A watermark stored in a data file refers to a method for ensuring data integrity which combines aspects of data hashing and digital watermarking. Both are useful for tamper detection, though each has its own advantages and disadvantages.
A typical data hash will process an input file to produce an alphanumeric string unique to the data file. Should the file be modified, such as if one or more bit changes occur within this original file, the same hash process on the modified file will produce a different alphanumeric. Through this method, a trusted source can calculate the hash of an original data file and subscribers can verify the integrity of the data. The subscriber simply compares a hash of the received data file with the known hash from the trusted source. This can lead to two situations: the hash being the same or the hash being different.
If the hash results are the same, the systems involved can have an appropriate degree of confidence to the integrity of the received data. On the other hand, if the hash results are different, they can conclude that the received data file has been altered.
This process is common in P2P networks, for example the protocol. Once a part of the file is downloaded, the data is then checked against the hash key (known as a hash check). Upon this result, the data is kept or discarded.
Digital watermarking is distinctly different from data hashing. It is the process of altering the original data file, allowing for the subsequent recovery of embedded auxiliary data referred to as a watermark.
A subscriber, with knowledge of the watermark and how it is recovered, can determine (to a certain extent) whether significant changes have occurred within the data file. Depending on the specific method used, recovery of the embedded auxiliary data can be robust to post-processing (such as lossy compression).
If the data file to be retrieved is an image, the provider can embed a watermark for protection purposes. The process allows tolerance to some change, while still maintaining an association with the original image file. Researchers have also developed techniques that embed components of the image within the image. This can help identify portions of the image that may contain unauthorized changes and even help in recovering some of the lost data.