Data comparison

In computing, file comparison is the calculation and display of the differences and similarities between data objects, typically text files such as source code.

The methods, implementations, and results are typically called a diff, after the Unix diff utility. The output may be presented in a graphical user interface or used as part of larger tasks in networks, file systems, or revision control.

Some widely used file comparison programs are diff, cmp, FileMerge, WinMerge, Beyond Compare, and Microsoft File Compare.

Many text editors and word processors perform file comparison to highlight the changes to a document.

Most file comparison tools find the longest common subsequence between two files. Any data not in the longest common subsequence is presented as an insertion or deletion.

In 1978, Paul Heckel published an algorithm that identifies most moved blocks of text. This is used in the IBM History Flow tool. Other file comparison programs find block moves.

Some specialized file comparison tools find the longest increasing subsequence between two files. The rsync protocol uses a rolling hash function to compare two files on two distant computers with low communication overhead.

...
Wikipedia