Hunt–McIlroy algorithm

In computer science, the Hunt–McIlroy algorithm is a solution to the longest common subsequence problem. It was one of the first non-heuristic algorithms used in diff. To this day, variations of this algorithm are found in incremental version control systems, , and molecular phylogenetics research software.

The research accompanying the final version of Unix diff, written by Douglas McIlroy, was published in the 1976 paper "An Algorithm for Differential File Comparison", co-written with James W. Hunt, who developed an initial prototype of diff.

The Hunt–McIlroy algorithm is a modification to a basic solution for the longest common subsequence problem. The solution is modified so that there are lower time and space requirements for the algorithm when it is working with typical inputs.

Let A_i be the ith line of the first file.

Let B_j be the jth line of the second file.

Let P_ij be the length of the longest common subsequence for the first i lines of the first file and the first j lines of the second file.

Consider the files A and B.

A contains three lines:

B contains three lines:

The steps the above algorithm would perform to determine the length of the longest common subsequence for both files are shown in the diagram. The algorithm correctly reports that the longest common subsequence of the two files is two lines long.

The above algorithm has worst-case time and space complexities of $O(mn)$ (see big O notation), where m is the number of lines in file A and n is the number of lines in file B. The Hunt–McIlroy algorithm modifies this algorithm to have a worst case time complexity of $O(mn\log m)$ and space complexity of $O(mn)$ , though it regularly beats the worst-case with typical inputs.

...
Wikipedia