In computer programming and computer science, "maximal munch" or "longest match" is the principle that when creating some construct, as much of the available input as possible should be consumed.
The earliest known use of this term is by R.G.G. Cattell in his PhD thesis on automatic derivation of code generators for compilers.
For instance, the lexical syntax of many programming languages requires that tokens be built from the maximum possible number of characters from the input stream. This is done to resolve the problem of inherent ambiguity in commonly used regular expressions such as [a-z]+
(one or more lower-case letters).
The term is also used in compilers in the instruction selection stage to describe a method of "tiling" — determining how a structured tree representing a program in an intermediate language should be converted into linear machine code. An entire subtree might be converted into just one machine instruction, and the problem is how to split the tree into non-overlapping "tiles," each representing one machine instruction. An effective strategy is simply to make a tile of the largest subtree possible at any given point, which is called "maximal munch."
In some situations, "maximal munch" leads to undesirable or unintuitive outcomes. For instance, in the C programming language, the statement x=y/*z;
(without any whitespace) will probably lead to a syntax error, since the /*
character sequence initiates a (unintended) comment that is either unterminated or terminated by the end token */
of some later, unrelated actual comment (comments in C do not nest). What was actually meant in the statement was to assign to the variable x
the result of dividing the value in y
by the value obtained by dereferencing pointer z
; this would be perfectly valid (though not very common) code. It can be stated by making use of whitespace, or using x=y/(*z);
.