*** Welcome to piglix ***

Shift-reduce parser


A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions. The outward actions of an LR parser are best understood by ignoring the arcane mathematical details of how LR parser tables are generated, and instead looking at the parser as just some generic shift-reduce method.

A shift-reduce parser scans and parses the input text in one forward pass over the text, without backing up. (That forward direction is generally left-to-right within a line, and top-to-bottom for multi-line inputs.) The parser builds up the parse tree incrementally, bottom up, and left to right, without guessing or backtracking. At every point in this pass, the parser has accumulated a list of subtrees or phrases of the input text that have been already parsed. Those subtrees are not yet joined together because the parser has not yet reached the right end of the syntax pattern that will combine them.

At step 7 in the example parse, only "A = B +" has been parsed. Only the shaded lower-left corner of the parse tree exists. None of the parse tree nodes numbered 8 and above exist yet. Nodes 1, 2, 6, and 7 are the roots of isolated subtrees covering all the items 1..7. Node 1 is variable A, node 2 is the delimiter =, node 6 is the summand B, and node 7 is the operator +. These four root nodes are temporarily held in a parse stack. The remaining unparsed portion of the input stream is "C * 2".

A shift-reduce parser works by doing some combination of Shift steps and Reduce steps, hence the name.

The parser continues with these steps until all of the input has been consumed and all of the parse trees have been reduced to a single tree representing an entire legal input.

At every parse step, the entire input text is divided into parse stack, current lookahead symbol, and remaining unscanned text. The parser's next action is determined by the rightmost stack symbol(s) and the lookahead symbol. The action is read from a table containing all syntactically valid combinations of stack and lookahead symbols.


...
Wikipedia

...