*** Welcome to piglix ***

Scannerless parsing


In computer science, scannerless parsing (also called lexerless parsing) refers to performing both tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. It also refers to the associated grammar: using a single formalism to express both the lexical (word level) grammar and phrase level grammar used to parse a language.

Dividing processing up into a lexer followed by a parser is generally viewed as better design because it is more modular, and scannerless parsing is primarily used when a clear lexer–parser distinction is unneeded or unwanted. Examples of when this is appropriate include TeX, most grammars, makefiles, simple per application scripting languages, and Perl 6.

Eelco Visser identified five key extensions to classical context-free syntax which handle almost all common non-context-free constructs arising in practice:


...
Wikipedia

...