*** Welcome to piglix ***

Chomsky normal form


In formal language theory, a context-free grammar G is said to be in Chomsky normal form (first described by Noam Chomsky) if all of its production rules are of the form:

where A, B, and C are nonterminal symbols, a is a terminal symbol (a symbol that represents a constant value), S is the start symbol, and ε denotes the empty string. Also, neither B nor C may be the start symbol, and the third production rule can only appear if ε is in L(G), namely, the language produced by the context-free grammar G.

Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form and has a size no larger than the square of the original grammar's size.

To convert a grammar to Chomsky normal form, a sequence of simple transformations is applied in a certain order; this is described in most textbooks on automata theory. The presentation here follows Hopcroft, Ullman (1979), but is adapted to use the transformation names from Lange, Leiß (2009). Each of the following transformations establishes one of the properties required for Chomsky normal form.

Introduce a new start symbol S0, and a new rule

where S is the previous start symbol. This doesn't change the grammar's produced language, and S0 won't occur on any rule's right-hand side.

To eliminate each rule

with a terminal symbol a being not the only symbol on the right-hand side, introduce, for every such terminal, a new nonterminal symbol Na, and a new rule

Change every rule

to

If several terminal symbols occur on the right-hand side, simultaneously replace each of them by its associated nonterminal symbol. This doesn't change the grammar's produced language.

Replace each rule

with more than 2 nonterminals X1,...,Xn by rules

where Ai are new nonterminal symbols. Again, this doesn't change the grammar's produced language.


...
Wikipedia

...