Subset construction algorithm

In the theory of computation and automata theory, the powerset construction or subset construction is a standard method for converting a nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) which recognizes the same formal language. It is important in theory because it establishes that NFAs, despite their additional flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to 2ⁿ states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs.

The construction, sometimes called the Rabin–Scott powerset construction (or subset construction) to distinguish it from similar constructions for other types of automata, was first published by Michael O. Rabin and Dana Scott in 1959.

To simulate the operation of a DFA on a given input string, one needs to keep track of a single state at any time: the state that the automaton will reach after seeing a prefix of the input. In contrast, to simulate an NFA, one needs to keep track of a set of states: all of the states that the automaton could reach after seeing the same prefix of the input, according to the nondeterministic choices made by the automaton. If, after a certain prefix of the input, a set $S$ of states can be reached, then after the next input symbol $x$ the set of reachable states is a deterministic function of $S$ and $x$ . Therefore, the sets of reachable NFA states play the same role in the NFA simulation as single DFA states play in the DFA simulation, and in fact the sets of NFA states appearing in this simulation may be re-interpreted as being states of a DFA.

The powerset construction applies most directly to an NFA that does not allow state transformations without consuming input symbols (aka: "ε-moves"). Such an automaton may be defined as a 5-tuple $(Q, Σ, T, q 0, F)$ , in which $Q$ is the set of states, $Σ$ is the set of input symbols, $T$ is the transition function (mapping a state and an input symbol to a set of states), $q 0$ is the initial state, and $F$ is the set of accepting states. The corresponding DFA has states corresponding to subsets of $Q$ . The initial state of the DFA is ${q 0}$ , the (one-element) set of initial states. The transition function of the DFA maps a state $S$ (representing a subset of $Q$ ) and an input symbol $x$ to the set $T (S, x) = \cup{T (q, x) | q \in S}$ , the set of all states that can be reached by an $x$ -transition from a state in $S$ . A state $S$ of the DFA is an accepting state if and only if at least one member of $S$ is an accepting state of the NFA.

...
Wikipedia