Star height

In theoretical computer science, more precisely in the theory of formal languages, the star height is a measure for the structural complexity of regular expressions: The star height equals the maximum nesting depth of stars appearing in the regular expression. The concept of star height was first defined and studied by Eggan (1963).

More formally, the star height of a regular expression E over a finite alphabet A is inductively defined as follows:

Here, $\scriptstyle \emptyset$ is the special regular expression denoting the empty set and ε the special one denoting the empty word; E and F are arbitrary regular expressions.

The star height h(L) of a regular language L is defined as the minimum star height among all regular expressions representing L. The intuition is here that if the language L has large star height, then it is in some sense inherently complex, since it cannot be described by means of an "easy" regular expression, of low star height.

While computing the star height of a regular expression is easy, determining the star height of a language can be sometimes tricky. For illustration, the regular expression

over the alphabet A = {a,b} has star height 2. However, the described language is just the set of all words ending in an a: thus the language can also be described by the expression

which is only of star height 1. To prove that this language indeed has star height 1, one still needs to rule out that it could be described by a regular expression of lower star height. For our example, this can be done by an indirect proof: One proves that a language of star height 0 contains only finitely many words. Since the language under consideration is infinite, it cannot be of star height 0.

...
Wikipedia