*** Welcome to piglix ***

Ehrenfeucht–Mycielski sequence


The Ehrenfeucht–Mycielski sequence is a recursively defined sequence of binary digits with pseudorandom properties, defined by Andrzej Ehrenfeucht and Jan Mycielski (1992).

The sequence starts with the single bit 0; each successive digit is formed by finding the longest suffix of the sequence that also occurs earlier within the sequence, and complementing the bit following the most recent earlier occurrence of that suffix. (The empty string is a suffix and prefix of every string.) For example, the first few steps of this construction process are:

The first few digits of the sequence are:

A naive algorithm that generates each bit in the sequence by comparing each suffix with the entire previous sequence could take as much as O(n3) time to generate the first n bits of the sequence; however, as Herman & Soltys (2009) show using a data structure related to a suffix tree, the sequence can be generated much more efficiently, in constant time per generated digit.

The sequence is disjunctive; i.e., every finite subsequence of bits occurs contiguously, infinitely often within the sequence (Ehrenfeucht & Mycielski 1992). More explicitly, the position by which every subsequence of length i can be guaranteed to have occurred at least j times is at most A(4i,j), where A is the Ackermann function (Herman & Soltys 2009). Experimentally, however, each subsequence appears much earlier in this sequence than this upper bound would suggest: the position by which all length-i sequences occur, up to the limit of experimental testing, is close to the minimum possible value it could be, 2i + i, the position by which a de Bruijn sequence contains all length-i substrings (Sutner 2003).

Ehrenfeucht & Mycielski (1992) conjecture that the numbers of 0 and 1 bits each converge to a limiting density of 1/2. That is, if f(i) denotes the number of 0 bits in the first i positions of the Ehrenfeucht–Mycielski sequence, then it should be the case that


...
Wikipedia

...