*** Welcome to piglix ***

Chinese restaurant process


In probability theory, the Chinese restaurant process is a discrete-time , analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables (or blocks of the partition). The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.

David J. Aldous attributes the restaurant analogy to Jim Pitman and Lester Dubins in his 1983 book.

At any positive-integer time n, the value of the process is a partition Bn of the set {1, 2, 3, ..., n}, whose probability distribution is determined as follows. At time n = 1, the trivial partition { {1} } is obtained with probability 1. At time n + 1 the element n + 1 is either:

The random partition so generated has some special properties. It is exchangeable in the sense that relabeling {1, ..., n} does not change the distribution of the partition, and it is consistent in the sense that the law of the partition of n − 1 obtained by removing the element n from the random partition at time n is the same as the law of the random partition at time n − 1.

The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is

where b is a block in the partition B and |b| is the size of b.

This construction can be generalized to a model with two parameters, α and θ, commonly called the discount and strength (or concentration) parameters. At time n + 1, the next customer to arrive finds |B| occupied tables and decides to sit at an empty table with probability


...
Wikipedia

...