Systematic sampling

Systematic sampling is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of systematic sampling is an equiprobability method. In this approach, progression through the list is treated circularly, with a return to the top once the end of the list is passed. The sampling starts by selecting an element from the list at random and then every k^th element in the frame is selected, where k, the sampling interval (sometimes known as the skip): this is calculated as:

where n is the sample size, and N is the population size.

Using this procedure each element in the population has a known and equal probability of selection. This makes systematic sampling functionally similar to simple random sampling (SRS). However it is not the same as SRS because not every possible sample of a certain size has an equal chance of being chosen (e.g. samples with at least two elements adjacent to each other will never be chosen by systematic sampling). It is however, much more efficient (if variance within systematic sample is more than variance of population).

Systematic sampling is to be applied only if the given population is logically homogeneous, because systematic sample units are uniformly distributed over the population. The researcher must ensure that the chosen sampling interval does not hide a pattern. Any pattern would threaten randomness.

Example: Suppose a supermarket wants to study buying habits of their customers, then using systematic sampling they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample.

This is random sampling with a system. From the sampling frame, a starting point is chosen at random, and choices thereafter are at regular intervals. For example, suppose you want to sample 8 houses from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random starting point between 1 and 15. If the random starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and 116. As an aside, if every 15th house was a "corner house" then this corner pattern could destroy the randomness of the population.

If, as more frequently, the population is not evenly divisible (suppose you want to sample 8 houses out of 125, where 125/8=15.625), should you take every 15th house or every 16th house? If you take every 16th house, 8*16=128, so there is a risk that the last house chosen does not exist. On the other hand, if you take every 15th house, 8*15=120, so the last five houses will never be selected. The random starting point should instead be selected as a noninteger between 0 and 15.625 (inclusive on one endpoint only) to ensure that every house has equal chance of being selected; the interval should now be nonintegral (15.625); and each noninteger selected should be rounded up to the next integer. If the random starting point is 3.6, then the houses selected are 4, 20, 35, 50, 66, 82, 98, and 113, where there are 3 cyclic intervals of 15 and 4 intervals of 16.

...
Wikipedia