Class | Selection algorithm |
---|---|
Data structure | Array |
Worst-case performance | |
Best-case performance | |
Worst-case space complexity | auxiliary |
In computer science, the median of medians algorithm is a selection algorithm based on the quickselect algorithm, and is optimal, having worst-case linear time complexity for selecting the kth largest element. The algorithm finds an approximate median in linear time—this is the key step—which is then used as a pivot in quickselect. In other words, it uses an (asymptotically) optimal approximate median-selection algorithm to build an (asymptotically) optimal general selection algorithm.
The approximate median-selection algorithm can also be used as a pivot strategy in quicksort, yielding an optimal algorithm, with worst-case complexity O(n log n). Although this approach optimizes quite well, it is typically outperformed in practice by instead choosing random pivots, which has average O(n) time for selection and average O(n log n) time for sorting, and avoids the overhead of computing the pivot. Median of medians is used in the hybrid introselect algorithm as a fallback, to ensure worst-case linear performance: introselect starts with quickselect, to obtain good average performance, and then falls back to median of medians if progress is too slow.
The algorithm was published in Blum et al. (1973), and thus is sometimes called BFPRT after the last names of the authors. In the original paper the algorithm was referred to as PICK, referring to quickselect as "FIND".
Quickselect is linear-time on average, but it can require quadratic time with poor pivot choices. This is because quickselect is a divide and conquer algorithm, with each step taking O(n) time in the size of the remaining search set. If the search set decreases exponentially quickly in size (by a fixed proportion), this yields a geometric series times the O(n) factor of a single step, and thus linear overall time. However, if the search set decreases slowly in size, such as linearly (by a fixed number of elements, in the worst case only reducing by one element each time), then a linear sum of linear steps yields quadratic overall time (formally, triangular numbers grow quadratically). For example, the worst case occurs when pivoting on the smallest element at each step, such as applying quickselect for the maximum element to already sorted data and taking the first element as pivot each time.