*** Welcome to piglix ***

MUSHRA


MUSHRA stands for MUltiple Stimuli with Hidden Reference and Anchor and is a methodology for subjective evaluation of audio quality, to evaluate the perceived quality of the output from lossy audio compression algorithms. It is defined by ITU-R recommendation BS.1534-3. The MUSHRA methodology is recommended for assessing "intermediate audio quality". For very small audio impairments, Recommendation ITU-R BS.1116-3 (ABC/HR) is recommended instead.

The main advantage over the Mean Opinion Score (MOS) methodology (which serves a similar purpose) is that it requires fewer participants to obtain statistically significant results. This is because all codecs are presented at the same time, on the same samples, so that a paired t-test or a repeated measures anova can be used for statistical analysis. Also, the 0-100 scale makes it possible to rate very small differences. In MUSHRA, the listener is presented with the reference (labeled as such), a certain number of test samples, a hidden version of the reference and one or more anchors. The recommendation specifies that a low-range and a mid-range anchor should be included in the test signals. These are typically a 7 kHz and a 3.5 kHz low-pass version of the reference. The purpose of the anchor(s) is to make the scale be closer to an "absolute scale", making sure that minor artifacts are not rated as having very bad quality. This is particularly important when comparing or pooling results from different labs.

Both, MUSHRA and ITU BS.1116 tests call for trained expert listeners who know what typical artifacts sound like and where they are likely to occur. Expert listeners also have a better internalization of the rating scale which leads to a better retest reliability than untrained listeners. Thus, fewer listeners are needed to achieve significant results.

It is assumed that preferences are similar for expert listeners and naive listeners and thus results of expert listeners are also predictive for consumers. In agreement with this assumption Schinkel-Bielefeld et al. found no differences in the rank order between expert listeners and untrained listeners when using test signals containing only timbre and no spatial artifacts. However, Rumsey et al. showed that for signals containing spatial artifacts, expert listeners weigh spatial artifacts slightly stronger than untrained listeners, who primarily focus on timbre artifacts.


...
Wikipedia

...