In medicine and statistics, gold standard test usually refers to a diagnostic test or benchmark that is the best available under reasonable conditions. Other times, gold standard is used to refer to the most accurate test possible without restrictions.
Both meanings are different because for example, in medicine, dealing with conditions that would require an autopsy to have a perfect diagnosis, the gold standard test would be the best one that keeps the patient alive instead of the autopsy.
The phrase is therefore ambiguous and its meaning should be deduced from the context in which it appears. Part of the ambiguity stems from its usage in economics, where "gold" does not imply "best" but is merely one of many possible standards.
"Gold standard" can refer to the criteria by which scientific evidence is evaluated. For example, in resuscitation research, the "gold standard" test of a medication or procedure is whether or not it leads to an increase in the number of neurologically intact survivors that walk out of the hospital. Other types of medical research might regard a significant decrease in 30-day mortality as the gold standard.
The AMA Style Guide prefers the phrase Criterion Standard instead of "gold standard", and many medical journals now mandate this usage in their instructions for contributors. For instance, Archives of biological Medicine and Rehabilitation specifies this usage. When the criterion is a whole clinical testing procedure it is usually referred to as clinical definition or clinical case definition.
A hypothetical ideal "gold standard" test has a sensitivity of 100% with respect to the presence of the disease (it identifies all individuals with a well defined disease process; it does not have any false-negative results) and a specificity of 100% (it does not falsely identify someone with a condition that does not have the condition; it does not have any false-positive results). In practice, there are sometimes no true "gold standard" tests. These are called "imperfect" or "alloyed" gold standards.
As new diagnostic methods become available, the "gold standard" test may change over time. For instance, for the diagnosis of aortic dissection, the "gold standard" test used to be the aortogram, which had a sensitivity as low as 83% and a specificity as low as 87%. Since the advancements of magnetic resonance imaging, the magnetic resonance angiogram (MRA) has become the new "gold standard" test for aortic dissection, with a sensitivity of 95% and a specificity of 92%. Before widespread acceptance of any new test, the former test retains its status as the "gold standard."