In computer programming, a characterization test (also known as Golden Master Testing) is a means to describe (characterize) the actual behavior of an existing piece of software, and therefore protect existing behavior of legacy code against unintended changes via automated testing. This term was coined by Michael Feathers.
The goal of characterization tests is to help developers verify that the modifications made to a reference version of a software system did not modify its behavior in unwanted or undesirable ways. They enable, and provide a safety net for, extending and refactoring code that does not have adequate unit tests.
In James Bachs and Michael Boltons classification of test oracles, this kind of testing corresponds to the historical oracle. In contrast to the usual approach of assertions-based software testing, the outcome of the test is not determined by individual values or properties (that are checked with assertions), but by comparing a complex result of the tested software-process as a whole with a the result of the same process in a previous version of the software. In a sense, characterization testing inverts traditional testing: Traditional tests check individual properties (whitelists them), where characterization testing checks all properties that are not removed (blacklisted).
When creating a characterization test, one must observe what outputs occur for a given set of inputs. Given an observation that the legacy code gives a certain output based on given inputs, then a test can be written that asserts that the output of the legacy code matches the observed result for the given inputs. For example, if one observes that f(3.14) == 42, then this could be created as a characterization test. Then, after modifications to the system, the test can determine if the modifications caused changes in the results when given the same inputs.
Unfortunately, as with any testing, it is generally not possible to create a characterization test for every possible input and output. As such, many people opt for either statement or branch coverage. However, even this can be difficult. Test writers must use their judgment to decide how much testing is appropriate. It is often sufficient to write characterization tests that only cover the specific inputs and outputs that are known to occur, paying special attention to edge cases.