A test score is a piece of information, usually a number, that conveys the performance of an examinee on a test. One formal definition is that it is "a summary of the evidence contained in an examinee's responses to the items of a test that are related to the construct or constructs being measured."
Test scores are interpreted with a norm-referenced or criterion-referenced interpretation, or occasionally both. A norm-referenced interpretation means that the score conveys meaning about the examinee with regards to their standing among other examinees. A criterion-referenced interpretation means that the score conveys information about the examinee with regards a specific subject matter, regardless of other examinees' scores.
There are two types of test scores: raw scores and scaled scores. A raw score is a score without any sort of adjustment or transformation, such as the simple number of questions answered correctly. A scaled score is the results of some transformation applied to the raw score.
The purpose of scaled scores is to report scores for all examinees on a consistent scale. Suppose that a test has two forms, and one is more difficult than the other. It has been determined by equating that a score of 65% on form 1 is equivalent to a score of 68% on form 2. Scores on both forms can be converted to a scale so that these two equivalent scores have the same reported scores. For example, they could both be a score of 350 on a scale of 100 to 500.
Two well-known tests in the United States that have scaled scores are the ACT and the SAT. The ACT's scale ranges from 0 to 36 and the SAT's from 200 to 800 (per section). Ostensibly, these two scales were selected to represent a mean and standard deviation of 18 and 6 (ACT), and 500 and 100. The upper and lower bounds were selected because an interval of plus or minus three standard deviations contains more than 99% of a population. Scores outside that range are difficult to measure, and return little practical value.
Note that scaling does not affect the psychometric properties of a test, it is something that occurs after the assessment process (and equating, if present) is completed. Therefore, it is not an issue of psychometrics, per se, but an issue of interpretability.
When tests are scored right-wrong, an important assumption has been made about learning. The number of right answers or the sum of item scores (where partial credit is given) is assumed to be the appropriate and sufficient measure of current performance status. In addition, a secondary assumption is made that there is no meaningful information in the wrong answers.