*** Welcome to piglix ***

Raw data


Raw data, also known as primary data, is data (e.g., numbers, instrument readings, figures, etc.) collected from a source. If a scientist sets up a computerized thermometer which records the temperature of a chemical mixture in a test tube every minute, the list of temperature readings for every minute, as printed out on a spreadsheet or viewed on a computer screen is "raw data". Raw data has not been subjected to processing, "cleaning" by researchers to remove outliers, obvious instrument reading errors or data entry errors, or any analysis (e.g., determining central tendency aspects such as the average or median result). As well, raw data has not been subject to any other manipulation by a software program or a human researcher, analyst or technician. It is also referred to as primary data. Raw data is a relative term (see data), because even once raw data has been "cleaned" and processed by one team of researchers, another team may consider this processed data to be "raw data" for another stage of research. Raw data can be inputted to a computer program or used in manual procedures such as analyzing statistics from a survey. The term "raw data" can refer to the binary data on electronic storage devices, such as hard disk drives (also referred to as "low-level data").

Data has two ways of being created or generated. The first is what is called 'captured data', and is found through purposeful investigation or analysis. The second is called 'exhaust data', and is gathered usually by machines or terminals as a secondary function. For example cash registers, smart phones, and speedometers serve a main function but may collect data as a secondary task. Exhaustive data is usually too large or of little use to process and becomes 'transient' or thrown away. However 'derived' data is that which is useful enough in nature to be further processed for use. Examples include smart phone data, traffic data, and hospital data.

In computing, raw data may have the following attributes: it may possibly contain human, machine, or instrument errors, it may not be validated; it might be in different (colloquial) formats; uncoded or unformatted; or some entries might be "suspect" (e.g., outliers), requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January 1999", "31/01/1999", "31/1/99", "31 Jan", or "today". Once captured, this raw data may be processed stored as a normalized format, perhaps a Julian date, to make it easier for computers and humans to interpret during later processing. Raw data (sometimes colloquially called "sourcey" data or "eggy" data, the latter a reference to the data being "uncooked", that is, "unprocessed", like a raw egg) are the data input to processing. A distinction is made between data and information, to the effect that information is the end product of data processing. Raw data that has undergone processing are sometimes referred to as "cooked" data in a colloquial sense. Although raw data has the potential to be transformed into "information," extraction, organization, analysis and formatting for presentation are required before raw data can be transformed into usable information.


...
Wikipedia

...