*** Welcome to piglix ***

Dataset


A data set (or dataset, although this spelling is not present in many contemporary dictionaries) is a collection of data.

Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows. The term data set may also be used more loosely, to refer to the data in a collection of closely related tables, corresponding to a particular experiment or event. An example of this type is the data sets collected by space agencies performing experiments with instruments aboard space probes. Data sets that are so large that traditional data processing applications are inadequate to deal with them are known as big data.

In the open data discipline, dataset is the unit to measure the information released in a public open data repository. The European Open Data portal aggregates more than half a million datasets. In this field other definitions have been proposed but currently there is not an official one. Some other issues (real-time data sources, non-relational datasets, etc.) increases the difficulty to reach a consensus about it.

Historically, the term originated in the mainframe field, where it had a well-defined meaning, very close to the contemporary computer file.

Several characteristics define a data set's structure and properties. These include the number and types of the attributes or variables, and various statistical measures applicable to them, such as standard deviation and kurtosis.


...
Wikipedia

...