*** Welcome to piglix ***

Dark data


Dark data is data which is acquired through various computer network operations but not used in any manner to derive insights or for decision making. The ability of an organisation to collect data can exceed the throughput at which it can analyse the data. In some cases the organisation may not even be aware that the data is being collected.IBM estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used.

In an industrial context, dark data can include information gathered by sensors and telematics. The first use and defining of the term appears to be by the consulting company Gartner.

Organizations retain dark data for a multitude of reasons, and it is estimated that most companies are only analyzing 1% of their data. Often it is stored for regulatory compliance and record keeping. Some organizations believe that dark data could be useful to them in the future, once they have acquired better analytic and business intelligence technology to process the information. Because storage is inexpensive, storing data is easy. However, storing and securing the data usually entails greater expenses (or even risk) than the potential return profit.

A lot of dark data is unstructured, which means that the information is in formats that may be difficult to categorise, be read by the computer and thus analysed. Often the reason that business do not analyse their dark data is because of the amount of resources it would take and the difficulty of having that data analysed. According to Computer Weekly, 60% of organisations believe that their own business intelligence reporting capability is "inadequate" and 65% say that they have "somewhat disorganised content management approaches".

Many companies in the IT sector are looking at creating "cognitive computer systems" that are able to analyse unstructured dark data. The IBM Watson is considered to be a future system that would be able to analyse this unstructured data and be able to produce meaningful results that will utilise a lot of dark data that it is either practically impossible or very difficult to process at present. In terms of current systems, IBM have advertised the IBM Spark as a system that "can extract insight from that information almost immediately. This enables businesses to build data rich products and services that use that information to transform the customer experience." Furthermore, they also give an even broader definition of dark data, one that also includes data that is not currently processed by computing systems but could be, for example in law.


...
Wikipedia

...