*** Welcome to piglix ***

Differential privacy


In cryptography, differential privacy aims to provide means to maximize the accuracy of queries from statistical databases while minimizing the chances of identifying its records.

Consider a trusted party that holds a dataset of sensitive private information (for example, medical records, movie viewing, or email usage) that would like to provide global, statistical information about the data. Such a system is called a statistical database. However, providing aggregate statistical information about the data may reveal some information about the individuals. In fact, various ad-hoc approaches to anonymizing public records have failed when researchers managed to identify personal information by linking two or more separately innocuous databases. Differential privacy is a framework for formalizing privacy in statistical databases introduced in order to protect against these kinds of deanonymization techniques.

For example, in 2007, Netflix offered a $1 million prize for a 10% improvement in its recommendation system. Netflix also released a training dataset for the competing developers to train their systems. While releasing this dataset, they provided a disclaimer: To protect customer privacy, all personal information identifying individual customers has been removed and all customer ids [sic] have been replaced by randomly assigned ids [sic].

Netflix is not the only movie-rating portal on the web; there are many others, including IMDb. On IMDb individuals can register and rate movies and they have the option of not keeping their details anonymous. Arvind Narayanan and Vitaly Shmatikov, researchers at the University of Texas at Austin, linked the Netflix anonymized training database with the IMDb database (using the date of rating by a user) to partially de-anonymize the Netflix training database, compromising the identity of some users.

Latanya Sweeney from Carnegie Mellon University linked the anonymized GIC database (which retained the birthdate, sex, and ZIP code of each patient) with voter registration records, and was able to identify the medical record of the governor of Massachusetts.


...
Wikipedia

...