*** Welcome to piglix ***

Statistical disclosure control


Statistical disclosure control (SDC), also known as statistical disclosure limitation (SDL) or disclosure avoidance, is a technique used in data-driven research to ensure no person or organization is identifiable from the results of an analysis of survey or administrative data, or in the release of microdata. The purpose of SDC is to protect the confidentiality of the respondents and subjects of the research.

There are two main approaches to SDC: principles-based and rules-based. In principles-based systems, disclosure control attempts to upload a specific set of fundamental principles—for example, "no person should be identifiable in released microdata". Rules-based systems, in contrast, are evidenced by a specific set of rules that a person performing disclosure control follows, after which the data are presumed to be safe to release. Using this taxonomy, proposed by Ritchie and Elliot in 2013, disclosure control based on differential privacy can be seen as a principles-based approach, whereas controls based on de-identification, such as the US Health Insurance Portability and Accountability Act's Privacy Rule's Safe Harbor method for de-identifying protected health information can be seen as a rule-based system.

Many kinds of social, economic and health research use potentially sensitive data as a basis for their research, such as survey or Census data, tax records, health records, educational information, etc. Such information is usually given in confidence, and, in the case of administrative data, not always for the purpose of research.

Researchers are not usually interested in information about one single person or business; they are looking for trends among larger groups of people. However, the data they use is, in the first place, linked to individual people and businesses, and SDC ensures that these cannot be identified from published data, no matter how detailed or broad.

It is possible that at the end of data analysis, the researcher somehow singles out one person or business through their research. For example, a researcher may identify the exceptionally good or bad service in a geriatric department within a hospital in a remote area, where only one hospital provides such care. In that case, the data analysis 'discloses' the identity of the hospital, even if the dataset used for analysis was properly anonymised or de-identified.


...
Wikipedia

...