Motivation: Certain cells in statistical tables are risky or sensitive from a confidentiality standpoint. For count data, small cells may be sensitive; for magnitude data, cells where one contributor can closely estimate another’s contribution may be sensitive. The earliest form of statistical disclosure limitation (SDL) was to remove sensitive cells from published tables, replacing their value by a symbol (“D” for disclosure); this is primary (cell) suppression. Primary suppression (only) usually provides inadequate protection, as, for example, if only one cell contributing to a published total is sensitive. Nevertheless, today many local, state and national data releasing organizations employ primary suppression (only) for SDL, particularly in the public health arena. Complementary cell suppression (CCS) amounts to suppressing additional, non-sensitive, cells in order to thwart close estimation of sensitive cell values. CCS is an NP-Hard mathematical problem, but nevertheless has stimulated serious work and publications in statistics, mathematics, computer science and operations research since the 1970s. The current state of practice and software ranges from sophisticated mixed integer linear programming formulations to ad hoc approaches to downright surrender (primary suppression only). Principled formulations—sometimes mathematically optimal, sometimes not—and software for large scale CCS have been developed and successfully deployed at the US Census Bureau, Statistics Canada, and within the European Union from the late 1970s onward.
CCS creates “tables with holes” that challenge even sophisticated statistical analysis because primary disclosures are not separately identified and because missing-ness is the result of deterministic, not stochastic, processes. Nevertheless, CCS is like Listerine—people hate it but use it twice a day.
Objectives: The landscape for practical approaches or mathematical models and deployment of CCS is most uneven internationally and within the US Federal statistical system. The objective of this workshop is to examine and move Fedstat agencies and other participants towards common understanding of the characteristics of and challenges posed by CCS, including its effects on data quality and usability. Motivating questions are: (a) what is best practice for CCS?, (b) how should users approach statistical analysis of tables with suppressions, particularly tables of magnitude data?, and (c) what can releasing agencies do to facilitate and simplify analysis of suppressed tabular data?
Event Type
- NISS Hosted