Distortion Measures for Categorical Data Swapping (2003)


Data swapping is a common technique for statistical disclosure limitation, but its effects on real data are not understood completely.  In this paper, we consider measures that can be used to quantify distortion to the data engendered by data swapping when the variables in the data set are categorical.  These measures are applied to a data set derived from the Current Population Survey.  Their behavior is studied and compared for various values of the swapping rate and different choice of the variable swapped.


data utility; data confidentiality; statistical disclosure limitation; Hellinger distance; Shannon entropy; total variation distance; contingency coefficient; Cramer's V.

Shanti GomatamAlan F. Karr
Publication Date: 
Wednesday, January 1, 2003
File Attachment: 
PDF icon tr131.pdf
Report Number: