Abstract:
We construct a decision-theoretic formulation of data swapping in which quantitative measures of disclosure risk and data utility are employed to select one release from a possibly large set of candidates. The decision variables are the swap rate, swap attribute(s) and possibly, constraints on the unswapped attributes. Risk–utility frontiers, consisting of those candidates not dominated in (risk, utility) space by any other candidate, are a principal tool for reducing the scale of the decision problem. Multiple measures of disclosure risk and data utility, including utility measures based directly on use of the swapped data for statistical inference, are introduced. Their behavior and resulting insights into the decision problem are illustrated using data from the Current Population Survey, the well-studied “Czech auto worker data” and data on schools and administrators generated by the National Center for Education Statistics.
