On Invariant Post‐randomization for Statistical Disclosure Control |
| |
Authors: | Tapan K Nayak Samson A Adeshiyan |
| |
Institution: | 1. Center for Disclosure Avoidance Research, U.S. Census Bureau, Washington, DC 20233 and Department of Statistics, George Washington University, Washington, DC 20052, USA;2. U.S. Energy Information Administration, Washington, DC 20585, USA |
| |
Abstract: | In this paper, we investigate certain operational and inferential aspects of invariant Post‐randomization Method (PRAM) as a tool for disclosure limitation of categorical data. Invariant PRAM preserves unbiasedness of certain estimators, but inflates their variances and distorts other attributes. We introduce the concept of strongly invariant PRAM, which does not affect data utility or the properties of any statistical method. However, the procedure seems feasible in limited situations. We review methods for constructing invariant PRAM matrices and prove that a conditional approach, which can preserve the original data on any subset of variables, yields invariant PRAM. For multinomial sampling, we derive expressions for variance inflation inflicted by invariant PRAM and variances of certain estimators of the cell probabilities and also their tight upper bounds. We discuss estimation of these quantities and thereby assessing statistical efficiency loss from applying invariant PRAM. We find a connection between invariant PRAM and creating partially synthetic data using a non‐parametric approach, and compare estimation variance under the two approaches. Finally, we discuss some aspects of invariant PRAM in a general survey context. |
| |
Keywords: | Categorical data randomized response sampling design synthetic data unbiased estimation variance inflation |
|
|