首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到14条相似文献,搜索用时 0 毫秒
1.
Statistical agencies often release a masked or perturbed version of survey data to protect the confidentiality of respondents' information. Ideally, a perturbation procedure should provide confidentiality protection without much loss of data quality, so that the released data may practically be treated as original data for making inferences. One major objective is to control the risk of correctly identifying any respondent's records in released data, by matching the values of some identifying or key variables. For categorical key variables, we propose a new approach to measuring identification risk and setting strict disclosure control goals. The general idea is to ensure that the probability of correctly identifying any respondent or surveyed unit is at most ξ, which is pre‐specified. Then, we develop an unbiased post‐randomisation procedure that achieves this goal for ξ>1/3. The procedure allows substantial control over possible changes to the original data, and the variance it induces is of a lower order of magnitude than sampling variance. We apply the procedure to a real data set, where it performs consistently with the theoretical results and quite importantly, shows very little data quality loss.  相似文献   

2.
This paper provides a review of common statistical disclosure control (SDC) methods implemented at statistical agencies for standard tabular outputs containing whole population counts from a census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach for assessing SDC methods is based on a disclosure risk–data utility framework and the need to find a balance between managing disclosure risk while maximizing the amount of information that can be released to users and ensuring high quality outputs. To carry out the analysis, quantitative measures of disclosure risk and data utility are defined and methods compared. Conclusions from the analysis show that record swapping as a sole SDC method leaves high probabilities of disclosure risk. Targeted record swapping lowers the disclosure risk, but there is more distortion of distributions. Small cell adjustments (rounding) give protection to census tables by eliminating small cells but only one set of variables and geographies can be disseminated in order to avoid disclosure by differencing nested tables. Full random rounding offers more protection against disclosure by differencing, but margins are typically rounded separately from the internal cells and tables are not additive. Rounding procedures protect against the perception of disclosure risk compared to record swapping since no small cells appear in the tables. Combining rounding with record swapping raises the level of protection but increases the loss of utility to census tabular outputs. For some statistical analysis, the combination of record swapping and rounding balances to some degree opposing effects that the methods have on the utility of the tables.  相似文献   

3.
In this paper we discuss the analysis of data from population‐based case‐control studies when there is appreciable non‐response. We develop a class of estimating equations that are relatively easy to implement. For some important special cases, we also provide efficient semi‐parametric maximum‐likelihood methods. We compare the methods in a simulation study based on data from the Women's Cardiovascular Health Study discussed in Arbogast et al. (Estimating incidence rates from population‐based case‐control studies in the presence of non‐respondents, Biometrical Journal 44, 227–239, 2002).  相似文献   

4.
In this paper, a new randomized response model is proposed, which is shown to have a Cramer–Rao lower bound of variance that is lower than the Cramer–Rao lower bound of variance suggested by Singh and Sedory at equal protection or greater protection of respondents. A new measure of protection of respondents in the setup of the efficient use of two decks of cards, because of Odumade and Singh, is also suggested. The developed Cramer–Rao lower bounds of variances are compared under different situations through exact numerical illustrations. Survey data to estimate the proportion of students who have sometimes driven a vehicle after drinking alcohol and feeling over the legal limit are collected by using the proposed randomization device and then analyzed. The proposed randomized response technique is also compared with a black box technique within the same survey. A method to determine minimum sample size in randomized response sampling based on a small pilot survey is also given.  相似文献   

5.
Risk‐utility formulations for problems of statistical disclosure limitation are now common. We argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short in essential ways from providing a sound basis for acting upon the problems. We illustrate this position in three specific contexts—transparency, tabular data and survey weights, with shorter consideration of two key emerging issues—longitudinal data and the use of administrative data to augment surveys.  相似文献   

6.
Social and economic scientists are tempted to use emerging data sources like big data to compile information about finite populations as an alternative for traditional survey samples. These data sources generally cover an unknown part of the population of interest. Simply assuming that analyses made on these data are applicable to larger populations is wrong. The mere volume of data provides no guarantee for valid inference. Tackling this problem with methods originally developed for probability sampling is possible but shown here to be limited. A wider range of model‐based predictive inference methods proposed in the literature are reviewed and evaluated in a simulation study using real‐world data on annual mileages by vehicles. We propose to extend this predictive inference framework with machine learning methods for inference from samples that are generated through mechanisms other than random sampling from a target population. Describing economies and societies using sensor data, internet search data, social media and voluntary opt‐in panels is cost‐effective and timely compared with traditional surveys but requires an extended inference framework as proposed in this article.  相似文献   

7.
Many developments have occurred in the practice of survey sampling and survey methodology in the past 60 years or so. These developments have been partly driven by the emergence of computers and the continuous growth in computer power over the years and partly by the increasingly sophisticated demands from the users of survey data. The paper reviews these developments with a main emphasis on survey sampling issues for the design and analysis of social surveys. Design‐based inference based on probability samples was the predominant approach in the early years, but over time, that predominance has been eroded by the need to employ model‐dependent methods to deal with missing data and to satisfy analysts' demands for survey estimates that cannot be met with design‐based methods. With the continuous decline in response rates that has occurred in recent years, much current research has focused on the use of non‐probability samples and data collected from administrative records and web surveys.  相似文献   

8.
This paper deals with the estimation of the mean of a spatial population. Under a design‐based approach to inference, an estimator assisted by a penalized spline regression model is proposed and studied. Proof that the estimator is design‐consistent and has a normal limiting distribution is provided. A simulation study is carried out to investigate the performance of the new estimator and its variance estimator, in terms of relative bias, efficiency, and confidence interval coverage rate. The results show that gains in efficiency over standard estimators in classical sampling theory may be impressive.  相似文献   

9.
A rich theory of production and analysis of productive efficiency has developed since the pioneering work by Tjalling C. Koopmans and Gerard Debreu. Michael J. Farrell published the first empirical study, and it appeared in a statistical journal (Journal of the Royal Statistical Society), even though the article provided no statistical theory. The literature in econometrics, management sciences, operations research and mathematical statistics has since been enriched by hundreds of papers trying to develop or implement new tools for analysing productivity and efficiency of firms. Both parametric and non‐parametric approaches have been proposed. The mathematical challenge is to derive estimators of production, cost, revenue or profit frontiers, which represent, in the case of production frontiers, the optimal loci of combinations of inputs (like labour, energy and capital) and outputs (the products or services produced by the firms). Optimality is defined in terms of various economic considerations. Then the efficiency of a particular unit is measured by its distance to the estimated frontier. The statistical problem can be viewed as the problem of estimating the support of a multivariate random variable, subject to some shape constraints, in multiple dimensions. These techniques are applied in thousands of papers in the economic and business literature. This ‘guided tour’ reviews the development of various non‐parametric approaches since the early work of Farrell. Remaining challenges and open issues in this challenging arena are also described. © 2014 The Authors. International Statistical Review © 2014 International Statistical Institute  相似文献   

10.
We consider a new method of semiparametric statistical estimation for the continuous‐time moving‐average Lévy processes. We derive the convergence rates of the proposed estimators and show that these rates are optimal in minimax sense.  相似文献   

11.
Mean profiles are widely used as indicators of the electricity consumption habits of customers. Currently, in Électricité De France (EDF), class load profiles are estimated using point‐wise mean profiles. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high‐levels of consumption. In this paper, we propose an alternative to the mean profile: the L 1 ‐ median profile which is more robust. When dealing with large data sets of functional data (load curves for example), survey sampling approaches are useful for estimating the median profile avoiding storing the whole data. We propose here several sampling strategies and estimators to estimate the median trajectory. A comparison between them is illustrated by means of a test population. We develop a stratification based on the linearized variable which substantially improves the accuracy of the estimator compared to simple random sampling without replacement. We suggest also an improved estimator that takes into account auxiliary information. Some potential areas for future research are also highlighted.  相似文献   

12.
The growth of non‐response rates for social science surveys has led to increased concern about the risk of non‐response bias. Unfortunately, the non‐response rate is a poor indicator of when non‐response bias is likely to occur. We consider in this paper a set of alternative indicators. A large‐scale simulation study is used to explore how each of these indicators performs in a variety of circumstances. Although, as expected, none of the indicators fully depict the impact of non‐response in survey estimates, we discuss how they can be used when creating a plausible account of the risks for non‐response bias for a survey. We also describe an interesting characteristic of the fraction of missing information that may be helpful in diagnosing not‐missing‐at‐random mechanisms in certain situations.  相似文献   

13.
In this paper, we consider the use of auxiliary and paradata for dealing with non‐response and measurement errors in household surveys. Three over‐arching purposes are distinguished: response enhancement, statistical adjustment, and bias exploration. Attention is given to the varying focus at the different phases of statistical production from collection, processing to analysis, and how to select and utilize the useful auxiliary and paradata. Administrative register data provide the richest source of relevant auxiliary information, in addition to data collected in previous surveys and censuses. Due to their importance in effective dealings with non‐sampling errors, one should make every effort to increase their availability in the statistical system and, at the same time, to develop efficient statistical methods that capitalize on the combined data sources.  相似文献   

14.
Response‐adaptive designs are being used increasingly in applications, and this is especially so in early phase clinical trials. This paper reviews a particular class of response‐adaptive designs that have the property of picking the superior treatment with probability tending to one. This is a desirable property from an ethical point of view in clinical trials. The model underlying such designs is a randomly reinforced urn. This paper provides an overview of results for these designs, starting from the early paper of Durham and Yu (1990) until the recent work by Flournoy, May, Moler and Plo (2010).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号