首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
This paper provides a review of common statistical disclosure control (SDC) methods implemented at statistical agencies for standard tabular outputs containing whole population counts from a census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach for assessing SDC methods is based on a disclosure risk–data utility framework and the need to find a balance between managing disclosure risk while maximizing the amount of information that can be released to users and ensuring high quality outputs. To carry out the analysis, quantitative measures of disclosure risk and data utility are defined and methods compared. Conclusions from the analysis show that record swapping as a sole SDC method leaves high probabilities of disclosure risk. Targeted record swapping lowers the disclosure risk, but there is more distortion of distributions. Small cell adjustments (rounding) give protection to census tables by eliminating small cells but only one set of variables and geographies can be disseminated in order to avoid disclosure by differencing nested tables. Full random rounding offers more protection against disclosure by differencing, but margins are typically rounded separately from the internal cells and tables are not additive. Rounding procedures protect against the perception of disclosure risk compared to record swapping since no small cells appear in the tables. Combining rounding with record swapping raises the level of protection but increases the loss of utility to census tabular outputs. For some statistical analysis, the combination of record swapping and rounding balances to some degree opposing effects that the methods have on the utility of the tables.  相似文献   

2.
Data fusion or statistical matching techniques merge datasets from different survey samples to achieve a complete but artificial data file which contains all variables of interest. The merging of datasets is usually done on the basis of variables common to all files, but traditional methods implicitly assume conditional independence between the variables never jointly observed given the common variables. Therefore we suggest using model based approaches tackling the data fusion task by more flexible procedures. By means of suitable multiple imputation techniques, the identification problem which is inherent in statistical matching is reflected. Here a non-iterative Bayesian version of Rubin's implicit regression model is presented and compared in a simulation study with imputations from a data augmentation algorithm as well as an iterative approach using chained equations.  相似文献   

3.
Vast amounts of data that could be used in the development and evaluation of policy for the benefit of society are collected by statistical agencies. It is therefore no surprise that there is very strong demand from analysts, within business, government, universities and other organisations, to access such data. When allowing access to micro‐data, a statistical agency is obliged, often legally, to ensure that it is unlikely to result in the disclosure of information about a particular person or organisation. Managing the risk of disclosure is referred to as statistical disclosure control (SDC). This paper describes an approach to SDC for output from analysis using generalised linear models, including estimates of regression parameters and their variances, diagnostic statistics and plots. The Australian Bureau of Statistics has implemented the approach in a remote analysis system, which returns analysis output from remotely submitted queries. A framework for measuring disclosure risk associated with a remote server is proposed. The disclosure risk and utility of approach are measured in two real‐life case studies and in simulation.  相似文献   

4.
To guard the confidentiality of information provided by respondents, statistical offices apply disclosure limitation techniques. An often applied technique is to ensure that there are no categories for which the population frequency is presumed to be small (‘rare’ categories). This is attained by recoding, top-coding or setting values to ‘unknown’. Since population frequencies are usually not available, the decision that a category is rare is often based on intuitive considerations. This is a time consuming process, involving many decisions of the disclosure limitation practitioners. In this paper it will be explored to what extent the sample frequencies can be used to make such decisions. This leads to a procedure which enables to automatically scan a data set for rare category combinations, whereby ‘rare’ is defined by the disclosure limitation policy of the statistical office.  相似文献   

5.
By means of an integration of decision theory and probabilistic models, we explore and develop methods for improving data privacy. Our work encompasses disclosure control tools in statistical databases and privacy requirements prioritization; in particular we propose a Bayesian approach for the on-line auditing in Statistical Databases and Pairwise Comparison Matrices for privacy requirements prioritization. The first approach is illustrated by means of examples in the context of statistical analysis on the census and medical data, where no salary (resp. no medical information), that could be related to a specific employee (resp. patient), must be released; the second approach is illustrated by means of examples, such as an e-voting system and an e-banking service that have to satisfy privacy requirements in addition to functional and security ones. Several fields in the social sciences, economics and engineering will benefit from the advances in this research area: e-voting, e-government, e-commerce, e-banking, e-health, cloud computing and risk management are a few examples of applications for the findings of this research.  相似文献   

6.
In this paper a model is developed for assesing disclosure risks of a microdata set. It is an extension of the one presented in Bethlehem et al. (1988). It is used to calculate (an upper bound of) the risk that an investigator is able to reidentify at least one individual in an anonimyzed data set, and hence discloses some sensitive information about him. This risk is shown to depend on, among other things, two variables which can be controlled by the statistical office which is disseminating such a data set: the 'coarseness' of the key variables and the size of the data set. The model yields guidelines as to the usage of these two instruments to control the disclosure risk.  相似文献   

7.
This is an expository paper. Here we propose a decision-theoretic framework for addressing aspects of the confidentiality of information problems in publicly released data. Our basic premise is that the problem needs to be conceptualized by looking at the actions of three agents: a data collector, a legitimate data user, and an intruder. Here we aim to prescribe the actions of the first agent who desires to provide useful information to the second agent, but must protect against possible misuse by the third. The first agent is under the constraint that the released data has to be public to all; this in some societies may not be the case.
A novel aspect of our paper is that all utilities—fundamental to decision making—are in terms of Shannon's information entropy. Thus what gets released is a distribution whose entropy maximizes the expected utility of the first agent. This means that the distribution that gets released will be different from that which generates the collected data. The discrepancy between the two distributions can be assessed via the Kullback–Leibler cross-entropy function. Our proposed strategy therefore boils down to the notion that it is the information content of the data, not the actual data, that gets masked. Current practice of "statistical disclosure limitation" masks the observed data via transformations or cell suppression. These transformations are guided by balancing what are known as "disclosure risks" and "data utility". The entropy indexed utility functions we propose are isomorphic to the above two entities. Thus our approach provides a formal link to that which is currently practiced in statistical disclosure limitation.  相似文献   

8.
Risk‐utility formulations for problems of statistical disclosure limitation are now common. We argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short in essential ways from providing a sound basis for acting upon the problems. We illustrate this position in three specific contexts—transparency, tabular data and survey weights, with shorter consideration of two key emerging issues—longitudinal data and the use of administrative data to augment surveys.  相似文献   

9.
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions.
The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches.
This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.  相似文献   

10.
We consider a revenue-maximizing seller who, before proposing a mechanism to sell her object(s), observes a vector of signals correlated with buyers’ valuations. Each buyer knows only the signal that the seller observes about him, but not the signals she observes about other buyers. The seller first chooses how to disclose her information and then chooses a revenue-maximizing mechanism. We allow for very general disclosure policies, that can be random, public, private, or any mixture of these possibilities. Through the disclosure of information privately, the seller can create correlation in buyers’ private information, which then consist of valuations plus beliefs. For the standard independent private values model, we show that information revelation is irrelevant: irrespective of the disclosure policy an optimal mechanism for this informed seller generates expected revenue that is equal to her maximal revenue under full information disclosure. For more general allocation environments that allow also for interdependent, for common values, and for multiple items, disclosure policies may matter, and the best the seller can do is to disclose no information at all.  相似文献   

11.
This paper investigates the optimal disclosure strategy for private information in a mixed duopoly market, where a state-owned enterprise (SOE) and a joint-stock company compete to supply products. I construct a model where the two firms compete in either quantity or price, and uncertainty is associated with either marginal cost or market demand. The model identifies the optimal disclosure strategies that constitute a perfect Bayesian equilibrium by type of competition and uncertainty. In Cournot competition, both firms disclose information under cost uncertainty, while only the SOE or neither firm discloses information under demand uncertainty. Alternatively, in Bertrand competition, only the joint-stock company discloses information under cost uncertainty or demand uncertainty. Recently, developed countries have required the same level of disclosure standards for SOEs as for ordinary joint-stock companies. The findings described in this paper warn that such mandatory disclosure by SOEs can trigger a reaction by joint-stock companies, putting the economy at risk of a reduction in welfare.  相似文献   

12.
环境信息披露是银行评估企业或项目环境风险的重要信息来源。以我国环境敏感型行业2011—2016年上市公司为研究对象,采用固定效应模型实证分析环境信息披露质量对企业债务融资成本的影响,结果显示企业环境信息披露质量对债务融资成本的降低作用不显著;将环境信息分为货币性和非货币性信息,发现货币性环境信息披露的质量对降低债务融资成本有显著影响。如何提高非货币性环境信息的作用是未来值得关注的问题。  相似文献   

13.
In this paper, we investigate certain operational and inferential aspects of invariant Post‐randomization Method (PRAM) as a tool for disclosure limitation of categorical data. Invariant PRAM preserves unbiasedness of certain estimators, but inflates their variances and distorts other attributes. We introduce the concept of strongly invariant PRAM, which does not affect data utility or the properties of any statistical method. However, the procedure seems feasible in limited situations. We review methods for constructing invariant PRAM matrices and prove that a conditional approach, which can preserve the original data on any subset of variables, yields invariant PRAM. For multinomial sampling, we derive expressions for variance inflation inflicted by invariant PRAM and variances of certain estimators of the cell probabilities and also their tight upper bounds. We discuss estimation of these quantities and thereby assessing statistical efficiency loss from applying invariant PRAM. We find a connection between invariant PRAM and creating partially synthetic data using a non‐parametric approach, and compare estimation variance under the two approaches. Finally, we discuss some aspects of invariant PRAM in a general survey context.  相似文献   

14.
In this paper, we study a Bayesian approach to flexible modeling of conditional distributions. The approach uses a flexible model for the joint distribution of the dependent and independent variables and then extracts the conditional distributions of interest from the estimated joint distribution. We use a finite mixture of multivariate normals (FMMN) to estimate the joint distribution. The conditional distributions can then be assessed analytically or through simulations. The discrete variables are handled through the use of latent variables. The estimation procedure employs an MCMC algorithm. We provide a characterization of the Kullback–Leibler closure of FMMN and show that the joint and conditional predictive densities implied by the FMMN model are consistent estimators for a large class of data generating processes with continuous and discrete observables. The method can be used as a robust regression model with discrete and continuous dependent and independent variables and as a Bayesian alternative to semi- and non-parametric models such as quantile and kernel regression. In experiments, the method compares favorably with classical nonparametric and alternative Bayesian methods.  相似文献   

15.
The relationship between disclosure quality and cost of equity capital is an important topic in today's economy. In general, economic theory and anecdotal evidence suggest a negative association. Empirical work on this link, however, is confronted with major methodological drawbacks – neither disclosure level nor cost of capital can be observed directly – and has documented somewhat confounding results so far. Adopting a finite horizon version of the residual income model, I provide evidence on the nature of the above relationship and try to quantify the effect of a firm's voluntary disclosure policy on its implied cost of capital. Switzerland seems especially suited for an analysis of this kind given that Swiss firms have considerable reporting discretion and the mandated level of disclosure is low. For a cross-sectional sample of seventy-three non-financial companies I show a negative and highly significant association between the two variables. The magnitude is such that the most forthcoming firms enjoy about a 1.8 to 2.4% cost advantage over the least forthcoming firms. The findings persist even after controlling for other potentially influential variables, e.g. risk characteristics and firm size. Furthermore, adjusting for self-selection bias – a major concern in disclosure studies – the marginal effect remains of the same direction and even increases in magnitude, although at lower levels of statistical significance. One reason for the strong relationship might be found in differing institutional factors between the US and Swiss capital markets.  相似文献   

16.
信息披露管制的数学论证   总被引:1,自引:1,他引:0  
一是从认识上分析了信息披露管制的理论依倨;二是从数学上对披露管制做了深入推演;三是强调了数学论证的局限性。文中认为,信息不对称形成的起点是公司内部的信息生成,从而引申出通过"三九四"工程对ERP系统进行机制控制,以实现信息对称的管理目标。  相似文献   

17.
An important statistical application is the problem of determining an appropriate set of input variables for modelling a response variable. In such an application, candidate models are characterized by which input variables are included in the mean structure. A reasonable approach to gauging the propriety of a candidate model is to define a discrepancy function through the prediction error associated with this model. An optimal set of input variables is then determined by searching for the candidate model that minimizes the prediction error. In this paper, we focus on a Bayesian approach to estimating a discrepancy function based on prediction error in linear regression. It is shown how this approach provides an informative method for quantifying model selection uncertainty.  相似文献   

18.
Microaggregation is a popular statistical disclosure control technique for continuous data. The basic principle of microaggregation is to group the observations in a data set and to replace them by their corresponding group means. However, while reducing the disclosure risk of data files, the technique also affects the results of statistical analyses. The paper deals with the impact of microaggregation on a multiple linear regression in continuous variables. We show that parameter estimates are biased if the dependent variable is used to form the groups. Using this result, we develop a consistent estimator that removes the aggregation bias, and derive its asymptotic covariance matrix.  相似文献   

19.
This paper studies an alternative quasi likelihood approach under possible model misspecification. We derive a filtered likelihood from a given quasi likelihood (QL), called a limited information quasi likelihood (LI-QL), that contains relevant but limited information on the data generation process. Our LI-QL approach, in one hand, extends robustness of the QL approach to inference problems for which the existing approach does not apply. Our study in this paper, on the other hand, builds a bridge between the classical and Bayesian approaches for statistical inference under possible model misspecification. We can establish a large sample correspondence between the classical QL approach and our LI-QL based Bayesian approach. An interesting finding is that the asymptotic distribution of an LI-QL based posterior and that of the corresponding quasi maximum likelihood estimator share the same “sandwich”-type second moment. Based on the LI-QL we can develop inference methods that are useful for practical applications under possible model misspecification. In particular, we can develop the Bayesian counterparts of classical QL methods that carry all the nice features of the latter studied in  White (1982). In addition, we can develop a Bayesian method for analyzing model specification based on an LI-QL.  相似文献   

20.
In recent years, the determinants of voluntary disclosure have been explored in an extensive body of empirical research. One major limitation of those studies is that none has tried to find out whether voluntary disclosures were occasional or continuous over time. Yet this point is particularly important, as the voluntary disclosure mechanism can only be fully effective if the manager consistently reports the same items. This paper examines the factors associated with the decision to stop disclosing an item of information previously published voluntarily (henceforth ‘information withholding’ or IW). To measure information withholding, we code 178 annual reports of French firms for three consecutive years. Although disclosure scores are relatively stable over time, we find that this does not mean there is no change in voluntary disclosure across the years. We document that IW is a widespread practice: on average, one voluntary item out of seven disclosed in a given year is withheld the following year. We show that information withholding is mainly related to the firm's competition environment, ownership diffusion, board independence and the existence of a dual leadership structure (separate CEO and chairman).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号