首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 33 毫秒
1.
In this paper, we investigate certain operational and inferential aspects of invariant Post‐randomization Method (PRAM) as a tool for disclosure limitation of categorical data. Invariant PRAM preserves unbiasedness of certain estimators, but inflates their variances and distorts other attributes. We introduce the concept of strongly invariant PRAM, which does not affect data utility or the properties of any statistical method. However, the procedure seems feasible in limited situations. We review methods for constructing invariant PRAM matrices and prove that a conditional approach, which can preserve the original data on any subset of variables, yields invariant PRAM. For multinomial sampling, we derive expressions for variance inflation inflicted by invariant PRAM and variances of certain estimators of the cell probabilities and also their tight upper bounds. We discuss estimation of these quantities and thereby assessing statistical efficiency loss from applying invariant PRAM. We find a connection between invariant PRAM and creating partially synthetic data using a non‐parametric approach, and compare estimation variance under the two approaches. Finally, we discuss some aspects of invariant PRAM in a general survey context.  相似文献   

2.
Data sharing in today's information society poses a threat to individual privacy and organisational confidentiality. k-anonymity is a widely adopted model to prevent the owner of a record being re-identified. By generalising and/or suppressing certain portions of the released dataset, it guarantees that no records can be uniquely distinguished from at least other k?1 records. A key requirement for the k-anonymity problem is to minimise the information loss resulting from data modifications. This article proposes a top-down approach to solve this problem. It first considers each record as a vertex and the similarity between two records as the edge weight to construct a complete weighted graph. Then, an edge cutting algorithm is designed to divide the complete graph into multiple trees/components. The Large Components with size bigger than 2k?1 are subsequently split to guarantee that each resulting component has the vertex number between k and 2k?1. Finally, the generalisation operation is applied on the vertices in each component (i.e. equivalence class) to make sure all the records inside have identical quasi-identifier values. We prove that the proposed approach has polynomial running time and theoretical performance guarantee O(k). The empirical experiments show that our approach results in substantial improvements over the baseline heuristic algorithms, as well as the bottom-up approach with the same approximate bound O(k). Comparing to the baseline bottom-up O(logk)-approximation algorithm, when the required k is smaller than 50, the adopted top-down strategy makes our approach achieve similar performance in terms of information loss while spending much less computing time. It demonstrates that our approach would be a best choice for the k-anonymity problem when both the data utility and runtime need to be considered, especially when k is set to certain value smaller than 50 and the record set is big enough to make the runtime have to be taken into account.  相似文献   

3.
In most countries, national statistical agencies do not release establishment‐level business microdata, because doing so represents too large a risk to establishments’ confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first‐ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.  相似文献   

4.
This study attempts to investigate whether corporate performance is affected by the ownership structure, using data from companies quoted on the Athens Stock Exchange for the period 1996–1998. Given such an objective, the basic hypothesis examined, is that corporate performance as measured by Tobin's Q ratio is a function of ownership and other control variables. Our econometric approach relies on the use of a combination of time series and cross section data (panel‐data analysis), a procedure that avoids many statistical problems. After examining the role of each identifiable shareholder, we find a positive relationship between institutional investors and corporate performance. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

5.
With cointegration tests often being oversized under time‐varying error variance, it is possible, if not likely, to confuse error variance non‐stationarity with cointegration. This paper takes an instrumental variable (IV) approach to establish individual‐unit test statistics for no cointegration that are robust to variance non‐stationarity. The sign of a fitted departure from long‐run equilibrium is used as an instrument when estimating an error‐correction model. The resulting IV‐based test is shown to follow a chi‐square limiting null distribution irrespective of the variance pattern of the data‐generating process. In spite of this, the test proposed here has, unlike previous work relying on instrumental variables, competitive local power against sequences of local alternatives in 1/T‐neighbourhoods of the null. The standard limiting null distribution motivates, using the single‐unit tests in a multiple testing approach for cointegration in multi‐country data sets by combining P‐values from individual units. Simulations suggest good performance of the single‐unit and multiple testing procedures under various plausible designs of cross‐sectional correlation and cross‐unit cointegration in the data. An application to the equilibrium relationship between short‐ and long‐term interest rates illustrates the dramatic differences between results of robust and non‐robust tests.  相似文献   

6.
Gower and Blasius (Quality and Quantity, 39, 2005) proposed the notion of multivariate predictability as a measure of goodness-of-fit in data reduction techniques which is useful for visualizing and screening data. For quantitative variables this leads to the usual sums-of-squares and variance accounted for criteria. For categorical variables, and in particular for ordered categorical variables, they showed how to predict the levels of all variables associated with every point (case). The proportion of predictions which agree with the true category-levels gives the measure of fit. The ideas are very general; as an illustration they used nonlinear principal components analysis. An example of the method is described in this paper using data drawn from 23 countries participating in the International Social Survey Program (1995), paying special attention to two sets of variables concerned with Regional and National Identity. It turns out that the predictability criterion suggests that the fits are rather better than is indicated by “percentage of variance accounted for”.  相似文献   

7.
In the areas of missing data and causal inference, there is great interest in doubly robust (DR) estimators that involve both an outcome regression (RG) model and a propensity score (PS) model. These DR estimators are consistent and asymptotically normal if either model is correctly specified. Despite their theoretical appeal, the practical utility of DR estimators has been disputed (e.g. Kang and Schaffer, Statistical Science 2007; 22: 523–539). One of the major concerns is the possibility of erratic estimates resulting from near‐zero denominators due to extreme values of the estimated PS. In contrast, the usual RG estimator based on the RG model alone is efficient when the RG model is correct and generally more stable than the DR estimators, although it can be biased when the RG model is incorrect. In light of the unique advantages of the RG and DR estimators, we propose a class of hybrid estimators that attempt to strike a reasonable balance between the RG and DR estimators. These hybrid estimators are motivated by heuristic arguments that coarsened PS estimates are less likely to take extreme values and less sensitive to misspecification of the PS model than the original model‐based PS estimates. The proposed estimators are compared with existing estimators in simulation studies and illustrated with real data from a large observational study on obstetric labour progression and birth outcomes.  相似文献   

8.
In dynamic panel regression, when the variance ratio of individual effects to disturbance is large, the system‐GMM estimator will have large asymptotic variance and poor finite sample performance. To deal with this variance ratio problem, we propose a residual‐based instrumental variables (RIV) estimator, which uses the residual from regressing Δyi,t?1 on as the instrument for the level equation. The RIV estimator proposed is consistent and asymptotically normal under general assumptions. More importantly, its asymptotic variance is almost unaffected by the variance ratio of individual effects to disturbance. Monte Carlo simulations show that the RIV estimator has better finite sample performance compared to alternative estimators. The RIV estimator generates less finite sample bias than difference‐GMM, system‐GMM, collapsing‐GMM and Level‐IV estimators in most cases. Under RIV estimation, the variance ratio problem is well controlled, and the empirical distribution of its t‐statistic is similar to the standard normal distribution for moderate sample sizes.  相似文献   

9.
It is argued that univariate long memory estimates based on ex post data tend to underestimate the persistence of ex ante variables (and, hence, that of the ex post variables themselves) because of the presence of unanticipated shocks whose short‐run volatility masks the degree of long‐range dependence in the data. Empirical estimates of long‐range dependence in the Fisher equation are shown to manifest this problem and lead to an apparent imbalance in the memory characteristics of the variables in the Fisher equation. Evidence in support of this typical underestimation is provided by results obtained with inflation forecast survey data and by direct calculation of the finite sample biases. To address the problem of bias, the paper introduces a bivariate exact Whittle (BEW) estimator that explicitly allows for the presence of short memory noise in the data. The new procedure enhances the empirical capacity to separate low‐frequency behaviour from high‐frequency fluctuations, and it produces estimates of long‐range dependence that are much less biased when there is noise contaminated data. Empirical estimates from the BEW method suggest that the three Fisher variables are integrated of the same order, with memory parameter in the range (0.75, 1). Since the integration orders are balanced, the ex ante real rate has the same degree of persistence as expected inflation, thereby furnishing evidence against the existence of a (fractional) cointegrating relation among the Fisher variables and, correspondingly, showing little support for a long‐run form of Fisher hypothesis. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

10.
In this paper, we evaluate the role of a set of variables as leading indicators for Euro‐area inflation and GDP growth. Our leading indicators are taken from the variables in the European Central Bank's (ECB) Euro‐area‐wide model database, plus a set of similar variables for the US. We compare the forecasting performance of each indicator ex post with that of purely autoregressive models. We also analyse three different approaches to combining the information from several indicators. First, ex post, we discuss the use as indicators of the estimated factors from a dynamic factor model for all the indicators. Secondly, within an ex ante framework, an automated model selection procedure is applied to models with a large set of indicators. No future information is used, future values of the regressors are forecast, and the choice of the indicators is based on their past forecasting records. Finally, we consider the forecasting performance of groups of indicators and factors and methods of pooling the ex ante single‐indicator or factor‐based forecasts. Some sensitivity analyses are also undertaken for different forecasting horizons and weighting schemes of forecasts to assess the robustness of the results.  相似文献   

11.
Kelejian (Letters in Spatial and Resources Sciences; 1 : 3–11) extended the J‐test procedure to a spatial framework. Although his suggested test was computationally simple and intuitive, it did not use the available information in an efficient manner. Kelejian and Piras (Regional Science and Urban Economics; 41 : 281–292) generalized and modified Kelejian's test to account for all the available information. However, neither Kelejian ( 2008 ) nor Kelejian & Piras ( 2011 ) considered a panel data framework. In this paper we generalize these earlier works to a panel data framework with fixed effects and additional endogenous variables. We give theoretical as well as Monte Carlo results relating to our suggested tests. An empirical application on a crime model for North Carolina is also estimated. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
This paper assesses the classification performance of the Z‐Score model in predicting bankruptcy and other types of firm distress, with the goal of examining the model's usefulness for all parties, especially banks that operate internationally and need to assess the failure risk of firms. We analyze the performance of the Z‐Score model for firms from 31 European and three non‐European countries using different modifications of the original model. This study is the first to offer such a comprehensive international analysis. Except for the United States and China, the firms in the sample are primarily private, and include non‐financial companies across all industrial sectors. We use the original Z′′‐Score model developed by Altman, Corporate Financial Distress: A Complete Guide to Predicting, Avoiding, and Dealing with Bankruptcy (1983) for private and public manufacturing and non‐manufacturing firms. While there is some evidence that Z‐Score models of bankruptcy prediction have been outperformed by competing market‐based or hazard models, in other studies, Z‐Score models perform very well. Without a comprehensive international comparison, however, the results of competing models are difficult to generalize. This study offers evidence that the general Z‐Score model works reasonably well for most countries (the prediction accuracy is approximately 0.75) and classification accuracy can be improved further (above 0.90) by using country‐specific estimation that incorporates additional variables.  相似文献   

13.
Abstract

This paper describes improvements on methods developed by Burgstahler and Dichev (1997, Earnings management to avoid earnings decreases and losses, Journal of Accounting and Economics, 24(1), pp. 99–126) and Bollen and Pool (2009, Do hedge fund managers misreport returns? Evidence from the pooled distribution, Journal of Finance, 64(5), pp. 2257–2288) to test for earnings management by identifying discontinuities in distributions of scaled earnings or earnings forecast errors. While existing methods use preselected bandwidths for kernel density estimation and histogram construction, the proposed test procedure addresses the key problem of bandwidth selection by using a bootstrap test to endogenise the selection step. The main advantage offered by the bootstrap procedure over prior methods is that it provides a reference distribution that cannot be globally distinguished from the empirical distribution rather than assuming a correct reference distribution. This procedure limits the researcher's degrees of freedom and offers a simple procedure to find and test a local discontinuity. I apply the bootstrap density estimation to earnings, earnings changes, and earnings forecast errors in US firms over the period 1976–2010. Significance levels found in earlier studies are greatly reduced, often to insignificant values. Discontinuities cannot be detected in analysts’ forecast errors, while such findings of discontinuities in earlier research can be explained by a simple rounding mechanism. Earnings data show a large drop in loss aversion after 2003 that cannot be detected in changes of earnings.  相似文献   

14.
We re‐examine studies of cross‐country growth regressions by Levine and Renelt (American Economic Review, Vol. 82, 1992, pp. 942–963) and Sala‐i‐Martin (American Economic Review, Vol. 87, 1997a, pp. 178–183; Economics Department, Columbia, University, 1997b). In a realistic Monte Carlo experiment, their variants of Edward Leamer's extreme‐bounds analysis are compared with a cross‐sectional version of the general‐to‐specific search methodology associated with the LSE approach to econometrics. Levine and Renelt's method has low size and low power, while Sala‐i‐Martin's method has high size and high power. The general‐to‐specific methodology is shown to have a near nominal size and high power. Sala‐i‐Martin's method and the general‐to‐specific method are then applied to the actual data from Sala‐i‐Martin's original study.  相似文献   

15.
This paper contributes to the literature on forecast evaluation by conducting an extensive Monte Carlo experiment using the evaluation procedure proposed by Elliott, Komunjer and Timmermann. We consider recent developments in weighting matrices for GMM estimation and testing. We pay special attention to the size and power properties of variants of the J‐test of forecast rationality. Proceeding from a baseline scenario to a more realistic setting, our results show that the approach leads to precise estimates of the degree of asymmetry of the loss function. For correctly specified models, we find the size of the J‐tests to be close to the nominal size, while the tests have high power against misspecified models. These findings are quite robust to inducing fat tails, serial correlation and outliers.  相似文献   

16.
This paper adapts Uhlig's [Journal of Monetary Economics (2005) forthcoming] sign restriction identification methodology to investigate the effects of UK monetary policy using a structural vector autoregression (VAR). It shows that shocks which can reasonably be described as monetary policy shocks have played only a small role in the total variation of UK monetary and macroeconomic variables. Most of the variation in UK monetary variables has been due to their systematic reaction to other macroeconomic shocks, namely non‐monetary aggregate demand, aggregate supply, and oil price shocks. We also find, without imposing any long run identifying restrictions, that aggregate supply shocks have permanent effects on output.  相似文献   

17.
In this paper, we provide an intensive review of the recent developments for semiparametric and fully nonparametric panel data models that are linearly separable in the innovation and the individual-specific term. We analyze these developments under two alternative model specifications: fixed and random effects panel data models. More precisely, in the random effects setting, we focus our attention in the analysis of some efficiency issues that have to do with the so-called working independence condition. This assumption is introduced when estimating the asymptotic variance–covariance matrix of nonparametric estimators. In the fixed effects setting, to cope with the so-called incidental parameters problem, we consider two different estimation approaches: profiling techniques and differencing methods. Furthermore, we are also interested in the endogeneity problem and how instrumental variables are used in this context. In addition, for practitioners, we also show different ways of avoiding the so-called curse of dimensionality problem in pure nonparametric models. In this way, semiparametric and additive models appear as a solution when the number of explanatory variables is large.  相似文献   

18.
In their advocacy of the rank‐transformation (RT) technique for analysis of data from factorial designs, Mende? and Yi?it (Statistica Neerlandica, 67, 2013, 1–26) missed important analytical studies identifying the statistical shortcomings of the RT technique, the recommendation that the RT technique not be used, and important advances that have been made for properly analyzing data in a non‐parametric setting. Applied data analysts are at risk of being misled by Mende? and Yi?it, when statistically sound techniques are available for the proper non‐parametric analysis of data from factorial designs. The appropriate methods express hypotheses in terms of normalized distribution functions, and the test statistics account for variance heterogeneity.  相似文献   

19.
The problem of finding an explicit formula for the probability density function of two zero‐mean correlated normal random variables dates back to 1936. Perhaps, surprisingly, this problem was not resolved until 2016. This is all the more surprising given that a very simple proof is available, which is the subject of this note; we identify the product of two zero‐mean correlated normal random variables as a variance‐gamma random variable, from which an explicit formula for the probability density function is immediate.  相似文献   

20.
Estimating dynamic panel data discrete choice models with fixed effects   总被引:1,自引:0,他引:1  
This paper considers the estimation of dynamic binary choice panel data models with fixed effects. It is shown that the modified maximum likelihood estimator (MMLE) used in this paper reduces the order of the bias in the maximum likelihood estimator from O(T-1) to O(T-2), without increasing the asymptotic variance. No orthogonal reparametrization is needed. Monte Carlo simulations are used to evaluate its performance in finite samples where T is not large. In probit and logit models containing lags of the endogenous variable and exogenous variables, the estimator is found to have a small bias in a panel with eight periods. A distinctive advantage of the MMLE is its general applicability. Estimation and relevance of different policy parameters of interest in this kind of models are also addressed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号