首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
To verify whether data are missing at random (MAR) we need to observe the missing data. There are only two exceptions: when the relationship between the probability of responding and the missing variables is either imposed by introducing untestable assumptions or recovered using additional data sources. In this paper, we briefly review the estimation and test procedures for selectivity in panel data. Furthermore, by extending the MAR definition from a static setting to the case of dynamic panel data models, we prove that some tests for selectivity are not verifying the MAR condition.  相似文献   

2.
Imputation procedures such as fully efficient fractional imputation (FEFI) or multiple imputation (MI) create multiple versions of the missing observations, thereby reflecting uncertainty about their true values. Multiple imputation generates a finite set of imputations through a posterior predictive distribution. Fractional imputation assigns weights to the observed data. The focus of this article is the development of FEFI for partially classified two-way contingency tables. Point estimators and variances of FEFI estimators of population proportions are derived. Simulation results, when data are missing completely at random or missing at random, show that FEFI is comparable in performance to maximum likelihood estimation and multiple imputation and superior to simple stochastic imputation and complete case anlaysis. Methods are illustrated with four data sets.  相似文献   

3.
Questions that often come up in contexts where household consumption data are unavailable or missing include: what are the best existing methods to obtain poverty estimates at a single snapshot in time? and over time? and what are the best available methods to study poverty dynamics? A variety of different techniques have been developed to tackle these questions, but unfortunately, they are presented in different forms and lack unified terminology. We offer a review of poverty imputation methods that address contexts ranging from completely missing and partially missing consumption data in cross‐sectional household surveys, to missing panel household data. We present the various existing methods under a common framework, with pedagogical discussion on their intuition. Empirical illustrations are provided using several rounds of household survey data from Vietnam. Furthermore, we also offer a practical guide with detailed instructions on computer programs that can be used to implement the reviewed techniques.  相似文献   

4.
A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper, we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may then be tackled by either model reduction procedures or model averaging methods. We illustrate our approach by considering the problem of estimating the relation between income and the body mass index (BMI) using survey data affected by item non-response, where the missing values on the main covariates are filled in by imputations.  相似文献   

5.
One of the most difficult problems confronting investigators who analyze data from surveys is how treat missing data. Many statistical procedures can not be used immediately if any values are missing. This paper considers the problem of estimating the population mean using auxiliary information when some observations on the sample are missing and the population mean of the auxiliary variable is not available. We use tools of classical statistical estimation theory to find a suitable estimator. We study the model and design properties of the proposed estimator. We also report the results of a broad-based simulation study of the efficiency of the estimator, which reveals very promising results.  相似文献   

6.
To understand changes in individuals' opinions and attitudes, it would be best to collect data through panels. Such panels, however, often cause irritation among respondents, resulting in low response rates and low response quality. We address whether this problem can be alleviated by designing a panel survey in an alternative way. For this purpose, we perform two field studies where we measure the effects of several panel design characteristics on response rates and response quality. These characteristics include the number of waves and the time between subsequent waves, which may be either fixed or random. Our findings suggest that response rates and response quality can be improved significantly by surveying at random time intervals. It is then crucial that panel members are not informed about the dates they will be surveyed, because in this case, respondents are less likely to develop expectations as to when they will be surveyed again. The methodology we put forward can be used to improve the efficiency of a panel study by carefully calibrating the studies' panel design parameters.  相似文献   

7.
In this paper we introduce the Random Recursive Partitioning (RRP) matching method. RRP generates a proximity matrix which might be useful in econometric applications like average treatment effect estimation. RRP is a Monte Carlo method that randomly generates non‐empty recursive partitions of the data and evaluates the proximity between two observations as the empirical frequency they fall in a same cell of these random partitions over all Monte Carlo replications. From the proximity matrix it is possible to derive both graphical and analytical tools to evaluate the extent of the common support between data sets. The RRP method is “honest” in that it does not match observations “at any cost”: if data sets are separated, the method clearly states it. The match obtained with RRP is invariant under monotonic transformation of the data. Average treatment effect estimators derived from the proximity matrix seem to be competitive compared to more commonly used estimators. RRP method does not require a particular structure of the data and for this reason it can be applied when distances like Mahalanobis or Euclidean are not suitable, in the presence of missing data or when the estimated propensity score is too sensitive to model specifications. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

8.
In this paper, we present an algorithm suitable for analysing the variance of panel data when some observations are either given in grouped form or are missed. The analysis is carried out from the perspective of ANOVA panel data models with general errors. The classification intervals of the grouped observations may vary from one to another, thus the missing observations are in fact a particular case of grouping. The proposed Algorithm (1) estimates the parameters of the panel data models; (2) evaluates the covariance matrices of the asymptotic distribution of the time-dependent parameters assuming that the number of time periods, T, is fixed and the number of individuals, N, tends to infinity and similarly, of the individual parameters when T → ∞ and N is fixed; and, finally, (3) uses these asymptotic covariance matrix estimations to analyse the variance of the panel data.  相似文献   

9.
Conclusions on the development of delinquent behaviour during the life-course can only be made with longitudinal data, which is regularly gained by repeated interviews of the same respondents. Missing data are a problem for the analysis of delinquent behaviour during the life-course shown with data from an adolescents’ four-wave panel. In this article two alternative techniques to cope with missing data are used: full information maximum likelihood estimation and multiple imputation. Both methods allow one to consider all available data (including adolescents with missing information on some variables) in order to estimate the development of delinquency. We demonstrate that self-reported delinquency is systematically underestimated with listwise deletion (LD) of missing data. Further, LD results in false conclusions on gender and school specific differences of the age–crime relationship. In the final discussion some hints are given for further methods to deal with bias in panel data affected by the missing process.  相似文献   

10.
The Lee–Carter method for modeling and forecasting mortality has been shown to work quite well given long time series of data. Here we consider how it can be used when there are few observations at uneven intervals. Assuming that the underlying model is correct and that the mortality index follows a random walk with drift, we find the method can be used with sparse data. The central forecast depends mainly on the first and last observation, and so can be generated with just two observations, preferably not too close in time. With three data points, uncertainty can also be estimated, although such estimates of uncertainty are themselves highly uncertain and improve with additional observations. We apply the methods to China and South Korea, which have 3 and 20 data points, respectively, at uneven intervals.  相似文献   

11.
《Journal of econometrics》2005,126(2):493-523
The estimated parameters of output distance functions frequently violate the monotonicity, quasi-convexity and convexity constraints implied by economic theory, leading to estimated elasticities and shadow prices that are incorrectly signed, and ultimately to perverse conclusions concerning the effects of input and output changes on productivity growth and relative efficiency levels. We show how a Bayesian approach can be used to impose these constraints on the parameters of a translog output distance function. Implementing the approach involves the use of a Gibbs sampler with data augmentation. A Metropolis–Hastings algorithm is also used within the Gibbs to simulate observations from truncated pdfs. Our methods are developed for the case where panel data is available and technical inefficiency effects are assumed to be time-invariant. Two models—a fixed effects model and a random effects model—are developed and applied to panel data on 17 European railways. We observe significant changes in estimated elasticities and shadow price ratios when regularity restrictions are imposed.  相似文献   

12.
It is shown that the classical taxonomy of missing data models, namely missing completely at random, missing at random and informative missingness, which has been developed almost exclusively within a selection modelling framework, can also be applied to pattern-mixture models. In particular, intuitively appealing identifying restrictions are proposed for a pattern-mixture MAR mechanism.  相似文献   

13.
The bias of various estimators for static cross-section and panel data models is assessed in a simulation study, where the actual data generating process is a dynamic adjustment mechanism with random individual effects. It is concluded that the consequences of incorrectly estimating a static model can be rather serious. Therefore, it is important to have an accurate technique available for the detection of dynamics. Two exact similar tests for the presence of a lagged dependent variable in panel data models are developed; in some simulation experiments these tests outperform standard asymptotic test procedures. Empirical results on Engle curves for food illustrate the above issues.  相似文献   

14.
Studies of efficiency in banking and elsewhere often impose arbitrary assumptions on the distributions of efficiency and random error in order to separate one from the other. In this study, we impose much less structure on these distributions and only assume that efficiencies are stable over time while random error tends to average out. We are able to do so by estimating firm-specific effects on costs using panel data sets of over 28,000 observations on U.S. banks from 1980 to 1989. We find results similar to the literature—X-efficiencies or managerial differences in efficiency are important in banking, while scale-efficiency differences are not. However, we also find that the distributional assumptions usually imposed in the literature are not very consistent with these data.  相似文献   

15.
In most surveys, one is confronted with missing or, more generally, coarse data. Traditional methods dealing with these data require strong, untestable and often doubtful assumptions, for example, coarsening at random. But due to the resulting, potentially severe bias, there is a growing interest in approaches that only include tenable knowledge about the coarsening process, leading to imprecise but reliable results. In this spirit, we study regression analysis with a coarse categorical‐dependent variable and precisely observed categorical covariates. Our (profile) likelihood‐based approach can incorporate weak knowledge about the coarsening process and thus offers a synthesis of traditional methods and cautious strategies refraining from any coarsening assumptions. This also allows a discussion of the uncertainty about the coarsening process, besides sampling uncertainty and model uncertainty. Our procedure is illustrated with data of the panel study ‘Labour market and social security' conducted by the Institute for Employment Research, whose questionnaire design produces coarse data.  相似文献   

16.
Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.  相似文献   

17.
Using data from the first 11 waves of the BHPS, this paper measures the extent of the selection bias induced by standard coresidence conditions—bias that is expected to be severe in short panels—on measures of intergenerational mobility in occupational prestige. We try to limit the impact of other selection biases, such as those induced by labour market restrictions that are typically imposed in intergenerational mobility studies, by using different measures of socio‐economic status that account for missing labour market information. We stress four main results. First, there is evidence of an underestimation of the true intergenerational elasticity, the extent of which ranges between 12% and 39%. Second, the proposed methods used to correct for the selection bias seem to be unable to attenuate it, except for the propensity score weighting procedure, which performs well in most circumstances. This result is confirmed both under the assumption of missing‐at‐random data as well as under the assumption of not‐missing‐at‐random data. Third, the two previous sets of results (direction and extent of the bias, and differential abilities to correct for it) are also robust when we account for measurement error. Fourth, restricting the sample to a period shorter than the 11 waves under analysis leads to a severe sample selection bias. In the cases when the analysis is limited to eight waves, this bias ranges from about 40% to 65%. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

18.
In cross-national longitudinal studies it is often impossible to administer the same measurement instruments at the same occasions to all sample units in all participating countries. This quickly results in large quantities of missing data, due to (a) missing measurement instruments in some countries, (b) missing assessment waves within or across countries, (c) missing data for individual sample units. As compared to cross-sectional studies, the problem of missing values is further aggravated by the fact that missing values are always associated with different time intervals between repeated observations. In the past, this has often been dealt with by the use of phantom-variables, but this approach is limited to simple designs with few missing value patters. In the present paper we propose a new way to think of, and deal with, missing values in longitudinal studies. Instead of conceiving of a longitudinal study as a study with \(T\) discrete time points of which some are missing, we propose to conceive of a longitudinal study as a way to measure an underlying process that develops continuously over time, but is only observed at some selected discrete time points. This transforms the problem of missing values into a problem of unequal time intervals. After a quick introduction to the basic idea of continuous time modeling, we demonstrate how this approach provides a straightforward solution to missing measurement instruments in some countries, missing assessment waves within or across countries, and missing data for individual sample units.  相似文献   

19.
Sanjoy K. Sinha 《Metrika》2012,75(7):913-938
We encounter missing data in many longitudinal studies. When the missing data are nonignorable, it is important to analyze the data by incorporating the missing data mechanism into the observed data likelihood function. The classical maximum likelihood (ML) method for analyzing longitudinal missing data has been extensively studied in the literature. However, it is well-known that the ordinary ML estimators are sensitive to extreme observations or outliers in the data. In this paper, we propose and explore a robust method, which is developed in the framework of the ML method, and is useful for downweighting any influential observations in the data when estimating the model parameters. We study the empirical properties of the robust estimators in small simulations. We also illustrate the robust method using incomplete longitudinal data on CD4 counts from clinical trials of HIV-infected patients.  相似文献   

20.
A class of sequential estimation procedures is considered in the case when relevant data may become available only at random times. The exact distributions of the optimal stopping time and the number of observations at the moment of stopping are derived in some sequential procedures. The results obtained in an explicit form are applied to derive the expected time of observing the process, the average number of observations and the expected loss of sequential estimation procedures based on delayed observations. The use of the results is illustrated in a special model of normally distributed observations and the Weibull distributed lifetimes. The probabilistic characteristics are also derived for an adaptive sequential procedures and the behavior of the adaptive procedure is compared with the corresponding optimal sequential procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号