首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multiple imputation has become viewed as a general solution to missing data problems in statistics. However, in order to lead to consistent asymptotically normal estimators, correct variance estimators and valid tests, the imputations must be proper . So far it seems that only Bayesian multiple imputation, i.e. using a Bayesian predictive distribution to generate the imputations, or approximately Bayesian multiple imputations has been shown to lead to proper imputations in some settings. In this paper, we shall see that Bayesian multiple imputation does not generally lead to proper multiple imputations. Furthermore, it will be argued that for general statistical use, Bayesian multiple imputation is inefficient even when it is proper.  相似文献   

2.
Imputation procedures such as fully efficient fractional imputation (FEFI) or multiple imputation (MI) create multiple versions of the missing observations, thereby reflecting uncertainty about their true values. Multiple imputation generates a finite set of imputations through a posterior predictive distribution. Fractional imputation assigns weights to the observed data. The focus of this article is the development of FEFI for partially classified two-way contingency tables. Point estimators and variances of FEFI estimators of population proportions are derived. Simulation results, when data are missing completely at random or missing at random, show that FEFI is comparable in performance to maximum likelihood estimation and multiple imputation and superior to simple stochastic imputation and complete case anlaysis. Methods are illustrated with four data sets.  相似文献   

3.
Multiple imputation methods properly account for the uncertainty of missing data. One of those methods for creating multiple imputations is predictive mean matching (PMM), a general purpose method. Little is known about the performance of PMM in imputing non‐normal semicontinuous data (skewed data with a point mass at a certain value and otherwise continuously distributed). We investigate the performance of PMM as well as dedicated methods for imputing semicontinuous data by performing simulation studies under univariate and multivariate missingness mechanisms. We also investigate the performance on real‐life datasets. We conclude that PMM performance is at least as good as the investigated dedicated methods for imputing semicontinuous data and, in contrast to other methods, is the only method that yields plausible imputations and preserves the original data distributions.  相似文献   

4.
Imputation: Methods, Simulation Experiments and Practical Examples   总被引:1,自引:0,他引:1  
When conducting surveys, two kinds of nonresponse may cause incomplete data files: unit nonresponse (complete nonresponse) and item nonresponse (partial nonresponse). The selectivity of the unit nonresponse is often corrected for. Various imputation techniques can be used for the missing values because of item nonresponse. Several of these imputation techniques are discussed in this report. One is the hot deck imputation. This paper describes two simulation experiments of the hot deck method. In the first study, data are randomly generated, and various percentages of missing values are then non-randomly'added'to the data. The hot deck method is used to reconstruct the data in this Monte Carlo experiment. The performance of the method is evaluated for the means, standard deviations, and correlation coefficients and compared with the available case method. In the second study, the quality of an imputation method is studied by running a simulation experiment. A selection of the data of the Dutch Housing Demand Survey is perturbed by leaving out specific values on a variable. Again hot deck imputations are used to reconstruct the data. The imputations are then compared with the true values. In both experiments the conclusion is that the hot deck method generally performs better than the available case method. This paper also deals with the questions which variables should be imputed and what the duration of the imputation process is. Finally the theory is illustrated by the imputation approaches of the Dutch Housing Demand Survey, the European Community Household Panel Survey (ECHP) and the new Dutch Structure of Earnings Survey (SES). These examples illustrate the levels of missing data that can be experienced in such surveys and the practical problems associated with choosing an appropriate imputation strategy for key items from each survey.  相似文献   

5.
A typical Business Register (BR) is mainly based on administrative data files provided by organisations that produce them as a by-product of their function. Such files do not necessarily yield a perfect Business Register. A good BR should have the following characteristics: (1) It should reflect the complex structures of businesses with multiple activities, in multiple locations or with multiple legal entities; (2) It should be free of duplication, extraneous or missing units; (3) It should be properly classified in terms of key stratification variables, including size, geography and industry; (4) It should be easily updateable to represent the "newer" business picture, and not lag too much behind it. In reality, not all these desirable features are fully satisfied, resulting in a universe that has missing units, inaccurate structures, as well as improper contact information, to name a few defects.
These defects can be compensated by using sampling and estimation procedures. For example, coverage can be improved using multiple frame techniques, and the sample size can be increased to account for misclassification of units and deaths on the register. At the time of estimation, auxiliary information can be used in a variety of ways. It can be used to impute missing variables, to treat outliers, or to create synthetic variables obtained via modelling. Furthermore, time lags between the birth of units and the time that they are included on the register can be accounted for appropriately inflating the design-based estimates.  相似文献   

6.
In this review paper, we discuss the theoretical background of multiple imputation, describe how to build an imputation model and how to create proper imputations. We also present the rules for making repeated imputation inferences. Three widely used multiple imputation methods, the propensity score method, the predictive model method and the Markov chain Monte Carlo (MCMC) method, are presented and discussed.  相似文献   

7.
In this paper, the two-step generalized estimating equations (GEE) approach developed by Wang and Fitzmaurice (Biom J 2:302–318, 2006) is employed to handle income non-responses in the Panel Study of Family Dynamics survey conducted in Taiwan. In our analysis, we first construct a conditional logit model of the paid work equation by taking the missing patterns into account. We then use the estimation results to impute whether or not the nonresponses were working for pay. For those who were imputed or observed to work for pay, we adopt the two-step GEE method to estimate the income equation. Compared to simply deleting the missing cases, the two-step imputation procedure is found to improve the estimation results.  相似文献   

8.
Empirical count data are often zero‐inflated and overdispersed. Currently, there is no software package that allows adequate imputation of these data. We present multiple‐imputation routines for these kinds of count data based on a Bayesian regression approach or alternatively based on a bootstrap approach that work as add‐ons for the popular multiple imputation by chained equations (mice ) software in R (van Buuren and Groothuis‐Oudshoorn , Journal of Statistical Software, vol. 45, 2011, p. 1). We demonstrate in a Monte Carlo simulation that our procedures are superior to currently available count data procedures. It is emphasized that thorough modeling is essential to obtain plausible imputations and that model mis‐specifications can bias parameter estimates and standard errors quite noticeably. Finally, the strengths and limitations of our procedures are discussed, and fruitful avenues for future theory and software development are outlined.  相似文献   

9.
This study investigated the performance of multiple imputations with Expectation-Maximization (EM) algorithm and Monte Carlo Markov chain (MCMC) method in missing data imputation. We compared the accuracy of imputation based on some real data and set up two extreme scenarios and conducted both empirical and simulation studies to examine the effects of missing data rates and number of items used for imputation. In the empirical study, the scenario represented item of highest missing rate from a domain with fewest items. In the simulation study, we selected a domain with most items and the item imputed has lowest missing rate. In the empirical study, the results showed there was no significant difference between EM algorithm and MCMC method for item imputation, and number of items used for imputation has little impact, either. Compared with the actual observed values, the middle responses of 3 and 4 were over-imputed, and the extreme responses of 1, 2 and 5 were under-represented. The similar patterns occurred for domain imputation, and no significant difference between EM algorithm and MCMC method and number of items used for imputation has little impact. In the simulation study, we chose environmental domain to examine the effect of the following variables: EM algorithm and MCMC method, missing data rates, and number of items used for imputation. Again, there was no significant difference between EM algorithm and MCMC method. The accuracy rates did not significantly reduce with increase in the proportions of missing data. Number of items used for imputation has some contribution to accuracy of imputation, but not as much as expected.  相似文献   

10.
Data fusion or statistical matching techniques merge datasets from different survey samples to achieve a complete but artificial data file which contains all variables of interest. The merging of datasets is usually done on the basis of variables common to all files, but traditional methods implicitly assume conditional independence between the variables never jointly observed given the common variables. Therefore we suggest using model based approaches tackling the data fusion task by more flexible procedures. By means of suitable multiple imputation techniques, the identification problem which is inherent in statistical matching is reflected. Here a non-iterative Bayesian version of Rubin's implicit regression model is presented and compared in a simulation study with imputations from a data augmentation algorithm as well as an iterative approach using chained equations.  相似文献   

11.
Index     
We study two Durbin-Watson type tests for serial correlation of errors inregression models when observations are missing. We derive them by applying standard methods used in time series and linear models to deal with missing observations. The first test may be viewed as a regular Durbin-Watson test in the context of an extended model. We discuss appropriate adjustments that allow one to use all available bounds tables. We show that the test is locally most powerful invariant against the same alternative error distribution as the Durbin-Watson test. The second test is based on a modified Durbin-Watson statistic suggested by King (1981a) and is locally most powerful invariant against a first-order autoregressive process.  相似文献   

12.
Understanding the transitions between disease states is often the goal in studying chronic disease. These studies, however, are typically subject to a large amount of missingness either due to patient dropout or intermittent missed visits. The missing data is often informative since missingness and dropout are usually related to either an individual's underlying disease process or the actual value of the missed observation. Our motivating example is a study of opiate addiction that examined the effect of a new treatment on thrice-weekly binary urine tests to assess opiate use over follow-up. The interest in this opiate addiction clinical trial was to characterize the transition pattern of opiate use (in each treatment arm) as well as to compare both the marginal probability of a positive urine test over follow-up and the time until the first positive urine test between the treatment arms. We develop a shared random effects model that links together the propensity of transition between states and the probability of either an intermittent missed observation or dropout. This approach allows for heterogeneous transition and missing data patterns between individuals as well as incorporating informative intermittent missing data and dropout. We compare this new approach with other approaches proposed for the analysis of longitudinal binary data with informative missingness.  相似文献   

13.
We consider a semiparametric method to estimate logistic regression models with missing both covariates and an outcome variable, and propose two new estimators. The first, which is based solely on the validation set, is an extension of the validation likelihood estimator of Breslow and Cain (Biometrika 75:11–20, 1988). The second is a joint conditional likelihood estimator based on the validation and non-validation data sets. Both estimators are semiparametric as they do not require any model assumptions regarding the missing data mechanism nor the specification of the conditional distribution of the missing covariates given the observed covariates. The asymptotic distribution theory is developed under the assumption that all covariate variables are categorical. The finite-sample properties of the proposed estimators are investigated through simulation studies showing that the joint conditional likelihood estimator is the most efficient. A cable TV survey data set from Taiwan is used to illustrate the practical use of the proposed methodology.  相似文献   

14.
A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper, we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may then be tackled by either model reduction procedures or model averaging methods. We illustrate our approach by considering the problem of estimating the relation between income and the body mass index (BMI) using survey data affected by item non-response, where the missing values on the main covariates are filled in by imputations.  相似文献   

15.
This paper proposes an estimation method for a partial parametric model with multiple integrated time series. Our estimation procedure is based on the decomposition of the nonparametric part of the regression function into homogeneous and integrable components. It consists of two steps: In the first step we parameterize and fit the homogeneous component of the nonparametric part by the nonlinear least squares with other parametric terms in the model, and use in the second step the standard kernel method to nonparametrically estimate the integrable component of the nonparametric part from the residuals in the first step. We establish consistency and obtain the asymptotic distribution of our estimator. A simulation shows that our estimator performs well in finite samples. For the empirical illustration, we estimate the money demand functions for the US and Japan using our model and methodology.  相似文献   

16.
Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.  相似文献   

17.
专利是衡量城市科技创新能力的主要指标。本文运用主成分分析法,对我国4个直辖市和15个副省级城市的专利综合实力做出评分、排序和分级,揭示各城市间专利综合实力的差异。然后运用多元线性回归分析法,对19个城市的专利综合实力做回归分析,得出影响城市专利综合实力的主要因素,包括人均科技支出、固定资产投资总额、科技人员占从业人员的比例、每百人公共图书馆藏书和人均电信业务总量。  相似文献   

18.
Bayesian multiple imputation (MI) has become a highly useful paradigm for handling missing values in many settings. In this paper, I compare Bayesian MI with other methods – maximum likelihood, in particular—and point out some of its unique features. One key aspect of MI, the separation of the imputation phase from the analysis phase, can be advantageous in settings where the models underlying the two phases do not agree.  相似文献   

19.
The missing data problem has been widely addressed in the literature. The traditional methods for handling missing data may be not suited to spatial data, which can exhibit distinctive structures of dependence and/or heterogeneity. As a possible solution to the spatial missing data problem, this paper proposes an approach that combines the Bayesian Interpolation method [Benedetti, R. & Palma, D. (1994) Markov random field-based image subsampling method, Journal of Applied Statistics, 21(5), 495–509] with a multiple imputation procedure. The method is developed in a univariate and a multivariate framework, and its performance is evaluated through an empirical illustration based on data related to labour productivity in European regions.  相似文献   

20.
马菲  杨波峰  王郡娴  添玉 《物流科技》2013,(12):82-85,92
文章分两部分就自动化立体仓库中巷道式有轨堆垛机作业效率进行研究分析.第一部分针对自动化立体仓库中堆垛机作业效率提出两台并行作业的模式,并理论分析其作业效率,得出两台并行堆垛机比单台堆垛机作业效率提升将近3倍;第二部分利用Flexsim仿真软件,对理论分析进行了仿真,验证了结论的正确性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号