首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Incomplete data is a common problem of survey research. Recent work on multiple imputation techniques has increased analysts’ awareness of the biasing effects of missing data and has also provided a convenient solution. Imputation methods replace non-response with estimates of the unobserved scores. In many instances, however, non-response to a stimulus does not result from measurement problems that inhibit accurate surveying of empirical reality, but from the inapplicability of the survey question. In such cases, existing imputation techniques replace valid non-response with counterfactual estimates of a situation in which the stimulus is applicable to all respondents. This paper suggests an alternative imputation procedure for incomplete data for which no true score exists: multiple complete random imputation, which overcomes the biasing effects of missing data and allows analysts to model respondents’ valid ‘I don’t know’ answers.  相似文献   

2.
Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.  相似文献   

3.
The missing data problem has been widely addressed in the literature. The traditional methods for handling missing data may be not suited to spatial data, which can exhibit distinctive structures of dependence and/or heterogeneity. As a possible solution to the spatial missing data problem, this paper proposes an approach that combines the Bayesian Interpolation method [Benedetti, R. & Palma, D. (1994) Markov random field-based image subsampling method, Journal of Applied Statistics, 21(5), 495–509] with a multiple imputation procedure. The method is developed in a univariate and a multivariate framework, and its performance is evaluated through an empirical illustration based on data related to labour productivity in European regions.  相似文献   

4.
This paper outlines a strategy to validate multiple imputation methods. Rubin's criteria for proper multiple imputation are the point of departure. We describe a simulation method that yields insight into various aspects of bias and efficiency of the imputation process. We propose a new method for creating incomplete data under a general Missing At Random (MAR) mechanism. Software implementing the validation strategy is available as a SAS/IML module. The method is applied to investigate the behavior of polytomous regression imputation for categorical data.  相似文献   

5.
Imputation: Methods, Simulation Experiments and Practical Examples   总被引:1,自引:0,他引:1  
When conducting surveys, two kinds of nonresponse may cause incomplete data files: unit nonresponse (complete nonresponse) and item nonresponse (partial nonresponse). The selectivity of the unit nonresponse is often corrected for. Various imputation techniques can be used for the missing values because of item nonresponse. Several of these imputation techniques are discussed in this report. One is the hot deck imputation. This paper describes two simulation experiments of the hot deck method. In the first study, data are randomly generated, and various percentages of missing values are then non-randomly'added'to the data. The hot deck method is used to reconstruct the data in this Monte Carlo experiment. The performance of the method is evaluated for the means, standard deviations, and correlation coefficients and compared with the available case method. In the second study, the quality of an imputation method is studied by running a simulation experiment. A selection of the data of the Dutch Housing Demand Survey is perturbed by leaving out specific values on a variable. Again hot deck imputations are used to reconstruct the data. The imputations are then compared with the true values. In both experiments the conclusion is that the hot deck method generally performs better than the available case method. This paper also deals with the questions which variables should be imputed and what the duration of the imputation process is. Finally the theory is illustrated by the imputation approaches of the Dutch Housing Demand Survey, the European Community Household Panel Survey (ECHP) and the new Dutch Structure of Earnings Survey (SES). These examples illustrate the levels of missing data that can be experienced in such surveys and the practical problems associated with choosing an appropriate imputation strategy for key items from each survey.  相似文献   

6.
In this article, we demonstrate by simulations that rich imputation models for incomplete longitudinal datasets produce more calibrated estimates in terms of reduced bias and higher coverage rates without duly deflating the efficiency. We argue that the use of supplementary variables that are thought to be potential causes or correlates of missingness or outcomes in the imputation process may lead to better inferential results in comparison to simpler imputation models. The liberal use of these variables is recommended as opposed to the conservative strategy.  相似文献   

7.
Repeated measurements often are analyzed by multivariate analysis of variance (MANOVA). An alternative approach is provided by multilevel analysis, also called the hierarchical linear model (HLM), which makes use of random coefficient models. This paper is a tutorial which indicates that the HLM can be specified in many different ways, corresponding to different sets of assumptions about the covariance matrix of the repeated measurements. The possible assumptions range from the very restrictive compound symmetry model to the unrestricted multivariate model. Thus, the HLM can be used to steer a useful middle road between the two traditional methods for analyzing repeated measurements. Another important advantage of the multilevel approach to analyzing repeated measures is the fact that it can be easily used also if the data are incomplete. Thus it provides a way to achieve a fully multivariate analysis of repeated measures with incomplete data. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

8.
Data fusion or statistical matching techniques merge datasets from different survey samples to achieve a complete but artificial data file which contains all variables of interest. The merging of datasets is usually done on the basis of variables common to all files, but traditional methods implicitly assume conditional independence between the variables never jointly observed given the common variables. Therefore we suggest using model based approaches tackling the data fusion task by more flexible procedures. By means of suitable multiple imputation techniques, the identification problem which is inherent in statistical matching is reflected. Here a non-iterative Bayesian version of Rubin's implicit regression model is presented and compared in a simulation study with imputations from a data augmentation algorithm as well as an iterative approach using chained equations.  相似文献   

9.
There has been a growing interest regarding generalized classes of distributions in statistical theory and practice because of their flexibility in model formation. Multiple imputation under such distributions that span a broader area in the symmetry–kurtosis plane appears to have the potential of better capturing real incomplete data trends. In this article, we impute continuous univariate data that exhibit varying characteristics under two well-known distributions, assess the extent to which this procedure works properly, make comparisons with normal imputation models in terms of commonly accepted bias and precision measures, and discuss possible generalizations to the multivariate case and to larger families of distributions.  相似文献   

10.
Hot deck imputation is a method for handling missing data in which each missing value is replaced with an observed response from a "similar" unit. Despite being used extensively in practice, the theory is not as well developed as that of other imputation methods. We have found that no consensus exists as to the best way to apply the hot deck and obtain inferences from the completed data set. Here we review different forms of the hot deck and existing research on its statistical properties. We describe applications of the hot deck currently in use, including the U.S. Census Bureau's hot deck for the Current Population Survey (CPS). We also provide an extended example of variations of the hot deck applied to the third National Health and Nutrition Examination Survey (NHANES III). Some potential areas for future research are highlighted.  相似文献   

11.
Empirical count data are often zero‐inflated and overdispersed. Currently, there is no software package that allows adequate imputation of these data. We present multiple‐imputation routines for these kinds of count data based on a Bayesian regression approach or alternatively based on a bootstrap approach that work as add‐ons for the popular multiple imputation by chained equations (mice ) software in R (van Buuren and Groothuis‐Oudshoorn , Journal of Statistical Software, vol. 45, 2011, p. 1). We demonstrate in a Monte Carlo simulation that our procedures are superior to currently available count data procedures. It is emphasized that thorough modeling is essential to obtain plausible imputations and that model mis‐specifications can bias parameter estimates and standard errors quite noticeably. Finally, the strengths and limitations of our procedures are discussed, and fruitful avenues for future theory and software development are outlined.  相似文献   

12.
Huisman  Mark 《Quality and Quantity》2000,34(4):331-351
Among the wide variety of procedures to handle missing data, imputingthe missing values is a popular strategy to deal with missing itemresponses. In this paper some simple and easily implemented imputationtechniques like item and person mean substitution, and somehot-deck procedures, are investigated. A simulation study was performed based on responses to items forming a scale to measure a latent trait ofthe respondents. The effects of different imputation procedures onthe estimation of the latent ability of the respondents wereinvestigated, as well as the effect on the estimation of Cronbach'salpha (indicating the reliability of the test) and Loevinger'sH-coefficient (indicating scalability). The results indicate thatprocedures which use the relationships between items perform best,although they tend to overestimate the scale quality.  相似文献   

13.
Multiple imputation has become viewed as a general solution to missing data problems in statistics. However, in order to lead to consistent asymptotically normal estimators, correct variance estimators and valid tests, the imputations must be proper . So far it seems that only Bayesian multiple imputation, i.e. using a Bayesian predictive distribution to generate the imputations, or approximately Bayesian multiple imputations has been shown to lead to proper imputations in some settings. In this paper, we shall see that Bayesian multiple imputation does not generally lead to proper multiple imputations. Furthermore, it will be argued that for general statistical use, Bayesian multiple imputation is inefficient even when it is proper.  相似文献   

14.
Multiple imputation methods properly account for the uncertainty of missing data. One of those methods for creating multiple imputations is predictive mean matching (PMM), a general purpose method. Little is known about the performance of PMM in imputing non‐normal semicontinuous data (skewed data with a point mass at a certain value and otherwise continuously distributed). We investigate the performance of PMM as well as dedicated methods for imputing semicontinuous data by performing simulation studies under univariate and multivariate missingness mechanisms. We also investigate the performance on real‐life datasets. We conclude that PMM performance is at least as good as the investigated dedicated methods for imputing semicontinuous data and, in contrast to other methods, is the only method that yields plausible imputations and preserves the original data distributions.  相似文献   

15.
The use of joint modelling approaches is becoming increasingly popular when an association exists between survival and longitudinal processes. Widely recognized for their gain in efficiency, joint models also offer a reduction in bias compared with naïve methods. With the increasing popularity comes a constantly expanding literature on joint modelling approaches. The aim of this paper is to give an overview of recent literature relating to joint models, in particular those that focus on the time‐to‐event survival process. A discussion is provided on the range of survival submodels that have been implemented in a joint modelling framework. A particular focus is given to the recent advancements in software used to build these models. Illustrated through the use of two different real‐life data examples that focus on the survival of end‐stage renal disease patients, the use of the JM and joineR packages within R are demonstrated. The possible future direction for this field of research is also discussed.  相似文献   

16.
This study concerns list augmentation in direct marketing. List augmentation is a special case of missing data imputation. We review previous work on the mixed outcome factor model and apply it for the purpose of list augmentation. The model deals with both discrete and continuous variables and allows us to augment the data for all subjects in a company's transaction database with soft data collected in a survey among a sample of those subjects. We propose a bootstrap-based imputation approach, which is appealing to use in combination with the factor model, since it allows one to include estimation uncertainty in the imputation procedure in a simple, yet adequate manner. We provide an empirical case study of the performance of the approach to a transaction data base of a bank.  相似文献   

17.
Receiver operating characteristic curves are widely used as a measure of accuracy of diagnostic tests and can be summarised using the area under the receiver operating characteristic curve (AUC). Often, it is useful to construct a confidence interval for the AUC; however, because there are a number of different proposed methods to measure variance of the AUC, there are thus many different resulting methods for constructing these intervals. In this article, we compare different methods of constructing Wald‐type confidence interval in the presence of missing data where the missingness mechanism is ignorable. We find that constructing confidence intervals using multiple imputation based on logistic regression gives the most robust coverage probability and the choice of confidence interval method is less important. However, when missingness rate is less severe (e.g. less than 70%), we recommend using Newcombe's Wald method for constructing confidence intervals along with multiple imputation using predictive mean matching.  相似文献   

18.
The most common way for treating item non‐response in surveys is to construct one or more replacement values to fill in for a missing value. This process is known as imputation. We distinguish single from multiple imputation. Single imputation consists of replacing a missing value by a single replacement value, whereas multiple imputation uses two or more replacement values. This article reviews various imputation procedures used in National Statistical Offices as well as the properties of point and variance estimators in the presence of imputed survey data. It also provides the reader with newer developments in the field.  相似文献   

19.
陈梅 《价值工程》2014,(16):216-218
生产过程中质量控制对提高产品质量具有重要的意义。本文对数据挖掘技术的概念及方法进行了简单的介绍,以挤塑产品生产过程中温度的多元回归预测挖掘为例介绍了数据挖掘技术在生产过程质量控制的应用,表明这种方法在实际应用中的正确性和有效性。  相似文献   

20.
王清松 《价值工程》2014,(27):323-324
本文作者运用法学理论,结合质检工作实践,认真分析产品社会活动全过程、各环节的责任主体,探讨责任主体的权利与义务(责任),对产品质量责任归责原则的几个问题进行粗浅的分析探讨,以期有助于产品质量法律法规制度研究和立法工作。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号