首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Imputation: Methods, Simulation Experiments and Practical Examples   总被引:1,自引:0,他引:1  
When conducting surveys, two kinds of nonresponse may cause incomplete data files: unit nonresponse (complete nonresponse) and item nonresponse (partial nonresponse). The selectivity of the unit nonresponse is often corrected for. Various imputation techniques can be used for the missing values because of item nonresponse. Several of these imputation techniques are discussed in this report. One is the hot deck imputation. This paper describes two simulation experiments of the hot deck method. In the first study, data are randomly generated, and various percentages of missing values are then non-randomly'added'to the data. The hot deck method is used to reconstruct the data in this Monte Carlo experiment. The performance of the method is evaluated for the means, standard deviations, and correlation coefficients and compared with the available case method. In the second study, the quality of an imputation method is studied by running a simulation experiment. A selection of the data of the Dutch Housing Demand Survey is perturbed by leaving out specific values on a variable. Again hot deck imputations are used to reconstruct the data. The imputations are then compared with the true values. In both experiments the conclusion is that the hot deck method generally performs better than the available case method. This paper also deals with the questions which variables should be imputed and what the duration of the imputation process is. Finally the theory is illustrated by the imputation approaches of the Dutch Housing Demand Survey, the European Community Household Panel Survey (ECHP) and the new Dutch Structure of Earnings Survey (SES). These examples illustrate the levels of missing data that can be experienced in such surveys and the practical problems associated with choosing an appropriate imputation strategy for key items from each survey.  相似文献   

2.
Huisman  Mark 《Quality and Quantity》2000,34(4):331-351
Among the wide variety of procedures to handle missing data, imputingthe missing values is a popular strategy to deal with missing itemresponses. In this paper some simple and easily implemented imputationtechniques like item and person mean substitution, and somehot-deck procedures, are investigated. A simulation study was performed based on responses to items forming a scale to measure a latent trait ofthe respondents. The effects of different imputation procedures onthe estimation of the latent ability of the respondents wereinvestigated, as well as the effect on the estimation of Cronbach'salpha (indicating the reliability of the test) and Loevinger'sH-coefficient (indicating scalability). The results indicate thatprocedures which use the relationships between items perform best,although they tend to overestimate the scale quality.  相似文献   

3.
In this paper, we introduce a threshold stochastic volatility model with explanatory variables. The Bayesian method is considered in estimating the parameters of the proposed model via the Markov chain Monte Carlo (MCMC) algorithm. Gibbs sampling and Metropolis–Hastings sampling methods are used for drawing the posterior samples of the parameters and the latent variables. In the simulation study, the accuracy of the MCMC algorithm, the sensitivity of the algorithm for model assumptions, and the robustness of the posterior distribution under different priors are considered. Simulation results indicate that our MCMC algorithm converges fast and that the posterior distribution is robust under different priors and model assumptions. A real data example was analyzed to explain the asymmetric behavior of stock markets.  相似文献   

4.
The most common way for treating item non‐response in surveys is to construct one or more replacement values to fill in for a missing value. This process is known as imputation. We distinguish single from multiple imputation. Single imputation consists of replacing a missing value by a single replacement value, whereas multiple imputation uses two or more replacement values. This article reviews various imputation procedures used in National Statistical Offices as well as the properties of point and variance estimators in the presence of imputed survey data. It also provides the reader with newer developments in the field.  相似文献   

5.
Although item nonresponse can never be totally prevented, it can be considerably reduced, and thereby provide the researcher with not only more useable data, but also with helpful auxiliary information for a better imputation and adjustment. To achieve this an optimal data collection design is necessary. The optimization of the questionnaire and survey design are the main tools a researcher has to reduce the number of missing data in any such survey. In this contribution a concise typology of missing data patterns and their sources of origin are presented. Based on this typology, the mechanisms responsible for missing data are identified, followed by a discussion on how item nonresponse can be prevented.  相似文献   

6.
王超 《价值工程》2014,(35):81-82
本研究对措施项目缺项的价款调整问题进行了阐述,提出三大关键问题:影响措施项目缺项的因素分析、措施项目缺项的价款调整条件、措施项目缺项的价款调整分析。同时,本研究将导致措施项目缺项的原因进行责任与风险范围划分,对导致缺项的责任进一步分类后,给予相应的调整条件。通过对调整条件的分析,梳理出总价措施项目费与单价措施项目费的调整方法,对实际项目中措施项目费的价款调整提供了指导。  相似文献   

7.
交互效应面板模型是目前计量经济学前沿研究的热点,有着广阔的应用空间。但是对很多应用者而言,模型内的参数估计是一个非常棘手的问题。通常的Newton-Raphson算法在优化似然函数的过程中,常常会出现优化失败的情况。本文依据EM算法和MCMC算法理论,为应用研究者提供了一套获得参数估计值的流程。计算机上的试验证实两种估计方法都非常稳健可靠,并在很多情况下,差异不是很大。  相似文献   

8.
In this review paper, we discuss the theoretical background of multiple imputation, describe how to build an imputation model and how to create proper imputations. We also present the rules for making repeated imputation inferences. Three widely used multiple imputation methods, the propensity score method, the predictive model method and the Markov chain Monte Carlo (MCMC) method, are presented and discussed.  相似文献   

9.
For contingency tables with extensive missing data, the unrestricted MLE under the saturated model, computed by the EM algorithm, is generally unsatisfactory. In this case, it may be better to fit a simpler model by imposing some restrictions on the parameter space. Perlman and Wu (1999) propose lattice conditional independence (LCI) models for contingency tables with arbitrary missing data patterns. When this LCI model fits well, the restricted MLE under the LCI model is more accurate than the unrestricted MLE under the saturated model, but not in general. Here we propose certain empirical Bayes (EB) estimators that adaptively combine the best features of the restricted and unrestricted MLEs. These EB estimators appear to be especially useful when the observed data is sparse, even in cases where the suitability of the LCI model is uncertain. We also study a restricted EM algorithm (called the ER algorithm) with similar desirable features. Received: July 1999  相似文献   

10.
This paper discusses a factor model for short-term forecasting of GDP growth using a large number of monthly and quarterly time series in real-time. To take into account the different periodicities of the data and missing observations at the end of the sample, the factors are estimated by applying an EM algorithm, combined with a principal components estimator. We discuss some in-sample properties of the estimator in a real-time environment and propose alternative methods for forecasting quarterly GDP with monthly factors. In the empirical application, we use a novel real-time dataset for the German economy. Employing a recursive forecast experiment, we evaluate the forecast accuracy of the factor model with respect to German GDP. Furthermore, we investigate the role of revisions in forecast accuracy and assess the contribution of timely monthly observations to the forecast performance. Finally, we compare the performance of the mixed-frequency model with that of a factor model, based on time-aggregated quarterly data.  相似文献   

11.
Nested multiple imputation of NMES via partially incompatible MCMC   总被引:1,自引:0,他引:1  
The multiple imputation of the National Medical Expenditure Survey (NMES) involved the use of two new techniques, both having potentially broad applicability. The first is to use distributionally incompatible MCMC (Markov Chain Monte Carlo), but to apply it only partially, to impute the missing values that destroy a monotone pattern, thereby limiting the extent of incompatibility. The second technique is to split the missing data into two parts, one that is much more computationally expensive to impute than the other, and create several imputations of the second part for each of the first part, thereby creating nested multiple imputations with their increased inferential efficiency.  相似文献   

12.
In many surveys, imputation procedures are used to account for non‐response bias induced by either unit non‐response or item non‐response. Such procedures are optimised (in terms of reducing non‐response bias) when the models include covariates that are highly predictive of both response and outcome variables. To achieve this, we propose a method for selecting sets of covariates used in regression imputation models or to determine imputation cells for one or more outcome variables, using the fraction of missing information (FMI) as obtained via a proxy pattern‐mixture (PMM) model as the key metric. In our variable selection approach, we use the PPM model to obtain a maximum likelihood estimate of the FMI for separate sets of candidate imputation models and look for the point at which changes in the FMI level off and further auxiliary variables do not improve the imputation model. We illustrate our proposed approach using empirical data from the Ohio Medicaid Assessment Survey and from the Service Annual Survey.  相似文献   

13.
Receiver operating characteristic curves are widely used as a measure of accuracy of diagnostic tests and can be summarised using the area under the receiver operating characteristic curve (AUC). Often, it is useful to construct a confidence interval for the AUC; however, because there are a number of different proposed methods to measure variance of the AUC, there are thus many different resulting methods for constructing these intervals. In this article, we compare different methods of constructing Wald‐type confidence interval in the presence of missing data where the missingness mechanism is ignorable. We find that constructing confidence intervals using multiple imputation based on logistic regression gives the most robust coverage probability and the choice of confidence interval method is less important. However, when missingness rate is less severe (e.g. less than 70%), we recommend using Newcombe's Wald method for constructing confidence intervals along with multiple imputation using predictive mean matching.  相似文献   

14.
Incomplete data is a common problem of survey research. Recent work on multiple imputation techniques has increased analysts’ awareness of the biasing effects of missing data and has also provided a convenient solution. Imputation methods replace non-response with estimates of the unobserved scores. In many instances, however, non-response to a stimulus does not result from measurement problems that inhibit accurate surveying of empirical reality, but from the inapplicability of the survey question. In such cases, existing imputation techniques replace valid non-response with counterfactual estimates of a situation in which the stimulus is applicable to all respondents. This paper suggests an alternative imputation procedure for incomplete data for which no true score exists: multiple complete random imputation, which overcomes the biasing effects of missing data and allows analysts to model respondents’ valid ‘I don’t know’ answers.  相似文献   

15.
This study concerns list augmentation in direct marketing. List augmentation is a special case of missing data imputation. We review previous work on the mixed outcome factor model and apply it for the purpose of list augmentation. The model deals with both discrete and continuous variables and allows us to augment the data for all subjects in a company's transaction database with soft data collected in a survey among a sample of those subjects. We propose a bootstrap-based imputation approach, which is appealing to use in combination with the factor model, since it allows one to include estimation uncertainty in the imputation procedure in a simple, yet adequate manner. We provide an empirical case study of the performance of the approach to a transaction data base of a bank.  相似文献   

16.
The missing data problem has been widely addressed in the literature. The traditional methods for handling missing data may be not suited to spatial data, which can exhibit distinctive structures of dependence and/or heterogeneity. As a possible solution to the spatial missing data problem, this paper proposes an approach that combines the Bayesian Interpolation method [Benedetti, R. & Palma, D. (1994) Markov random field-based image subsampling method, Journal of Applied Statistics, 21(5), 495–509] with a multiple imputation procedure. The method is developed in a univariate and a multivariate framework, and its performance is evaluated through an empirical illustration based on data related to labour productivity in European regions.  相似文献   

17.
Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.  相似文献   

18.
Most economic applications rely on a large number of time series, which typically have a remarkable clustering structure and they are available over different spans. To handle these databases, we combined the expectation–maximization (EM) algorithm outlined by Stock and Watson (JBES, 2002) and the estimation algorithm for large factor models with an unknown number of group structures and unknown membership described by Ando and Bai (JAE, 2016; JASA, 2017) . Several Monte Carlo experiments demonstrated the good performance of the proposed method at determining the correct number of clusters, providing the appropriate number of group-specific factors, identifying error-free group membership, and obtaining accurate estimates of unobserved missing data. In addition, we found that our proposed method performed substantially better than the standard EM algorithm when the data had a grouped factor structure. Using the Federal Reserve Economic Data FRED-QD, our method detected two distinct groups of macroeconomic indicators comprising the real activity indicators and nominal indicators. Thus, we demonstrated the usefulness of our group-specific factor model for studies of business cycle chronology and for forecasting purposes.  相似文献   

19.
Sanjoy K. Sinha 《Metrika》2012,75(7):913-938
We encounter missing data in many longitudinal studies. When the missing data are nonignorable, it is important to analyze the data by incorporating the missing data mechanism into the observed data likelihood function. The classical maximum likelihood (ML) method for analyzing longitudinal missing data has been extensively studied in the literature. However, it is well-known that the ordinary ML estimators are sensitive to extreme observations or outliers in the data. In this paper, we propose and explore a robust method, which is developed in the framework of the ML method, and is useful for downweighting any influential observations in the data when estimating the model parameters. We study the empirical properties of the robust estimators in small simulations. We also illustrate the robust method using incomplete longitudinal data on CD4 counts from clinical trials of HIV-infected patients.  相似文献   

20.
Wangli Xu  Lixing Zhu 《Metrika》2013,76(1):53-69
In this paper, we investigate checking the adequacy of varying coefficient models with response missing at random. In doing so, we first construct two completed data sets based on imputation and marginal inverse probability weighted methods, respectively. The empirical process-based tests by using these two completed data sets are suggested and the asymptotic properties of the test statistics under the null and local alternative hypotheses are studied. Because the limiting null distribution is intractable, a Monte Carlo approach is applied to approximate the distribution to determine critical values. Simulation studies are carried out to examine the performance of our method, and a real data set from an environmental study is analyzed for illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号