首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In many surveys, imputation procedures are used to account for non‐response bias induced by either unit non‐response or item non‐response. Such procedures are optimised (in terms of reducing non‐response bias) when the models include covariates that are highly predictive of both response and outcome variables. To achieve this, we propose a method for selecting sets of covariates used in regression imputation models or to determine imputation cells for one or more outcome variables, using the fraction of missing information (FMI) as obtained via a proxy pattern‐mixture (PMM) model as the key metric. In our variable selection approach, we use the PPM model to obtain a maximum likelihood estimate of the FMI for separate sets of candidate imputation models and look for the point at which changes in the FMI level off and further auxiliary variables do not improve the imputation model. We illustrate our proposed approach using empirical data from the Ohio Medicaid Assessment Survey and from the Service Annual Survey.  相似文献   

2.
By closely examining the examples provided in Nielsen (2003), this paper further explores the relationship between self-efficiency (Meng, 1994) and the validity of Rubin's multiple imputation (RMI) variance combining rule. The RMI variance combining rule is based on the common assumption/intuition that the efficiency of our estimators decreases when we have less data. However, there are estimation procedures that will do the opposite, that is, they can produce more efficient estimators with less data. Self-efficiency is a theoretical formulation for excluding such procedures. When a user, typically unaware of the hidden self-inefficiency of his choice, adopts a self-inefficient complete-data estimation procedure to conduct an RMI inference, the theoretical validity of his inference becomes a complex issue, as we demonstrate. We also propose a diagnostic tool for assessing potential self-inefficiency and the bias in the RMI variance estimator, at the outset of RMI inference, by constructing a convenient proxy to the RMI point estimator.  相似文献   

3.
Receiver operating characteristic curves are widely used as a measure of accuracy of diagnostic tests and can be summarised using the area under the receiver operating characteristic curve (AUC). Often, it is useful to construct a confidence interval for the AUC; however, because there are a number of different proposed methods to measure variance of the AUC, there are thus many different resulting methods for constructing these intervals. In this article, we compare different methods of constructing Wald‐type confidence interval in the presence of missing data where the missingness mechanism is ignorable. We find that constructing confidence intervals using multiple imputation based on logistic regression gives the most robust coverage probability and the choice of confidence interval method is less important. However, when missingness rate is less severe (e.g. less than 70%), we recommend using Newcombe's Wald method for constructing confidence intervals along with multiple imputation using predictive mean matching.  相似文献   

4.
The missing data problem has been widely addressed in the literature. The traditional methods for handling missing data may be not suited to spatial data, which can exhibit distinctive structures of dependence and/or heterogeneity. As a possible solution to the spatial missing data problem, this paper proposes an approach that combines the Bayesian Interpolation method [Benedetti, R. & Palma, D. (1994) Markov random field-based image subsampling method, Journal of Applied Statistics, 21(5), 495–509] with a multiple imputation procedure. The method is developed in a univariate and a multivariate framework, and its performance is evaluated through an empirical illustration based on data related to labour productivity in European regions.  相似文献   

5.
In intensive care units (ICUs), besides routinely collected admission data, a daily monitoring of organ dysfunction using scoring systems such as the sequential organ failure assessment (SOFA) score has become practice. Such updated information is valuable in making accurate predictions of patients' survival. Few prediction models that incorporate this updated information have been reported. We used follow‐up data of ICU patients who either died or were discharged at the end of hospital stay, without censored cases. We propose a joint model comprising a linear mixed effects submodel for the development of longitudinal SOFA scores and a proportional subdistribution hazards submodel for death as end point with discharge as competing risk. The two parts are linked by shared latent terms. Because there was no censoring, it was straightforward to fit our joint model using available software. We compared predictive values, based on the Brier score and the area under the receiver operating characteristic curve, from our model with those obtained from an earlier modeling approach by Toma et al . [Journal of Biomedical Informatics 40, 649, (2007)] that relied on patterns discovered in the SOFA scores over a given period of time.  相似文献   

6.
The most common way for treating item non‐response in surveys is to construct one or more replacement values to fill in for a missing value. This process is known as imputation. We distinguish single from multiple imputation. Single imputation consists of replacing a missing value by a single replacement value, whereas multiple imputation uses two or more replacement values. This article reviews various imputation procedures used in National Statistical Offices as well as the properties of point and variance estimators in the presence of imputed survey data. It also provides the reader with newer developments in the field.  相似文献   

7.
This paper outlines a strategy to validate multiple imputation methods. Rubin's criteria for proper multiple imputation are the point of departure. We describe a simulation method that yields insight into various aspects of bias and efficiency of the imputation process. We propose a new method for creating incomplete data under a general Missing At Random (MAR) mechanism. Software implementing the validation strategy is available as a SAS/IML module. The method is applied to investigate the behavior of polytomous regression imputation for categorical data.  相似文献   

8.
Multiple imputation methods properly account for the uncertainty of missing data. One of those methods for creating multiple imputations is predictive mean matching (PMM), a general purpose method. Little is known about the performance of PMM in imputing non‐normal semicontinuous data (skewed data with a point mass at a certain value and otherwise continuously distributed). We investigate the performance of PMM as well as dedicated methods for imputing semicontinuous data by performing simulation studies under univariate and multivariate missingness mechanisms. We also investigate the performance on real‐life datasets. We conclude that PMM performance is at least as good as the investigated dedicated methods for imputing semicontinuous data and, in contrast to other methods, is the only method that yields plausible imputations and preserves the original data distributions.  相似文献   

9.
10.
In this paper we develop a model for the conditional inflated multivariate density of integer count variables with domain ?n, n?. Our modelling framework is based on a copula approach and can be used for a broad set of applications where the primary characteristics of the data are: (i) discrete domain; (ii) the tendency to cluster at certain outcome values; and (iii) contemporaneous dependence. These kinds of properties can be found for high‐ or ultra‐high‐frequency data describing the trading process on financial markets. We present a straightforward sampling method for such an inflated multivariate density through the application of an independence Metropolis–Hastings sampling algorithm. We demonstrate the power of our approach by modelling the conditional bivariate density of bid and ask quote changes in a high‐frequency setup. We show how to derive the implied conditional discrete density of the bid–ask spread, taking quote clusterings (at multiples of 5 ticks) into account. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.
Huisman  Mark 《Quality and Quantity》2000,34(4):331-351
Among the wide variety of procedures to handle missing data, imputingthe missing values is a popular strategy to deal with missing itemresponses. In this paper some simple and easily implemented imputationtechniques like item and person mean substitution, and somehot-deck procedures, are investigated. A simulation study was performed based on responses to items forming a scale to measure a latent trait ofthe respondents. The effects of different imputation procedures onthe estimation of the latent ability of the respondents wereinvestigated, as well as the effect on the estimation of Cronbach'salpha (indicating the reliability of the test) and Loevinger'sH-coefficient (indicating scalability). The results indicate thatprocedures which use the relationships between items perform best,although they tend to overestimate the scale quality.  相似文献   

12.
Imputation procedures such as fully efficient fractional imputation (FEFI) or multiple imputation (MI) create multiple versions of the missing observations, thereby reflecting uncertainty about their true values. Multiple imputation generates a finite set of imputations through a posterior predictive distribution. Fractional imputation assigns weights to the observed data. The focus of this article is the development of FEFI for partially classified two-way contingency tables. Point estimators and variances of FEFI estimators of population proportions are derived. Simulation results, when data are missing completely at random or missing at random, show that FEFI is comparable in performance to maximum likelihood estimation and multiple imputation and superior to simple stochastic imputation and complete case anlaysis. Methods are illustrated with four data sets.  相似文献   

13.
A common problem in survey sampling is to compare two cross‐sectional estimates for the same study variable taken from two different waves or occasions. These cross‐sectional estimates often include imputed values to compensate for item non‐response. The estimation of the sampling variance of the estimator of change is useful to judge whether the observed change is statistically significant. Estimating the variance of a change is not straightforward because of the rotation in repeated surveys and imputation. We propose using a multivariate linear regression approach and show how it can be used to accommodate the effect of rotation and imputation. The regression approach gives a design‐consistent estimation of the variance of change when the sampling fraction is small. We illustrate the proposed approach using random hot‐deck imputation, although the proposed estimator can be implemented with other imputation techniques.  相似文献   

14.
The asymptotic approach and Fisher's exact approach have often been used for testing the association between two dichotomous variables. The asymptotic approach may be appropriate to use in large samples but is often criticized for being associated with unacceptable high actual type I error rates for small to medium sample sizes. Fisher's exact approach suffers from conservative type I error rates and low power. For these reasons, a number of exact unconditional approaches have been proposed, which have been seen to be generally more powerful than exact conditional counterparts. We consider the traditional unconditional approach based on maximization and compare it to our presented approach, which is based on estimation and maximization. We extend the unconditional approach based on estimation and maximization to designs with the total sum fixed. The procedures based on the Pearson chi‐square, Yates's corrected, and likelihood ratio test statistics are evaluated with regard to actual type I error rates and powers. A real example is used to illustrate the various testing procedures. The unconditional approach based on estimation and maximization performs well, having an actual level much closer to the nominal level. The Pearson chi‐square and likelihood ratio test statistics work well with this efficient unconditional approach. This approach is generally more powerful than the other p‐value calculation methods in the scenarios considered.  相似文献   

15.
Comparing occurrence rates of events of interest in science, business, and medicine is an important topic. Because count data are often under‐reported, we desire to account for this error in the response when constructing interval estimators. In this article, we derive a Bayesian interval for the difference of two Poisson rates when counts are potentially under‐reported. The under‐reporting causes a lack of identifiability. Here, we use informative priors to construct a credible interval for the difference of two Poisson rate parameters with under‐reported data. We demonstrate the efficacy of our new interval estimates using a real data example. We also investigate the performance of our newly derived Bayesian approach via simulation and examine the impact of various informative priors on the new interval.  相似文献   

16.
We propose new real‐time monitoring procedures for the emergence of end‐of‐sample predictive regimes using sequential implementations of standard (heteroskedasticity‐robust) regression t‐statistics for predictability applied over relatively short time periods. The procedures we develop can also be used for detecting historical regimes of temporary predictability. Our proposed methods are robust to both the degree of persistence and endogeneity of the regressors in the predictive regression and to certain forms of heteroskedasticity in the shocks. We discuss how the monitoring procedures can be designed such that their false positive rate can be set by the practitioner at the start of the monitoring period using detection rules based on information obtained from the data in a training period. We use these new monitoring procedures to investigate the presence of regime changes in the predictability of the US equity premium at the 1‐month horizon by traditional macroeconomic and financial variables, and by binary technical analysis indicators. Our results suggest that the 1‐month‐ahead equity premium has temporarily been predictable, displaying so‐called “pockets of predictability,” and that these episodes of predictability could have been detected in real time by practitioners using our proposed methodology.  相似文献   

17.
Incomplete data is a common problem of survey research. Recent work on multiple imputation techniques has increased analysts’ awareness of the biasing effects of missing data and has also provided a convenient solution. Imputation methods replace non-response with estimates of the unobserved scores. In many instances, however, non-response to a stimulus does not result from measurement problems that inhibit accurate surveying of empirical reality, but from the inapplicability of the survey question. In such cases, existing imputation techniques replace valid non-response with counterfactual estimates of a situation in which the stimulus is applicable to all respondents. This paper suggests an alternative imputation procedure for incomplete data for which no true score exists: multiple complete random imputation, which overcomes the biasing effects of missing data and allows analysts to model respondents’ valid ‘I don’t know’ answers.  相似文献   

18.
With cointegration tests often being oversized under time‐varying error variance, it is possible, if not likely, to confuse error variance non‐stationarity with cointegration. This paper takes an instrumental variable (IV) approach to establish individual‐unit test statistics for no cointegration that are robust to variance non‐stationarity. The sign of a fitted departure from long‐run equilibrium is used as an instrument when estimating an error‐correction model. The resulting IV‐based test is shown to follow a chi‐square limiting null distribution irrespective of the variance pattern of the data‐generating process. In spite of this, the test proposed here has, unlike previous work relying on instrumental variables, competitive local power against sequences of local alternatives in 1/T‐neighbourhoods of the null. The standard limiting null distribution motivates, using the single‐unit tests in a multiple testing approach for cointegration in multi‐country data sets by combining P‐values from individual units. Simulations suggest good performance of the single‐unit and multiple testing procedures under various plausible designs of cross‐sectional correlation and cross‐unit cointegration in the data. An application to the equilibrium relationship between short‐ and long‐term interest rates illustrates the dramatic differences between results of robust and non‐robust tests.  相似文献   

19.
In this review paper, we discuss the theoretical background of multiple imputation, describe how to build an imputation model and how to create proper imputations. We also present the rules for making repeated imputation inferences. Three widely used multiple imputation methods, the propensity score method, the predictive model method and the Markov chain Monte Carlo (MCMC) method, are presented and discussed.  相似文献   

20.
Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号