期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Utilising Recent Advancements in Techniques for the Analysis of Incomplete Multivariate Data to Improve the Data Quality Management of Current Academic Research

Fogarty David J. Blake John 《Quality and Quantity》2002,36(3):277-289

This paper discusses the importance of managing data quality in academic research in its relation to satisfying the customer. This focus is on the data completeness objectivedimension of data quality in relation to recent advancements which have been made in the development of methods for analysing incomplete multivariate data. An overview and comparison of the traditional techniques with the recent advancements are provided. Multiple imputation is also discussed as a method of analysing incomplete multivariate data, which can potentially reduce some of the biases which can occur from using some of the traditional techniques. Despite these recent advancements in the analysis of incomplete multivariate data, evidence is presented which shows that researchers are not using these techniques to manage the data quality of their current research across a variety of academic disciplines. An analysis is then provided as to why these techniques have not been adopted along with suggestions to improve the frequency of their use in the future. Source-Reference. The ideas for this paper originated from research work on David J. Fogarty's Ph.D. dissertation. The subject area is the use of advanced techniques for the imputation of incomplete multivariate data on corporate data warehouses. 相似文献

2.

A toolkit in SAS for the evaluation of multiple imputation methods

Jaap P.L. Brand Stef van Buuren Karin Groothuis-Oudshoorn Edzard S. Gelsema† 《Statistica Neerlandica》2003,57(1):36-45

This paper outlines a strategy to validate multiple imputation methods. Rubin's criteria for proper multiple imputation are the point of departure. We describe a simulation method that yields insight into various aspects of bias and efficiency of the imputation process. We propose a new method for creating incomplete data under a general Missing At Random (MAR) mechanism. Software implementing the validation strategy is available as a SAS/IML module. The method is applied to investigate the behavior of polytomous regression imputation for categorical data. 相似文献

3.

Multiple imputation of incomplete zero‐inflated count data

Kristian Kleinke Jost Reinecke 《Statistica Neerlandica》2013,67(3):311-336

Empirical count data are often zero‐inflated and overdispersed. Currently, there is no software package that allows adequate imputation of these data. We present multiple‐imputation routines for these kinds of count data based on a Bayesian regression approach or alternatively based on a bootstrap approach that work as add‐ons for the popular multiple imputation by chained equations (mice ) software in R (van Buuren and Groothuis‐Oudshoorn , Journal of Statistical Software, vol. 45, 2011, p. 1). We demonstrate in a Monte Carlo simulation that our procedures are superior to currently available count data procedures. It is emphasized that thorough modeling is essential to obtain plausible imputations and that model mis‐specifications can bias parameter estimates and standard errors quite noticeably. Finally, the strengths and limitations of our procedures are discussed, and fruitful avenues for future theory and software development are outlined. 相似文献

4.

A Non-Iterative Bayesian Approach to Statistical Matching

Susanne Rässler 《Statistica Neerlandica》2003,57(1):58-74

Data fusion or statistical matching techniques merge datasets from different survey samples to achieve a complete but artificial data file which contains all variables of interest. The merging of datasets is usually done on the basis of variables common to all files, but traditional methods implicitly assume conditional independence between the variables never jointly observed given the common variables. Therefore we suggest using model based approaches tackling the data fusion task by more flexible procedures. By means of suitable multiple imputation techniques, the identification problem which is inherent in statistical matching is reflected. Here a non-iterative Bayesian version of Rubin's implicit regression model is presented and compared in a simulation study with imputations from a data augmentation algorithm as well as an iterative approach using chained equations. 相似文献

5.

Rebecca R. Andridge Roderick J. A. Little 《Revue internationale de statistique》2010,78(1):40-64

Hot deck imputation is a method for handling missing data in which each missing value is replaced with an observed response from a \"similar\" unit. Despite being used extensively in practice, the theory is not as well developed as that of other imputation methods. We have found that no consensus exists as to the best way to apply the hot deck and obtain inferences from the completed data set. Here we review different forms of the hot deck and existing research on its statistical properties. We describe applications of the hot deck currently in use, including the U.S. Census Bureau's hot deck for the Current Population Survey (CPS). We also provide an extended example of variations of the hot deck applied to the third National Health and Nutrition Examination Survey (NHANES III). Some potential areas for future research are highlighted. 相似文献

6.

Domenica Panzera Roberto Benedetti Paolo Postiglione 《Spatial Economic Analysis》2016,11(2):201-218

The missing data problem has been widely addressed in the literature. The traditional methods for handling missing data may be not suited to spatial data, which can exhibit distinctive structures of dependence and/or heterogeneity. As a possible solution to the spatial missing data problem, this paper proposes an approach that combines the Bayesian Interpolation method [Benedetti, R. & Palma, D. (1994) Markov random field-based image subsampling method, Journal of Applied Statistics, 21(5), 495–509] with a multiple imputation procedure. The method is developed in a univariate and a multivariate framework, and its performance is evaluated through an empirical illustration based on data related to labour productivity in European regions. 相似文献

7.

Hakan Demirtas 《Statistica Neerlandica》2004,58(4):466-482

In this article, we demonstrate by simulations that rich imputation models for incomplete longitudinal datasets produce more calibrated estimates in terms of reduced bias and higher coverage rates without duly deflating the efficiency. We argue that the use of supplementary variables that are thought to be potential causes or correlates of missingness or outcomes in the imputation process may lead to better inferential results in comparison to simpler imputation models. The liberal use of these variables is recommended as opposed to the conservative strategy. 相似文献

8.

Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random

Navarro Pastor José Blas 《Quality and Quantity》2003,37(4):363-376

Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation. 相似文献

9.

Imputation of Missing Item Responses: Some Simple Techniques

Huisman Mark 《Quality and Quantity》2000,34(4):331-351

Among the wide variety of procedures to handle missing data, imputingthe missing values is a popular strategy to deal with missing itemresponses. In this paper some simple and easily implemented imputationtechniques like item and person mean substitution, and somehot-deck procedures, are investigated. A simulation study was performed based on responses to items forming a scale to measure a latent trait ofthe respondents. The effects of different imputation procedures onthe estimation of the latent ability of the respondents wereinvestigated, as well as the effect on the estimation of Cronbach'salpha (indicating the reliability of the test) and Loevinger'sH-coefficient (indicating scalability). The results indicate thatprocedures which use the relationships between items perform best,although they tend to overestimate the scale quality. 相似文献

10.

Taking ‘Don’t Knows’ as Valid Responses: A Multiple Complete Random Imputation of Missing Data

Martin Kroh 《Quality and Quantity》2006,40(2):225-244

Incomplete data is a common problem of survey research. Recent work on multiple imputation techniques has increased analysts’ awareness of the biasing effects of missing data and has also provided a convenient solution. Imputation methods replace non-response with estimates of the unobserved scores. In many instances, however, non-response to a stimulus does not result from measurement problems that inhibit accurate surveying of empirical reality, but from the inapplicability of the survey question. In such cases, existing imputation techniques replace valid non-response with counterfactual estimates of a situation in which the stimulus is applicable to all respondents. This paper suggests an alternative imputation procedure for incomplete data for which no true score exists: multiple complete random imputation, which overcomes the biasing effects of missing data and allows analysts to model respondents’ valid ‘I don’t know’ answers. 相似文献

11.

A Random Effects Transition Model For Longitudinal Binary Data With Informative Missingness 总被引：1，自引：0，他引：1

Paul S. Albert Dean A. Follmann 《Statistica Neerlandica》2003,57(1):100-111

Understanding the transitions between disease states is often the goal in studying chronic disease. These studies, however, are typically subject to a large amount of missingness either due to patient dropout or intermittent missed visits. The missing data is often informative since missingness and dropout are usually related to either an individual's underlying disease process or the actual value of the missed observation. Our motivating example is a study of opiate addiction that examined the effect of a new treatment on thrice-weekly binary urine tests to assess opiate use over follow-up. The interest in this opiate addiction clinical trial was to characterize the transition pattern of opiate use (in each treatment arm) as well as to compare both the marginal probability of a positive urine test over follow-up and the time until the first positive urine test between the treatment arms. We develop a shared random effects model that links together the propensity of transition between states and the probability of either an intermittent missed observation or dropout. This approach allows for heterogeneous transition and missing data patterns between individuals as well as incorporating informative intermittent missing data and dropout. We compare this new approach with other approaches proposed for the analysis of longitudinal binary data with informative missingness. 相似文献

12.

Monotone missing data and pattern-mixture models

G. Molenberghs B. Michiels M. G. Kenward & P. J. Diggle 《Statistica Neerlandica》1998,52(2):153-161

It is shown that the classical taxonomy of missing data models, namely missing completely at random, missing at random and informative missingness, which has been developed almost exclusively within a selection modelling framework, can also be applied to pattern-mixture models. In particular, intuitively appealing identifying restrictions are proposed for a pattern-mixture MAR mechanism. 相似文献

13.

The Multilevel Approach to Repeated Measures for Complete and Incomplete Data

Maas Cora J. M. Snijders Tom A. B. 《Quality and Quantity》2003,37(1):71-89

Repeated measurements often are analyzed by multivariate analysis of variance (MANOVA). An alternative approach is provided by multilevel analysis, also called the hierarchical linear model (HLM), which makes use of random coefficient models. This paper is a tutorial which indicates that the HLM can be specified in many different ways, corresponding to different sets of assumptions about the covariance matrix of the repeated measurements. The possible assumptions range from the very restrictive compound symmetry model to the unrestricted multivariate model. Thus, the HLM can be used to steer a useful middle road between the two traditional methods for analyzing repeated measurements. Another important advantage of the multilevel approach to analyzing repeated measures is the fact that it can be easily used also if the data are incomplete. Thus it provides a way to achieve a fully multivariate analysis of repeated measures with incomplete data. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

14.

Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ

Joseph L. Schafer 《Statistica Neerlandica》2003,57(1):19-35

Bayesian multiple imputation (MI) has become a highly useful paradigm for handling missing values in many settings. In this paper, I compare Bayesian MI with other methods – maximum likelihood, in particular—and point out some of its unique features. One key aspect of MI, the separation of the imputation phase from the analysis phase, can be advantageous in settings where the models underlying the two phases do not agree. 相似文献

15.

Rebecca M. Kuiper Herbert Hoijtink 《Statistica Neerlandica》2011,65(4):489-506

An important application of multiple regression is predictor selection. When there are no missing values in the data, information criteria can be used to select predictors. For example, one could apply the small‐sample‐size corrected version of the Akaike information criterion (AIC), the (AICC). In this article, we discuss how information criteria should be calculated when the dependent variable and/or the predictors contain missing values. Therewith, we extensively discuss and evaluate three models that can be employed to deal with the missing data, that is, to predict the missing values. The most complex model, that is, the model with all available predictors, outperforms the other models. These results also apply to more general hypotheses than predictor selection and also to structural equation modeling (SEM) models. 相似文献

16.

Hunyong Cho Gregory J. Matthews Ofer Harel 《Revue internationale de statistique》2019,87(1):152-177

Receiver operating characteristic curves are widely used as a measure of accuracy of diagnostic tests and can be summarised using the area under the receiver operating characteristic curve (AUC). Often, it is useful to construct a confidence interval for the AUC; however, because there are a number of different proposed methods to measure variance of the AUC, there are thus many different resulting methods for constructing these intervals. In this article, we compare different methods of constructing Wald‐type confidence interval in the presence of missing data where the missingness mechanism is ignorable. We find that constructing confidence intervals using multiple imputation based on logistic regression gives the most robust coverage probability and the choice of confidence interval method is less important. However, when missingness rate is less severe (e.g. less than 70%), we recommend using Newcombe's Wald method for constructing confidence intervals along with multiple imputation using predictive mean matching. 相似文献

17.

Anthony Y. C. Kuk Jinfeng Xu 《Revue internationale de statistique》2009,77(3):395-404

In missing data problems, it is often the case that there is a natural test statistic for testing a statistical hypothesis had all the data been observed. A fuzzy p -value approach to hypothesis testing has recently been proposed which is implemented by imputing the missing values in the \"complete data\" test statistic by values simulated from the conditional null distribution given the observed data. We argue that imputing data in this way will inevitably lead to loss in power. For the case of scalar parameter, we show that the asymptotic efficiency of the score test based on the imputed \"complete data\" relative to the score test based on the observed data is given by the ratio of the observed data information to the complete data information. Three examples involving probit regression, normal random effects model, and unidentified paired data are used for illustration. For testing linkage disequilibrium based on pooled genotype data, simulation results show that the imputed Neyman Pearson and Fisher exact tests are less powerful than a Wald-type test based on the observed data maximum likelihood estimator. In conclusion, we caution against the routine use of the fuzzy p -value approach in latent variable or missing data problems and suggest some viable alternatives. 相似文献

18.

Longitudinal LISREL model estimation from incomplete panel data using the EM algorithm and the Kalman smoother

R. A. R. G. Jansen J. H. L. Oud 《Statistica Neerlandica》1995,49(3):362-377

Longitudinal data sets with the structure T (time points) × N (subjects) are often incomplete because of data missing for certain subjects at certain time points. The EM algorithm is applied in conjunction with the Kalman smoother for computing maximum likelihood estimates of longitudinal LISREL models from varying missing data patterns. The iterative procedure uses the LISREL program in the M-step and the Kalman smoother in the E-step. The application of the method is illustrated by simulating missing data on a data set from educational research. 相似文献

19.

物流设施规划综合效果评价的灵敏度分析

张兆强朱晓敏特超博《物流技术》2012,(3):119-123

在研究有关物流设施规划理论基础上,详细分析影响物流设施规划的因素,然后在充分考虑影响因素的基础上,利用模糊控制理论建立物流设施规划综合效果评价模型。通过该评价模型对影响物流设施规划的因素进行灵敏度分析,在因素输入值变化时得到综合效果的变化趋势图,从而整体上了解影响因素的灵敏程度;再根据影响因素与综合效果的关系,量化影响因素灵敏度,最后通过一个实例说明灵敏度分析的应用。相似文献

20.

一种新的连续型多配送中心选址算法 总被引：1，自引：0，他引：1

王宙魏嫄吴耀华《物流科技》2011,(6):71-73

连续型选址方法作为一种重要的选址方法,在求解配送中心选址问题中得到广泛应用。但现阶段的连续型选址方法一般只能解决单配送中心的选址问题。通过对密度思想的运用,提出了一种基于网格的连续型多配送中心选址方法。通过进行实验对模型进行验证,证实该模型实验结果可以满足要求,并且计算速度较快,有较强的实用性。相似文献