期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Missing Data: A Unified Taxonomy Guided by Conditional Independence

Marco Doretti Sara Geneletti Elena Stanghellini 《Revue internationale de statistique》2018,86(2):189-204

Recent work (Seaman et al., 2013 ; Mealli & Rubin, 2015 ) attempts to clarify the not always well‐understood difference between realised and everywhere definitions of missing at random (MAR) and missing completely at random. Another branch of the literature (Mohan et al., 2013 ; Pearl & Mohan, 2013 ) exploits always‐observed covariates to give variable‐based definitions of MAR and missing completely at random. In this paper, we develop a unified taxonomy encompassing all approaches. In this taxonomy, the new concept of ‘complementary MAR’ is introduced, and its relationship with the concept of data observed at random is discussed. All relationships among these definitions are analysed and represented graphically. Conditional independence, both at the random variable and at the event level, is the formal language we adopt to connect all these definitions. Our paper covers both the univariate and the multivariate case, where attention is paid to monotone missingness and to the concept of sequential MAR. Specifically, for monotone missingness, we propose a sequential MAR definition that might be more appropriate than both everywhere and variable‐based MAR to model dropout in certain contexts. 相似文献

2.

Alternative Indicators for the Risk of Non‐response Bias: A Simulation Study

下载免费PDF全文

Raphael Nishimura James Wagner Michael Elliott 《Revue internationale de statistique》2016,84(1):43-62

The growth of non‐response rates for social science surveys has led to increased concern about the risk of non‐response bias. Unfortunately, the non‐response rate is a poor indicator of when non‐response bias is likely to occur. We consider in this paper a set of alternative indicators. A large‐scale simulation study is used to explore how each of these indicators performs in a variety of circumstances. Although, as expected, none of the indicators fully depict the impact of non‐response in survey estimates, we discuss how they can be used when creating a plausible account of the risks for non‐response bias for a survey. We also describe an interesting characteristic of the fraction of missing information that may be helpful in diagnosing not‐missing‐at‐random mechanisms in certain situations. 相似文献

3.

A Random Effects Transition Model For Longitudinal Binary Data With Informative Missingness 总被引：1，自引：0，他引：1

Paul S. Albert Dean A. Follmann 《Statistica Neerlandica》2003,57(1):100-111

Understanding the transitions between disease states is often the goal in studying chronic disease. These studies, however, are typically subject to a large amount of missingness either due to patient dropout or intermittent missed visits. The missing data is often informative since missingness and dropout are usually related to either an individual's underlying disease process or the actual value of the missed observation. Our motivating example is a study of opiate addiction that examined the effect of a new treatment on thrice-weekly binary urine tests to assess opiate use over follow-up. The interest in this opiate addiction clinical trial was to characterize the transition pattern of opiate use (in each treatment arm) as well as to compare both the marginal probability of a positive urine test over follow-up and the time until the first positive urine test between the treatment arms. We develop a shared random effects model that links together the propensity of transition between states and the probability of either an intermittent missed observation or dropout. This approach allows for heterogeneous transition and missing data patterns between individuals as well as incorporating informative intermittent missing data and dropout. We compare this new approach with other approaches proposed for the analysis of longitudinal binary data with informative missingness. 相似文献

4.

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

Raghunathan Trivellore 《Quality and Quantity》2003,37(1):99-110

This paper develops and evaluates an approximate procedure for testing homogeneity of an arbitrary subset of correlation coefficients among variables measured on the same set of individuals. The sample may have some missing data. The simple test statistic is a multiple of the variance of Fisher r-to-z transformed correlation coefficients relevant to the null hypothesis being tested and is referred to a chi-square distribution. The use of this test is illustrated through several examples. Given the approximate nature of the test statistics, the procedure was evaluated using a simulation study. The accuracy in terms of the nominal and the actual significance levels of this test for several null hypotheses of interest were evaluated. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

5.

How to handle missing data in regression models using information criteria*

Rebecca M. Kuiper Herbert Hoijtink 《Statistica Neerlandica》2011,65(4):489-506

An important application of multiple regression is predictor selection. When there are no missing values in the data, information criteria can be used to select predictors. For example, one could apply the small‐sample‐size corrected version of the Akaike information criterion (AIC), the (AICC). In this article, we discuss how information criteria should be calculated when the dependent variable and/or the predictors contain missing values. Therewith, we extensively discuss and evaluate three models that can be employed to deal with the missing data, that is, to predict the missing values. The most complex model, that is, the model with all available predictors, outperforms the other models. These results also apply to more general hypotheses than predictor selection and also to structural equation modeling (SEM) models. 相似文献

6.

Monotone missing data and pattern-mixture models

G. Molenberghs B. Michiels M. G. Kenward & P. J. Diggle 《Statistica Neerlandica》1998,52(2):153-161

It is shown that the classical taxonomy of missing data models, namely missing completely at random, missing at random and informative missingness, which has been developed almost exclusively within a selection modelling framework, can also be applied to pattern-mixture models. In particular, intuitively appealing identifying restrictions are proposed for a pattern-mixture MAR mechanism. 相似文献

7.

Imputation of Missing Item Responses: Some Simple Techniques

Huisman Mark 《Quality and Quantity》2000,34(4):331-351

Among the wide variety of procedures to handle missing data, imputingthe missing values is a popular strategy to deal with missing itemresponses. In this paper some simple and easily implemented imputationtechniques like item and person mean substitution, and somehot-deck procedures, are investigated. A simulation study was performed based on responses to items forming a scale to measure a latent trait ofthe respondents. The effects of different imputation procedures onthe estimation of the latent ability of the respondents wereinvestigated, as well as the effect on the estimation of Cronbach'salpha (indicating the reliability of the test) and Loevinger'sH-coefficient (indicating scalability). The results indicate thatprocedures which use the relationships between items perform best,although they tend to overestimate the scale quality. 相似文献

8.

Small‐sample testing inference in symmetric and log‐symmetric linear regression models

下载免费PDF全文

Francisco M. C. Medeiros Silvia L. P. Ferrari 《Statistica Neerlandica》2017,71(3):200-224

This paper deals with the issue of testing hypotheses in symmetric and log‐symmetric linear regression models in small and moderate‐sized samples. We focus on four tests, namely, the Wald, likelihood ratio, score, and gradient tests. These tests rely on asymptotic results and are unreliable when the sample size is not large enough to guarantee a good agreement between the exact distribution of the test statistic and the corresponding chi‐squared asymptotic distribution. Bartlett and Bartlett‐type corrections typically attenuate the size distortion of the tests. These corrections are available in the literature for the likelihood ratio and score tests in symmetric linear regression models. Here, we derive a Bartlett‐type correction for the gradient test. We show that the corrections are also valid for the log‐symmetric linear regression models. We numerically compare the various tests and bootstrapped tests, through simulations. Our results suggest that the corrected and bootstrapped tests exhibit type I probability error closer to the chosen nominal level with virtually no power loss. The analytically corrected tests as well as the bootstrapped tests, including the Bartlett‐corrected gradient test derived in this paper, perform with the advantage of not requiring computationally intensive calculations. We present a real data application to illustrate the usefulness of the modified tests. 相似文献

9.

Handling Missing Data by Re-approaching Non-respondents

Huisman Mark Krol Boudien Van Sonderen Eric 《Quality and Quantity》1998,32(1):77-91

When handling missing data, a researcher should be aware of the mechanism underlying the missingness. In the presence of non-randomly missing data, a model of the missing data mechanism should be included in the analyses to prevent the analyses based on the data from becoming biased. Modeling the missing data mechanism, however, is a difficult task. One way in which knowledge about the missing data mechanism may be obtained is by collecting additional data from non-respondents. In this paper the method of re-approaching respondents who did not answer all questions of a questionnaire is described. New answers were obtained from a sample of these non-respondents and the reason(s) for skipping questions was (were) probed for. The additional data resulted in a larger sample and was used to investigate the differences between respondents and non-respondents, whereas probing for the causes of missingness resulted in more knowledge about the nature of the missing data patterns. 相似文献

10.

k‐Nearest neighbors local linear regression for functional and missing data at random

Mustapha Rachdi Ali Laksaci Zoulikha Kaid Abbassia Benchiha Fahimah A. Al‐Awadhi 《Statistica Neerlandica》2021,75(1):42-65

We combine the k‐Nearest Neighbors (kNN) method to the local linear estimation (LLE) approach to construct a new estimator (LLE‐kNN) of the regression operator when the regressor is of functional type and the response variable is a scalar but observed with some missing at random (MAR) observations. The resulting estimator inherits many of the advantages of both approaches (kNN and LLE methods). This is confirmed by the established asymptotic results, in terms of the pointwise and uniform almost complete consistencies, and the precise convergence rates. In addition, a numerical study (i) on simulated data, then (ii) on a real dataset concerning the sugar quality using fluorescence data, were conducted. This practical study clearly shows the feasibility and the superiority of the LLE‐kNN estimator compared to competitive estimators. 相似文献

11.

Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ

Joseph L. Schafer 《Statistica Neerlandica》2003,57(1):19-35

Bayesian multiple imputation (MI) has become a highly useful paradigm for handling missing values in many settings. In this paper, I compare Bayesian MI with other methods – maximum likelihood, in particular—and point out some of its unique features. One key aspect of MI, the separation of the imputation phase from the analysis phase, can be advantageous in settings where the models underlying the two phases do not agree. 相似文献

12.

Relative Efficiency of the Fuzzy p-Value Approach to Hypothesis Testing

Anthony Y. C. Kuk Jinfeng Xu 《Revue internationale de statistique》2009,77(3):395-404

In missing data problems, it is often the case that there is a natural test statistic for testing a statistical hypothesis had all the data been observed. A fuzzy p -value approach to hypothesis testing has recently been proposed which is implemented by imputing the missing values in the "complete data" test statistic by values simulated from the conditional null distribution given the observed data. We argue that imputing data in this way will inevitably lead to loss in power. For the case of scalar parameter, we show that the asymptotic efficiency of the score test based on the imputed "complete data" relative to the score test based on the observed data is given by the ratio of the observed data information to the complete data information. Three examples involving probit regression, normal random effects model, and unidentified paired data are used for illustration. For testing linkage disequilibrium based on pooled genotype data, simulation results show that the imputed Neyman Pearson and Fisher exact tests are less powerful than a Wald-type test based on the observed data maximum likelihood estimator. In conclusion, we caution against the routine use of the fuzzy p -value approach in latent variable or missing data problems and suggest some viable alternatives. 相似文献

13.

Three‐period,two‐treatment crossover design under long‐term carryover effect

下载免费PDF全文

Suryasish Chatterjee Uttam Bandyopadhyay 《Statistica Neerlandica》2017,71(4):263-285

The paper describes non‐parametric approach for analysis of a three‐period, two‐treatment, four‐sequence crossover design in which test procedure for interchangeability of the treatment effects is obtained. The proposed procedure is based on a non‐parametric model, which incorporates, along with the direct treatment effects and the usual carryover effects, the long‐term carryover effects. Relevant competitors are obtained. Related asymptotic results are given. By performing simulation study, we compared the procedures with respect to type I error rate and power. Furthermore, confidence intervals for treatment differences are studied. The procedures are illustrated with a data study. 相似文献

14.

基于BP神经网络的磷酸铝合成数据补值模型研究

李劲松《中国工程师》2013,(12):62-64

磷酸铝合成反应数据库中有29％的数据存在不同情况的缺失。为了处理缺失值问题,本文首次提出利用BP神经网络对其进行估计补值。在不同缺失率下,通过大量的随机实验结果证明了补值算法具有一定的有效性和可行性。相似文献

15.

Confidence Intervals for the Area Under the Receiver Operating Characteristic Curve in the Presence of Ignorable Missing Data

Hunyong Cho Gregory J. Matthews Ofer Harel 《Revue internationale de statistique》2019,87(1):152-177

Receiver operating characteristic curves are widely used as a measure of accuracy of diagnostic tests and can be summarised using the area under the receiver operating characteristic curve (AUC). Often, it is useful to construct a confidence interval for the AUC; however, because there are a number of different proposed methods to measure variance of the AUC, there are thus many different resulting methods for constructing these intervals. In this article, we compare different methods of constructing Wald‐type confidence interval in the presence of missing data where the missingness mechanism is ignorable. We find that constructing confidence intervals using multiple imputation based on logistic regression gives the most robust coverage probability and the choice of confidence interval method is less important. However, when missingness rate is less severe (e.g. less than 70%), we recommend using Newcombe's Wald method for constructing confidence intervals along with multiple imputation using predictive mean matching. 相似文献

16.

Information loss for 2 × 2 tables with missing cell counts: binomial case 总被引：1，自引：1，他引：0

Rob Eisinga 《Statistica Neerlandica》2008,62(2):239-254

We formulate likelihood-based ecological inference for 2 × 2 tables with missing cell counts as an incomplete data problem and study Fisher information loss by comparing estimation from complete and incomplete data. In so doing, we consider maximum-likelihood (ML) estimators of probabilities governed by two independent binomial distributions and obtain simplified expressions for their covariance. These expressions reflect well the additional uncertainty arising from the unobserved data compared to complete data tables. We also discuss an approximation to the expected conditional variance of the unobserved entries and ML parameter bias correction. An empirical example is used to demonstrate the results. 相似文献

17.

Longitudinal LISREL model estimation from incomplete panel data using the EM algorithm and the Kalman smoother

R. A. R. G. Jansen J. H. L. Oud 《Statistica Neerlandica》1995,49(3):362-377

Longitudinal data sets with the structure T (time points) × N (subjects) are often incomplete because of data missing for certain subjects at certain time points. The EM algorithm is applied in conjunction with the Kalman smoother for computing maximum likelihood estimates of longitudinal LISREL models from varying missing data patterns. The iterative procedure uses the LISREL program in the M-step and the Kalman smoother in the E-step. The application of the method is illustrated by simulating missing data on a data set from educational research. 相似文献

18.

A Bayesian Approach to Parameter Estimation in the Presence of Spatial Missing Data

Domenica Panzera Roberto Benedetti Paolo Postiglione 《Spatial Economic Analysis》2016,11(2):201-218

The missing data problem has been widely addressed in the literature. The traditional methods for handling missing data may be not suited to spatial data, which can exhibit distinctive structures of dependence and/or heterogeneity. As a possible solution to the spatial missing data problem, this paper proposes an approach that combines the Bayesian Interpolation method [Benedetti, R. & Palma, D. (1994) Markov random field-based image subsampling method, Journal of Applied Statistics, 21(5), 495–509] with a multiple imputation procedure. The method is developed in a univariate and a multivariate framework, and its performance is evaluated through an empirical illustration based on data related to labour productivity in European regions. 相似文献

19.

Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random

Navarro Pastor José Blas 《Quality and Quantity》2003,37(4):363-376

Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation. 相似文献

20.

A Review of Hot Deck Imputation for Survey Non-response

Rebecca R. Andridge Roderick J. A. Little 《Revue internationale de statistique》2010,78(1):40-64

Hot deck imputation is a method for handling missing data in which each missing value is replaced with an observed response from a "similar" unit. Despite being used extensively in practice, the theory is not as well developed as that of other imputation methods. We have found that no consensus exists as to the best way to apply the hot deck and obtain inferences from the completed data set. Here we review different forms of the hot deck and existing research on its statistical properties. We describe applications of the hot deck currently in use, including the U.S. Census Bureau's hot deck for the Current Population Survey (CPS). We also provide an extended example of variations of the hot deck applied to the third National Health and Nutrition Examination Survey (NHANES III). Some potential areas for future research are highlighted. 相似文献