首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many statistical problems can be formulated as discrete missing data problems (MDPs). Examples include change-point problems, capture and recapture models, sample survey with non-response, zero-inflated Poisson models, medical screening/diagnostic tests and bioassay. This paper proposes an exact non-iterative sampling algorithm to obtain independently and identically distributed (i.i.d.) samples from posterior distribution in discrete MDPs. The new algorithm is essentially a conditional sampling, thus completely avoiding problems of convergence and slow convergence in iterative algorithms such as Markov chain Monte Carlo. Different from the general inverse Bayes formulae (IBF) sampler of Tan, Tian and Ng (Statistica Sinica, 13 , 2003, 625), the implementation of the new algorithm requires neither the expectation maximization nor the sampling importance resampling algorithms. The key idea is to first utilize the sampling-wise IBF to derive the conditional distribution of the missing data given the observed data, and then to draw i.i.d. samples from the complete-data posterior distribution. We first illustrate the method with a performing example and then apply the method to contingency tables with one supplemental margin for an human immunodeficiency virus study.  相似文献   

2.
3.
This study investigated the performance of multiple imputations with Expectation-Maximization (EM) algorithm and Monte Carlo Markov chain (MCMC) method in missing data imputation. We compared the accuracy of imputation based on some real data and set up two extreme scenarios and conducted both empirical and simulation studies to examine the effects of missing data rates and number of items used for imputation. In the empirical study, the scenario represented item of highest missing rate from a domain with fewest items. In the simulation study, we selected a domain with most items and the item imputed has lowest missing rate. In the empirical study, the results showed there was no significant difference between EM algorithm and MCMC method for item imputation, and number of items used for imputation has little impact, either. Compared with the actual observed values, the middle responses of 3 and 4 were over-imputed, and the extreme responses of 1, 2 and 5 were under-represented. The similar patterns occurred for domain imputation, and no significant difference between EM algorithm and MCMC method and number of items used for imputation has little impact. In the simulation study, we chose environmental domain to examine the effect of the following variables: EM algorithm and MCMC method, missing data rates, and number of items used for imputation. Again, there was no significant difference between EM algorithm and MCMC method. The accuracy rates did not significantly reduce with increase in the proportions of missing data. Number of items used for imputation has some contribution to accuracy of imputation, but not as much as expected.  相似文献   

4.
This work entailed tackling the significant problem of missing data which was solved by identifying a new substitution procedure, following an empirical approach based on the analysis of the information contained in the entire set of data collected. This procedures offers a number of advantages compared to other techniques commonly mentioned in the statistical–methodological literature.  相似文献   

5.
We consider questions of efficiency and redundancy in the GMM estimation problem in which we have two sets of moment conditions, where two sets of parameters enter into one set of moment conditions, while only one set of parameters enters into the other. We then apply these results to a selectivity problem in which the first set of moment conditions is for the model of interest, and the second set of moment conditions is for the selection process. We use these results to explain the counterintuitive result in the literature that, under an ignorability assumption that justifies GMM with weighted moment conditions, weighting using estimated probabilities of selection is better than weighting using the true probabilities. We also consider estimation under an exogeneity of selection assumption such that both the unweighted and the weighted moment conditions are valid, and we show that when weighting is not needed for consistency, it is also not useful for efficiency.  相似文献   

6.
In this article, we develop Markov random field models for multivariate lattice data. Specific attention is given to building models that incorporate general forms of the spatial correlations and cross-correlations between variables at different sites. The methodology is applied to a problem in environmental equity. Using a Bayesian hierarchical model that is multivariate in form, we examine the racial distribution of residents of southern Louisiana in relation to the location of sites listed with the U.S. Environmental Protection Agency's Toxic Release Inventory.  相似文献   

7.
Sequential tests to decide among three binomial probabilities are needed in many situations, such as acceptance sampling used to determine the proportion of defective items and presence and absence sampling to decide whether pest species are causing economic damage to a crop such as corn. Approximate error probabilities associated with Armitage's (1950, JRSS B) method of simultaneously conducting three sequential probability ratio tests (SPRTs) are derived for the binomial distribution. These approximations provide a basis for adjusting the error rates used to establish the individual SPRTs so that the desired overall error rates are attained. Monte Carlo simulation is used to evaluate the revised procedure. Received: September 1998  相似文献   

8.
I study inverse probability weighted M-estimation under a general missing data scheme. Examples include M-estimation with missing data due to a censored survival time, propensity score estimation of the average treatment effect in the linear exponential family, and variable probability sampling with observed retention frequencies. I extend an important result known to hold in special cases: estimating the selection probabilities is generally more efficient than if the known selection probabilities could be used in estimation. For the treatment effect case, the setup allows a general characterization of a “double robustness” result due to Scharfstein et al. [1999. Rejoinder. Journal of the American Statistical Association 94, 1135–1146].  相似文献   

9.
Empirical count data are often zero‐inflated and overdispersed. Currently, there is no software package that allows adequate imputation of these data. We present multiple‐imputation routines for these kinds of count data based on a Bayesian regression approach or alternatively based on a bootstrap approach that work as add‐ons for the popular multiple imputation by chained equations (mice ) software in R (van Buuren and Groothuis‐Oudshoorn , Journal of Statistical Software, vol. 45, 2011, p. 1). We demonstrate in a Monte Carlo simulation that our procedures are superior to currently available count data procedures. It is emphasized that thorough modeling is essential to obtain plausible imputations and that model mis‐specifications can bias parameter estimates and standard errors quite noticeably. Finally, the strengths and limitations of our procedures are discussed, and fruitful avenues for future theory and software development are outlined.  相似文献   

10.
The kappa statistic is suggested as a means to index the degree to which particular patterns occur in social interaction. It is suggested that the value of the kappa statistic for each interaction be used as a dependent measure. Particular formulas for kappa are derived for undirectional dependence, bidirectional dependence, other additive patterns, and for dominance.  相似文献   

11.
Leng-Cheng Hwang 《Metrika》2011,74(1):121-133
The problem of estimating sequentially the intensity parameter of a homogeneous Poisson process with quadratic loss and fixed cost per unit time is considered within the Bayesian framework. Without using both the prior information and any auxiliary data, this paper proposes a sequential procedure as that suggested by Vardi (Ann Statist 7:1040?C1051, 1979) in classical non-Bayesian sequential estimation. The proposed sequential procedure is robust in the sense that it does not depend on the prior. The second order approximations to the expected sample size and the Bayes risk of the proposed sequential procedure are established for a large class of prior distributions.  相似文献   

12.
The stratified logrank test can be used to compare survival distributions of several groups of patients, while adjusting for the effect of some discrete variable that may be predictive of the survival outcome. In practice, it can happen that this discrete variable is missing for some patients. An inverse-probability-weighted version of the stratified logrank statistic is introduced to tackle this issue. Its asymptotic distribution is derived under the null hypothesis of equality of the survival distributions. A simulation study is conducted to assess behavior of the proposed test statistic in finite samples. An analysis of a medical dataset illustrates the methodology.  相似文献   

13.
For contingency tables with extensive missing data, the unrestricted MLE under the saturated model, computed by the EM algorithm, is generally unsatisfactory. In this case, it may be better to fit a simpler model by imposing some restrictions on the parameter space. Perlman and Wu (1999) propose lattice conditional independence (LCI) models for contingency tables with arbitrary missing data patterns. When this LCI model fits well, the restricted MLE under the LCI model is more accurate than the unrestricted MLE under the saturated model, but not in general. Here we propose certain empirical Bayes (EB) estimators that adaptively combine the best features of the restricted and unrestricted MLEs. These EB estimators appear to be especially useful when the observed data is sparse, even in cases where the suitability of the LCI model is uncertain. We also study a restricted EM algorithm (called the ER algorithm) with similar desirable features. Received: July 1999  相似文献   

14.
This paper concerns estimating parameters in a high-dimensional dynamic factor model by the method of maximum likelihood. To accommodate missing data in the analysis, we propose a new model representation for the dynamic factor model. It allows the Kalman filter and related smoothing methods to evaluate the likelihood function and to produce optimal factor estimates in a computationally efficient way when missing data is present. The implementation details of our methods for signal extraction and maximum likelihood estimation are discussed. The computational gains of the new devices are presented based on simulated data sets with varying numbers of missing entries.  相似文献   

15.
This paper presents an interactive visualization tool for the qualitative exploration of multivariate data that may exhibit cyclic or periodic behavior. Glyphs are used to encode each multivariate data point, and linear, stacked, and spiral glyph layouts are employed to help convey both intra-cycle and inter-cycle relationships within the data. Users may interactively select glyph and layout types, modify cycle lengths and the number of cycles to display, and select the specific data dimensions to be included. We validate the usefulness of the system with case studies and describe our future plans for expanding the system's capabilities.  相似文献   

16.
17.
D. G. Kabe 《Metrika》1968,13(1):86-97
Summary In this paper we consider some aspects of analysis of variance and covariance theory for the complex normal distribution introduced byGoodman (1963). The properties which we consider are similar to those in the real case.Goodman (1963), andKhatri (1965) have studied several other properties of the complex normal model and have pointed out the similarity between the real and the complex case.  相似文献   

18.
Conclusions on the development of delinquent behaviour during the life-course can only be made with longitudinal data, which is regularly gained by repeated interviews of the same respondents. Missing data are a problem for the analysis of delinquent behaviour during the life-course shown with data from an adolescents’ four-wave panel. In this article two alternative techniques to cope with missing data are used: full information maximum likelihood estimation and multiple imputation. Both methods allow one to consider all available data (including adolescents with missing information on some variables) in order to estimate the development of delinquency. We demonstrate that self-reported delinquency is systematically underestimated with listwise deletion (LD) of missing data. Further, LD results in false conclusions on gender and school specific differences of the age–crime relationship. In the final discussion some hints are given for further methods to deal with bias in panel data affected by the missing process.  相似文献   

19.
The treatment of missing data has been overlooked by the OM literature, while other fields such as marketing, organizational behavior, economics, statistics and psychometrics have paid more attention to the issue. A review of 103 survey-based articles published in the Journal of Operations Management between 1993 and 2001 shows that listwise deletion, which is often the least accurate technique of dealing with missing data, is heavily utilized by OM researchers. The paper also discusses the research implications of missing data, types of missing data and concludes with recommendations on which techniques should be used under different circumstances in order to improve the treatment of missing data in OM survey research.  相似文献   

20.
Wei Yu  Cuizhen Niu  Wangli Xu 《Metrika》2014,77(5):675-693
In this paper, we use the empirical likelihood method to make inferences for the coefficient difference of a two-sample linear regression model with missing response data. The commonly used empirical likelihood ratio is not concave for this problem, so we append a natural and well-explained condition to the likelihood function and propose three types of restricted empirical likelihood ratios for constructing the confidence region of the parameter in question. It can be demonstrated that all three empirical likelihood ratios have, asymptotically, chi-squared distributions. Simulation studies are carried out to show the effectiveness of the proposed approaches in aspects of coverage probability and interval length. A real data set is analysed with our methods as an example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号