首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
Empirical count data are often zero‐inflated and overdispersed. Currently, there is no software package that allows adequate imputation of these data. We present multiple‐imputation routines for these kinds of count data based on a Bayesian regression approach or alternatively based on a bootstrap approach that work as add‐ons for the popular multiple imputation by chained equations (mice ) software in R (van Buuren and Groothuis‐Oudshoorn , Journal of Statistical Software, vol. 45, 2011, p. 1). We demonstrate in a Monte Carlo simulation that our procedures are superior to currently available count data procedures. It is emphasized that thorough modeling is essential to obtain plausible imputations and that model mis‐specifications can bias parameter estimates and standard errors quite noticeably. Finally, the strengths and limitations of our procedures are discussed, and fruitful avenues for future theory and software development are outlined.  相似文献   

2.
This study investigates whether project management maturity (PMM) relates to perceived organizational performance and how an organization's cultural orientation is a contributing factor. Perceived organizational performance is defined as project effectiveness and efficiency followed by resulting business performance. A survey‐based research was conducted with 86 project professionals from various U.S. service and manufacturing organizations. The study revealed that PMM is significantly related to business performance but not to project performance. Furthermore, while clan organizational culture is a sole contributing factor for project and business performances, PMM interacts with market culture in improving business performance. This study shows that in order to deal with project time, budget, and expectations issues, an organizational culture change toward sharing, collaboration, and empowerment is a must. Furthermore, an increasing project management maturity along with a results‐oriented organizational culture improves an organization's competitiveness, resulting in cost savings and increased sales. PMM efforts are therefore crucial. PMM accompanied by an understanding of cultural orientation is a best strategy for today's project‐based organizations.  相似文献   

3.
In this review paper, we discuss the theoretical background of multiple imputation, describe how to build an imputation model and how to create proper imputations. We also present the rules for making repeated imputation inferences. Three widely used multiple imputation methods, the propensity score method, the predictive model method and the Markov chain Monte Carlo (MCMC) method, are presented and discussed.  相似文献   

4.
Multiple imputation has become viewed as a general solution to missing data problems in statistics. However, in order to lead to consistent asymptotically normal estimators, correct variance estimators and valid tests, the imputations must be proper . So far it seems that only Bayesian multiple imputation, i.e. using a Bayesian predictive distribution to generate the imputations, or approximately Bayesian multiple imputations has been shown to lead to proper imputations in some settings. In this paper, we shall see that Bayesian multiple imputation does not generally lead to proper multiple imputations. Furthermore, it will be argued that for general statistical use, Bayesian multiple imputation is inefficient even when it is proper.  相似文献   

5.
In missing data problems, it is often the case that there is a natural test statistic for testing a statistical hypothesis had all the data been observed. A fuzzy  p -value approach to hypothesis testing has recently been proposed which is implemented by imputing the missing values in the "complete data" test statistic by values simulated from the conditional null distribution given the observed data. We argue that imputing data in this way will inevitably lead to loss in power. For the case of scalar parameter, we show that the asymptotic efficiency of the score test based on the imputed "complete data" relative to the score test based on the observed data is given by the ratio of the observed data information to the complete data information. Three examples involving probit regression, normal random effects model, and unidentified paired data are used for illustration. For testing linkage disequilibrium based on pooled genotype data, simulation results show that the imputed Neyman Pearson and Fisher exact tests are less powerful than a Wald-type test based on the observed data maximum likelihood estimator. In conclusion, we caution against the routine use of the fuzzy  p -value approach in latent variable or missing data problems and suggest some viable alternatives.  相似文献   

6.
Nested multiple imputation of NMES via partially incompatible MCMC   总被引:1,自引:0,他引:1  
The multiple imputation of the National Medical Expenditure Survey (NMES) involved the use of two new techniques, both having potentially broad applicability. The first is to use distributionally incompatible MCMC (Markov Chain Monte Carlo), but to apply it only partially, to impute the missing values that destroy a monotone pattern, thereby limiting the extent of incompatibility. The second technique is to split the missing data into two parts, one that is much more computationally expensive to impute than the other, and create several imputations of the second part for each of the first part, thereby creating nested multiple imputations with their increased inferential efficiency.  相似文献   

7.
Data fusion or statistical matching techniques merge datasets from different survey samples to achieve a complete but artificial data file which contains all variables of interest. The merging of datasets is usually done on the basis of variables common to all files, but traditional methods implicitly assume conditional independence between the variables never jointly observed given the common variables. Therefore we suggest using model based approaches tackling the data fusion task by more flexible procedures. By means of suitable multiple imputation techniques, the identification problem which is inherent in statistical matching is reflected. Here a non-iterative Bayesian version of Rubin's implicit regression model is presented and compared in a simulation study with imputations from a data augmentation algorithm as well as an iterative approach using chained equations.  相似文献   

8.
In many surveys, imputation procedures are used to account for non‐response bias induced by either unit non‐response or item non‐response. Such procedures are optimised (in terms of reducing non‐response bias) when the models include covariates that are highly predictive of both response and outcome variables. To achieve this, we propose a method for selecting sets of covariates used in regression imputation models or to determine imputation cells for one or more outcome variables, using the fraction of missing information (FMI) as obtained via a proxy pattern‐mixture (PMM) model as the key metric. In our variable selection approach, we use the PPM model to obtain a maximum likelihood estimate of the FMI for separate sets of candidate imputation models and look for the point at which changes in the FMI level off and further auxiliary variables do not improve the imputation model. We illustrate our proposed approach using empirical data from the Ohio Medicaid Assessment Survey and from the Service Annual Survey.  相似文献   

9.
In this paper, we investigate certain operational and inferential aspects of invariant Post‐randomization Method (PRAM) as a tool for disclosure limitation of categorical data. Invariant PRAM preserves unbiasedness of certain estimators, but inflates their variances and distorts other attributes. We introduce the concept of strongly invariant PRAM, which does not affect data utility or the properties of any statistical method. However, the procedure seems feasible in limited situations. We review methods for constructing invariant PRAM matrices and prove that a conditional approach, which can preserve the original data on any subset of variables, yields invariant PRAM. For multinomial sampling, we derive expressions for variance inflation inflicted by invariant PRAM and variances of certain estimators of the cell probabilities and also their tight upper bounds. We discuss estimation of these quantities and thereby assessing statistical efficiency loss from applying invariant PRAM. We find a connection between invariant PRAM and creating partially synthetic data using a non‐parametric approach, and compare estimation variance under the two approaches. Finally, we discuss some aspects of invariant PRAM in a general survey context.  相似文献   

10.
Much effort has been dedicated to the topic of earnings usefulness to investment decisions. The low explanatory power of earnings number on the stock returns urges researchers to further investigate the theoretical and technical aspects of the model. This paper treats the econometrical issues of the topic. We try to allow heteroscedasticity in the model. In addition, panel data method is used to estimate the model. The outcome is convincing and meaningful, suggesting the adoption of panel data method with heteroscedasticity greatly improves the fitness of the model.  相似文献   

11.
This paper implements the generalized maximum entropy (GME) method in longitudinal data setup to investigate the regression para meters and correlation among the repeated measurements. We derive the GME system using Shannon classical entropy as well as some higher‐order entropies assuming an autoregressive correlation structure. This method is illustrated using two simulated examples to study the effect of changing the support range and compare the performance of the GME approach with the classical estimation methods.  相似文献   

12.
This paper outlines a strategy to validate multiple imputation methods. Rubin's criteria for proper multiple imputation are the point of departure. We describe a simulation method that yields insight into various aspects of bias and efficiency of the imputation process. We propose a new method for creating incomplete data under a general Missing At Random (MAR) mechanism. Software implementing the validation strategy is available as a SAS/IML module. The method is applied to investigate the behavior of polytomous regression imputation for categorical data.  相似文献   

13.
This paper analyzes the reasons for differences in the estimated effect of retirement on health in previous studies. We investigate these differences by focusing on the analysis methods used by these studies. Using various health indexes, numerous researchers have examined the effects of retirement on health. However, there are no unified views on the impact of retirement on various health indexes. Consequently, we show that the choice of analysis method is one of the key factors in explaining why the estimated results of the effect of retirement on health differ. Moreover, we re‐estimate the effect of retirement on health by using a fixed analysis method controlling for individual heterogeneity and endogeneity of the retirement behavior. We analyze the effect of retirement on health parameters, such as cognitive function, self‐report of health, activities of daily living (ADL), depression, and body mass index in eight countries. We find that the effects of retirement on self‐report of health, depression, and ADL are positive in many of these countries.  相似文献   

14.
We compare five methods for parameter estimation of a Poisson regression model for clustered data: (1) ordinary (naive) Poisson regression (OP), which ignores intracluster correlation, (2) Poisson regression with fixed cluster‐specific intercepts (FI), (3) a generalized estimating equations (GEE) approach with an equi‐correlation matrix, (4) an exact generalized estimating equations (EGEE) approach with an exact covariance matrix, and (5) maximum likelihood (ML). Special attention is given to the simplest case of the Poisson regression with a cluster‐specific intercept random when the asymptotic covariance matrix is obtained in closed form. We prove that methods 1–5, except GEE, produce the same estimates of slope coefficients for balanced data (an equal number of observations in each cluster and the same vectors of covariates). All five methods lead to consistent estimates of slopes but have different efficiency for unbalanced data design. It is shown that the FI approach can be derived as a limiting case of maximum likelihood when the cluster variance increases to infinity. Exact asymptotic covariance matrices are derived for each method. In terms of asymptotic efficiency, the methods split into two groups: OP & GEE and EGEE & FI & ML. Thus, contrary to the existing practice, there is no advantage in using GEE because it is substantially outperformed by EGEE and FI. In particular, EGEE does not require integration and is easy to compute with the asymptotic variances of the slope estimates close to those of the ML.  相似文献   

15.
Macro‐integration is the process of combining data from several sources at an aggregate level. We review a Bayesian approach to macro‐integration with special emphasis on the inclusion of inequality constraints. In particular, an approximate method of dealing with inequality constraints within the linear macro‐integration framework is proposed. This method is based on a normal approximation to the truncated multivariate normal distribution. The framework is then applied to the integration of international trade statistics and transport statistics. By combining these data sources, transit flows can be derived as differences between specific transport and trade flows. Two methods of imposing the inequality restrictions that transit flows must be non‐negative are compared. Moreover, the figures are improved by imposing the equality constraints that aggregates of incoming and outgoing transit flows must be equal.  相似文献   

16.
Comparing occurrence rates of events of interest in science, business, and medicine is an important topic. Because count data are often under‐reported, we desire to account for this error in the response when constructing interval estimators. In this article, we derive a Bayesian interval for the difference of two Poisson rates when counts are potentially under‐reported. The under‐reporting causes a lack of identifiability. Here, we use informative priors to construct a credible interval for the difference of two Poisson rate parameters with under‐reported data. We demonstrate the efficacy of our new interval estimates using a real data example. We also investigate the performance of our newly derived Bayesian approach via simulation and examine the impact of various informative priors on the new interval.  相似文献   

17.
Imputation: Methods, Simulation Experiments and Practical Examples   总被引:1,自引:0,他引:1  
When conducting surveys, two kinds of nonresponse may cause incomplete data files: unit nonresponse (complete nonresponse) and item nonresponse (partial nonresponse). The selectivity of the unit nonresponse is often corrected for. Various imputation techniques can be used for the missing values because of item nonresponse. Several of these imputation techniques are discussed in this report. One is the hot deck imputation. This paper describes two simulation experiments of the hot deck method. In the first study, data are randomly generated, and various percentages of missing values are then non-randomly'added'to the data. The hot deck method is used to reconstruct the data in this Monte Carlo experiment. The performance of the method is evaluated for the means, standard deviations, and correlation coefficients and compared with the available case method. In the second study, the quality of an imputation method is studied by running a simulation experiment. A selection of the data of the Dutch Housing Demand Survey is perturbed by leaving out specific values on a variable. Again hot deck imputations are used to reconstruct the data. The imputations are then compared with the true values. In both experiments the conclusion is that the hot deck method generally performs better than the available case method. This paper also deals with the questions which variables should be imputed and what the duration of the imputation process is. Finally the theory is illustrated by the imputation approaches of the Dutch Housing Demand Survey, the European Community Household Panel Survey (ECHP) and the new Dutch Structure of Earnings Survey (SES). These examples illustrate the levels of missing data that can be experienced in such surveys and the practical problems associated with choosing an appropriate imputation strategy for key items from each survey.  相似文献   

18.
Imputation procedures such as fully efficient fractional imputation (FEFI) or multiple imputation (MI) create multiple versions of the missing observations, thereby reflecting uncertainty about their true values. Multiple imputation generates a finite set of imputations through a posterior predictive distribution. Fractional imputation assigns weights to the observed data. The focus of this article is the development of FEFI for partially classified two-way contingency tables. Point estimators and variances of FEFI estimators of population proportions are derived. Simulation results, when data are missing completely at random or missing at random, show that FEFI is comparable in performance to maximum likelihood estimation and multiple imputation and superior to simple stochastic imputation and complete case anlaysis. Methods are illustrated with four data sets.  相似文献   

19.
We study the suitability of applying lasso-type penalized regression techniques to macroe-conomic forecasting with high-dimensional datasets. We consider the performances of lasso-type methods when the true DGP is a factor model, contradicting the sparsity assumptionthat underlies penalized regression methods. We also investigate how the methods handle unit roots and cointegration in the data. In an extensive simulation study we find that penalized regression methods are more robust to mis-specification than factor models, even if the underlying DGP possesses a factor structure. Furthermore, the penalized regression methods can be demonstrated to deliver forecast improvements over traditional approaches when applied to non-stationary data that contain cointegrated variables, despite a deterioration in their selective capabilities. Finally, we also consider an empirical applicationto a large macroeconomic U.S. dataset and demonstrate the competitive performance of penalized regression methods.  相似文献   

20.
The Wooldridge method is based on a simple and novel strategy to deal with the initial values problem in nonlinear dynamic random‐effects panel data models. The characteristic of the method makes it very attractive in empirical applications. However, its finite sample performance and robustness are not fully known as of yet. In this paper we investigate the performance and robustness of this method in comparison with an ideal case in which the initial values are known constants; the worst scenario is based on an exogenous initial values assumption, and the Heckman's reduced‐form approximation method, which is widely used in the literature. The dynamic random‐effects probit and Tobit (type I) models are used as working examples. Various designs of the Monte Carlo experiments and two further empirical illustrations are provided. The results suggest that the Wooldridge method works very well only for the panels of moderately long duration (longer than 5–8 periods). Heckman's reduced‐form approximation is suggested for short panels (shorter than 5 periods). It is also found that all the methods tend to perform equally well for panels of long duration (longer than 15–20 periods). Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号