首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Empirical count data are often zero‐inflated and overdispersed. Currently, there is no software package that allows adequate imputation of these data. We present multiple‐imputation routines for these kinds of count data based on a Bayesian regression approach or alternatively based on a bootstrap approach that work as add‐ons for the popular multiple imputation by chained equations (mice ) software in R (van Buuren and Groothuis‐Oudshoorn , Journal of Statistical Software, vol. 45, 2011, p. 1). We demonstrate in a Monte Carlo simulation that our procedures are superior to currently available count data procedures. It is emphasized that thorough modeling is essential to obtain plausible imputations and that model mis‐specifications can bias parameter estimates and standard errors quite noticeably. Finally, the strengths and limitations of our procedures are discussed, and fruitful avenues for future theory and software development are outlined.  相似文献   

2.
Data fusion or statistical matching techniques merge datasets from different survey samples to achieve a complete but artificial data file which contains all variables of interest. The merging of datasets is usually done on the basis of variables common to all files, but traditional methods implicitly assume conditional independence between the variables never jointly observed given the common variables. Therefore we suggest using model based approaches tackling the data fusion task by more flexible procedures. By means of suitable multiple imputation techniques, the identification problem which is inherent in statistical matching is reflected. Here a non-iterative Bayesian version of Rubin's implicit regression model is presented and compared in a simulation study with imputations from a data augmentation algorithm as well as an iterative approach using chained equations.  相似文献   

3.
Item nonresponse in survey data can pose significant problems for social scientists carrying out statistical modeling using a large number of explanatory variables. A number of imputation methods exist but many only deal with univariate imputation, or relatively simple cases of multivariate imputation, often assuming a monotone pattern of missingness. In this paper we evaluate a tree-based approach for multivariate imputation using real data from the 1970 British Cohort Study, known for its complex pattern of nonresponse. The performance of this tree-based approach is compared to mode imputation and a sequential regression based approach within a simulation study.  相似文献   

4.
Measuring technical efficiency in European railways: A panel data approach   总被引:3,自引:0,他引:3  
We estimate a factor requirement frontier for European railways using a panel data approach in which technical efficiency is assumed to be endogeneously determined. This approach has two main outcomes. On one hand, it allows the identification of factors influencing technical efficiency, and on the other hand, it allows the estimation of alternative efficiency indicators free of these influences. In the case under study, a particular attention is devoted to an autonomy indicator representing the managerial freedom, with respect to public authorities, experienced by firms, that appears to be positively correlated with technical efficiency.  相似文献   

5.
Despite their long history, parametric survival-time models have largely been neglected in the modern biostatistical and medical literature in favour of the Cox proportional hazards model. Here, I present a case for the use of the lognormal distribution in the analysis of survival times of breast and ovarian cancer patients, specifically in modelling the effects of prognostic factors. The lognormal provides a completely specified probability distribution for the observations and a sensible estimate of the variation explained by the model, a quantity that is controversial for the Cox model. I show how imputation of censored observations under the model may be used to inspect the data using familiar graphical and other technques. Results from the Cox and lognormal models are compared and shown apparently to differ to some extent. However, it is hard to judge which model gives the more accurate estimates. It is concluded that provided the lognormal model fits the data adequately, it may be a useful approach to the analysis of censored survival data.  相似文献   

6.
We employ a Bayesian approach to analyze financial markets experimental data. We estimate a structural model of sequential trading in which trading decisions are classified in five types: private-information based, noise, herd, contrarian and irresolute. Through Monte Carlo simulation, we estimate the posterior distributions of the structural parameters. This technique allows us to compare several non-nested models of trade arrival. We find that the model best fitting the data is that in which a proportion of trades stems from subjects who do not rely only on their private information once the difference between the number of previous buy and sell decisions is at least two. In this model, the majority of trades stem from subjects following their private information. There is also a large proportion of noise trading activity, which is biased towards buying the asset. We observe little herding and contrarianism, as theory suggests. Finally, we observe a significant proportion of (irresolute) subjects who follow their own private information when it agrees with public information, but abstain from trading when it does not.  相似文献   

7.
A common problem in survey sampling is to compare two cross‐sectional estimates for the same study variable taken from two different waves or occasions. These cross‐sectional estimates often include imputed values to compensate for item non‐response. The estimation of the sampling variance of the estimator of change is useful to judge whether the observed change is statistically significant. Estimating the variance of a change is not straightforward because of the rotation in repeated surveys and imputation. We propose using a multivariate linear regression approach and show how it can be used to accommodate the effect of rotation and imputation. The regression approach gives a design‐consistent estimation of the variance of change when the sampling fraction is small. We illustrate the proposed approach using random hot‐deck imputation, although the proposed estimator can be implemented with other imputation techniques.  相似文献   

8.
In many surveys, imputation procedures are used to account for non‐response bias induced by either unit non‐response or item non‐response. Such procedures are optimised (in terms of reducing non‐response bias) when the models include covariates that are highly predictive of both response and outcome variables. To achieve this, we propose a method for selecting sets of covariates used in regression imputation models or to determine imputation cells for one or more outcome variables, using the fraction of missing information (FMI) as obtained via a proxy pattern‐mixture (PMM) model as the key metric. In our variable selection approach, we use the PPM model to obtain a maximum likelihood estimate of the FMI for separate sets of candidate imputation models and look for the point at which changes in the FMI level off and further auxiliary variables do not improve the imputation model. We illustrate our proposed approach using empirical data from the Ohio Medicaid Assessment Survey and from the Service Annual Survey.  相似文献   

9.
Multi-tenant architectures (MTAs) are considered a cornerstone in the success of Software as a Service as a new application distribution formula. Multi-tenancy allows multiple customers (i.e. tenants) to be consolidated into the same operational system. This way, tenants run and share the same application instance as well as costs, which are significantly reduced. Functional needs vary from one tenant to another; either companies from different sectors run different types of applications or, although deploying the same functionality, they do differ in the extent of their complexity. In any case, MTA leaves one major concern regarding the companies’ data, their privacy and security, which requires special attention to the data layer. In this article, we propose an extended data model that enhances traditional MTAs in respect of this concern. This extension – called multi-target – allows MT applications to host, manage and serve multiple functionalities within the same multi-tenant (MT) environment. The practical deployment of this approach will allow SaaS vendors to target multiple markets or address different levels of functional complexity and yet commercialise just one single MT application. The applicability of the approach is demonstrated via a case study of a real multi-tenancy multi-target (MT2) implementation, called Globalgest.  相似文献   

10.
In many of the social sciences it is useful to explore the “working models” or mental schemata that people use to organise items from some cognitive or perceptual domain. With an increasing number of items, versions of the Method of Sorting become important techniques for collecting data about inter-item similarities. Because people do not necessarily all bring the same mental model to the items, there is also the prospect that sorting data can identify a range within the population of interest, or even distinct subgroups. Anthropology provides one tool for this purpose in the form of Cultural Consensus Analysis (CCA). CCA itself proves to be a special case of the “Points of View” approach. Here factor analysis is applied to the subjects’ method-of-sorting responses, obtaining idealized or prototypal modes of organising the items—the “viewpoints”. These idealised modes account for each subject’s data by combining them in proportions given by the subject’s factor loadings. The separate organisation represented by each viewpoint can be made explicit with clustering or multidimensional scaling. The technique is illustrated with job-sorting data from occupational research, and social-network data from primate behaviour.  相似文献   

11.
Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.  相似文献   

12.
Incomplete data is a common problem of survey research. Recent work on multiple imputation techniques has increased analysts’ awareness of the biasing effects of missing data and has also provided a convenient solution. Imputation methods replace non-response with estimates of the unobserved scores. In many instances, however, non-response to a stimulus does not result from measurement problems that inhibit accurate surveying of empirical reality, but from the inapplicability of the survey question. In such cases, existing imputation techniques replace valid non-response with counterfactual estimates of a situation in which the stimulus is applicable to all respondents. This paper suggests an alternative imputation procedure for incomplete data for which no true score exists: multiple complete random imputation, which overcomes the biasing effects of missing data and allows analysts to model respondents’ valid ‘I don’t know’ answers.  相似文献   

13.
Retailers supply a wide range of stock keeping units (SKUs), which may differ for example in terms of demand quantity, demand frequency, demand regularity, and demand variation. Given this diversity in demand patterns, it is unlikely that any single model for demand forecasting can yield the highest forecasting accuracy across all SKUs. To save costs through improved forecasting, there is thus a need to match any given demand pattern to its most appropriate prediction model. To this end, we propose an automated model selection framework for retail demand forecasting. Specifically, we consider model selection as a classification problem, where classes correspond to the different models available for forecasting. We first build labeled training data based on the models’ performances in previous demand periods with similar demand characteristics. For future data, we then automatically select the most promising model via classification based on the labeled training data. The performance is measured by economic profitability, taking into account asymmetric shortage and inventory costs. In an exploratory case study using data from an e-grocery retailer, we compare our approach to established benchmarks. We find promising results, but also that no single approach clearly outperforms its competitors, underlying the need for case-specific solutions.  相似文献   

14.
A set optimization approach to multi-utility maximization is presented, and duality results are obtained for discrete market models with proportional transaction costs. The novel approach allows us to obtain results for non-complete preferences, where the formulas derived closely resemble but generalize the scalar case.  相似文献   

15.
This study combines the output distance function approach with a latent class model to estimate technical efficiency in English football in the presence of productive heterogeneity within a stochastic frontier analysis framework. The distance function approach allows the researcher to estimate technical efficiency including both on-field and off-field production, which is important in the case of English football where clubs are generally thought to maximize something other than profit. On-field production is measured using total league points, and off-field production is measured using total revenue. The data set consists of 2177 club-level observations on 88 clubs that competed in the four divisions of professional football in England over the 29-season period from 1981/82 to 2009/10. The results show evidence of three separate productivity classes in English football. As might be expected, technical efficiency estimated using the latent class model is, on average, higher than technical efficiency using an alternative method which confines heterogeneity to the intercept coefficient. Specifically, average efficiency for the sample is 87.3 and 93.2% for the random-intercept model and the latent class model respectively.  相似文献   

16.
This paper proposes SupWald tests from a threshold autoregressive model computed with an adaptive set of thresholds. Simple examples of adaptive threshold sets are given. A second contribution of the paper is a general asymptotic null limit theory when the threshold variable is a level variable. We obtain a pivotal null limiting distribution under some simple conditions for bounded or asymptotically unbounded thresholds. Our general approach is flexible enough to allow a choice of the auxiliary threshold model or of the threshold set involved in the test specifically designed for nonlinear stationary alternatives relevant for macroeconomic and financial topics involving arbitrage in presence of transaction costs. A Monte-Carlo study and an application to the interest rates spread for French, German, New-Zealander and US post-1980 monthly data illustrate the ability of the adaptive SupWald tests to reject unit-root when the ADF does not.  相似文献   

17.
With the rapid rise of cryptocurrencies, it has become an urgent problem to realize the flat use of digital currency, with making it really put into use, and giving full play to its utility in the current economic market. This paper innovatively takes the maximization of user benefit as the key point to predict transaction bidding price combining dynamic game theory. The bidding price of user transaction not only refers to historical transactions, but also considers the impact on future subsequences, and the result describes the interaction between transactions in detail. Also this paper proposes a method to express user satisfaction and establishes a user benefit model accordingly, so as to ensure the transaction is packaged successfully to the greatest extent within the acceptable range of transaction pricing. Finally this paper compares the proposed model with conventional machine learning prediction algorithms, finding that when user does not participate in the trading for the first time, the prediction effect of this proposal is better than that of machine learning over small data sets, moreover superior to machine learning methods in prediction accuracy and sensitivity, with a lower time complexity.  相似文献   

18.
Imputation: Methods, Simulation Experiments and Practical Examples   总被引:1,自引:0,他引:1  
When conducting surveys, two kinds of nonresponse may cause incomplete data files: unit nonresponse (complete nonresponse) and item nonresponse (partial nonresponse). The selectivity of the unit nonresponse is often corrected for. Various imputation techniques can be used for the missing values because of item nonresponse. Several of these imputation techniques are discussed in this report. One is the hot deck imputation. This paper describes two simulation experiments of the hot deck method. In the first study, data are randomly generated, and various percentages of missing values are then non-randomly'added'to the data. The hot deck method is used to reconstruct the data in this Monte Carlo experiment. The performance of the method is evaluated for the means, standard deviations, and correlation coefficients and compared with the available case method. In the second study, the quality of an imputation method is studied by running a simulation experiment. A selection of the data of the Dutch Housing Demand Survey is perturbed by leaving out specific values on a variable. Again hot deck imputations are used to reconstruct the data. The imputations are then compared with the true values. In both experiments the conclusion is that the hot deck method generally performs better than the available case method. This paper also deals with the questions which variables should be imputed and what the duration of the imputation process is. Finally the theory is illustrated by the imputation approaches of the Dutch Housing Demand Survey, the European Community Household Panel Survey (ECHP) and the new Dutch Structure of Earnings Survey (SES). These examples illustrate the levels of missing data that can be experienced in such surveys and the practical problems associated with choosing an appropriate imputation strategy for key items from each survey.  相似文献   

19.
This study discusses the validation of an agent-based model of emergent city systems with heterogeneous agents. To this end, it proposes a simplified version of the original agent-based model and subjects it to mathematical analysis. The proposed model is transformed into an analytically tractable discrete Markov model, and its city size distribution is examined. Its discrete nature allows the Markov model to be used to validate the algorithms of computational agent-based models. We show that the Markov chains lead to a power-law distribution when the ranges of migration options are randomly distributed across the agent population. We also identify sufficient conditions under which the Markov chains produce the Zipf׳s Law, which has never been done within a discrete framework. The conditions under which our simplified model yields the Zipf׳s Law are in agreement with, and thus validate, the configurations of the original heterogeneous agent-based model.  相似文献   

20.
The paper addresses the reduction of the total cost of purchasing in public procurement, focusing on tenders called for in the European Union and awarded by the Lowest Price (LP) criterion. Taking into account the main characteristic features of governmental purchasing (competition, prescribed procedures, and transparency) and building upon the related contributions in the literature, we present a probabilistic approach for evaluating and limiting the total cost of purchasing in public tenders awarded according to the LP criterion. The presented framework includes the evaluation of the so-called additional costs of purchasing (ACP), a part of the transaction cost that is typically considered in the related literature from a private organization perspective only. The approach can be applied to a generic transaction in any public tender issued according to the European legislation with the LP criterion. Considering the real case study of the public tender for maintenance works on a municipal sport facility in Bari (Italy), we take into account the costs of both transaction counterparts, i.e., the ACP regarding the contracting authority and those related to the firms involved in the tender. Applying the model to the case study, we underline the relevance of ACP for public tenders and show that, by inviting a suitable number of bidders to participate in the call, it is possible to save money both for the contracting authority and the involved competitors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号