首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
孙成霖 《价值工程》2010,29(6):39-39
假设检验是统计推断的内容之一,统计推断在体育统计学中的地位也十分重要。在假设检验中存在两类错误。在很多时候,我们往往只注意第一类错误的控制,而对于第二类错误经常不考虑。其实,对于第二类错误的控制也是十分必要的。本文对于两类错误的成因以及如何控制第二类错误进行了探讨,希望对于第二类错误的控制提出一些解决的方法。  相似文献   

2.
We randomly assigned eight different consumption surveys to obtain evidence on the nature of measurement errors in estimates of household consumption. Regressions using data from more error‐prone designs are compared with results from a ‘gold standard’ survey. Measurement errors appear to have a mean‐reverting negative correlation with true consumption, especially for food and especially for rural households.  相似文献   

3.
We propose a simple estimator for nonlinear method of moment models with measurement error of the classical type when no additional data, such as validation data or double measurements, are available. We assume that the marginal distributions of the measurement errors are Laplace (double exponential) with zero means and unknown variances and the measurement errors are independent of the latent variables and are independent of each other. Under these assumptions, we derive simple revised moment conditions in terms of the observed variables. They are used to make inference about the model parameters and the variance of the measurement error. The results of this paper show that the distributional assumption on the measurement errors can be used to point identify the parameters of interest. Our estimator is a parametric method of moments estimator that uses the revised moment conditions and hence is simple to compute. Our estimation method is particularly useful in situations where no additional data are available, which is the case in many economic data sets. Simulation study demonstrates good finite sample properties of our proposed estimator. We also examine the performance of the estimator in the case where the error distribution is misspecified.  相似文献   

4.
There has been considerable and controversial research over the past two decades into how successfully random effects misspecification in mixed models (i.e. assuming normality for the random effects when the true distribution is non‐normal) can be diagnosed and what its impacts are on estimation and inference. However, much of this research has focused on fixed effects inference in generalised linear mixed models. In this article, motivated by the increasing number of applications of mixed models where interest is on the variance components, we study the effects of random effects misspecification on random effects inference in linear mixed models, for which there is considerably less literature. Our findings are surprising and contrary to general belief: for point estimation, maximum likelihood estimation of the variance components under misspecification is consistent, although in finite samples, both the bias and mean squared error can be substantial. For inference, we show through theory and simulation that under misspecification, standard likelihood ratio tests of truly non‐zero variance components can suffer from severely inflated type I errors, and confidence intervals for the variance components can exhibit considerable under coverage. Furthermore, neither of these problems vanish asymptotically with increasing the number of clusters or cluster size. These results have major implications for random effects inference, especially if the true random effects distribution is heavier tailed than the normal. Fortunately, simple graphical and goodness‐of‐fit measures of the random effects predictions appear to have reasonable power at detecting misspecification. We apply linear mixed models to a survey of more than 4 000 high school students within 100 schools and analyse how mathematics achievement scores vary with student attributes and across different schools. The application demonstrates the sensitivity of mixed model inference to the true but unknown random effects distribution.  相似文献   

5.
Computerised Record Linkage methods help us combine multiple data sets from different sources when a single data set with all necessary information is unavailable or when data collection on additional variables is time consuming and extremely costly. Linkage errors are inevitable in the linked data set because of the unavailability of error‐free unique identifiers. A small amount of linkage errors can lead to substantial bias and increased variability in estimating parameters of a statistical model. In this paper, we propose a unified theory for statistical analysis with linked data. Our proposed method, unlike the ones available for secondary data analysis of linked data, exploits record linkage process data as an alternative to taking a costly sample to evaluate error rates from the record linkage procedure. A jackknife method is introduced to estimate bias, covariance matrix and mean squared error of our proposed estimators. Simulation results are presented to evaluate the performance of the proposed estimators that account for linkage errors.  相似文献   

6.
Record linkage is the act of bringing together records from two files that are believed to belong to the same unit (e.g., a person or business). It is a low‐cost way of increasing the set of variables available for analysis. Errors may arise in the linking process if an error‐free unit identifier is not available. Two types of linking errors include an incorrect link (records belonging to two different units are linked) and a missed record (an unlinked record for which a correct link exists). Naively ignoring linkage errors may mean that analysis of the linked file is biased. This paper outlines a “weighting approach” to making correct inference about regression coefficients and population totals in the presence of such linkage errors. This approach is designed for analysts who do not have the expertise or time to use specialist software required by other approaches but who are comfortable using weights in inference. The performance of the estimator is demonstrated in a simulation study.  相似文献   

7.
This paper introduces a new representation for seasonally cointegrated variables, namely the complex error correction model, which allows statistical inference to be performed by reduced rank regression. The suggested estimators and tests statistics are asymptotically equivalent to their maximum likelihood counterparts. The small sample properties are evaluated by a Monte Carlo study and an empirical example is presented to illustrate the concepts and methods.  相似文献   

8.
This paper is concerned with the statistical inference on seemingly unrelated varying coefficient partially linear models. By combining the local polynomial and profile least squares techniques, and estimating the contemporaneous correlation, we propose a class of weighted profile least squares estimators (WPLSEs) for the parametric components. It is shown that the WPLSEs achieve the semiparametric efficiency bound and are asymptotically normal. For the non‐parametric components, by applying the undersmoothing technique, and taking the contemporaneous correlation into account, we propose an efficient local polynomial estimation. The resulting estimators are shown to have mean‐squared errors smaller than those estimators that neglect the contemporaneous correlation. In addition, a class of variable selection procedures is developed for simultaneously selecting significant variables and estimating unknown parameters, based on the non‐concave penalized and weighted profile least squares techniques. With a proper choice of regularization parameters and penalty functions, the proposed variable selection procedures perform as efficiently as if one knew the true submodels. The proposed methods are evaluated using wide simulation studies and applied to a set of real data.  相似文献   

9.
In data-processing standpoint, an efficient algorithm for identifying the minimum value among a set of measurements are record statistics. From a sequence of n independent identically distributed continuous random variables only about log(n) records are expected, so we expect to have little data, hence any prior information is welcome (Houchens, Record value theory and inference, Ph.D. thesis, University of California, Riverside, 1984). In this paper, non-Bayesian and Bayesian estimates are derived for the two parameters of the Exponential distribution based on record statistics with respect to the squared error and Linear-Exponential loss functions and then compared with together. The admissibility of some estimators is discussed.  相似文献   

10.
It is shown that statistical tests of the effectiveness of alternative monetary fiscal policies may be inconclusive because the policies themselves may influence the observed time-series in such a way as to cause regression coefficients to differ from their true values. It is established that simple rules such as a constant rate of monetary growth have no “corrupting” effect on the data, but this is not true for sophisticated policies designed to go beyond naive model relationships.  相似文献   

11.
In forecasting, data mining is frequently perceived as a distinct technological discipline without immediate relevance to the challenges of time series prediction. However, Hand (2009) postulates that when the large cross-sectional datasets of data mining and the high-frequency time series of forecasting converge, common problems and opportunities are created for the two disciplines. This commentary attempts to establish the relationship between data mining and forecasting via the dataset properties of aggregate and disaggregate modelling, in order to identify areas where research in data mining may contribute to current forecasting challenges, and vice versa. To forecasting, data mining offers insights on how to handle large, sparse datasets with many binary variables, in feature and instance selection. Furthermore data mining and related disciplines may stimulate research into how to overcome selectivity bias using reject inference on observational datasets and, through the use of experimental time series data, how to extend the utility and costs of errors beyond measuring performance, and how to find suitable time series benchmarks to evaluate computer intensive algorithms. Equally, data mining can profit from forecasting’s expertise in handling nonstationary data to counter the out-of-date-data problem, and how to develop empirical evidence beyond the fine tuning of algorithms, leading to a number of potential synergies and stimulating research in both data mining and forecasting.  相似文献   

12.
Recent interest in statistical inference for panel data has focused on the problem of unobservable, individual-specific, random effects and the inconsistencies they introduce in estimation when they are correlated with other exogenous variables. Analysis of this problem has always assumed the variance components to be known. In this paper, we re-examine some of these questions in finite samples when the variance components must be estimated. In particular, when the effects are uncorrelated with other explanatory variables, we show that (i) the feasible Gauss-Markov estimator is more efficient than the within groups estimator for all but the fewest degrees of freedom and its variance is never more than 17% above the Cramer-Rao bound, (ii) the asymptotic approximation to the variance of the feasible Gauss-Markov estimator is similarly within 17% of the true variance but remains significantly smaller for moderately large samples sizes, and (iii) more efficient estimators for the variance components do not necessarily yield more efficient feasible Gauss-Markov estimators.  相似文献   

13.
In this article, we consider nonparametric regression analysis between two variables when data are sampled through a complex survey. While nonparametric regression analysis has been widely used with data that may be assumed to be generated from independently and identically distributed (iid) random variables, the methods and asymptotic analyses established for iid data need to be extended in the framework of complex survey designs. Local polynomial regression estimators are studied, which include as particular cases design-based versions of the Nadaraya–Watson estimator and of the local linear regression estimator. In this paper, special emphasis is given to the local linear regression estimator. Our estimators incorporate both the sampling weights and the kernel weights. We derive the asymptotic mean squared error (MSE) of the kernel estimators using a combined inference framework, and as a corollary consistency of the estimators is deduced. Selection of a bandwidth is necessary for the resulting estimators; an optimal bandwidth can be determined, according to the MSE criterion in the combined mode of inference. Simulation experiments are conducted to illustrate the proposed methodology and an application with the Canadian survey of labour and income dynamics is presented.  相似文献   

14.
Statistical offices are responsible for publishing accurate statistical information about many different aspects of society. This task is complicated considerably by the fact that data collected by statistical offices generally contain errors. These errors have to be corrected before reliable statistical information can be published. This correction process is referred to as statistical data editing. Traditionally, data editing was mainly an interactive activity with the aim to correct all data in every detail. For that reason the data editing process was both expensive and time-consuming. To improve the efficiency of the editing process it can be partly automated. One often divides the statistical data editing process into the error localisation step and the imputation step. In this article we restrict ourselves to discussing the former step, and provide an assessment, based on personal experience, of several selected algorithms for automatically solving the error localisation problem for numerical (continuous) data. Our article can be seen as an extension of the overview article by Liepins, Garfinkel & Kunnathur (1982). All algorithms we discuss are based on the (generalised) Fellegi–Holt paradigm that says that the data of a record should be made to satisfy all edits by changing the fewest possible (weighted) number of fields. The error localisation problem may have several optimal solutions for a record. In contrast to what is common in the literature, most of the algorithms we describe aim to find all optimal solutions rather than just one. As numerical data mostly occur in business surveys, the described algorithms are mainly suitable for business surveys and less so for social surveys. For four algorithms we compare the computing times on six realistic data sets as well as their complexity.  相似文献   

15.
Statistical Inference in Nonparametric Frontier Models: The State of the Art   总被引:14,自引:8,他引:6  
Efficiency scores of firms are measured by their distance to an estimated production frontier. The economic literature proposes several nonparametric frontier estimators based on the idea of enveloping the data (FDH and DEA-type estimators). Many have claimed that FDH and DEA techniques are non-statistical, as opposed to econometric approaches where particular parametric expressions are posited to model the frontier. We can now define a statistical model allowing determination of the statistical properties of the nonparametric estimators in the multi-output and multi-input case. New results provide the asymptotic sampling distribution of the FDH estimator in a multivariate setting and of the DEA estimator in the bivariate case. Sampling distributions may also be approximated by bootstrap distributions in very general situations. Consequently, statistical inference based on DEA/FDH-type estimators is now possible. These techniques allow correction for the bias of the efficiency estimators and estimation of confidence intervals for the efficiency measures. This paper summarizes the results which are now available, and provides a brief guide to the existing literature. Emphasizing the role of hypotheses and inference, we show how the results can be used or adapted for practical purposes.  相似文献   

16.
Instrumental variable (IV) methods for regression are well established. More recently, methods have been developed for statistical inference when the instruments are weakly correlated with the endogenous regressor, so that estimators are biased and no longer asymptotically normally distributed. This paper extends such inference to the case where two separate samples are used to implement instrumental variables estimation. We also relax the restrictive assumptions of homoskedastic error structure and equal moments of exogenous covariates across two samples commonly employed in the two‐sample IV literature for strong IV inference. Monte Carlo experiments show good size properties of the proposed tests regardless of the strength of the instruments. We apply the proposed methods to two seminal empirical studies that adopt the two‐sample IV framework.  相似文献   

17.
Many studies that involve people's perceptions or behaviors focus on aggregate rather than individual responses. For example, variables describing public perceptions for some set of events may be represented as mean scores for each event. Event mean scores then become the unit of analysis for each variable. The variance of these mean scores for a variable is not only a function of the variation among the events themselves, but is also due to the variation among respondents and their possible responses. This is also the case for the covariances between variables based on event mean scores. In many contexts the variance and covariance components attributable to the sampling of respondents and their responses may be large; these components can be described as measurement error. In this paper we show how to estimate variances and covariances of aggregate variables that are free of these sources of measurement error. We also present a measure of reliability for the event means and examine the effect of the number of respondents on these spurious components. To illustrate how these estimates are computed, forty-two respondents were asked to rate forty events on seven risk perception variables. Computing the variances and covariances for these variables based on event means resulted in relatively large components attributable to measurement error. A demonstration is given of how this error is removed and the resulting effect on our estimates.  相似文献   

18.
Although attention has been given to obtaining reliable standard errors for the plug-in estimator of the Gini index, all standard errors suggested until now are either complicated or quite unreliable. An approximation is derived for the estimator by which it is expressed as a sum of IID random variables. This approximation allows us to develop a reliable standard error that is simple to compute. A simple but effective bias correction is also derived. The quality of inference based on the approximation is checked in a number of simulation experiments, and is found to be very good unless the tail of the underlying distribution is heavy. Bootstrap methods are presented which alleviate this problem except in cases in which the variance is very large or fails to exist. Similar methods can be used to find reliable standard errors of other indices which are not simply linear functionals of the distribution function, such as Sen’s poverty index and its modification known as the Sen–Shorrocks–Thon index.  相似文献   

19.
To determine whether an industry exhibits constant returns to scale, whether the production function is homothetic, or whether inputs are separable, a common approach is to specify a cost function, estimate its parameters using data such as prices and quantities of inputs, and then test the parametric restrictions corresponding to constant returns, a homothetic technology, or separability. Statistically, such inferences are valid if the true cost function is a member of the parametric class considered, otherwise the inference is biased. That is, the true rejection probability is not necessarily adequately approximated by the nominal size of the statistical test. The use of fixed parameter flexible functional forms such as the Translog, the generalized Leontief, or the Box-Cox will not alleviate this problem.The Fourier flexible form differs fundamentally from other flexible forms in that it has a variable number of parameters and a known bound, depending on the number of parameters, on the error, as measured by the Sobolev norm, of approximation to an arbitrary cost function. Thus it is possible to construct statistical tests for constant returns, a homothetic technology, or separability which are asymptotically size α by letting the number of parameters of the Fourier flexible form depend on sample size. That is, the true rejection probability converges to the nominal size of the test as sample size tends to infinity. The rate of convergence depends on the smoothness of the true cost function; the more times is differentiable the true cost function, the faster the convergence.The method is illustrated using the data on aggregate U.S. manufacturing of Berndt and Wood (1975, 1979) and Berndt and Khaled (1979).  相似文献   

20.
The presence of random measurement error is commonly thought to cause attenuation of statistical relationships. While this is an unquestionable truth in bivariate analysis, it cannot be generalized to the multivariate case without qualification. This paper shows that measurement error may give rise to overestimates of parameters in causal analysis whenever there is more than one independent variable and the independent variables are correlated. If the independent variables are not measured with the same amount of reliability, there may also be considerable error in estimates of the relative magnitude of their impact. Both problems are particularly serious when the amount of measurement error is large relative to some of the causal effects such as in panel analysis with lagged dependent variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号