首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
In this study, we suggest pretest and shrinkage methods based on the generalised ridge regression estimation that is suitable for both multicollinear and high-dimensional problems. We review and develop theoretical results for some of the shrinkage estimators. The relative performance of the shrinkage estimators to some penalty methods is compared and assessed by both simulation and real-data analysis. We show that the suggested methods can be accounted as good competitors to regularisation techniques, by means of a mean squared error of estimation and prediction error. A thorough comparison of pretest and shrinkage estimators based on the maximum likelihood method to the penalty methods. In this paper, we extend the comparison outlined in his work using the least squares method for the generalised ridge regression.  相似文献   

2.
Data bases regarding warranty claims for manufactured products record claims experience and information about concomitant factors. If constructed and maintained properly, warranty data bases may be used for a variety of purposes that include the prediction of future claims, the comparison of claims experience for different groups of products, the estimation of field reliability, and the identification of opportunities for quality and reliability improvement. This paper reviews some methods of analyzing warranty data and of addressing these objectives. Some extensions of previous work and suggestions for future development are included. Examples involving warranty claims for automobiles and regrigerators are considered.  相似文献   

3.
Texture is one of the most important physical property of the soils for its influence on other fundamental properties. It is defined according to particle size distribution, that can be accurately measured in laboratory. However, these measurements are costly and very time consuming, therefore valid alternatives are necessary. In last years some statistical techniques have been used to predict textural classification using values of reflectance spectrometry as explicative variables. The estimation of the model parameters can be not too accurate, affecting prediction when there is multicollinearity among predictors. Another issue can be the great number of explicative variables usually necessary to explain the response. In order to improve the accuracy of the prediction in classification problems under multicollinearity and to reduce the dimension of the problem with continuous covariates, in this paper we introduce a new technique, based on classification and dimension reduction methods. We show how the new proposal can improve the accuracy of prediction, considering a problem concerning the textural classification of soils of Campania region.  相似文献   

4.
运用粗糙集神经网络建立了广州港集装箱吞吐量的预测模型,预测了2007~2010年的集装箱吞吐量。该预测方法融合了粗糙集理论与神经网络方法具有的优点,具有很强的学习与泛化能力,非常适合处理多因素、非线性的复杂系统。预测结果对广州港的发展有较强的借鉴作用,可以为广州港未来发展提供参考。  相似文献   

5.
We consider nonlinear heteroscedastic single‐index models where the mean function is a parametric nonlinear model and the variance function depends on a single‐index structure. We develop an efficient estimation method for the parameters in the mean function by using the weighted least squares estimation, and we propose a “delete‐one‐component” estimator for the single‐index in the variance function based on absolute residuals. Asymptotic results of estimators are also investigated. The estimation methods for the error distribution based on the classical empirical distribution function and an empirical likelihood method are discussed. The empirical likelihood method allows for incorporation of the assumptions on the error distribution into the estimation. Simulations illustrate the results, and a real chemical data set is analyzed to demonstrate the performance of the proposed estimators.  相似文献   

6.
Computerised Record Linkage methods help us combine multiple data sets from different sources when a single data set with all necessary information is unavailable or when data collection on additional variables is time consuming and extremely costly. Linkage errors are inevitable in the linked data set because of the unavailability of error‐free unique identifiers. A small amount of linkage errors can lead to substantial bias and increased variability in estimating parameters of a statistical model. In this paper, we propose a unified theory for statistical analysis with linked data. Our proposed method, unlike the ones available for secondary data analysis of linked data, exploits record linkage process data as an alternative to taking a costly sample to evaluate error rates from the record linkage procedure. A jackknife method is introduced to estimate bias, covariance matrix and mean squared error of our proposed estimators. Simulation results are presented to evaluate the performance of the proposed estimators that account for linkage errors.  相似文献   

7.
In this paper, the problem of reconstructing past records from the known values of future records is investigated. Different methods are applied when the underlying distributions are Exponential and Pareto, and several reconstructors are obtained and then compared. A data set representing the record values of average July temperatures in Neuenburg, Switzerland, is used to illustrate the proposed procedure in the Pareto case. The results may be used for studying the past epoch times of non-homogeneous Poisson process or the past failure times in a reliability problem when the repair policy is minimal repair. N. Balakrishnan and J. Ahmadi are members of Ordered and Spatial Data Center of Excellence of Ferdowsi University of Mashhad.  相似文献   

8.
This paper is concerned with the Bayesian estimation and comparison of flexible, high dimensional multivariate time series models with time varying correlations. The model proposed and considered here combines features of the classical factor model with that of the heavy tailed univariate stochastic volatility model. A unified analysis of the model, and its special cases, is developed that encompasses estimation, filtering and model choice. The centerpieces of the estimation algorithm (which relies on MCMC methods) are: (1) a reduced blocking scheme for sampling the free elements of the loading matrix and the factors and (2) a special method for sampling the parameters of the univariate SV process. The resulting algorithm is scalable in terms of series and factors and simulation-efficient. Methods for estimating the log-likelihood function and the filtered values of the time-varying volatilities and correlations are also provided. The performance and effectiveness of the inferential methods are extensively tested using simulated data where models up to 50 dimensions and 688 parameters are fit and studied. The performance of our model, in relation to various multivariate GARCH models, is also evaluated using a real data set of weekly returns on a set of 10 international stock indices. We consider the performance along two dimensions: the ability to correctly estimate the conditional covariance matrix of future returns and the unconditional and conditional coverage of the 5% and 1% value-at-risk (VaR) measures of four pre-defined portfolios.  相似文献   

9.
Estimating house price appreciation: A comparison of methods   总被引:2,自引:0,他引:2  
Several parametric and nonparametric methods have been advanced over the years for estimating house price appreciation. This paper compares five of these methods in terms of predictive accuracy, using data from Montgomery County, Pennsylvania. The methods are evaluated on the basis of the mean squared prediction error and the mean absolute prediction error. A statistic developed by Diebold and Mariano is used to determine whether differences in prediction errors are statistically significant. We use the same statistic to determine the effect of sample size on the accuracy of the predictions. In general, parametric methods of estimation produce more accurate estimates of house price appreciation than nonparametric methods. And when the mean absolute prediction error is used as the criterion of accuracy, the repeat sales method produces the most accurate estimate among the parametric methods we tested. Finally, of the five methods we tested, the accuracy of the repeat sales method is least diminished by a reduction in sample size.  相似文献   

10.
We use extreme‐value theory to estimate the ultimate world records for the 100‐m running, for both men and women. For this aim we collected the fastest personal best times set between January 1991 and June 2008. Estimators of the extreme‐value index are based on a certain number of upper order statistics. To optimize this number of order statistics we minimize the asymptotic mean‐squared error of the moment estimator. Using the thus obtained estimate for the extreme‐value index, the right endpoint of the speed distribution is estimated. The corresponding time can be interpreted as the estimated ultimate world record: the best possible time that could be run in the near future. We find 9.51 seconds for the 100‐m men and 10.33 seconds for the women.  相似文献   

11.
Prediction markets have been an important source of information for decision makers due to their high ex post accuracies. Nevertheless, recent failures of prediction markets remind us of the importance of ex ante assessments of their prediction accuracy. This paper proposes a systematic procedure for decision makers to acquire prediction models which may be used to predict the correctness of winner-take-all markets. We commence with a set of classification models and generate combined models following various rules. We also create artificial records in the training datasets to overcome the imbalanced data issue in classification problems. These models are then empirically trained and tested with a large dataset to see which may best be used to predict the failures of prediction markets. We find that no model can universally outperform others in terms of different performance measures. Despite this, we clearly demonstrate a result of capable models for decision makers based on different decision goals.  相似文献   

12.
In data-processing standpoint, an efficient algorithm for identifying the minimum value among a set of measurements are record statistics. From a sequence of n independent identically distributed continuous random variables only about log(n) records are expected, so we expect to have little data, hence any prior information is welcome (Houchens, Record value theory and inference, Ph.D. thesis, University of California, Riverside, 1984). In this paper, non-Bayesian and Bayesian estimates are derived for the two parameters of the Exponential distribution based on record statistics with respect to the squared error and Linear-Exponential loss functions and then compared with together. The admissibility of some estimators is discussed.  相似文献   

13.
As a result of novel data collection technologies, it is now common to encounter data in which the number of explanatory variables collected is large, while the number of variables that actually contribute to the model remains small. Thus, a method that can identify those variables with impact on the model without inferring other noneffective ones will make analysis much more efficient. Many methods are proposed to resolve the model selection problems under such circumstances, however, it is still unknown how large a sample size is sufficient to identify those “effective” variables. In this paper, we apply sequential sampling method so that the effective variables can be identified efficiently, and the sampling is stopped as soon as the “effective” variables are identified and their corresponding regression coefficients are estimated with satisfactory accuracy, which is new to sequential estimation. Both fixed and adaptive designs are considered. The asymptotic properties of estimates of the number of effective variables and their coefficients are established, and the proposed sequential estimation procedure is shown to be asymptotically optimal. Simulation studies are conducted to illustrate the performance of the proposed estimation method, and a diabetes data set is used as an example.  相似文献   

14.
The forecast of the real estate market is an important part of studying the Chinese economic market. Most existing methods have strict requirements on input variables and are complex in parameter estimation. To obtain better prediction results, a modified Holt's exponential smoothing (MHES) method was proposed to predict the housing price by using historical data. Unlike the traditional exponential smoothing models, MHES sets different weights on historical data and the smoothing parameters depend on the sample size. Meanwhile, the proposed MHES incorporates the whale optimization algorithm (WOA) to obtain the optimal parameters. Housing price data from Kunming, Changchun, Xuzhou and Handan were used to test the performance of the model. The housing prices results of four cities indicate that the proposed method has a smaller prediction error and shorter computation time than that of other traditional models. Therefore, WOA-MHES can be applied efficiently to housing price forecasting and can be a reliable tool for market investors and policy makers.  相似文献   

15.
In situations where a regression model is subject to one or more breaks it is shown that it can be optimal to use pre-break data to estimate the parameters of the model used to compute out-of-sample forecasts. The issue of how best to exploit the trade-off that might exist between bias and forecast error variance is explored and illustrated for the multivariate regression model under the assumption of strictly exogenous regressors. In practice when this assumption cannot be maintained and both the time and size of the breaks are unknown, the optimal choice of the observation window will be subject to further uncertainties that make exploiting the bias–variance trade-off difficult. To that end we propose a new set of cross-validation methods for selection of a single estimation window and weighting or pooling methods for combination of forecasts based on estimation windows of different lengths. Monte Carlo simulations are used to show when these procedures work well compared with methods that ignore the presence of breaks.  相似文献   

16.
J. Ahmadi  N. R. Arghami 《Metrika》2001,53(3):195-206
In this article, we establish some general results concerning the comparison of the amount of the Fisher information contained in n record values with the Fisher information contained in n iid observations from the original distribution. Some common distributions are classified according to this criterion. We also propose some methods of estimation based on record values. The results may be of interest in some life testing problems. Received: September 1999  相似文献   

17.
This paper provides a review of common statistical disclosure control (SDC) methods implemented at statistical agencies for standard tabular outputs containing whole population counts from a census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach for assessing SDC methods is based on a disclosure risk–data utility framework and the need to find a balance between managing disclosure risk while maximizing the amount of information that can be released to users and ensuring high quality outputs. To carry out the analysis, quantitative measures of disclosure risk and data utility are defined and methods compared. Conclusions from the analysis show that record swapping as a sole SDC method leaves high probabilities of disclosure risk. Targeted record swapping lowers the disclosure risk, but there is more distortion of distributions. Small cell adjustments (rounding) give protection to census tables by eliminating small cells but only one set of variables and geographies can be disseminated in order to avoid disclosure by differencing nested tables. Full random rounding offers more protection against disclosure by differencing, but margins are typically rounded separately from the internal cells and tables are not additive. Rounding procedures protect against the perception of disclosure risk compared to record swapping since no small cells appear in the tables. Combining rounding with record swapping raises the level of protection but increases the loss of utility to census tabular outputs. For some statistical analysis, the combination of record swapping and rounding balances to some degree opposing effects that the methods have on the utility of the tables.  相似文献   

18.
Nonparametric estimation and inferences of conditional distribution functions with longitudinal data have important applications in biomedical studies. We propose in this paper an estimation approach based on time-varying parametric models. Our model assumes that the conditional distribution of the outcome variable at each given time point can be approximated by a parametric model, but the parameters are smooth functions of time. Our estimation is based on a two-step smoothing method, in which we first obtain the raw estimators of the conditional distribution functions at a set of disjoint time points, and then compute the final estimators at any time by smoothing the raw estimators. Asymptotic properties, including the asymptotic biases, variances and mean squared errors, are derived for the local polynomial smoothed estimators. Applicability of our two-step estimation method is demonstrated through a large epidemiological study of childhood growth and blood pressure. Finite sample properties of our procedures are investigated through simulation study.  相似文献   

19.
Standard bankruptcy prediction methods lead to models weighted by the types of failure firms included in the estimation sample. These kinds of weighted models may lead to severe classification errors when they are applied to such types of failing (and non-failing) firms which are in the minority in the estimation sample (frequency effect). The purpose of this study is to present a bankruptcy prediction method based on identifying two different failure types, i.e. the solidity and liquidity bankruptcy firms, to avoid the frequency effect. Both of the types are depicted by a theoretical gambler's ruin model of its own to yield an approximation of failure probability separately for both types. These models are applied to the data of randomly selected Finnish bankrupt and non-bankrupt firms. A logistic regression model based on a set of financial variables is used as a benchmark model. Empirical results show that the resulting heavily solidity-weighted logistic model may lead to severe errors in classifying non-bankrupt firms. The present approach will avoid these kinds of error by separately evaluating the probability of the solidity and liquidity bankruptcy; the firm is not classified bankrupt as long as neither of the probabilities exceeds the critical value. This leads the present prediction method slightly to outperform the logistic model in the overall classification accuracy.  相似文献   

20.
In this study, we consider Bayesian methods for the estimation of a sample selection model with spatially correlated disturbance terms. We design a set of Markov chain Monte Carlo algorithms based on the method of data augmentation. The natural parameterization for the covariance structure of our model involves an unidentified parameter that complicates posterior analysis. The unidentified parameter – the variance of the disturbance term in the selection equation – is handled in different ways in these algorithms to achieve identification for other parameters. The Bayesian estimator based on these algorithms can account for the selection bias and the full covariance structure implied by the spatial correlation. We illustrate the implementation of these algorithms through a simulation study and an empirical application.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号