共查询到20条相似文献,搜索用时 15 毫秒
1.
The center of a univariate data set {x 1,…,x n} can be defined as the point μ that minimizes the norm of the vector of distances y′=(|x 1−μ|,…,|x n−μ|). As the median and the mean are the minimizers of respectively the L 1- and the L 2-norm of y, they are two alternatives to describe the center of a univariate data set. The center μ of a multivariate data set {x 1,…,x n} can also be defined as minimizer of the norm of a vector of distances. In multivariate situations however, there are several kinds of distances. In this note, we consider the vector of L 1-distances y′1=(∥x 1- μ∥1,…,∥x n- μ∥1) and the vector of L 2-distances y′2=(∥x 1- μ∥2,…,∥x n-μ∥2). We define the L 1-median and the L 1-mean as the minimizers of respectively the L 1- and the L 2-norm of y 1; and then the L 2-median and the L 2-mean as the minimizers of respectively the L 1- and the L 2-norm of y 2. In doing so, we obtain four alternatives to describe the center of a multivariate data set. While three of them have been already investigated in the statistical literature, the L 1-mean appears to be a new concept. Received January 1999 相似文献
2.
Over the last decades, several methods for selecting the bandwidth have been introduced in kernel regression. They differ quite a bit, and although there already exist more selection methods than for any other regression smoother, one can still observe coming up new ones. Given the need of automatic data‐driven bandwidth selectors for applied statistics, this review is intended to explain and, above all, compare these methods. About 20 different selection methods have been revised, implemented and compared in an extensive simulation study. 相似文献
3.
Joseph Ficek Wei Wang Henian Chen Getachew Dagne Ellen Daley 《Revue internationale de statistique》2021,89(1):132-147
Differential privacy is a framework for data analysis that provides rigorous privacy protections for database participants. It has increasingly been accepted as the gold standard for privacy in the analytics industry, yet there are few techniques suitable for statistical inference in the health sciences. This is notably the case for regression, one of the most widely used modelling tools in clinical and epidemiological studies. This paper provides an overview of differential privacy and surveys the literature on differentially private regression, highlighting the techniques that hold the most relevance for statistical inference as practiced in clinical and epidemiological research. Research gaps and opportunities for further inquiry are identified. 相似文献
4.
To weight or not to weight in regression analyses with survey data has been debated in the literature. The problem is essentially a tradeoff between the bias and the variance of the regression coefficient estimator. An array of diagnostic tests for informative weights have been developed. Nonetheless, studies comparing the performance of the tests, especially for finite samples, are scarce, and the theoretical equivalence of some tests has not been investigated. Focusing on the linear regression setting, we review a collection of such tests and propose enhanced versions of some of them that require an auxiliary regression model for the weight. Further, the equivalence of two popular tests is established which has not been reported before. In contrast to existing reviews with no empirical comparison, we compare the sizes and powers of the tests in simulation studies. The reviewed tests are applied to a regression analysis of the family expenditure using the data from the China Family Panel Study. 相似文献
5.
José García-Pérez María del Mar López-Martín Catalina García-García Román Salmerón-Gómez 《Revue internationale de statistique》2020,88(3):776-792
Justifying ridge regression from a geometrical perspective is one of the main contributions of this paper. To the best of our knowledge, this question has not been treated previously. This paper shows that ridge regression is a particular case of raising procedures that provide greater flexibility by transforming the matrix X associated with the model. Thus, raising procedures, based on a geometrical idea of the vectorial space associated with the columns of matrix X , lead naturally to ridge regression and justify the presence of the well-known constant k on the main diagonal of matrix X ′ X . This paper also analyses and compares different alternatives to raising with respect to collinearity mitigation. The results are illustrated with an empirical application. 相似文献
6.
Traditional linear programming algorithms for quantile regression, for example, the simplex method and the interior point method, work well for data of small to moderate sizes. However, these methods are difficult to generalize to high‐dimensional big data for which penalization is usually necessary. Further, the massive size of contemporary big data calls for the development of large‐scale algorithms on distributed computing platforms. The traditional linear programming algorithms are intrinsically sequential and not suitable for such frameworks. In this paper, we discuss how to use the popular ADMM algorithm to solve large‐scale penalized quantile regression problems. The ADMM algorithm can be easily parallelized and implemented in modern distributed frameworks. Simulation results demonstrate that the ADMM is as accurate as traditional LP algorithms while faster even in the nonparallel case. 相似文献
7.
协同过滤推荐算法综述 总被引:3,自引:0,他引:3
推荐系统是电子商务系统中最重要的技术之一,协同过滤推荐技术是目前应用最广泛和最成功的推荐技术。本文首先介绍了协同过滤的基本概念和原理,然后总结了协同过滤推荐算法中的关键问题和相关解决方案,最后介绍了协同过滤推荐算法需要进一步解决的问题和可能的发展方向。 相似文献
8.
回归分析是数理统计中的一个重要内容,是利用统计学原理寻求隐藏在随机现象中的统计规律的计算方法和理论,它在各个学科领域以及社会经济各部门都得到广泛应用。运用回归分析建立回归模型,并通过逐步回归求得"最优"结果,利用最优回归模型对规模以上企业效益未来发展进行预测,从而为有关部门的决策提供一定的科学依据。 相似文献
9.
《Revue internationale de statistique》2017,85(2):228-249
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images and so on are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorising the basic model types as linear, non‐linear and non‐parametric. We discuss publicly available software packages and illustrate some of the procedures by application to a functional magnetic resonance imaging data set. 相似文献
10.
11.
Pieter H. A. J. M. van Gelder 《Statistica Neerlandica》2013,67(2):181-189
Let D be an invertible matrix and L1 (x) denote the well‐known 1‐norm of x∈Rn. In this note we analyse the kth moment of the ratio L1 (D?1z) and its asymptotics in special cases, where z is uniformly distributed over the unit hypersphere. Conceptually, L1 (D?1z) can be seen as the (weighted) length, measured along fixed orientations, of a path connecting any two points relative to their straight line distance. 相似文献
12.
Mixture regression models have been widely used in business, marketing and social sciences to model mixed regression relationships arising from a clustered and thus heterogeneous population. The unknown mixture regression parameters are usually estimated by maximum likelihood estimators using the expectation–maximisation algorithm based on the normality assumption of component error density. However, it is well known that the normality-based maximum likelihood estimation is very sensitive to outliers or heavy-tailed error distributions. This paper aims to give a selective overview of the recently proposed robust mixture regression methods and compare their performance using simulation studies. 相似文献
13.
14.
José García Román Salmerón Catalina García María del Mar López Martín 《Revue internationale de statistique》2016,84(2):245-266
Ridge estimation (RE) is an alternative method to ordinary least squares when there exists a collinearity problem in a linear regression model. The variance inflator factor (VIF) is applied to test if the problem exists in the original model and is also necessary after applying the ridge estimate to check if the chosen value for parameter k has mitigated the collinearity problem. This paper shows that the application of the original data when working with the ridge estimate leads to non‐monotone VIF values. García et al. (2014) showed some problems with the traditional VIF used in RE. We propose an augmented VIF, VIFR(j,k), associated with RE, which is obtained by standardizing the data before augmenting the model. The VIFR(j,k) will coincide with the VIF associated with the ordinary least squares estimator when k = 0. The augmented VIF has the very desirable properties of being continuous, monotone in the ridge parameter and higher than one. 相似文献
15.
J.P.C. Kleijnen J. Kriens H. Timmermans H. van den Wildenberg 《Statistica Neerlandica》1989,43(4):193-209
Several confidence intervals for the regression estimator are surveyed. A Monte Carlo experiment, based on the NETER and LOEBBECKE (1975) populations, gives estimated coverages and lengths of the different confidence intervals. One interval is exact under the assumption of multivariate normal distributions; it gives longer intervals (hence better coverages) than the interval based on a popular variance estimator. An interval due to ROBERTS (1970) is much too long. Jackknifing gives robust intervals. Rules of thumb for practitioners are given. 相似文献
16.
Logistic regression analysis may well be used to develop a predictive model for a dichotomous medical outcome, such as short-term mortality. When the data set is small compared to the number of covariables studied, shrinkage techniques may improve predictions. We compared the performance of three variants of shrinkage techniques: 1) a linear shrinkage factor, which shrinks all coefficients with the same factor; 2) penalized maximum likelihood (or ridge regression), where a penalty factor is added to the likelihood function such that coefficients are shrunk individually according to the variance of each covariable; 3) the Lasso, which shrinks some coefficients to zero by setting a constraint on the sum of the absolute values of the coefficients of standardized covariables.
Logistic regression models were constructed to predict 30-day mortality after acute myocardial infarction. Small data sets were created from a large randomized controlled trial, half of which provided independent validation data. We found that all three shrinkage techniques improved the calibration of predictions compared to the standard maximum likelihood estimates. This study illustrates that shrinkage is a valuable tool to overcome some of the problems of overfitting in medical data. 相似文献
Logistic regression models were constructed to predict 30-day mortality after acute myocardial infarction. Small data sets were created from a large randomized controlled trial, half of which provided independent validation data. We found that all three shrinkage techniques improved the calibration of predictions compared to the standard maximum likelihood estimates. This study illustrates that shrinkage is a valuable tool to overcome some of the problems of overfitting in medical data. 相似文献
17.
土工试验数据处理通常是通过繁琐计算与作图来完成,且要求及时、准确。线性回归处理法通过一定的数理关系有效简化数据处理的计算与作图过程。通过大量的剪切、液塑限试验数据处理表明,其结果是准确的、值得信赖的,它通过线性回归简化繁琐计算,使作图也变得简单。本文主要针对土工试验中的线性回归应用进行分析,从大量的剪切、液塑限联合试验中论证其结果的准确性。 相似文献
18.
Wei‐Yin Loh 《Revue internationale de statistique》2014,82(3):329-348
Fifty years have passed since the publication of the first regression tree algorithm. New techniques have added capabilities that far surpass those of the early methods. Modern classification trees can partition the data with linear splits on subsets of variables and fit nearest neighbor, kernel density, and other models in the partitions. Regression trees can fit almost every kind of traditional statistical model, including least‐squares, quantile, logistic, Poisson, and proportional hazards models, as well as models for longitudinal and multiresponse data. Greater availability and affordability of software (much of which is free) have played a significant role in helping the techniques gain acceptance and popularity in the broader scientific community. This article surveys the developments and briefly reviews the key ideas behind some of the major algorithms. 相似文献
19.
回归分析是一种运用十分广泛的统计分析方法,软件质量控制是在软件研发过程中为了保证最终软件产品质量而开展的评审、测试等活动。在软件质量控制活动中引入回归分析,可以有效地挖掘出影响各类质量控制活动的显著因素,通过对显著因素的关注和调控,可以改善软件质量控制活动的策划工作和实施效果,从而保证中间和最终交付的软件产品质量。 相似文献
20.
供应链管理的绩效评价,对于供应链的运作和管理是至关重要的。本文将基于支持向量回归的数据挖掘方法,用于供应链管理的绩效评价研究中。并结合实例,讨论了支持向量回归在供应链管理绩效评价中的应用及其特点。 相似文献