Jing Pan  Yuan Yu  Yong Zhou 《Metrika》2018,81(7):821-847
With the explosion of digital information, high-dimensional data is frequently collected in prevalent domains, in which the dimension of covariates can be much larger than the sample size. Many effective methods have been developed to reduce the dimension of such data recently, however, few methods might perform well for survival data with censoring. In this article, we develop a novel nonparametric feature screening procedure based on ultrahigh-dimensional survival data by incorporating the inverse probability weighting scheme to tackle the issue of censoring. The proposed method is model-free and hence can be implemented for extensive survival models. Moreover, it is robust to heterogeneity and invariant to monotone increasing transformations of the response. The sure screening property and ranking consistency property are also established under mild conditions. The competence and robustness of our method is further confirmed through comprehensive simulation studies and an analysis of a real data example.  相似文献   

The existing methods for feature screening focus mainly on the mean function of regression models. The variance function, however, plays an important role in statistical theory and application. We thus investigate feature screening for mean and variance functions with multiple-index framework in high dimensional regression models. Notice that some information about predictors can be known in advance from previous investigations and experience, for example, a certain set of predictors is related to the response. Based on the conditional information, together with empirical likelihood, we propose conditional feature screening procedures. Our methods can consistently estimate the sets of active predictors in the mean and variance functions. It is interesting that the proposed screening procedures can avoid estimating the unknown link functions in the mean and variance functions, and moreover, can work well in the case of high correlation among the predictors without iterative algorithm. Therefore, our proposal is of computational simplicity. Furthermore, as a conditional method, our method is robust to the choice of the conditional set. The theoretical results reveal that the proposed procedures have sure screening properties. The attractive finite sample performance of our method is illustrated in simulations and a real data application.  相似文献   

In this paper, we propose a new approach to the empirical likelihood inference for the parameters in heteroscedastic partially linear single-index models. In the growing dimensional setting, it is proved that estimators based on semiparametric efficient score have the asymptotic consistency, and the limit distribution of the empirical log-likelihood ratio statistic for parameters \((\beta ^{\top },\theta ^{\top })^{\top }\) is a normal distribution. Furthermore, we show that the empirical log-likelihood ratio based on the subvector of \(\beta \) is an asymptotic chi-square random variable, which can be used to construct the confidence interval or region for the subvector of \(\beta \). The proposed method can naturally be applied to deal with pure single-index models and partially linear models with high-dimensional data. The performance of the proposed method is illustrated via a real data application and numerical simulations.  相似文献   

This article is concerned with feature screening for varying coefficient models with ultrahigh-dimensional predictors. We propose a new sure independence screening method based on quantile partial correlation (QPC-SIS), which is quite robust against outliers and heavy-tailed distributions. Then we establish the sure screening property for the QPC-SIS, and conduct simulations to examine its finite sample performance. The results of simulation study indicate that the QPC-SIS performs better than other methods like sure independent screening (SIS), sure independent ranking and screening, distance correlation-sure independent screening, conditional correlation sure independence screening and nonparametric independent screening, which shows the validity and rationality of QPC-SIS.  相似文献   

In hypotheses testing, such as other statistical problems, we may confront imprecise concepts. One case is a situation in which both hypotheses and observations are imprecise. This paper tries to develop a new approach for testing fuzzy hypothesis when the available data are fuzzy, too. First, some definitions are provided, such as: fuzzy sample space, fuzzy-valued random sample, and fuzzy-valued random variable. Then, the problem of fuzzy hypothesis testing with vague data is formulated. Finally, we state and prove a generalized Neyman–Pearson Lemma for such problem. The proposed approach is illustrated by some numerical examples.  相似文献   

由于金融市场是动荡不定的,资产定价模型CAPM往往会出现结构突变,异方差,序列相关,因此需要对CAPM的随机误差进行齐性检验。对于具有单个结构突变点的CAPM,本文得到了检验阶段异方差和自相关性的调整LM检验统计量。Monte Carlo模拟的结果显示,该调整LM检验统计量具有比普通LM检验统计量更好的检验功效。最后,我们用一个具体的实例论证了方法的有效性。  相似文献   

A test statistic is developed for making inference about a block‐diagonal structure of the covariance matrix when the dimensionality p exceeds n, where n = N ? 1 and N denotes the sample size. The suggested procedure extends the complete independence results. Because the classical hypothesis testing methods based on the likelihood ratio degenerate when p > n, the main idea is to turn instead to a distance function between the null and alternative hypotheses. The test statistic is then constructed using a consistent estimator of this function, where consistency is considered in an asymptotic framework that allows p to grow together with n. The suggested statistic is also shown to have an asymptotic normality under the null hypothesis. Some auxiliary results on the moments of products of multivariate normal random vectors and higher‐order moments of the Wishart matrices, which are important for our evaluation of the test statistic, are derived. We perform empirical power analysis for a number of alternative covariance structures.  相似文献   

A test statistic is considered for testing a hypothesis for the mean vector for multivariate data, when the dimension of the vector, p, may exceed the number of vectors, n, and the underlying distribution need not necessarily be normal. With n,p→∞, and under mild assumptions, but without assuming any relationship between n and p, the statistic is shown to asymptotically follow a chi‐square distribution. A by product of the paper is the approximate distribution of a quadratic form, based on the reformulation of the well‐known Box's approximation, under high‐dimensional set up. Using a classical limit theorem, the approximation is further extended to an asymptotic normal limit under the same high dimensional set up. The simulation results, generated under different parameter settings, are used to show the accuracy of the approximation for moderate n and large p.  相似文献   

The present penalized quantile variable selection methods are only applicable to finite number of predictors or do not have oracle property associated with estimator. This technique is considered as an alternative to ordinary least squares regression in case of the outliers and the heavy‐tailed errors existing in linear models. The variable selection through quantile regression with diverging number of parameters is investigated in this paper. The convergence rate of estimator with smoothly clipped absolute deviation penalty function is also studied. Moreover, the oracle property with proper selection of tuning parameter for quantile regression under certain regularity conditions is also established. In addition, the rank correlation screening method is used to accommodate ultra‐high dimensional data settings. Monte Carlo simulations demonstrate finite performance of the proposed estimator. The results of real data reveal that this approach provides substantially more information as compared with ordinary least squares, conventional quantile regression, and quantile lasso.  相似文献   

In this paper, we present an algorithm suitable for analysing the variance of panel data when some observations are either given in grouped form or are missed. The analysis is carried out from the perspective of ANOVA panel data models with general errors. The classification intervals of the grouped observations may vary from one to another, thus the missing observations are in fact a particular case of grouping. The proposed Algorithm (1) estimates the parameters of the panel data models; (2) evaluates the covariance matrices of the asymptotic distribution of the time-dependent parameters assuming that the number of time periods, T, is fixed and the number of individuals, N, tends to infinity and similarly, of the individual parameters when T → ∞ and N is fixed; and, finally, (3) uses these asymptotic covariance matrix estimations to analyse the variance of the panel data.  相似文献   

We examine a consistent test for the correct specification of a regression function with dependent data. The test is based on the supremum of the difference between the parametric and nonparametric estimates of the regression model. Rather surprisingly, the behaviour of the test depends on whether the regressors are deterministic or stochastic. In the former situation, the normalization constants necessary to obtain the limiting Gumbel distribution are data dependent and difficult to estimate, so it may be difficult to obtain valid critical values, whereas, in the latter, the asymptotic distribution may not be even known. Because of that, under very mild regularity conditions, we describe a bootstrap analogue for the test, showing its asymptotic validity and finite sample behaviour in a small Monte-Carlo experiment.  相似文献   

Asymptotic theory for nonparametric regression with spatial data   总被引:1,自引:0,他引:1  
Nonparametric regression with spatial, or spatio-temporal, data is considered. The conditional mean of a dependent variable, given explanatory ones, is a nonparametric function, while the conditional covariance reflects spatial correlation. Conditional heteroscedasticity is also allowed, as well as non-identically distributed observations. Instead of mixing conditions, a (possibly non-stationary) linear process is assumed for disturbances, allowing for long range, as well as short-range, dependence, while decay in dependence in explanatory variables is described using a measure based on the departure of the joint density from the product of marginal densities. A basic triangular array setting is employed, with the aim of covering various patterns of spatial observation. Sufficient conditions are established for consistency and asymptotic normality of kernel regression estimates. When the cross-sectional dependence is sufficiently mild, the asymptotic variance in the central limit theorem is the same as when observations are independent; otherwise, the rate of convergence is slower. We discuss the application of our conditions to spatial autoregressive models, and models defined on a regular lattice.  相似文献   

Generalized Efficiency Measures (GEMS) for use in DEA are developed and analyzed in a context of differing models where they might be employed. The additive model of DEA is accorded a central role and developed in association with a new measure of efficiency referred to as RAM (Range Adjusted Measure). The need for separately treating input oriented and output oriented approaches to efficient measurement is eliminated because additive models effect their evaluations by maximizing distance from the efficient frontier (in 1, or weighted 1, measure) and thereby simultaneously maximize outputs and minimize inputs. Contacts with other models and approaches are maintained with theorems and accompanying proofs to ensure the validity of the thus identified relations. New criteria are supplied, both managerial and mathematical, for evaluating proposed measures. The concept of approximating models is used to further extend these possibilities. The focus of the paper is on the physical aspects of performance involved in technical and mix inefficiencies. However, an Appendix shows how overall, allocative and technical inefficiencies may be incorporated in additive models.  相似文献   

A general framework for frontier estimation with panel data   总被引:1,自引:0,他引:1  
The main objective of the paper is to present a general framework for estimating production frontier models with panel data. A sample of firms i = 1, ..., N is observed on several time periods t = 1, ... T. In this framework, nonparametric stochastic models for the frontier will be analyzed. The usual parametric formulations of the literature are viewed as particular cases and the convergence of the obtained estimators in this general framework are investigated. Special attention is devoted to the role of N and of T on the speeds of convergence of the obtained estimators. First, a very general model is investigated. In this model almost no restriction is imposed on the structure of the model or of the inefficiencies. This model is estimable from a nonparametric point of view but needs large values of T and of N to obtain reliable estimates of the individual production functions and estimates of the frontier function. Then more specific nonparametric firm effect models are presented. In these cases, only NT must be large to estimate the common production function; but again both large N and T are needed for estimating individual efficiencies and for estimating the frontier. The methods are illustrated through a numerical example with real data.  相似文献   

In this paper, empirical likelihood inferences for varying-coefficient single-index model with right-censored data are investigated. By a synthetic data approach, we propose an empirical log-likelihood ratio function for the index parameters, which are of primary interest, and show that its limiting distribution is a mixture of central chi-squared distributions. In order that the Wilks’ phenomenon holds, we propose an adjusted empirical log-likelihood ratio for the index parameters. The adjusted empirical log-likelihood is shown to have a standard chi-squared limiting distribution. Simulation studies are undertaken to assess the finite sample performance of the proposed confidence intervals. A real example is presented for illustration.  相似文献   

We consider balanced incomplete block data when ties occur and propose new statistics for testing (a) differences in mean ranks, (b) differences in distributions of ranks, (c) differences in nonlinear effects of ranks and (d) linear contrasts. A sensory evaluation example where the data are ranks is given.  相似文献   

The purchase behaviour of consumers is observed in a panel during a month. The quantity of interest is the penetration of a product. The problem is that this quantity has to be estimated on the basis of incomplete data. For some or all respondents some weeks are missing. To this end the purchasing process is modeled with a variety of stochastic processes. The performance of some existing models is compared for penetrations of the complete population, but also for Bayesian estimates in subpopulations.  相似文献   

Quality & Quantity - This article refers to theory construction on the basis of binary data, where a configuration of several yes/no-variables is used in order to explain a binary outcome. The...  相似文献   

