首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 109 毫秒
This paper considers multiple regression procedures for analyzing the relationship between a response variable and a vector of d covariates in a nonparametric setting where tuning parameters need to be selected. We introduce an approach which handles the dilemma that with high dimensional data the sparsity of data in regions of the sample space makes estimation of nonparametric curves and surfaces virtually impossible. This is accomplished by abandoning the goal of trying to estimate true underlying curves and instead estimating measures of dependence that can determine important relationships between variables. These dependence measures are based on local parametric fits on subsets of the covariate space that vary in both dimension and size within each dimension. The subset which maximizes a signal to noise ratio is chosen, where the signal is a local estimate of a dependence parameter which depends on the subset dimension and size, and the noise is an estimate of the standard error (SE) of the estimated signal. This approach of choosing the window size to maximize a signal to noise ratio lifts the curse of dimensionality because for regions with sparsity of data the SE is very large. It corresponds to asymptotically maximizing the probability of correctly finding nonspurious relationships between covariates and a response or, more precisely, maximizing asymptotic power among a class of asymptotic level αt-tests indexed by subsets of the covariate space. Subsets that achieve this goal are called features. We investigate the properties of specific procedures based on the preceding ideas using asymptotic theory and Monte Carlo simulations and find that within a selected dimension, the volume of the optimally selected subset does not tend to zero as n → ∞ unless the volume of the subset of the covariate space where the response depends on the covariate vector tends to zero.  相似文献   

In this paper we examine a multiplicative intensity model in which a covariate interacts with two other covariates in the same model. We demonstrate, analytically, that in such situations a log-linear parameterization based on two pairs of baseline levels cannot be transformed, uniquely, to the, otherwise equivalent, multiplicative parameterization. We show that the problem lies in an oversight of the conditional independence between the two covariates interacting with a common third covariate. As a solution, therefore, we propose an approach that takes due account of such dependence. Our proposed approach uses a common baseline level for the three covariates involved in interaction while estimating the corresponding relative intensities. The issues addressed are illustrated with a demographic data set involving the estimation of rates of transition to parenthood.  相似文献   

Summarizing the effect of many covariates through a few linear combinations is an effective way of reducing covariate dimension and is the backbone of (sufficient) dimension reduction. Because the replacement of high‐dimensional covariates by low‐dimensional linear combinations is performed with a minimum assumption on the specific regression form, it enjoys attractive advantages as well as encounters unique challenges in comparison with the variable selection approach. We review the current literature of dimension reduction with an emphasis on the two most popular models, where the dimension reduction affects the conditional distribution and the conditional mean, respectively. We discuss various estimation and inference procedures in different levels of detail, with the intention of focusing on their underneath idea instead of technicalities. We also discuss some unsolved problems in this area for potential future research.  相似文献   

The brown rat lives with man in a wide variety of environmental contexts and adversely affects public health by transmission of diseases, bites, and allergies. Understanding behavioral and spatial correlation aspects of pest species can contribute to their effective management and control. Rat sightings can be described by spatial coordinates in a particular region of interest defining a spatial point pattern. In this paper, we investigate the spatial structure of rat sightings in the Latina district of Madrid (Spain) and its relation to a number of distance‐based covariates that relate to the proliferation of rats. Given a number of locations, biologically considered as attractor points, the spatial dependence is modeled by distance‐based covariates and angular orientations through copula functions. We build a particular spatial trivariate distribution using univariate margins coming from the covariate information and provide predictive distributions for such distances and angular orientations.  相似文献   

In this paper nonparametric instrumental variable estimation of local average treatment effects (LATE) is extended to incorporate covariates. Estimation of LATE is appealing since identification relies on much weaker assumptions than the identification of average treatment effects in other nonparametric instrumental variable models. Including covariates in the estimation of LATE is necessary when the instrumental variable itself is confounded, such that the IV assumptions are valid only conditional on covariates. Previous approaches to handle covariates in the estimation of LATE relied on parametric or semiparametric methods. In this paper, a nonparametric estimator for the estimation of LATE with covariates is suggested that is root-n asymptotically normal and efficient.  相似文献   

This paper examines a flexible way to model empirically discrete data outcomes using ‘hazard rate’ decompositions. It presents a general data‐generating mechanism based on potential outcomes to describe why the approach should work for almost any discrete distribution. Monte Carlo evidence indicates that these models estimate well the impacts of covariates on expected counts when the data follow a Poisson distribution. With data from more complex processes, these estimators continue to perform well. Since most economic count outcomes arise from occurrence‐dependent behavioral processes, using flexibly estimated distributions should reduce the dependence of results on convenient but invalid assumptions. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

crs is a library for R written by Jeffrey S. Racine (Maintainer) and Zhenghua Nie. This add‐on package provides a collection of functions for spline‐based nonparametric estimation of regression functions with both continuous and categorical regressors. Currently, the crs package integrates data‐driven methods for selecting the spline degree, the number of knots and the necessary bandwidths for nonparametric conditional mean, IV and quantile regression. A function for multivariate density spline estimation with mixed data is also currently in the works. As a bonus, the authors have also provided the first simple R interface to the NOMAD (‘nonsmooth mesh adaptive direct search’) optimization solver which can be applied to solve other mixed integer optimization problems that future users might find useful in other settings. Although the crs package shares some of the same functionalities as its kernel‐based counterpart—the np package by the same author—it currently lacks some of the features the np package provides, such as hypothesis testing and semiparametric estimation. However, what it lacks in breadth, crs makes up in speed. A Monte Carlo experiment in this review uncovers sizable speed gains compared to its np counterpart, with a marginal loss in terms of goodness of fit. Therefore, the package will be extremely useful for applied econometricians interested in employing nonparametric techniques using large amounts of data with a small number of discrete covariates. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

In this paper, we study an estimation problem where the variables of interest are subject to both right censoring and measurement error. In this context, we propose a nonparametric estimation strategy of the hazard rate, based on a regression contrast minimized in a finite‐dimensional functional space generated by splines bases. We prove a risk bound of the estimator in terms of integrated mean square error and discuss the rate of convergence when the dimension of the projection space is adequately chosen. Then we define a data‐driven criterion of model selection and prove that the resulting estimator performs an adequate compromise. The method is illustrated via simulation experiments that show that the strategy is successful.  相似文献   

We develop methods for analysing the 'interaction' or dependence between points in a spatial point pattern, when the pattern is spatially inhomogeneous. Completely non-parametric study of interactions is possible using an analogue of the K -function. Alternatively one may assume a semi-parametric model in which a (parametrically specified) homogeneous Markov point process is subjected to (non-parametric) inhomogeneous independent thinning. The effectiveness of these approaches is tested on datasets representing the positions of trees in forests.  相似文献   

This paper studies functional coefficient regression models with nonstationary time series data, allowing also for stationary covariates. A local linear fitting scheme is developed to estimate the coefficient functions. The asymptotic distributions of the estimators are obtained, showing different convergence rates for the stationary and nonstationary covariates. A two-stage approach is proposed to achieve estimation optimality in the sense of minimizing the asymptotic mean squared error. When the coefficient function is a function of a nonstationary variable, the new findings are that the asymptotic bias of its nonparametric estimator is the same as the stationary covariate case but convergence rate differs, and further, the asymptotic distribution is a mixed normal, associated with the local time of a standard Brownian motion. The asymptotic behavior at boundaries is also investigated.  相似文献   

Second‐order orientation methods provide a natural tool for the analysis of spatial point process data. In this paper, we extend to the spatiotemporal setting the spatial point pair orientation distribution function. The new space–time orientation distribution function is used to detect space–time anisotropic configurations. An edge‐corrected estimator is defined and illustrated through a simulation study. We apply the resulting estimator to data on the spatiotemporal distribution of fire ignition events caused by humans in a square area of 30 × 30 km2 for 4 years. Our results confirm that our approach is able to detect directional components at distinct spatiotemporal scales. © 2014 The Authors. Statistica Neerlandica © 2014 VVS.  相似文献   

A brief survey on methods to handle non-proportional hazards in survival analysis is given with emphasis on short-term and long-term hazard ratio modelling. A drawback of the existing model of this nature is that except at time zero or infinity, the hazard ratio for a unit increase in the value of a covariate depends on the starting value. With two or more covariates, the hazard ratio for a unit increase in one covariate with other covariates held fixed depends in an unintended way on the values of the other covariates. We propose an alternative way to model short-term and long-term hazard ratios without the above drawbacks through a judicious choice of covariate-time interactions. Under the new model, it is easier to describe the time-varying effect of each covariate on the hazard. Nonparametric maximum likelihood estimation for the new model can be carried out in the same way as for the existing model. We also propose a product version of the existing model, which overcomes its second drawback but not the first. The advocated covariate–time interaction model provides a better fit to the Veterans Administration lung cancer data set than the original and product versions of the existing model.  相似文献   

Estimation with longitudinal Y having nonignorable dropout is considered when the joint distribution of Y and covariate X is nonparametric and the dropout propensity conditional on (Y,X) is parametric. We apply the generalised method of moments to estimate the parameters in the nonignorable dropout propensity based on estimating equations constructed using an instrument Z, which is part of X related to Y but unrelated to the dropout propensity conditioned on Y and other covariates. Population means and other parameters in the nonparametric distribution of Y can be estimated based on inverse propensity weighting with estimated propensity. To improve efficiency, we derive a model‐assisted regression estimator making use of information provided by the covariates and previously observed Y‐values in the longitudinal setting. The model‐assisted regression estimator is protected from model misspecification and is asymptotically normal and more efficient when the working models are correct and some other conditions are satisfied. The finite‐sample performance of the estimators is studied through simulation, and an application to the HIV‐CD4 data set is also presented as illustration.  相似文献   

Spatial marked point processes are models for systems of points which are randomly distributed in space and provided with measured quantities called marks. This study deals with marking, that is methods of constructing marked point processes from unmarked ones. The focus is density‐dependent marking where the local point intensity affects the mark distribution. This study develops new markings for log Gaussian Cox processes. In these markings, both the mean and variance of the mark distribution depend on the local intensity. The mean, variance and mark correlation properties are presented for the new markings, and a Bayesian estimation procedure is suggested for statistical inference. The performance of the new approach is studied by means of simulation experiments. As an example, a tropical rainforest data is modelled.  相似文献   

This paper analyzes the endogeneity bias problem caused by associations of members within a network when the spatial autoregressive (SAR) model is used to study social interactions. When there are unobserved factors that affect both friendship decisions and economic outcomes, the spatial weight matrix (sociomatrix; adjacency matrix) in the SAR model, which represents the structure of a friendship network, might correlate with the disturbance term of the model, and consequently result in an endogenous selection problem in the outcomes. We consider this problem of selection bias with a modeling approach. In this approach, a statistical network model is adopted to explain the endogenous network formation process. By specifying unobserved components in both the network model and the SAR model, we capture the correlation between the processes of network and outcome formation, and propose a proper estimation procedure for the system. We demonstrate that the estimation of this system can be effectively done by using the Bayesian method. We provide a Monte Carlo experiment and an empirical application of this modeling approach on the friendship networks of high school students and their interactions on academic performance in the Add Health data. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

Covariate information is often available in randomised clinical trials for each subject prior to treatment assignment and is commonly utilised to make covariate adjustment for baseline characteristics predictive of the outcome in order to increase precision and improve power in the detection of a treatment effect. Motivated by a nonparametric covariance analysis, we study a projection approach to making objective covariate adjustment in randomised clinical trials on the basis of two unbiased estimating functions that decouple the outcome and covariate data. The proposed projection approach extends a weighted least‐squares procedure by projecting one of the estimating functions onto the linear subspace spanned by the other estimating function that is E‐ancillary for the average treatment effect. Compared with the weighted least‐squares method, the projection method allows for objective inference on the average treatment effect by exploiting the treatment specific covariate–outcome associations. The resulting projection‐based estimator of the average treatment effect is asymptotically efficient when the treatment‐specific working regression models are correctly specified and is asymptotically more efficient than other existing competitors when the treatment‐specific working regression models are misspecified. The proposed projection method is illustrated by an analysis of data from an HIV clinical trial. In a simulation study, we show that the proposed projection method compares favourably with its competitors in finite samples.  相似文献   

Contaminated or corrupted data typically require strong assumptions to identify parameters of interest. However, weaker assumptions often identify bounds on these parameters. This paper addresses when covariate data—variables in addition to the one of interest—tighten these bounds. First, we construct the identification region for the distribution of the variable of interest. This region demonstrates that covariate data are useless without knowledge about the distribution of erroneous data conditional on the covariates. Then, we develop bounds both on probabilities and on parameters of this distribution that respect stochastic dominance.  相似文献   

We investigate a novel database of 10,217 extreme operational losses from the Italian bank UniCredit. Our goal is to shed light on the dependence between the severity distribution of these losses and a set of macroeconomic, financial, and firm‐specific factors. To do so, we use generalized Pareto regression techniques, where both the scale and shape parameters are assumed to be functions of these explanatory variables. We perform the selection of the relevant covariates with a state‐of‐the‐art penalized‐likelihood estimation procedure relying on L1‐penalty terms. A simulation study indicates that this approach efficiently selects covariates of interest and tackles spurious regression issues encountered when dealing with integrated time series. Lastly, we illustrate the impact of different economic scenarios on the requested capital for operational risk. Our results have important implications in terms of risk management and regulatory policy.  相似文献   

This paper addresses the problem of estimation of a nonparametric regression function from selectively observed data when selection is endogenous. Our approach relies on independence between covariates and selection conditionally on potential outcomes. Endogeneity of regressors is also allowed for. In the exogenous and endogenous case, consistent two-step estimation procedures are proposed and their rates of convergence are derived. Pointwise asymptotic distribution of the estimators is established. In addition, bootstrap uniform confidence bands are obtained. Finite sample properties are illustrated in a Monte Carlo simulation study and an empirical illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号