首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mean profiles are widely used as indicators of the electricity consumption habits of customers. Currently, in Électricité De France (EDF), class load profiles are estimated using point‐wise mean profiles. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high‐levels of consumption. In this paper, we propose an alternative to the mean profile: the L 1 ‐ median profile which is more robust. When dealing with large data sets of functional data (load curves for example), survey sampling approaches are useful for estimating the median profile avoiding storing the whole data. We propose here several sampling strategies and estimators to estimate the median trajectory. A comparison between them is illustrated by means of a test population. We develop a stratification based on the linearized variable which substantially improves the accuracy of the estimator compared to simple random sampling without replacement. We suggest also an improved estimator that takes into account auxiliary information. Some potential areas for future research are also highlighted.  相似文献   

2.
We present a modern perspective of the conditional likelihood approach to the analysis of capture‐recapture experiments, which shows the conditional likelihood to be a member of generalized linear model (GLM). Hence, there is the potential to apply the full range of GLM methodologies. To put this method in context, we first review some approaches to capture‐recapture experiments with heterogeneous capture probabilities in closed populations, covering parametric and non‐parametric mixture models and the use of covariates. We then review in more detail the analysis of capture‐recapture experiments when the capture probabilities depend on a covariate.  相似文献   

3.
Estimation with longitudinal Y having nonignorable dropout is considered when the joint distribution of Y and covariate X is nonparametric and the dropout propensity conditional on (Y,X) is parametric. We apply the generalised method of moments to estimate the parameters in the nonignorable dropout propensity based on estimating equations constructed using an instrument Z, which is part of X related to Y but unrelated to the dropout propensity conditioned on Y and other covariates. Population means and other parameters in the nonparametric distribution of Y can be estimated based on inverse propensity weighting with estimated propensity. To improve efficiency, we derive a model‐assisted regression estimator making use of information provided by the covariates and previously observed Y‐values in the longitudinal setting. The model‐assisted regression estimator is protected from model misspecification and is asymptotically normal and more efficient when the working models are correct and some other conditions are satisfied. The finite‐sample performance of the estimators is studied through simulation, and an application to the HIV‐CD4 data set is also presented as illustration.  相似文献   

4.
Small area estimation typically requires model‐based methods that depend on isolating the contribution to overall population heterogeneity associated with group (i.e. small area) membership. One way of doing this is via random effects models with latent group effects. Alternatively, one can use an M‐quantile ensemble model that assigns indices to sampled individuals characterising their contribution to overall sample heterogeneity. These indices are then aggregated to form group effects. The aim of this article is to contrast these two approaches to characterising group effects and to illustrate them in the context of small area estimation. In doing so, we consider a range of different data types, including continuous data, count data and binary response data.  相似文献   

5.
In this article, we propose a new method for estimating the randomisation (design‐based) mean squared error (DMSE) of model‐dependent small area predictors. Analogously to classical survey sampling theory, the DMSE considers the finite population values as fixed numbers and accounts for the MSE of small area predictors over all possible sample selections. The proposed method models the true DMSE as computed for synthetic populations and samples drawn from them, as a function of known statistics and then applies the model to the original sample. Several simulation studies for the linear area‐level model and the unit‐level mixed logistic model illustrate the performance of the proposed method and compare it with the performance of other DMSE estimators proposed in the literature.  相似文献   

6.
The effective use of spatial information in a regression‐based approach to small area estimation is an important practical issue. One approach to account for geographic information is by extending the linear mixed model to allow for spatially correlated random area effects. An alternative is to include the spatial information by a non‐parametric mixed models. Another option is geographic weighted regression where the model coefficients vary spatially across the geography of interest. Although these approaches are useful for estimating small area means efficiently under strict parametric assumptions, they can be sensitive to outliers. In this paper, we propose robust extensions of the geographically weighted empirical best linear unbiased predictor. In particular, we introduce robust projective and predictive estimators under spatial non‐stationarity. Mean squared error estimation is performed by two analytic approaches that account for the spatial structure in the data. Model‐based simulations show that the methodology proposed often leads to more efficient estimators. Furthermore, the analytic mean squared error estimators introduced have appealing properties in terms of stability and bias. Finally, we demonstrate in the application that the new methodology is a good choice for producing estimates for average rent prices of apartments in urban planning areas in Berlin.  相似文献   

7.
When sampling a batch consisting of particulate material, the distribution of a sample estimator can be characterized using knowledge about the sample drawing process. With Bernoulli sampling, the number of particles in the sample is binomially distributed. Because this is rarely realized in practice, we propose a sampling design in which the possible samples have a nearly equal mass. Expected values and variances of the sample estimator are calculated. It is shown that the sample estimator becomes identical to the Horvitz–Thompson estimator in the case of a large batch-to-sample mass ratio and a large sample mass. Simulations and experiments were performed to test the theory. Simulations confirm that the round-off error due to the discrete nature of particles is negligible for large sample sizes. Sampling experiments were carried out with a mixture of PolyPropylene (PP) and PolyTetraFluorEthylene (PTFE) spheres suspended in a viscous medium. The measured and theoretical variations are in good agreement.  相似文献   

8.
Surveys usually include questions where individuals must select one in a series of possible options that can be sorted. On the other hand, multiple frame surveys are becoming a widely used method to decrease bias due to undercoverage of the target population. In this work, we propose statistical techniques for handling ordinal data coming from a multiple frame survey using complex sampling designs and auxiliary information. Our aim is to estimate proportions when the variable of interest has ordinal outcomes. Two estimators are constructed following model‐assisted generalised regression and model calibration techniques. Theoretical properties are investigated for these estimators. Simulation studies with different sampling procedures are considered to evaluate the performance of the proposed estimators in finite size samples. An application to a real survey on opinions towards immigration is also included.  相似文献   

9.
Many phenomena in the life sciences can be analyzed by using a fixed design regression model with a regression function m that exhibits a crossing‐point in the following sense: the regression function runs below or above its mean level, respectively, according as the input variable lies to the left or to the right of that crossing‐point, or vice versa. We propose a non‐parametric estimator and show weak and strong consistency as long as the crossing‐point is unique. It is defined as maximizing point arg max of a certain marked empirical process. For testing the hypothesis H0 that the regression function m actually is constant (no crossing‐point), a decision rule is designed for the specific alternative H1 that m possesses a crossing‐point. The pertaining test‐statistic is the ratio max/argmax of the maximum value and the maximizing point of the marked empirical process. Under the hypothesis the ratio converges in distribution to the corresponding ratio of a reflected Brownian bridge, for which we derive the distribution function. The test is consistent on the whole alternative and superior to the corresponding Kolmogorov–Smirnov test, which is based only on the maximal value max. Some practical examples of possible applications are given where a certain study about dental phobia is discussed in more detail.  相似文献   

10.
Social and economic studies are often implemented as complex survey designs. For example, multistage, unequal probability sampling designs utilised by federal statistical agencies are typically constructed to maximise the efficiency of the target domain level estimator (e.g. indexed by geographic area) within cost constraints for survey administration. Such designs may induce dependence between the sampled units; for example, with employment of a sampling step that selects geographically indexed clusters of units. A sampling‐weighted pseudo‐posterior distribution may be used to estimate the population model on the observed sample. The dependence induced between coclustered units inflates the scale of the resulting pseudo‐posterior covariance matrix that has been shown to induce under coverage of the credibility sets. By bridging results across Bayesian model misspecification and survey sampling, we demonstrate that the scale and shape of the asymptotic distributions are different between each of the pseudo‐maximum likelihood estimate (MLE), the pseudo‐posterior and the MLE under simple random sampling. Through insights from survey‐sampling variance estimation and recent advances in computational methods, we devise a correction applied as a simple and fast postprocessing step to Markov chain Monte Carlo draws of the pseudo‐posterior distribution. This adjustment projects the pseudo‐posterior covariance matrix such that the nominal coverage is approximately achieved. We make an application to the National Survey on Drug Use and Health as a motivating example and we demonstrate the efficacy of our scale and shape projection procedure on synthetic data on several common archetypes of survey designs.  相似文献   

11.
Raghunath Arnab 《Metrika》2001,54(2):159-177
The problems of estimating population total in multi-charter surveys are considered in a unified set up. Alternative estimators for Rao-Hartley-Cochran (1962), Midzuno-Sen (1952,53) and other varying probability sampling schemes are proposed when the measure of size is not well related to the study variables. Some of the proposed estimators are found superior to the existing alternatives. A numerical study is carried out to investigate the performances of the proposed alternatives.  相似文献   

12.
The use of auxiliary variables to improve the efficiency of estimators is a well‐known strategy in survey sampling. Typically, the auxiliary variables used are the totals of appropriate measurement that are exactly known from registers or administrative sources. Increasingly, however, these totals are estimated from surveys and are then used to calibrate estimators and improve their efficiency. We consider different types of survey structures and develop design‐based estimators that are calibrated on known as well as estimated totals of auxiliary variables. The optimality properties of these estimators are studied. These estimators can be viewed as extensions of the Montanari generalised regression estimator adapted to the more complex situations. The paper studies interesting special cases to develop insights and guidelines to properly manage the survey‐estimated auxiliary totals.  相似文献   

13.
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images and so on are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorising the basic model types as linear, non‐linear and non‐parametric. We discuss publicly available software packages and illustrate some of the procedures by application to a functional magnetic resonance imaging data set.  相似文献   

14.
This article is concerned with the inference on seemingly unrelated non‐parametric regression models with serially correlated errors. Based on an initial estimator of the mean functions, we first construct an efficient estimator of the autoregressive parameters of the errors. Then, by applying an undersmoothing technique, and taking both of the contemporaneous correlation among equations and serial correlation into account, we propose an efficient two‐stage local polynomial estimation for the unknown mean functions. It is shown that the resulting estimator has the same bias as those estimators which neglect the contemporaneous and/or serial correlation and smaller asymptotic variance. The asymptotic normality of the resulting estimator is also established. In addition, we develop a wild block bootstrap test for the goodness‐of‐fit of models. The finite sample performance of our procedures is investigated in a simulation study whose results come out very supportive, and a real data set is analysed to illustrate the usefulness of our procedures.  相似文献   

15.
In this paper we discuss the analysis of data from population‐based case‐control studies when there is appreciable non‐response. We develop a class of estimating equations that are relatively easy to implement. For some important special cases, we also provide efficient semi‐parametric maximum‐likelihood methods. We compare the methods in a simulation study based on data from the Women's Cardiovascular Health Study discussed in Arbogast et al. (Estimating incidence rates from population‐based case‐control studies in the presence of non‐respondents, Biometrical Journal 44, 227–239, 2002).  相似文献   

16.
The Effect of using Household as a Sampling Unit   总被引:1,自引:0,他引:1  
The effect of sampling people through households is considered. Results on design effects for two stage surveys are reviewed and applied to give design effects of household samples. The main factors that determine the design effect are identified for the designs in which one person, or all people, are selected from each selected household. Within household correlation is one factor. We show that the relationships between household size and the mean and variance within households are also important factors. Census and survey data are used to empirically compare the design effects for a range estimators, variables and designs.  相似文献   

17.
Single‐index models are popular regression models that are more flexible than linear models and still maintain more structure than purely nonparametric models. We consider the problem of estimating the regression parameters under a monotonicity constraint on the unknown link function. In contrast to the standard approach of using smoothing techniques, we review different “non‐smooth” estimators that avoid the difficult smoothing parameter selection. For about 30 years, one has had the conjecture that the profile least squares estimator is an ‐consistent estimator of the regression parameter, but the only non‐smooth argmin/argmax estimators that are actually known to achieve this ‐rate are not based on the nonparametric least squares estimator of the link function. However, solving a score equation corresponding to the least squares approach results in ‐consistent estimators. We illustrate the good behavior of the score approach via simulations. The connection with the binary choice and current status linear regression models is also discussed.  相似文献   

18.
In dynamic panel regression, when the variance ratio of individual effects to disturbance is large, the system‐GMM estimator will have large asymptotic variance and poor finite sample performance. To deal with this variance ratio problem, we propose a residual‐based instrumental variables (RIV) estimator, which uses the residual from regressing Δyi,t?1 on as the instrument for the level equation. The RIV estimator proposed is consistent and asymptotically normal under general assumptions. More importantly, its asymptotic variance is almost unaffected by the variance ratio of individual effects to disturbance. Monte Carlo simulations show that the RIV estimator has better finite sample performance compared to alternative estimators. The RIV estimator generates less finite sample bias than difference‐GMM, system‐GMM, collapsing‐GMM and Level‐IV estimators in most cases. Under RIV estimation, the variance ratio problem is well controlled, and the empirical distribution of its t‐statistic is similar to the standard normal distribution for moderate sample sizes.  相似文献   

19.
Social and economic scientists are tempted to use emerging data sources like big data to compile information about finite populations as an alternative for traditional survey samples. These data sources generally cover an unknown part of the population of interest. Simply assuming that analyses made on these data are applicable to larger populations is wrong. The mere volume of data provides no guarantee for valid inference. Tackling this problem with methods originally developed for probability sampling is possible but shown here to be limited. A wider range of model‐based predictive inference methods proposed in the literature are reviewed and evaluated in a simulation study using real‐world data on annual mileages by vehicles. We propose to extend this predictive inference framework with machine learning methods for inference from samples that are generated through mechanisms other than random sampling from a target population. Describing economies and societies using sensor data, internet search data, social media and voluntary opt‐in panels is cost‐effective and timely compared with traditional surveys but requires an extended inference framework as proposed in this article.  相似文献   

20.
In this paper, we investigate certain operational and inferential aspects of invariant Post‐randomization Method (PRAM) as a tool for disclosure limitation of categorical data. Invariant PRAM preserves unbiasedness of certain estimators, but inflates their variances and distorts other attributes. We introduce the concept of strongly invariant PRAM, which does not affect data utility or the properties of any statistical method. However, the procedure seems feasible in limited situations. We review methods for constructing invariant PRAM matrices and prove that a conditional approach, which can preserve the original data on any subset of variables, yields invariant PRAM. For multinomial sampling, we derive expressions for variance inflation inflicted by invariant PRAM and variances of certain estimators of the cell probabilities and also their tight upper bounds. We discuss estimation of these quantities and thereby assessing statistical efficiency loss from applying invariant PRAM. We find a connection between invariant PRAM and creating partially synthetic data using a non‐parametric approach, and compare estimation variance under the two approaches. Finally, we discuss some aspects of invariant PRAM in a general survey context.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号