首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Social and economic scientists are tempted to use emerging data sources like big data to compile information about finite populations as an alternative for traditional survey samples. These data sources generally cover an unknown part of the population of interest. Simply assuming that analyses made on these data are applicable to larger populations is wrong. The mere volume of data provides no guarantee for valid inference. Tackling this problem with methods originally developed for probability sampling is possible but shown here to be limited. A wider range of model‐based predictive inference methods proposed in the literature are reviewed and evaluated in a simulation study using real‐world data on annual mileages by vehicles. We propose to extend this predictive inference framework with machine learning methods for inference from samples that are generated through mechanisms other than random sampling from a target population. Describing economies and societies using sensor data, internet search data, social media and voluntary opt‐in panels is cost‐effective and timely compared with traditional surveys but requires an extended inference framework as proposed in this article.  相似文献   

2.
Non‐response is a common source of error in many surveys. Because surveys often are costly instruments, quality‐cost trade‐offs play a continuing role in the design and analysis of surveys. The advances of telephone, computers, and Internet all had and still have considerable impact on the design of surveys. Recently, a strong focus on methods for survey data collection monitoring and tailoring has emerged as a new paradigm to efficiently reduce non‐response error. Paradata and adaptive survey designs are key words in these new developments. Prerequisites to evaluating, comparing, monitoring, and improving quality of survey response are a conceptual framework for representative survey response, indicators to measure deviations thereof, and indicators to identify subpopulations that need increased effort. In this paper, we present an overview of representativeness indicators or R‐indicators that are fit for these purposes. We give several examples and provide guidelines for their use in practice.  相似文献   

3.
Social and economic studies are often implemented as complex survey designs. For example, multistage, unequal probability sampling designs utilised by federal statistical agencies are typically constructed to maximise the efficiency of the target domain level estimator (e.g. indexed by geographic area) within cost constraints for survey administration. Such designs may induce dependence between the sampled units; for example, with employment of a sampling step that selects geographically indexed clusters of units. A sampling‐weighted pseudo‐posterior distribution may be used to estimate the population model on the observed sample. The dependence induced between coclustered units inflates the scale of the resulting pseudo‐posterior covariance matrix that has been shown to induce under coverage of the credibility sets. By bridging results across Bayesian model misspecification and survey sampling, we demonstrate that the scale and shape of the asymptotic distributions are different between each of the pseudo‐maximum likelihood estimate (MLE), the pseudo‐posterior and the MLE under simple random sampling. Through insights from survey‐sampling variance estimation and recent advances in computational methods, we devise a correction applied as a simple and fast postprocessing step to Markov chain Monte Carlo draws of the pseudo‐posterior distribution. This adjustment projects the pseudo‐posterior covariance matrix such that the nominal coverage is approximately achieved. We make an application to the National Survey on Drug Use and Health as a motivating example and we demonstrate the efficacy of our scale and shape projection procedure on synthetic data on several common archetypes of survey designs.  相似文献   

4.
This paper provides a systematic review of the literature on 80 experimental, hypothetical survey and market data studies of insurance demand against low‐probability/high‐impact risks. The objective of the review is to extract lessons from these studies and to outline an agenda for future research. We contrast the results of experimental and survey studies to findings from market data. We focus on experimental design methods, insurance characteristics, as well as results about theories, heuristics, behavioural biases and explanatory variables. Lessons for policymakers are drawn which can facilitate disaster preparedness.  相似文献   

5.
6.
Statistical agencies often release a masked or perturbed version of survey data to protect the confidentiality of respondents' information. Ideally, a perturbation procedure should provide confidentiality protection without much loss of data quality, so that the released data may practically be treated as original data for making inferences. One major objective is to control the risk of correctly identifying any respondent's records in released data, by matching the values of some identifying or key variables. For categorical key variables, we propose a new approach to measuring identification risk and setting strict disclosure control goals. The general idea is to ensure that the probability of correctly identifying any respondent or surveyed unit is at most ξ, which is pre‐specified. Then, we develop an unbiased post‐randomisation procedure that achieves this goal for ξ>1/3. The procedure allows substantial control over possible changes to the original data, and the variance it induces is of a lower order of magnitude than sampling variance. We apply the procedure to a real data set, where it performs consistently with the theoretical results and quite importantly, shows very little data quality loss.  相似文献   

7.
Surveys usually include questions where individuals must select one in a series of possible options that can be sorted. On the other hand, multiple frame surveys are becoming a widely used method to decrease bias due to undercoverage of the target population. In this work, we propose statistical techniques for handling ordinal data coming from a multiple frame survey using complex sampling designs and auxiliary information. Our aim is to estimate proportions when the variable of interest has ordinal outcomes. Two estimators are constructed following model‐assisted generalised regression and model calibration techniques. Theoretical properties are investigated for these estimators. Simulation studies with different sampling procedures are considered to evaluate the performance of the proposed estimators in finite size samples. An application to a real survey on opinions towards immigration is also included.  相似文献   

8.
Christofides (2003) has given an improved modification of Warners (1965) pioneering randomized response (RR) technique in estimating an unknown proportion of people bearing a sensitive characteristic in a given community. As both these RR devices are shown to yield unbiased estimators based only on simple random sampling (SRS) with replacement (WR) but in practice samples are mostly taken with unequal selection probabilities without replacement (WOR), here we present methods of estimation when Christofides RR data are available from unequal probability samples. Warners (1965) RR device was earlier shown by Chaudhuri (2001) to be applicable in complex surveys. For completeness we present estimators for the variance of our estimator and also describe what to do if some people opt to divulge truths.This research is partially supported by CSIR grant No. 21(0539)/02/EMR-II  相似文献   

9.
This paper is a review of some applications of the combination of data sets, such as combining census or administrative data and survey data, constructing expanded data sets through linkage, combining large‐scale commercial databases with survey data and harnessing designed data collection to be able to make use of non‐probability samples. It is aimed to highlight their commonalities and differences and to formulate some general principles for data set combination.  相似文献   

10.
Spatially distributed data exhibit particular characteristics that should be considered when designing a survey of spatial units. Unfortunately, traditional sampling designs generally do not allow for spatial features, even though it is usually desirable to use information concerning spatial dependence in a sampling design. This paper reviews and compares some recently developed randomised spatial sampling procedures, using simple random sampling without replacement as a benchmark for comparison. The approach taken is design‐based and serves to corroborate intuitive arguments about the need to explicitly integrate spatial dependence into sampling survey theory. Some guidance for choosing an appropriate spatial sampling design is provided, and some empirical evidence for the gains from using these designs with spatial populations is presented, using two datasets as illustrations.  相似文献   

11.
The World Wide Web (WWW) is increasingly being used as a tool and platform for survey research. Two types of electronic or online surveys available for data collection are the email and Web based survey, and they constitute the focus of this paper. We address a multitude of issues researchers should consider before and during the use of this method of data collection: advantages and liabilities with this form of survey research, sampling problems, questionnaire design considerations, suggestions in approaching potential respondents, response rates and aspects of data processing. Where relevant, the methodological issues involved are illustrated with examples from our own research practice. This methods review shows that most challenges are resolved when taking into account the principles that guide the conduct of conventional surveys.  相似文献   

12.
Survey calibration (or generalized raking) estimators are a standard approach to the use of auxiliary information in survey sampling, improving on the simple Horvitz–Thompson estimator. In this paper we relate the survey calibration estimators to the semiparametric incomplete‐data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial. The development based on calibration estimators explains the “estimated weights” paradox and provides useful heuristics for constructing practical estimators. We present some examples of using calibration to gain precision without making additional modelling assumptions in a variety of regression models.  相似文献   

13.
14.
With the rapid, ongoing expansions in the world of data, we need to devise ways of getting more students much further, much faster. One of the choke points affecting both accessibility to a broad spectrum of students and faster progress is classical statistical inference based on normal theory. In this paper, bootstrap‐based confidence intervals and randomisation tests conveyed through dynamic visualisation are developed as a means of reducing cognitive demands and increasing the speed with which application areas can be opened up. We also discuss conceptual pathways and the design of software developed to enable this approach.  相似文献   

15.
This article provides a practical evaluation of some leading density forecast scoring rules in the context of forecast surveys. We analyse the density forecasts of UK inflation obtained from the Bank of England’s Survey of External Forecasters, considering both the survey average forecasts published in the Bank’s quarterly Inflation Report, and the individual survey responses recently made available to researchers by the Bank. The density forecasts are collected in histogram format, and the ranked probability score (RPS) is shown to have clear advantages over other scoring rules. Missing observations are a feature of forecast surveys, and we introduce an adjustment to the RPS, based on the Yates decomposition, to improve its comparative measurement of forecaster performance in the face of differential non-response. The new measure, denoted RPS*, is recommended to analysts of forecast surveys.  相似文献   

16.
We study the generalized bootstrap technique under general sampling designs. We focus mainly on bootstrap variance estimation but we also investigate the empirical properties of bootstrap confidence intervals obtained using the percentile method. Generalized bootstrap consists of randomly generating bootstrap weights so that the first two (or more) design moments of the sampling error are tracked by the corresponding bootstrap moments. Most bootstrap methods in the literature can be viewed as special cases. We discuss issues such as the choice of the distribution used to generate bootstrap weights, the choice of the number of bootstrap replicates, and the potential occurrence of negative bootstrap weights. We first describe the generalized bootstrap for the linear Horvitz‐Thompson estimator and then consider non‐linear estimators such as those defined through estimating equations. We also develop two ways of bootstrapping the generalized regression estimator of a population total. We study in greater depth the case of Poisson sampling, which is often used to select samples in Price Index surveys conducted by national statistical agencies around the world. For Poisson sampling, we consider a pseudo‐population approach and show that the resulting bootstrap weights capture the first three design moments of the sampling error. A simulation study and an example with real survey data are used to illustrate the theory.  相似文献   

17.
This paper presents Bayesian inference procedures for the continuous time mover–stayer model applied to labour market transition data collected in discrete time. These methods allow us to derive the probability of embeddability of the discrete‐time modelling with the continuous‐time one. A special emphasis is put on two alternative procedures, namely the importance sampling algorithm and a new Gibbs sampling algorithm. Transition intensities, proportions of stayers and functions of these parameters are then estimated with the Gibbs sampling algorithm for individual transition data coming from the French Labour Force Surveys collected over the period 1986–2000. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

18.
In analysing big data for finite population inference, it is critical to adjust for the selection bias in the big data. In this paper, we propose two methods of reducing the selection bias associated with the big data sample. The first method uses a version of inverse sampling by incorporating auxiliary information from external sources, and the second one borrows the idea of data integration by combining the big data sample with an independent probability sample. Two simulation studies show that the proposed methods are unbiased and have better coverage rates than their alternatives. In addition, the proposed methods are easy to implement in practice.  相似文献   

19.
The objective of this paper is to define and compare alternative sampling frames for the representative population coverage as a basis for sample selection in internet surveys. The study aims to provide a methodology for domain weighting and adjustment procedures for free access web surveys that are based on the restricted access surveys. Some basic variables can be proposed for the data adjustment, namely gender breakdown, age groups, and education groups. The application of our work consists of a first stage based on a web survey by an e-mail invitation (restricted access) and a second stage based on a voluntary participation web survey (free access). An advertising company’s registered customer list was taken as the sampling frame population for the first stage. This frame was an electronic e-mail list of the population of registered customers. Two different types of questionnaire were loaded on the company’s internet web site for a month each, for two independent rounds, for testing the visual aspects of the questionnaire design. The restricted access internet survey design relies on probability selection procedures in this study. These results are used with the provided algorithms for the adjustment procedures of free access web surveys. A new methodology is also proposed for the estimation and allocation of the population frame characteristics of adult internet users by gender and age groups. The proposed alternative methodologies will be beneficial tools for future web survey users.  相似文献   

20.
In most surveys, one is confronted with missing or, more generally, coarse data. Traditional methods dealing with these data require strong, untestable and often doubtful assumptions, for example, coarsening at random. But due to the resulting, potentially severe bias, there is a growing interest in approaches that only include tenable knowledge about the coarsening process, leading to imprecise but reliable results. In this spirit, we study regression analysis with a coarse categorical‐dependent variable and precisely observed categorical covariates. Our (profile) likelihood‐based approach can incorporate weak knowledge about the coarsening process and thus offers a synthesis of traditional methods and cautious strategies refraining from any coarsening assumptions. This also allows a discussion of the uncertainty about the coarsening process, besides sampling uncertainty and model uncertainty. Our procedure is illustrated with data of the panel study ‘Labour market and social security' conducted by the Institute for Employment Research, whose questionnaire design produces coarse data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号