首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A typical Business Register (BR) is mainly based on administrative data files provided by organisations that produce them as a by-product of their function. Such files do not necessarily yield a perfect Business Register. A good BR should have the following characteristics: (1) It should reflect the complex structures of businesses with multiple activities, in multiple locations or with multiple legal entities; (2) It should be free of duplication, extraneous or missing units; (3) It should be properly classified in terms of key stratification variables, including size, geography and industry; (4) It should be easily updateable to represent the "newer" business picture, and not lag too much behind it. In reality, not all these desirable features are fully satisfied, resulting in a universe that has missing units, inaccurate structures, as well as improper contact information, to name a few defects.
These defects can be compensated by using sampling and estimation procedures. For example, coverage can be improved using multiple frame techniques, and the sample size can be increased to account for misclassification of units and deaths on the register. At the time of estimation, auxiliary information can be used in a variety of ways. It can be used to impute missing variables, to treat outliers, or to create synthetic variables obtained via modelling. Furthermore, time lags between the birth of units and the time that they are included on the register can be accounted for appropriately inflating the design-based estimates.  相似文献   

2.
The most common way for treating item non‐response in surveys is to construct one or more replacement values to fill in for a missing value. This process is known as imputation. We distinguish single from multiple imputation. Single imputation consists of replacing a missing value by a single replacement value, whereas multiple imputation uses two or more replacement values. This article reviews various imputation procedures used in National Statistical Offices as well as the properties of point and variance estimators in the presence of imputed survey data. It also provides the reader with newer developments in the field.  相似文献   

3.
The psychometric literature contains many indices to detect aberrant respondents. A different, promising approach is using ordered latent class analysis with the goal to distinguish latent classes of respondents that are scalable, from latent classes of respondents that are not scalable (i.e., aberrant) according to the scaling model adopted. This article examines seven Latent Class models for a cumulative scale. A simulation study was performed to study the efficacy of different models for data that follow the scale model perfectly. A second simulation study was performed to study how well these models detect aberrant respondents.  相似文献   

4.
Item response theory (IRT) has recently been proposed as a framework to measure deprivation. It allows a latent measure of deprivation to be derived from a set of dichotomous items indicating deprivation, and the determinants of deprivation to be analysed. We investigate further the use of IRT models in the field of deprivation measurement. First, the paper emphasises the importance of item selection and the Mokken Scale Procedure is applied to select the items to be included in the scale of deprivation. Second, we apply the one- and the two-parameter probit IRT models for dichotomous items to two different sets of items, in order to highlight different empirical results. Finally, we introduce a graphical tool, the Item Characteristic Curve (ICC), and analyse the determinants of deprivation in Luxembourg. The empirical illustration is based on the fourth wave of the “Liewen zu Lëtzebuerg” Luxembourg socioeconomic panel (PSELL-3).  相似文献   

5.
The missing data problem has been widely addressed in the literature. The traditional methods for handling missing data may be not suited to spatial data, which can exhibit distinctive structures of dependence and/or heterogeneity. As a possible solution to the spatial missing data problem, this paper proposes an approach that combines the Bayesian Interpolation method [Benedetti, R. & Palma, D. (1994) Markov random field-based image subsampling method, Journal of Applied Statistics, 21(5), 495–509] with a multiple imputation procedure. The method is developed in a univariate and a multivariate framework, and its performance is evaluated through an empirical illustration based on data related to labour productivity in European regions.  相似文献   

6.
Imputation: Methods, Simulation Experiments and Practical Examples   总被引:1,自引:0,他引:1  
When conducting surveys, two kinds of nonresponse may cause incomplete data files: unit nonresponse (complete nonresponse) and item nonresponse (partial nonresponse). The selectivity of the unit nonresponse is often corrected for. Various imputation techniques can be used for the missing values because of item nonresponse. Several of these imputation techniques are discussed in this report. One is the hot deck imputation. This paper describes two simulation experiments of the hot deck method. In the first study, data are randomly generated, and various percentages of missing values are then non-randomly'added'to the data. The hot deck method is used to reconstruct the data in this Monte Carlo experiment. The performance of the method is evaluated for the means, standard deviations, and correlation coefficients and compared with the available case method. In the second study, the quality of an imputation method is studied by running a simulation experiment. A selection of the data of the Dutch Housing Demand Survey is perturbed by leaving out specific values on a variable. Again hot deck imputations are used to reconstruct the data. The imputations are then compared with the true values. In both experiments the conclusion is that the hot deck method generally performs better than the available case method. This paper also deals with the questions which variables should be imputed and what the duration of the imputation process is. Finally the theory is illustrated by the imputation approaches of the Dutch Housing Demand Survey, the European Community Household Panel Survey (ECHP) and the new Dutch Structure of Earnings Survey (SES). These examples illustrate the levels of missing data that can be experienced in such surveys and the practical problems associated with choosing an appropriate imputation strategy for key items from each survey.  相似文献   

7.
This article applies the testing procedures for measurement invariance using multigroup confirmatory factor analysis (MGCFA). It illustrates these procedures by investigating the factorial structure and invariance of the Portraits Value Questionnaire (PVQ, Schwartz et al.: J. Cross Cult. Psychol. 32(5), 519–542 (2001)) across three education groups in a population sample (N  =  1,677). The PVQ measures 10 basic values that Schwartz postulates to comprehensively describe the human values recognized in all societies (achievement, hedonism, self-direction, benevolence, conformity, security, stimulation, power, tradition and universalism). We also estimate and compare the latent means of the three education groups. The analyses show partial invariance for most of the 10 values and parameters. As expected, the latent means show that less educated respondents attribute more importance to security, tradition, and conformity values.  相似文献   

8.
This paper explores the possibilities of method triangulation between two methodological approaches for assessing the validity performance of survey items: cognitive interviewing and factor analytic techniques. Although their means of approaching validity differ, both methods attempt to prove whether a measure corresponds to a theoretical (latent) concept (e.g. patriotism vs. nationalism), thus both are concerned with the question, whether an indicator measures what it is supposed to measure. Based on two representative samples for Austria [data gathered within the framework of the International Social Survey Program (ISSP) on National Identity in 1995 and 2003] and 18 cognitive interviews conducted between 2003 and 2005, the paper shows the considerable advantages of using a multi-method approach for ensuring the quality of survey items. On the one hand, we apply exploratory and confirmatory factor analysis in order to identify poorly performing indicators with regard to validity and reliability. On the other hand, the analysis of the cognitive interviews reveals the substantial sources of response error. Results show that to a large extent, respondents do not understand the items that have been defined to measure national identification and related concepts in Austria the way intended by the drafting group of this ISSP Module, a fact that has considerable implications on the scales’ predictive power.  相似文献   

9.
Dynamic factor models have been the main “big data” tool used by empirical macroeconomists during the last 30 years. In this context, Kalman filter and smoothing (KFS) procedures can cope with missing data, mixed frequency data, time-varying parameters, non-linearities, non-stationarity, and many other characteristics often observed in real systems of economic variables. The main contribution of this paper is to provide a comprehensive updated summary of the literature on latent common factors extracted using KFS procedures in the context of dynamic factor models, pointing out their potential limitations. Signal extraction and parameter estimation issues are separately analyzed. Identification issues are also tackled in both stationary and non-stationary models. Finally, empirical applications are surveyed in both cases. This survey is relevant to researchers and practitioners interested not only in the theory of KFS procedures for factor extraction in dynamic factor models but also in their empirical application in macroeconomics and finance.  相似文献   

10.
The article reports the results of a Mokken Scale Procedure (MSP) developing a hierarchical cross-national scale to measure xenophobia, and a qualitative validation of this scale. A pool of 30 xenophobic scale items were collected from several sources and edited according to established unidimensional criteria. The survey was administered to 608 undergraduate students in the USA, 193 undergraduate students in the Netherlands, and 303 undergraduate students in Norway. Fourteen scale statements measuring perceived threat or fear and meeting the criteria of the Stereotype Content Model (e.g., Fiske et al. in Trends Cogn Sci 11:77–83, 2006) were selected for further analysis. A separate item analysis and subsequently MSP analysis yielded a cumulative scale with the same five items for each of the three samples meeting criteria for homogeneity in all samples with H >.40. The result, a cross-national 5-item scale measuring fear-based xenophobia, was tested by means of the Three-Step Test-Interview (Hak et al. in Surv Res Methods 2:143–150, 2008) with 10 students in The Netherlands and 10 students in Norway. The analysis of these qualitative interviews shows that individual respondents’ criteria for the ranking of the scale items strongly depend on the way immigrants are framed. Ranking according to different levels of fear turned out to be only one criterion out of several possible ones used by individual respondents.  相似文献   

11.
Incomplete data is a common problem of survey research. Recent work on multiple imputation techniques has increased analysts’ awareness of the biasing effects of missing data and has also provided a convenient solution. Imputation methods replace non-response with estimates of the unobserved scores. In many instances, however, non-response to a stimulus does not result from measurement problems that inhibit accurate surveying of empirical reality, but from the inapplicability of the survey question. In such cases, existing imputation techniques replace valid non-response with counterfactual estimates of a situation in which the stimulus is applicable to all respondents. This paper suggests an alternative imputation procedure for incomplete data for which no true score exists: multiple complete random imputation, which overcomes the biasing effects of missing data and allows analysts to model respondents’ valid ‘I don’t know’ answers.  相似文献   

12.
In an important paper, Dempster, Laird and Rubin (1977) showed how the expectation maximization (EM) algorithm could be used to obtain maximum likelihood estimates of parameters in a multinomial probability model with missing information. This article extends Dempster, Laird and Rubin's work on the EM algorithm to the estimation of a multinomial logit model with missing information on category membership. We call this new model the latent multinomial logit (LMNL) model. A constrained version of the LMNL model is used to examine the issue of hidden unemployment in transition economies following the approach of Earle and Sakova (2000) . We found an additional 0.5% hidden unemployment among workers describing themselves as self‐employed in the transition economies of Central and Eastern Europe.  相似文献   

13.
Empirical count data are often zero‐inflated and overdispersed. Currently, there is no software package that allows adequate imputation of these data. We present multiple‐imputation routines for these kinds of count data based on a Bayesian regression approach or alternatively based on a bootstrap approach that work as add‐ons for the popular multiple imputation by chained equations (mice ) software in R (van Buuren and Groothuis‐Oudshoorn , Journal of Statistical Software, vol. 45, 2011, p. 1). We demonstrate in a Monte Carlo simulation that our procedures are superior to currently available count data procedures. It is emphasized that thorough modeling is essential to obtain plausible imputations and that model mis‐specifications can bias parameter estimates and standard errors quite noticeably. Finally, the strengths and limitations of our procedures are discussed, and fruitful avenues for future theory and software development are outlined.  相似文献   

14.
When measuring (complex) attitudes within a social survey, researchers often use balanced lists of positive and negative items. The purpose of the present research is to investigate: (a) whether a specific order of measurement scale items can lead to the bipolar (single-dimensional) concept (attitude) being recognised as a dual (bi-dimensional) concept and vice-versa; and (b) whether item order can affect the consistency (metric characteristics) of a measurement scale. An experiment on a group of social science students was conducted: students were randomly split into three subgroups and three different version of a questionnaire (with three differing item orders) were applied. A multi-group confirmatory factor analysis (‘CFA’) and a single group CFA for each item order separately were applied. The final conclusion of the experiment is that there is no general rule about how and when respondents form separate (dual) or unidimensional (continuous) representations of measured concepts. Item-order effects are possible, but they are not as important as one would expect. The results of the experiment also suggest that other factors should be taken into account: the content of the measured concept and the cognitive sophistication of the respondents.  相似文献   

15.
This study concerns list augmentation in direct marketing. List augmentation is a special case of missing data imputation. We review previous work on the mixed outcome factor model and apply it for the purpose of list augmentation. The model deals with both discrete and continuous variables and allows us to augment the data for all subjects in a company's transaction database with soft data collected in a survey among a sample of those subjects. We propose a bootstrap-based imputation approach, which is appealing to use in combination with the factor model, since it allows one to include estimation uncertainty in the imputation procedure in a simple, yet adequate manner. We provide an empirical case study of the performance of the approach to a transaction data base of a bank.  相似文献   

16.
In missing data problems, it is often the case that there is a natural test statistic for testing a statistical hypothesis had all the data been observed. A fuzzy  p -value approach to hypothesis testing has recently been proposed which is implemented by imputing the missing values in the "complete data" test statistic by values simulated from the conditional null distribution given the observed data. We argue that imputing data in this way will inevitably lead to loss in power. For the case of scalar parameter, we show that the asymptotic efficiency of the score test based on the imputed "complete data" relative to the score test based on the observed data is given by the ratio of the observed data information to the complete data information. Three examples involving probit regression, normal random effects model, and unidentified paired data are used for illustration. For testing linkage disequilibrium based on pooled genotype data, simulation results show that the imputed Neyman Pearson and Fisher exact tests are less powerful than a Wald-type test based on the observed data maximum likelihood estimator. In conclusion, we caution against the routine use of the fuzzy  p -value approach in latent variable or missing data problems and suggest some viable alternatives.  相似文献   

17.
H. Toutenburg  Shalabh 《Metrika》2002,54(3):247-259
This article considers a linear regression model with some missing observations on the response variable and presents two estimators of regression coefficients employing the approach of minimum risk estimation. Small disturbance asymptotic properties of these estimators along with the traditional unbiased estimator are analyzed and conditions, that are easy to check in practice, for the superiority of one estimator over the other are derived. Received May 2001  相似文献   

18.
Summary This paper is an exposition about the model and techniques in factor analysis, a method of studying the covariance matrix of several properties on the basis of a sample co-variance matrix of independent observations on n individuals. The indeterminacy of the basis of the so called factor space and several possibilities of interpretation are discussed. The scale invariant maximum likelihood estimation of the parameters of the assumed normal distribution which also provides a test on the dimension of the factor space is compared with the customary but unjustified attack of the estimation problem by means of component analysis or modifications of it. The prohibitive slowness of convergence of iterative procedures recommended till now can be removed by steepest ascent methods together with Aitken's acceleration method. An estimate of the original observations according to the model assumed, as to be compared with the data, is given.  相似文献   

19.
We considerk (≥2) independent negative exponential populations with unknown location parameters and unknown but equal scale parameter. We incorporate the existing purely sequential and three-stage sampling procedures for selecting the “best” population and study the asymptotic second-order characteristics of the proposed fixed-size simultaneous confidence regions for the location parameters constructed after selection and ranking. Some direct estimation procedures have also been discussed.  相似文献   

20.
Conclusions on the development of delinquent behaviour during the life-course can only be made with longitudinal data, which is regularly gained by repeated interviews of the same respondents. Missing data are a problem for the analysis of delinquent behaviour during the life-course shown with data from an adolescents’ four-wave panel. In this article two alternative techniques to cope with missing data are used: full information maximum likelihood estimation and multiple imputation. Both methods allow one to consider all available data (including adolescents with missing information on some variables) in order to estimate the development of delinquency. We demonstrate that self-reported delinquency is systematically underestimated with listwise deletion (LD) of missing data. Further, LD results in false conclusions on gender and school specific differences of the age–crime relationship. In the final discussion some hints are given for further methods to deal with bias in panel data affected by the missing process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号