首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Computerised Record Linkage methods help us combine multiple data sets from different sources when a single data set with all necessary information is unavailable or when data collection on additional variables is time consuming and extremely costly. Linkage errors are inevitable in the linked data set because of the unavailability of error‐free unique identifiers. A small amount of linkage errors can lead to substantial bias and increased variability in estimating parameters of a statistical model. In this paper, we propose a unified theory for statistical analysis with linked data. Our proposed method, unlike the ones available for secondary data analysis of linked data, exploits record linkage process data as an alternative to taking a costly sample to evaluate error rates from the record linkage procedure. A jackknife method is introduced to estimate bias, covariance matrix and mean squared error of our proposed estimators. Simulation results are presented to evaluate the performance of the proposed estimators that account for linkage errors.  相似文献   

2.
Probabilistic record linkage is the act of bringing together records that are believed to belong to the same unit (e.g., person or business) from two or more files. It is a common way to enhance dimensions such as time and breadth or depth of detail. Probabilistic record linkage is not an error-free process and link records that do not belong to the same unit. Naively treating such a linked file as if it is linked without errors can lead to biased inferences. This paper develops a method of making inference with estimating equations when records are linked using algorithms that are widely used in practice. Previous methods for dealing with this problem cannot accommodate such linking algorithms. This paper develops a parametric bootstrap approach to inference in which each bootstrap replicate involves applying the said linking algorithm. This paper demonstrates the effectiveness of the method in simulations and in real applications.  相似文献   

3.
Record linkage is the act of bringing together records from two files that are believed to belong to the same unit (e.g., a person or business). It is a low‐cost way of increasing the set of variables available for analysis. Errors may arise in the linking process if an error‐free unit identifier is not available. Two types of linking errors include an incorrect link (records belonging to two different units are linked) and a missed record (an unlinked record for which a correct link exists). Naively ignoring linkage errors may mean that analysis of the linked file is biased. This paper outlines a “weighting approach” to making correct inference about regression coefficients and population totals in the presence of such linkage errors. This approach is designed for analysts who do not have the expertise or time to use specialist software required by other approaches but who are comfortable using weights in inference. The performance of the estimator is demonstrated in a simulation study.  相似文献   

4.
A basic concern in statistical disclosure limitation is the re-identification of individuals in anonymised microdata. Linking against a second dataset that contains identifying information can result in a breach of confidentiality. Almost all linkage approaches are based on comparing the values of variables that are common to both datasets. It is tempting to think that if datasets contain no common variables, then there can be no risk of re-identification. However, linkage has been attempted between such datasets via the extraction of structural information using ordered weighted averaging (OWA) operators. Although this approach has been shown to perform better than randomly pairing records, it is debatable whether it demonstrates a practically significant disclosure risk. This paper reviews some of the main aspects of statistical disclosure limitation. It then goes on to show that a relatively simple, supervised Bayesian approach can consistently outperform OWA linkage. Furthermore, the Bayesian approach demonstrates a significant risk of re-identification for the types of data considered in the OWA record linkage literature.  相似文献   

5.
Effective linkage detection and gene mapping requires analysis of data jointly on members of extended pedigrees, jointly at multiple genetic markers. Exact likelihood computation is then often infeasible, but Markov chain Monte Carlo (MCMC) methods permit estimation of posterior probabilities of genome sharing among relatives, conditional upon marker data. In principle, MCMC also permits estimation of linkage analysis location score curves, but in practice effective MCMC samplers are hard to find. Although the whole-meiosis Gibbs sampler (M-sampler) performs well in some cases, for extended pedigrees and tightly linked markers better samplers are needed. However, using the M-sampler as a proposal distribution in a Metropolis-Hastings algorithm does allow genetic interference to be incorporated into the analysis.  相似文献   

6.
Linking administrative, survey and census files to enhance dimensions such as time and breadth or depth of detail is now common. Because a unique person identifier is often not available, records belonging to two different units (e.g. people) may be incorrectly linked. Estimating the proportion of links that are correct, called Precision, is difficult because, even after clerical review, there will remain uncertainty about whether a link is in fact correct or incorrect. Measures of Precision are useful when deciding whether or not it is worthwhile linking two files, when comparing alternative linking strategies and as a quality measure for estimates based on the linked file. This paper proposes an estimator of Precision for a linked file that has been created by either deterministic (or rules‐based) or probabilistic (where evidence for a link being a match is weighted against the evidence that it is not a match) linkage, both of which are widely used in practice. This paper shows that the proposed estimators perform well.  相似文献   

7.
This paper is a review of some applications of the combination of data sets, such as combining census or administrative data and survey data, constructing expanded data sets through linkage, combining large‐scale commercial databases with survey data and harnessing designed data collection to be able to make use of non‐probability samples. It is aimed to highlight their commonalities and differences and to formulate some general principles for data set combination.  相似文献   

8.
We propose a simple estimator for nonlinear method of moment models with measurement error of the classical type when no additional data, such as validation data or double measurements, are available. We assume that the marginal distributions of the measurement errors are Laplace (double exponential) with zero means and unknown variances and the measurement errors are independent of the latent variables and are independent of each other. Under these assumptions, we derive simple revised moment conditions in terms of the observed variables. They are used to make inference about the model parameters and the variance of the measurement error. The results of this paper show that the distributional assumption on the measurement errors can be used to point identify the parameters of interest. Our estimator is a parametric method of moments estimator that uses the revised moment conditions and hence is simple to compute. Our estimation method is particularly useful in situations where no additional data are available, which is the case in many economic data sets. Simulation study demonstrates good finite sample properties of our proposed estimator. We also examine the performance of the estimator in the case where the error distribution is misspecified.  相似文献   

9.
This paper studies performance of factor-based forecasts using differenced and nondifferenced data. Approximate variances of forecasting errors from the two forecasts are derived and compared. It is reported that the forecast using nondifferenced data tends to be more accurate than that using differenced data. This paper conducts simulations to compare root mean squared forecasting errors of the two competing forecasts. Simulation results indicate that forecasting using nondifferenced data performs better. The advantage of using nondifferenced data is more pronounced when a forecasting horizon is long and the number of factors is large. This paper applies the two competing forecasting methods to 68 I(1) monthly US macroeconomic variables across a range of forecasting horizons and sampling periods. We also provide detailed forecasting analysis on US inflation and industrial production. We find that forecasts using nondifferenced data tend to outperform those using differenced data.  相似文献   

10.
针对湖南省制造业与物流业联动的现状,文中通过企业问卷调查和座谈的方式,收集了湖南大型制造企业物流的信息和物流企业为制造业提供服务的情况。通过对所收集数据的分析和对相关文献的梳理,研究了湖南省制造业和物流业的模式及实现路径,并对湖南省两业联动的发展提出了建议。  相似文献   

11.
文章介绍了四连杆和六杆机构在焊装夹具中的应用实例,指出合理采用连杆机构可以简化夹具的结构,降低夹具的使用成本。  相似文献   

12.
Nonlinear taxes create econometric difficulties when estimating labor supply functions. One estimation method that tackles these problems accounts for the complete form of the budget constraint and uses the maximum likelihood method to estimate parameters. Another method linearizes budget constraints and uses instrumental variables techniques. Using Monte Carlo simulations I investigate the small-sample properties of these estimation methods and how they are affected by measurement errors in independent variables. No estimator is uniquely best. Hence, in actual estimation the choice of estimator should depend on the sample size and type of measurement errors in the data. Complementing actual estimates with a Monte Carlo study of the estimator used, given the type of measurement errors that characterize the data, would often help interpreting the estimates. This paper shows how such a study can be performed.  相似文献   

13.
The quality of master data is crucial for the accurate functioning of the various modules of an enterprise resource planning (ERP) system. This study addresses specific data problems arising from the generation of approximately duplicate material records in ERP databases. Such problems are mainly due to the firm’s lack of unique and global identifiers for the material records, and to the arbitrary assignment of alternative names for the same material by various users. Traditional duplicate detection methods are ineffective in identifying such approximately duplicate material records because these methods typically rely on string comparisons of each field. To address this problem, a machine learning-based framework is developed to recognise semantic similarity between strings and to further identify and reunify approximately duplicate material records – a process referred to as de-duplication in this article. First, the keywords of the material records are extracted to form vectors of discriminating words. Second, a machine learning method using a probabilistic neural network is applied to determine the semantic similarity between these material records. The approach was evaluated using data from a real case study. The test results indicate that the proposed method outperforms traditional algorithms in identifying approximately duplicate material records.  相似文献   

14.
Cross sectional estimates from repeated surveys form a time series { yt }. These estimates can be viewed as the sum y t = Y t + e t of two processes, { Y t }, the population process and { e t }, the survey error process. Serial correlations in the latter series are usually present, mainly due to sample overlap. Other sources of data such as censuses, administrative records and demographic population counts are also available. The state–space modelling approach to the analysis of repeated surveys allows combining information from different sources, incorporating benchmarking constraints in a natural way. Results from these methods seem to compare favourably with those from X-11-ARIMA in filtering out survey errors.  相似文献   

15.
In the domain of IT benchmarking (ITBM), a variety of data and information are collected. Although these data serve as the basis for business analyses, no unified semantic representation of such data yet exists. Consequently, data analysis across different distributed data sets and different benchmarks is almost impossible. This paper presents a system architecture and prototypical implementation for an integrated data management of distributed databases based on a domain-specific ontology. To preserve the semantic meaning of the data, the ITBM ontology is linked to data sources and functions as the central concept for database access. Thus, additional databases can be integrated by linking them to this domain-specific ontology and are directly available for further business analyses. Moreover, the web-based system supports the process of mapping ontology concepts to external databases by introducing a semi-automatic mapping recommender and by visualizing possible mapping candidates. The system also provides a natural language interface to easily query linked databases. The expected result of this ontology-based approach of knowledge representation and data access is an increase in knowledge and data sharing in this domain, which will enhance existing business analysis methods.  相似文献   

16.
We study the construction of confidence intervals for efficiency levels of individual firms in stochastic frontier models with panel data. The focus is on bootstrapping and related methods. We start with a survey of various versions of the bootstrap. We also propose a simple parametric alternative in which one acts as if the␣identity of the best firm is known. Monte Carlo simulations indicate that the parametric method works better than the␣percentile bootstrap, but not as well as bootstrap methods that make bias corrections. All of these methods are valid␣only for large time-series sample size (T), and correspondingly none of the methods yields very accurate confidence intervals except when T is large enough that the identity of the best firm is clear. We also present empirical results for two well-known data sets.   相似文献   

17.
物流业是生产性服务业的重要组成部分,物流服务创新不仅是物流企业增强竞争力的重要途径,对于服务经济的发展也起着至关重要的作用。文章研究的基本问题是两业联动框架中物流企业进行服务创新是否同时受到联动主体及环境因素的影响,具体有哪些因素影响物流企业的服务创新,并运用层次分析法(AHP)对这些因素进行分析。研究结果表明:影响物流企业服务创新的因素主要有物流企业因素、政府政策因素、行业环境因素、制造企业因素四个方面。  相似文献   

18.
Wangli Xu  Lixing Zhu 《Metrika》2013,76(1):53-69
In this paper, we investigate checking the adequacy of varying coefficient models with response missing at random. In doing so, we first construct two completed data sets based on imputation and marginal inverse probability weighted methods, respectively. The empirical process-based tests by using these two completed data sets are suggested and the asymptotic properties of the test statistics under the null and local alternative hypotheses are studied. Because the limiting null distribution is intractable, a Monte Carlo approach is applied to approximate the distribution to determine critical values. Simulation studies are carried out to examine the performance of our method, and a real data set from an environmental study is analyzed for illustration.  相似文献   

19.
邓基刚 《价值工程》2013,(33):133-135
本文通过构建卷烟销售异常联动管理机制影响因素的结构方程模型,以某市烟草公司为实证研究对象,并展开问卷调查获取数据,运用AMOS软件对该模型进行验证性因子分析,获得关键影响因素。基于实证研究结果,提出提升卷烟销售异常联动管理机制效率的对策。  相似文献   

20.
This paper analyzes the determinants of success at the concours d'agrégation en sciences économiques. This is a centralized hiring procedure through which professors of economics are selected in France. Using detailed data from all concours held between 1984 and 2003, we focus on the role of the candidates' publication records (number and quality of articles) and networks (defined as professional links between candidates and the jury members who take the recruitment decisions). Both sets of variables have statistically significant effects on the likelihood of getting hired. The effect of network connections is important in the sense that a substantial improvement of the publication record is needed to compensate for not being linked to the jury.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号