首页 | 本学科首页   官方微博 | 高级检索  
     


Estimating Precision and Recall for Deterministic and Probabilistic Record Linkage
Authors:James Chipperfield  Noel Hansen  Peter Rossiter
Affiliation:1. Australian Bureau of Statistics, Docklands, VIC, Australia;2. Adjunct Associate Professor, National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, NSW, Australia
Abstract:Linking administrative, survey and census files to enhance dimensions such as time and breadth or depth of detail is now common. Because a unique person identifier is often not available, records belonging to two different units (e.g. people) may be incorrectly linked. Estimating the proportion of links that are correct, called Precision, is difficult because, even after clerical review, there will remain uncertainty about whether a link is in fact correct or incorrect. Measures of Precision are useful when deciding whether or not it is worthwhile linking two files, when comparing alternative linking strategies and as a quality measure for estimates based on the linked file. This paper proposes an estimator of Precision for a linked file that has been created by either deterministic (or rules‐based) or probabilistic (where evidence for a link being a match is weighted against the evidence that it is not a match) linkage, both of which are widely used in practice. This paper shows that the proposed estimators perform well.
Keywords:Latent models  link quality
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号