A note on using the F-measure for evaluating record linkage algorithms
Tóm tắt
Từ khóa
Tài liệu tham khảo
Belin, T.R., Rubin, D.B.: A method for calibrating false-match rates in record linkage. J. Am. Stat. Assoc. 90(430), 694–707 (1995)
Christen, P.: Development and user experiences of an open source data cleaning, deduplication and record linkage system. SIGKDD Explor. 11(1), 39–48 (2009)
Christen, P.: Data Matching—Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, Berlin (2012)
Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24(9), 1537–1555 (2012)
Christen, P.: Preparation of a Real Temporal Voter Data Set for Record Linkage and Duplicate Detection Research. Technical Report, The Australian National University (2014)
Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. In: Guillet, F., Hamilton, H. (eds.) Quality Measures in Data Mining, Studies in Computational Intelligence, vol. 43, pp. 127–151. Springer, Berlin (2007)
Christen, P., Vatsalan, D., Wang, Q.: Efficient entity resolution with adaptive and interactive training data selection. In: IEEE International Conference on Data Mining, pp. 727–732. Atlantic City (2015)
Copas, J., Hilton, F.: Record linkage: statistical models for matching computer records. J. R. Stat. Soc. Ser. A (Stat. Soc.) 153(3), 287–320 (1990)
Domingo-Ferrer, J., Torra, V.: Disclosure risk assessment in statistical microdata protection via advanced record linkage. Stat. Comput. 13(4), 343–354 (2003)
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)
Getoor, L., Machanavajjhala, A.: Entity resolution: theory, practice and open challenges. VLDB Endow. 5(12), 2018–2019 (2012)
Gutman, R., Afendulis, C.C., Zaslavsky, A.M.: A Bayesian procedure for file linking to analyze end-of-life medical costs. J. Am. Stat. Assoc. 108(501), 34–47 (2013)
Gutman, R., Sammartino, C., Green, T., Montague, B.: Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+. Stat. Med. 35(1), 115–129 (2016)
Hand, D.J.: Construction and Assessment of Classification Rules. Wiley, New York (1997)
Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77(1), 103–123 (2009)
Hand, D.J.: Evaluating diagnostic tests: the area under the ROC curve and the balance of errors. Stat. Med. 29(14), 1502–1510 (2010)
Hand, D.J.: Assessing the performance of classification methods. Int. Stat. Rev. 80(3), 400–414 (2012)
Harron, K., Goldstein, H., Dibben, C.: Methodological Developments in Data Linkage. Wiley, New York (2015)
Herzog, T., Scheuren, F., Winkler, W.E.: Data Quality and Record Linkage Techniques. Springer, Berlin (2007)
Jaro, M.A.: Advances in record-linkage methodology a applied to matching the 1985 Census of Tampa, Florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)
Larsen, M.D., Rubin, D.B.: Iterative automated record linkage using mixture models. J. Am. Stat. Assoc. 96(453), 32–41 (2001)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 169–178. Boston (2000)
Murray, J.S.: Probabilistic record linkage and deduplication after indexing, blocking, and filtering. J. Priv. Confid. 7(1), 2 (2016)
Naumann, F., Herschel, M.: An introduction to duplicate detection. In: Synthesis Lectures on Data Management, vol. 3. Morgan and Claypool Publishers (2010)
Newcombe, H.B.: Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business. Oxford University Press Inc, New York (1988)
Reid, A., Davies, R., Garrett, E.: Nineteenth-century Scottish demography from linked censuses and civil registers. Hist. Comput. 14(1–2), 61–86 (2002)
Sadinle, M.: Detecting duplicates in a homicide registry using a Bayesian partitioning approach. Ann. Appl. Stat. 8(4), 2404–2434 (2014)
Sadinle, M., Fienberg, S.E.: A generalized Fellegi–Sunter framework for multiple record linkage with application to homicide record systems. J. Am. Stat. Assoc. 108(502), 385–397 (2013)
van Rijsbergen, C.: Information Retrieval. Butterworth, Oxford (1979)
Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Inf. Syst. 38(6), 946–969 (2013)
Winkler, W.E., Yancey, W.E., Porter, E.H.: Fast record linkage of very large files in support of decennial and administrative records projects. In: Proceedings of the Section on Survey Research Methods, pp. 2120–2130. American Statistical Association (2010)