On the consistent estimation of linkage errors without training data
Tóm tắt
Từ khóa
Tài liệu tham khảo
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Armstrong, M., & Mayda, J. (1993). Model-based estimation of record linkage error rates. Survey Methodology, 19, 137–147.
Belin, T., & Rubin, D. (1995). A method for calibrating false-match rates in record linkage. Journal of the American Statistical Association, 90, 694–707.
Billingsley, P. (1995). Probability and measure (3rd ed.). New York: Wiley.
Blakely, T., & Salmond, C. (2002). Probabilistic record linkage and a method to calculate the positive predicted value. International Journal of Epidemiology, 31, 1246–1252.
Bohensky, M., Jolley, D., Sundararajan, V., Evans, S., Pilcher, D., Scott, I., & Brand, C. (2010). A powerful research tool with potential problems. BMC Health Services Research, 10, 1–7.
Chambers, R., & Kim, G. (2016). Secondary analysis of linked data. In K. Harron, H. Goldstein, & C. Dibben (Eds.), Methodological Developments in Data Linkage (pp. 83–108). Chichester: Wiley.
Chipperfield, J., Hansen, N., & Rossiter, P. (2018). Estimating precision and recall for deterministic and probabilistic record linkage. International Statistical Review, 86, 219–236.
Christen, P. (2012). Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution and Duplicate Detection. New York: Springer.
Comenetz, J. (2016). Demographic aspects of surnames - 2020 census. https://www2.census.gov/topics/genealogy/2010surnames/surnames.pdf. Accessed: 2020-10-17.
Copas, J., & Hilton, F. (1990). Record linkage: statistical models for matching computer records. Journal of the Royal Statistical Society A, 153, 287–320.
Daggy, J., Xu, H., Hui, S., Gamache, R., & Grannis, S. (2013). A practical approach for incorporating dependence among fields in probabilistic record linkage. BMC Medical Informatics and Decision Making, 13, 1–8.
Dasylva, A., Abeysundera, M., Akpoué, B., Haddou, M., and Saïdi, A. (2016). Measuring the quality of a probabilistic linkage through clerical reviews. In Statistics Canada, editor, Proceedings of the 2016 International Methodology Symposium.
Dasylva, A. and Goussanou, A. (2020). Estimating linkage errors under regularity conditions. In American Statistical Association, editor, In Proceedings of the Section on Survey Research Methods, pages 687–692.
Dasylva, A., & Goussanou, A. (2021). Estimating the false negatives due to blocking in record linkage. Survey Methodology, 47, 299–311.
Fellegi, I., & Sunter, A. (1969). A theory of record linkage. Journal of the American Statistical Association, 64, 1183–1210.
Fortini, M., Liseo, B., Nuccitelli, A., & Scanu, M. (2001). On Bayesian record linkage. Research in Official Statistics, 4, 185–198.
Haberman, S. (1975). Iterative scaling for log-linear model for frequency tables derived by indirect observation. In American Statistical Association, editor, Proceedings of the Statistical Computing Section, pages 45–50.
Haberman, S. (1977). Product models for frequency tables involving indirect observation. Annals of Statistics, 5, 1124–1147.
Herzog, T., Scheuren, F., & Winkler, W. (2007). Data Quality and Record Linkage Techniques. New York: Springer.
Larsen, M., & Rubin, D. (2001). Iterated automated record linkage using mixture models. Journal of the American Statistical Association, 96, 32–41.
Newcombe, H., Smith, M., & Howe, G. (1983). Reliability of computerized versus manual death searches in a study of the health of eldorado uranium workers. Computers in Biology and Medicine, 13, 157–169.
Sadinle, M. (2017). Bayesian estimation of bipartite matchings for record linkage. Journal of the American Statistical Association, 112, 600–612.
Sanmartin, C., Decady, Y., Trudeau, R., Dasylva, A., Tjepkema, M., Finés, P., et al. (2016). Linking the Canadian community health survey and the Canadian mortality database: An enhanced data source for the study of mortality. Health Reports, 27, 1–11.
Sariyar, M., Borg, A., & Pommerening, K. (2011). Controlling false match rates in record linkage using extreme value theory. Journal of Biomedical Informatics, 44, 648–654.
Schnell, R., Bachteler, T., and Reiher, J. (2009). Privacy-preserving record linkage using bloom filters. BMC Medical Informatics and Decision Making, 9.
Statistics Canada (2017a). 2016 census of population income reference guide. 98-500-X2016004.
Statistics Canada (2017b). Record linkage project process model. Catalog no 12-605-X.
Statistics Canada (2019). 2016 census of population coverage technical report. 98-303-X2016001.
Steorts, R., Hall, R., & Fienberg, S. (2016). A Bayesian approach to graphical record linkage and de-duplication. Journal of the American Statistical Association, 111, 1660–1672.
Tancredi, A., & Liseo, B. (2011). A hierarchical Bayesian approach to record linkage and population size problems. Annals of Applied Statistics, 5, 1553–1585.
Teicher, H. (1963). Identifiability of finite mixtures. Annals of Mathematical Statistics, 34, 1265–1269.
Thibaudeau, Y. (1993). The discrimination power of dependency structures in record linkage. Survey Methodology, 19, 1–16.
US Census Bureau (2016). File b: surnames occurring 100 or more times. https://www2.census.gov/topics/genealogy/2010surnames/names.zip. Accessed: 2020-10-17.
US Census Bureau (2020). Annual state resident population estimates for 6 race groups (5 race alone groups and two or more races) by age, sex, and hispanic origin: April 1, 2010 to july 1, 2019. https://www2.census.gov/programs-surveys/popest/tables/2010-2019/state/asrh/sc-est2019-alldata6.csv. Accessed: 2020-10-17.
Winkler, W. (1993). Improved decision rules in the fellegi-sunter model of record linkage. In American Statistical Association, editor, In Proceedings of the Section on Survey Research Methods, pages 274–279.