Probabilistic linkage of large public health data files

Statistics in Medicine - Tập 14 Số 5-7 - Trang 491-498 - 1995
Matthew A. Jaro1
1Match Ware Technologies, Inc., 14637 Locustwood Lane, Silver Spring, MD 20905, U.S.A.

Tóm tắt

AbstractProbabilistic linkage technology makes it feasible and efficient to link large public health databases in a statistically justifiable manner. The problem addressed by the methodology is that of matching two files of individual data under conditions of uncertainty. Each field is subject to error which is measured by the probability that the field agrees given a record pair matches (called the m probability) and probabilities of chance agreement of its value states (called the u probability). Fellegi and Sunter pioneered record linkage theory. Advances in methodology include use of an EM algorithm for parameter estimation, optimization of matches by means of a linear sum assignment program, and more recently, a probability model that addresses both m and u probabilities for all value states of a field. This provides a means for obtaining greater precision from non‐uniformly distributed fields, without the theoretical complications arising from frequency‐based matching alone. The model includes an interative parameter estimation procedure that is more robust than pre‐match estimation techniques. The methodology was originally developed and tested by the author at the U.S. Census Bureau for census undercount estimation. The more recent advances and a new generalized software system were tested and validated by linking highway crashes to Emergency Medical Service (EMS) reports and to hospital admission records for the National Highway Traffic Safety Administration (NHTSA).

Từ khóa


Tài liệu tham khảo

Match Ware Technologies, Inc., 1992, AUTOMATCH Generalized Record Linkage System

10.1145/368996.369026

10.1080/01621459.1969.10501049

10.1080/01621459.1989.10478785

Dempster A. P., 1977, Maximum likelihood from Incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39, 1

Jaro M. A., 1972, UNIMATCH: A computer system for general record linkage under conditions of uncertainty, American Federation of Information Processing Societies (AFIPS) Conference Proceedings, 40, 523

Match Ware Technologies, Inc., 1993, AUTOSTAN Generalized Standardization System