Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities

Marc N. Elliott1, Peter A. Morrison2, Allen Fremont1, Daniel F. McCaffrey3, Philip Pantoja1, Nicole Lurie4
1RAND Corporation, Santa Monica, USA
2RAND Corporation, Nantucket, USA
3RAND Corporation, Pittsburgh, USA
4RAND Corporation, Arlington, USA

Tóm tắt

Commercial health plans need member racial/ethnic information to address disparities, but often lack it. We incorporate the U.S. Census Bureau’s latest surname list into a previous Bayesian method that integrates surname and geocoded information to better impute self-reported race/ethnicity. We validate this approach with data from 1,921,133 enrollees of a national health plan. Overall, the new approach correlated highly with self-reported race-ethnicity (0.76), which is 19% more efficient than its predecessor (and 41% and 108% more efficient than single-source surname and address methods, respectively, P < 0.05 for all). The new approach has an overall concordance statistic (area under the Receiver Operating Curve or ROC) of 0.93. The largest improvements were in areas where prior performance was weakest (for Blacks and Asians). The new Census surname list accounts for about three-fourths of the variance explained in the new estimates. Imputing Native American and multiracial identities from surname and residence remains challenging.

Tài liệu tham khảo