A multi-stage approach to maximizing geocoding success in a large population-based cohort study through automated and interactive processes

  • Jennifer S. Sonderman | Jennifer@iei.us International Epidemiology Institute, Rockville, MD, United States.
  • Michael T. Mumma International Epidemiology Institute, Rockville, MD, United States.
  • Sarah S. Cohen International Epidemiology Institute, Rockville, MD, United States.
  • Elizabeth L. Cope International Epidemiology Institute, Rockville, MD, United States.
  • William J. Blot International Epidemiology Institute, Rockville, MD; Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt University, and Vanderbilt-Ingram Cancer Center, Nashville, TN, United States.
  • Lisa B. Signorello International Epidemiology Institute, Rockville, MD; Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt University, and Vanderbilt-Ingram Cancer Center, Nashville, TN, United States.

Abstract

To enable spatial analyses within a large, prospective cohort study of nearly 86,000 adults enrolled in a 12-state area in the southeastern United States of America from 2002-2009, a multi-stage geocoding protocol was developed to efficiently maximize the proportion of participants assigned an address level geographic coordinate. Addresses were parsed, cleaned and standardized before applying a combination of automated and interactive geocoding tools. Our full protocol increased the non-Post Office (PO) Box match rate from 74.5% to 97.6%. Overall, we geocoded 99.96% of participant addresses, with only 5.2% at the ZIP code centroid level (2.8% PO Box and 2.3% non-PO Box addresses). One key to reducing the need for interactive geocoding was the use of multiple base maps. Still, addresses in areas with population density 920 persons/km2 (odds ratio (OR) = 5.24; 95% confidence interval (CI) = 4.23, 6.49), as were addresses collected from participants during in-person interviews compared with mailed questionnaires (OR = 1.83; 95% CI = 1.59, 2.11). This study demonstrates that population density and address ascertainment method can influence automated geocoding results and that high success in address level geocoding is achievable for large-scale studies covering wide geographical areas.

Dimensions

Altmetric

PlumX Metrics

Downloads

Download data is not yet available.
Published
2012-05-01
Info
Issue
Section
Original Articles
Keywords:
epidemiologic methods, geographical information systems, prospective studies, residence characteristics, United States of America.
Statistics
  • Abstract views: 1262

  • PDF: 407
How to Cite
Sonderman, J. S., Mumma, M. T., Cohen, S. S., Cope, E. L., Blot, W. J., & Signorello, L. B. (2012). A multi-stage approach to maximizing geocoding success in a large population-based cohort study through automated and interactive processes. Geospatial Health, 6(2), 273-284. https://doi.org/10.4081/gh.2012.145