A multi-stage approach to maximizing geocoding success in a large population-based cohort study through automated and interactive processes

Submitted: 18 December 2014
Accepted: 18 December 2014
Published: 1 May 2012
Abstract Views: 1775
PDF: 653
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Authors

To enable spatial analyses within a large, prospective cohort study of nearly 86,000 adults enrolled in a 12-state area in the southeastern United States of America from 2002-2009, a multi-stage geocoding protocol was developed to efficiently maximize the proportion of participants assigned an address level geographic coordinate. Addresses were parsed, cleaned and standardized before applying a combination of automated and interactive geocoding tools. Our full protocol increased the non-Post Office (PO) Box match rate from 74.5% to 97.6%. Overall, we geocoded 99.96% of participant addresses, with only 5.2% at the ZIP code centroid level (2.8% PO Box and 2.3% non-PO Box addresses). One key to reducing the need for interactive geocoding was the use of multiple base maps. Still, addresses in areas with population density 920 persons/km2 (odds ratio (OR) = 5.24; 95% confidence interval (CI) = 4.23, 6.49), as were addresses collected from participants during in-person interviews compared with mailed questionnaires (OR = 1.83; 95% CI = 1.59, 2.11). This study demonstrates that population density and address ascertainment method can influence automated geocoding results and that high success in address level geocoding is achievable for large-scale studies covering wide geographical areas.

Dimensions

Altmetric

PlumX Metrics

Downloads

Download data is not yet available.

Citations

How to Cite

Sonderman, J. S., Mumma, M. T., Cohen, S. S., Cope, E. L., Blot, W. J., & Signorello, L. B. (2012). A multi-stage approach to maximizing geocoding success in a large population-based cohort study through automated and interactive processes. Geospatial Health, 6(2), 273–284. https://doi.org/10.4081/gh.2012.145