An effective and efficient approach for manually improving geocoded data
Daniel W Goldberg, John P Wilson, Craig A Knoblock, Beate Ritz, and Myles G Cockburn
International Journal of Health Geographics 2008, 7:60
doi:10.1186/1476-072X-7-60
Published: 26 November 2008
The process of geocoding produces output coordinates of varying degrees of quality.
Previous studies have revealed that simply excluding records with low-quality
geocodes from analysis can introduce significant bias, but depending on the number
and severity of the inaccuracies, their inclusion may also lead to bias. Little
quantitative research has been presented on the cost and/or effectiveness of correcting
geocodes through manual interactive processes, so the most cost effective methods for
improving geocoded data are unclear. The present work investigates the time and effort
required to correct geocodes contained in five health-related datasets that represent
examples of data commonly used in Health GIS.
Geocode correction was attempted on five health-related datasets containing a
total of 22,317 records. The complete processing of these data took 11.4 weeks
(427 hours), averaging 69 seconds of processing time per record. Overall, the geocodes
associated with 12,280 (55%) of records were successfully improved, taking 95 seconds
of processing time per corrected record on average across all five datasets. Geocode correction
improved the overall match rate (the number of successful matches out of the total attempted)
from 79.3 to 95%. The spatial shift between the location of original successfully matched
geocodes and their corrected improved counterparts averaged 9.9 km per corrected record.
After geocode correction the number of city and USPS ZIP code accuracy geocodes were reduced
from 10,959 and 1,031 to 6,284 and 200, respectively, while the number of building centroid
accuracy geocodes increased from 0 to 2,261.
The results indicate that manual geocode correction using a web-based interactive
approach is a feasible and cost effective method for improving the quality of geocoded
data. The level of effort required varies depending on the type of data geocoded. These
results can be used to choose between data improvement options (e.g., manual intervention,
pseudocoding/geo-imputation, field GPS readings).
View Full Article