Background Record linkage to routinely collected national databases is increasingly used in cohort studies as a cost effective strategy to collect outcome events. However, incomplete or imperfect data remains a concern.
Aims This paper evaluated the completeness of injury outcome data collected in the Taupo Bicycle Study (2590 cyclists followed) for 55 months (1 December 2006 to 30 June 2011).
Methods Data on injury producing bicycle crashes were collected through record linkage to insurance claims, hospital discharge and mortality data and police reports. A capture recapture analysis using log-linear models was undertaken to estimate the number of undetected cases. A comparison was made with self-reported data collected in a follow-up survey (2009/2010).
Results Collectively, 1336 crashes (including 755 on road crashes and 120 collisions with a motor vehicle) experienced by 855 participants were identified from the datasets noted. The estimated completeness of data was 73.7% (95% CI 68.0% to 78.7%) for total crashes, 74.5% (95% CI 69.1% to 79.3%) for on road crashes and 83.3% (95% CI 78.9% to 87.6%) for collisions. There was moderate agreement between self-reported and linked data (kappa: 0.55). The agreement varied by participants' demographic characteristics, pre-existing health conditions and confidence in recalling crash events. Considering self-reported crashes as the gold standard, the linked administrative data has 63.1% sensitivity and 93.5% specificity for total crashes and 40.0% sensitivity and 99.9% specificity for collisions.
Significance Given the substantial underestimation of bicycle crashes in routinely collected data, cohort studies using record linkage need to consider and account for potential biases in analyses.