Article Text
Abstract
Background Internet-based big data may offer important and timely information concerning road traffic injury data, supplementing official government statistics. We developed computer-based approaches to define, extract and automatically collect internet-based Chinese language big data on road traffic injuries.
Methods Based on injury prevention matrices and ICD-10, we established a thesaurus set and analysis framework for data extraction. A dilated convolutions neural network classifier was developed to filter eligible news stories based on 10,000 researcher-annotated news sources, and algorithms were built to extract information concerning relevant variables. Word frequency was reported using a Python Chinese word segmentation module (Jieba). Pearson correlation coefficients examined relations between internet-based big data and official statistics.
Results 650,140 media reports were captured from 27 Chinese news websites, and 92,813 news pieces were filtered as eligible reports (accuracy=86%). Searches captured information about 71,829 traffic crashes from January 2013-September 2019. The words ‘crash’, ‘vehicle’ and ‘scene’ were the most frequently used words in the stories. Our results revealed characteristics that official statistics did not cover, such as changes in travel patterns for the elderly. The number of media-reported crashes was highly correlated with official statistics (r=0.84, p=0.035).
Conclusion Internet-based big data offers information about traffic crashes that can supplement official government statistics and aid in road traffic injury prevention strategies. Extension to countries where government data and statistics are unreliable, but news reporting is reliable, appeals in particular.
Learning Outcomes Internet-based big data offers data that can supplement existing road traffic injury sources and guide prevention efforts.