Article Text

Download PDFPDF

3A.004 Internet-based textual big data and road traffic injuries
  1. Peixia Cheng1,
  2. Jianxin Wang2,
  3. Wangxin Xiao1,
  4. David Schwebel3,
  5. Peishan Ning1,
  6. Yue Wu4,
  7. Guoqing Hu1
  1. 1Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha, China
  2. 2School of Computer Science and Engineering, Central South University, Changsha, China
  3. 3Department of Psychology, University of Alabama at Birmingham, Birmingham, USA
  4. 4Department of Environmental and Occupational Health, Xiangya School of Public Health, Central South University, Changsha, China


Background Internet-based big data may offer important and timely information concerning road traffic injury data, supplementing official government statistics. We developed computer-based approaches to define, extract and automatically collect internet-based Chinese language big data on road traffic injuries.

Methods Based on injury prevention matrices and ICD-10, we established a thesaurus set and analysis framework for data extraction. A dilated convolutions neural network classifier was developed to filter eligible news stories based on 10,000 researcher-annotated news sources, and algorithms were built to extract information concerning relevant variables. Word frequency was reported using a Python Chinese word segmentation module (Jieba). Pearson correlation coefficients examined relations between internet-based big data and official statistics.

Results 650,140 media reports were captured from 27 Chinese news websites, and 92,813 news pieces were filtered as eligible reports (accuracy=86%). Searches captured information about 71,829 traffic crashes from January 2013-September 2019. The words ‘crash’, ‘vehicle’ and ‘scene’ were the most frequently used words in the stories. Our results revealed characteristics that official statistics did not cover, such as changes in travel patterns for the elderly. The number of media-reported crashes was highly correlated with official statistics (r=0.84, p=0.035).

Conclusion Internet-based big data offers information about traffic crashes that can supplement official government statistics and aid in road traffic injury prevention strategies. Extension to countries where government data and statistics are unreliable, but news reporting is reliable, appeals in particular.

Learning Outcomes Internet-based big data offers data that can supplement existing road traffic injury sources and guide prevention efforts.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.