Elsevier

Injury

Volume 46, Issue 5, May 2015, Pages 891-897
Injury

Making the most of injury surveillance data: Using narrative text to identify exposure information in case-control studies

https://doi.org/10.1016/j.injury.2014.11.012Get rights and content

Abstract

Introduction

Free-text fields in injury surveillance databases can provide detailed information beyond routinely coded data. Additional data, such as exposures and covariates can be identified from narrative text and used to conduct case-control studies.

Methods

To illustrate this, we developed a text-search algorithm to identify helmet status (worn, not worn, use unknown) in the U.S. National Electronic Injury Surveillance System (NEISS) narratives for bicycling and other sports injuries from 2005 to 2011. We calculated adjusted odds ratios (ORs) for head injury associated with helmet use, with non-head injuries representing controls. For bicycling, we validated ORs against published estimates. ORs were calculated for other sports and we examined factors associated with helmet reporting.

Results

Of 105,614 bicycling injury narratives reviewed, 14.1% contained sufficient helmet information for use in the case-control study. The adjusted ORs for head injuries associated with helmet-wearing were smaller than, but directionally consistent, with previously published estimates (e.g., 1999 Cochrane Review). ORs illustrated a protective effect of helmets for other sports as well (less than 1).

Conclusions

This exploratory analysis illustrates the potential utility of relatively simple text-search algorithms to identify additional variables in surveillance data. Limitations of this study include possible selection bias and the inability to identify individuals with multiple injuries. A similar approach can be applied to study other injuries, conditions, risks, or protective factors. This approach may serve as an efficient method to extend the utility of injury surveillance data to conduct epidemiological research.

Introduction

Administrative health and injury surveillance databases often include narrative text fields that provide additional detailed information beyond routinely coded data. Researchers use free text to validate coded data retrospectively or to obtain supplemental information about patients, illnesses, injuries, comorbidities, outcomes, or health services received [1], [2], [3], [4], [5], [6]. Secondary use of data from free text illustrates one way to extend the value of electronic health information for application in clinical research, quality improvement, or public health surveillance [7]. To our knowledge, however, researchers have not used narratives or free text to obtain additional information for a case-control study.

The National Electronic Injury Surveillance System (NEISS) is an electronic database of injury information from a national probability sample of U.S. emergency departments (EDs), managed by the Consumer Product Safety Commission (CPSC). NEISS provides coded information on body part injured and injury type and has been widely used in epidemiological studies to analyze injury mechanisms for many sports. NEISS does not systematically capture information on known risk factors (e.g. alcohol use) or protective factors (e.g. helmets or other protective equipment). However, since January 1, 2002, NEISS data have included 142-character narratives that provide additional detail on the injury and circumstances around its occurrence. For some injuries such as bicycling, narrative text may provide important etiological information about the injury and whether protective gear was used. Although some studies have used NEISS narrative data to ascertain the activity at the time of injury, they have not been widely used to evaluate the prevalence or effectiveness of protective devices, such as helmets.

Many head injuries can be prevented through improved use of protective gear, especially helmets. Several landmark case-control studies have shown that in bicycling, appropriate helmet use reduces head injuries by up to 85% [8], [9], [10]. These studies provide “gold standards” for estimates of the protective effect of helmets.

The primary objective of this study was to validate the use of narrative-derived exposure information by comparing odds ratios (ORs) for the association between bicycle helmet use and ED-reported head injuries obtained from NEISS with ORs previously reported in the literature. As an exploratory analysis, we also estimated ORs for head injuries associated with helmet use for sports-related injuries with NEISS narratives that report helmet use. Finally, we investigated the factors associated with report of helmet information in the narrative text.

Section snippets

Methods

We obtained NEISS data from the Consumer Product Safety Commission for sports-related injuries that presented to NEISS hospital emergency departments from 2005 to 2011. Injury information provided by NEISS includes age, sex, ethnicity, body part, diagnosis, discharge disposition, location of incident, consumer product(s) associated with injury, and 142-character narrative text field. Diagnosis code refers to the “most severe and specific diagnosis” given by the attending physician and is

Bicycling injuries

There were 105,614 bicycling injury narratives reported in NEISS from 2005 to 2011, of which 14,925 (14.1%) referenced helmet use. Of narratives that referenced helmets, 5270 (35.3%) were categorized as helmeted, 7287 (48.8%) as unhelmeted, and 2368 (15.9%) with helmet mentioned but use unknown.

A 10% random sample of bicycling injury narratives containing the word helmet resulted in 1493 narratives, all of which were reviewed for validation of the text-search algorithm. Among the reviewed

Discussion

This study utilized oft-overlooked narrative text fields in a national injury surveillance database to identify and code supplemental information (in this case exposure data) for use in an illustrative case-control study. This approach can potentially be used on any free text data, such as administrative data or medical records, to extend the utility of this information in an efficient and economical manner. This exploratory case-control study of bicycling head injuries and helmet use resulted

Conflict of interest statement

Authors have no conflicts of interest to declare.

Acknowledgements

Research reported in this publication was supported by the National Institute of Child Health and Human Development of the U.S. National Institutes of Health under award number T32HD057822 (Rivara). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Dr. Rivara holds the Seattle Children's Guild Endowed Chair in Paediatrics. Dr. Hagel holds the Alberta Children's Hospital Foundation Professorship in

References (17)

  • K. McKenzie et al.

    The use of narrative text for injury surveillance research: a systematic review

    Accid Anal Prev Mar

    (2010)
  • H.J. Murff et al.

    Automated identification of postoperative complications within an electronic medical record using natural language processing

    J Am Med Assoc

    (2011)
  • Z. Wang et al.

    Extracting diagnoses and investigation results from unstructured text in electronic health records by semi-supervised machine learning

    PLoS ONE

    (2012)
  • Y. Stahl et al.

    Psychosocial health information in free text notes of Swedish children's health records

    Scand J Caring Sci

    (August 2012)
  • R.A. Wilke et al.

    Use of an electronic medical record for the identification of research subjects with diabetes mellitus

    Clin Med Res

    (2007)
  • M. Steidl et al.

    Data for free – can an electronic medical record provide outcome data for incontinence/prolapse repair procedures?

    J Urol

    (2013)
  • W.R. Hersh

    Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance

    Am J Manag Care

    (2007)
  • R.S. Thompson et al.

    A case-control study of the effectiveness of bicycle safety helmets

    N Engl J Med

    (1989)
There are more references available in the full text version of this article.

Cited by (23)

  • Sectoral patterns of accident process for occupational safety using narrative texts of OSHA database

    2021, Safety Science
    Citation Excerpt :

    More studies on narrative texts analysis can be already found in the review papers (McKenzie et al., 2010; Vallmuur, 2015). First, several studies on identifying and classifying the accident process were conducted using various method such as fuzzy method (Taylor et al., 2014), statistics (Graves et al., 2015), topic modeling (Tanguy et al., 2016), and machine learning (Marucci-Wellman et al., 2017; Goh and Ubeynarayana, 2017; Zhang et al., 2019). Along with development of machine learning, unsupervised and supervised methods have been especially applied to categorize the taxonomy of accident and to predict industries where accidents generated.

View all citing articles on Scopus
View full text