Article Text

PDF
A combined Fuzzy and Naïve Bayesian strategy can be used to assign event codes to injury narratives
  1. H Marucci-Wellman1,
  2. M Lehto2,
  3. H Corns1
  1. 1Center for Injury Epidemiology, Liberty Mutual Research Institute for Safety, 71 Frankland Road, Hopkinton, Massachusetts, USA
  2. 2School of Industrial Engineering, Purdue University, 1287 Grissom Hall, West Lafayette, Indiana, USA/ School of Management/Center for Global Innovation & Entrepreneurship Kyunghee University Seoul 130-701, Korea
  1. Correspondence to Dr Helen Marucci-Wellman, Center for Injury Epidemiology, Liberty Mutual Research Institute for Safety, 71 Frankland Road, Hopkinton, MA 01748, USA; helen.wellman{at}libertymutual.com

Abstract

Background Bayesian methods show promise for classifying injury narratives from large administrative datasets into cause groups. This study examined a combined approach where two Bayesian models (Fuzzy and Naïve) were used to either classify a narrative or select it for manual review.

Methods Injury narratives were extracted from claims filed with a worker's compensation insurance provider between January 2002 and December 2004. Narratives were separated into a training set (n=11,000) and prediction set (n=3,000). Expert coders assigned two-digit Bureau of Labor Statistics Occupational Injury and Illness Classification event codes to each narrative. Fuzzy and Naïve Bayesian models were developed using manually classified cases in the training set. Two semi-automatic machine coding strategies were evaluated. The first strategy assigned cases for manual review if the Fuzzy and Naïve models disagreed on the classification. The second strategy selected additional cases for manual review from the Agree dataset using prediction strength to reach a level of 50% computer coding and 50% manual coding.

Results When agreement alone was used as the filtering strategy, the majority were coded by the computer (n=1,928, 64%) leaving 36% for manual review. The overall combined (human plus computer) sensitivity was 0.90 and positive predictive value (PPV) was >0.90 for 11 of 18 2-digit event categories. Implementing the 2nd strategy improved results with an overall sensitivity of 0.95 and PPV >0.90 for 17 of 18 categories.

Conclusions A combined Naïve-Fuzzy Bayesian approach can classify some narratives with high accuracy and identify others most beneficial for manual review, reducing the burden on human coders.

  • e-Code
  • e-coding
  • injury
  • narrative analyses
  • surveillance
  • text mining

Statistics from Altmetric.com

Footnotes

  • Competing interests None.

  • Ethics approval This study was conducted with the approval of the Liberty Mutual Research Institute for Safety and Purdue University.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.