Background Emergency departments (ED) around the world collect valuable injury data with potential to inform consumer product regulators. However, many of these systems store key information in unstructured text fields, making case identification and analysis difficult. Machine learning approaches allow autocoding of large amounts of data, increasing the utility of these data for surveillance. This study aimed to evaluate the performance of different classifiers for categorising mechanisms and objects involved in injury-related ED presentations.
Methods A sample of 100,000 cases from a special injury surveillance system was used to train the classifiers (Naïve Bayesian, support vector machine (SVM) and logistic regression) and algorithms were tested on 10,000 cases. Accuracy results of each classifier were compared. The classifier obtaining the highest accuracy was then applied to state-wide ED text to autocode the data. A sample of cases were manually coded and reviewed to assess the accuracy of the algorithm for the larger dataset.
Results All classifiers were found to achieve high levels of accuracy for categorising mechanism and moderate levels of accuracy for categorising objects involved. The SVM approach showed the highest accuracy, and was used to classify state-wide ED injury data. Over 75% of the statewide database was assigned a specified mechanism and almost a quarter of cases were categorised as involving a consumer product. Comparison with gold standard manual coding for a sample of cases found high accuracy of the SVM classifier for the statewide data.
Conclusions Consumer product regulators are increasingly requiring an evidence base to support regulatory responses, and ED data provides a valuable yet underutilised source of injury data. Machine learning approaches can be used to quickly and accurately code free text descriptions to categorise data for further extraction, analysis and interpretation.
- Consumer product safety
- Emergency department data
- Injury Surveillance
- Machine learning
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.