Skip to main content

Advertisement

Log in

Information Extraction Approaches to Unconventional Data Sources for “Injury Surveillance System”: the Case of Newspapers Clippings

  • Original Paper
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Injury Surveillance Systems based on traditional hospital records or clinical data have the advantage of being a well established, highly reliable source of information for making an active surveillance on specific injuries, like choking in children. However, they suffer the drawback of delays in making data available to the analysis, due to inefficiencies in data collection procedures. In this sense, the integration of clinical based registries with unconventional data sources like newspaper articles has the advantage of making the system more useful for early alerting. Usage of such sources is difficult since information is only available in the form of free natural-language documents rather than structured databases as required by traditional data mining techniques. Information Extraction (IE) addresses the problem of transforming a corpus of textual documents into a more structured database. In this paper, on a corpora of Italian newspapers articles related to choking in children due to ingestion/inhalation of foreign body we compared the performance of three IE algorithms- (a) a classical rule based system which requires a manual annotation of the rules; (ii) a rule based system which allows for the automatic building of rules; (b) a machine learning method based on Support Vector Machine. Although some useful indications are extracted from the newspaper clippings, this approach is at the time far from being routinely implemented for injury surveillance purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Centers for Disease Control and Prevention, Updated guidelines for evaluating public health surveillance systems: recommendations from the guidelines working group, in MMWR Recomm Rep. 2001. p. 1–51.

  2. Voight, B., et al., Injury reporting in Connecticut newspapers. Inj. Prev. 4(4):292–294, 1998.

    Article  Google Scholar 

  3. Horan, J. M., and Mallonee, S., Injury surveillance. Epidemiol. Rev. 25:24–42, 2003.

    Article  Google Scholar 

  4. Baullinger, J., et al., Use of Washington State newspapers for submersion injury surveillance. Inj. Prev. 7(4):339–342, 2001.

    Article  Google Scholar 

  5. Guard, A., and Gallagher, S. S., Heat related deaths to young children in parked cars: an analysis of 171 fatalities in the United States, 1995–2002. Inj. Prev. 11(1):33–37, 2005.

    Article  Google Scholar 

  6. Frost, K., Frank, E., and Maibach, E., Relative risk in the news media: a quantification of misrepresentation. Am. J. Public Health 87(5):842–845, 1997.

    Article  Google Scholar 

  7. Chapman, S., and Lupton, D., The fight for public health: principles and practice of media advocacy. London: BMJ. xv, 270, 1994.

  8. Fine, P. R., et al., Are newspapers a viable source for intentional injury surveillance data? South Med. J. 91(3):234–242, 1998.

    Article  MathSciNet  Google Scholar 

  9. Rainey, D. Y., and Runyan, C. W., Newspapers: a source for injury surveillance? Am. J. Public Health 82(5):745–746, 1992.

    Article  Google Scholar 

  10. Zhou, G., et al., Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20(7):1178–1190, 2004.

    Article  Google Scholar 

  11. Corney, D. P., et al., BioRAT: extracting biological information from full-length papers. Bioinformatics 20(17):3206–3213, 2004.

    Article  Google Scholar 

  12. Zigon, G., et al., Child mortality due to suffocation in Europe (1980–1995): a review of official data. Acta Otorhinolaryngol. Ital. 26(3):154–161, 2006.

    Google Scholar 

  13. Saggion, H., et al., Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project. Data Knowledge Eng. 48(2):247–264, 2004.

    Article  Google Scholar 

  14. Cunningham, H., et al., GATE: a framework and graphical development environment for robust NLP tools and applications. In 40th Anniversary Meeting of the Association for Computational Linguistics (ACL’02). 2002.

  15. Text Analysis International Inc., Integrated development environments for natural language processing. 2001.

  16. Iria, J., Ireson, N., and Ciravegna, F.. An Experimental Study on Boundary Classification Algorithms for Information Extraction using SVM. In Workshop on Adaptive Text Extraction and Mining 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006.

  17. Joachims, T., Training Linear SVMs in Linear Time. in, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). 2006.

  18. Makhoul, J., et al., Performance measures for information extraction. In Proceedings of DARPA Broadcast News Workshop, (Herndon, VA), 1999.

  19. Ghaffar, A., Hyder, A. A., and Bishai, D., Newspaper reports as a source for injury data in developing countries. Health Policy Plan 16(3):322–325, 2001.

    Article  Google Scholar 

  20. Collier, N., and Takeuchi, K., Comparison of character-level and part of speech features for name recognition in biomedical texts. J. Biomed. Inform. 37(6):423–435, 2004.

    Article  Google Scholar 

  21. Ananiadou, S., Kell, D. B., and Tsujii, J. I., Text mining and its potential applications in systems biology. Trends Biotechnol, 2006.

  22. Marshall, R. J., Comparison of misclassification rates of search partition analysis and other classification methods. Stat. Med. 25(22):3787–3797, 2005.

    Article  Google Scholar 

  23. Rahman, F., Andersson, R., and Svanstrom, L., Potential of using existing injury information for injury surveillance at the local level in developing countries: experiences from Bangladesh. Public Health 114:133–136, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dario Gregori.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berchialla, P., Scarinzi, C., Snidero, S. et al. Information Extraction Approaches to Unconventional Data Sources for “Injury Surveillance System”: the Case of Newspapers Clippings. J Med Syst 36, 475–481 (2012). https://doi.org/10.1007/s10916-010-9492-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10916-010-9492-1

Keywords

Navigation