Article Text

Download PDFPDF

Big data and opportunities for injury surveillance
  1. Julia E Gunn1,
  2. Snehal N Shah2,3
  1. 1Communicable Disease Control Division, Boston Public Health Commission, Boston, Massachusetts, USA
  2. 2Research and Evaluation Office, Boston Public Health Commission, Boston, Massachusetts, USA
  3. 3Department of Pediatrics, Boston University School of Medicine, Boston, Massachusetts, USA
  1. Correspondence to Julia E Gunn, Communicable Disease Control Division, Boston Public Health Commission, 1010 Massachusetts Ave, Boston, MA 02118, USA;

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

This issue of Injury Prevention presents injury surveillance activities spanning a range of available methods from paper data collection systems to machine-learning techniques. We have been invited to share thoughts on future opportunities for injury surveillance and public health in general. Our perspective is focused on electronic information and its capacity to transform public health surveillance and provide opportunities for improved targeted interventions. Public health surveillance risks becoming irrelevant if it does not take advantage of electronic information. This includes contextual data that defines the circumstances of an event or outcome and allows for a more complete and informed response to public and population health problems.

In considering the future of public health surveillance, we see opportunities in electronic data sources that can be grouped into three categories. We consider electronic sources to be ‘big data’, which is defined by volume, velocity, variety and veracity.1 The definition provides a framework when evaluating big data. For example, veracity, which can range from inadequate to high quality, is critical for determining usability. Big data is repurposed for public health surveillance and will require tools and methods such as machine learning and quality metrics that may improve veracity. The categories are electronic health records (EHRs), sources of electronic information that describe context such as weather, crime, environmental conditions and emergency medical services data, and social media and internet-based data. These data may allow for a level of granularity critical to designing interventions relevant to specific settings, and pose both challenges and opportunities for public health agencies.

EHRs are a potential source of public health surveillance data. The Health Information Technology for Economic and Clinical Health Act of 2009 was enacted to promote the adoption and meaningful use of health information technology and has resulted in programmes to improve quality, safety and efficiency of healthcare.2 Meaningful use defines standards for and incentivises the use of certified EHRs and secure electronic health information exchanges. This has resulted in the expanded use of EHRs. While individual institutions or healthcare systems may be using EHR data to monitor populations served, the value of EHR data for public health surveillance lies in integrating data from multiple provider organisations to generate population level indicators. In this EHR use case, local public health departments can act as the integrator of this information, which also offers the option of multidirectional data sharing and fosters partnership between providers, healthcare systems, first responders, other community agencies and local public health. While still in its infancy, the use of EHR data for public health surveillance offers several advantages over traditional methods and could revolutionise injury surveillance, in particular. EHR data can provide information on types of injuries, severity, treatment and outcomes that is timely, more granular and more accurate. Algorithms to mine the variety of data types within an EHR such as prescriptions, referrals and laboratory reports along with diagnosis codes could increase detection of relevant events. Automation of these processes could reduce or eliminate the need for inefficient reporting practices. With the development of methods to convert text, screening tools and other non-standard data into standard concepts and formats, EHRs could contribute much needed data about a range of determinants contributing to health outcomes, which are often unavailable through current surveillance activities. In addition, EHRs may play a role in detecting disparities in injury incidence and outcomes. Though it holds promise, much work is needed to develop systems to categorise, standardise and repurpose EHR data to meet the public health surveillance use case. Standard messaging for sending data and the continued development of systems to electronically transfer information are needed. Finally, questions regarding the level of detail and type of information to be used, with a focus on protecting patient privacy, must continue to be addressed by healthcare systems and public health agencies.

In 2004, the Boston Public Health Commission (BPHC) developed a syndromic surveillance system to use EHR data to monitor for symptoms reported in chief complaints that were associated with a potential bioterrorism agent such as anthrax or plague. The system uses a limited dataset from hospital emergency department visits to identify unusual patterns in emergency department visits. Privacy, security and legal authority were initial challenges. To address the legal authority issue, BPHC, as the local health authority, passed a regulation requiring this information be sent electronically every 24 h by all acute care hospitals in Boston.

The initial system development faced many challenges. Misspellings, abbreviations and negations (ie, denies headache) limited its utility. BPHC's first chief complaint coder lacked flexibility to address other issues of public health significance. The system evolved to address these challenges. BPHC's current system uses an emergency medical text processor, a natural language processing system, to standardise chief complaint text and has the capacity to develop syndrome definitions on an ad hoc basis.3 This system, like other surveillance systems, requires ongoing monitoring, maintenance and development. In addition, quality control, to identify when electronic data are missing or incorrect, is a daily activity. We routinely monitor our syndromic surveillance system for events such as acute gastrointestinal illness, influenza like illness, asthma, alcohol-related events and carbon monoxide. Data are used to inform programming and direct public health resources. Calls for standardisation of terms in EHRs will pose challenges. There will always be events that cannot be anticipated and coded from a standardised list of chief complaints. After the 2013 Marathon Bombing, BPHC wanted to ensure that mental health services and resources were available for those in need. To understand the potential need for mental health services, we created syndromes to search for, categorise and analyse terms related to anxiety, post-traumatic stress disorder and suicidal ideation. Aggregated information was shared with the BPHC's Office of Public Health Preparedness for situation awareness.

In our work with BPHC's syndromic surveillance system, we have seen the evolution of healthcare practices and related documentation. For example, ‘narcan’ (naloxone) as a chief complaint rarely appeared in the early years and has become more common with the opioid epidemic. This suggests the need for tools to identify new conditions, treatments and practices to maximise the value of EHR data for syndromic and other types of population health surveillance.

Traditional surveillance systems can be described as one dimensional when relying on a single information source. Public health surveillance systems must develop methods to integrate multiple data streams to create a multidimensional understanding of public health threats, which may be particularly relevant for injury surveillance. For instance, could we rely on the growing number of traffic video cameras to provide information on the prevalence of cycling and helmet use by cyclists? Public health will be challenged with complex ethical and privacy issues as social trade-offs are considered. Automated algorithms could be applied to the review of video footage, obviating the need for human participation and, potentially, addressing privacy concerns. These data could be combined with multiple data sources to understand factors that may influence bicycle injuries. Imagine an amalgamation of geographic data on weather patterns, traffic conditions, public works projects, construction permits, city-sponsored events, vehicles per household, prevalence of bicycling and bicycle helmet use, and data on bicycle injuries from police, emergency medical services and emergency departments—what kind of picture would emerge? Could one imagine more robust interventions resulting from such a contextualised picture of injury? Coordinating data from multiple sectors would provide an opportunity to engage agencies and organisations that provide such data in injury prevention as well as surveillance and analysis activities. Progress is needed in linking, visualising and analysing these integrated multiple data streams to produce coherent information for surveillance purposes. Public health must understand and respect the limitations and biases of each data source and the integrating data streams while maximising its usefulness.

Many local communities including Boston have seen an increase in bicycling. Bicyclists are particularly vulnerable to injury due to environmental conditions such as street infrastructure and road hazards, rider and vehicle driver behaviours. The BPHC syndromic surveillance system combined chief complaints and ICD9-CM codes for a bicycle-related injury syndrome. The analysis provided information on the characteristics of the cyclist such as age, gender, race/ethnicity and zip code of residence.4 Information on the context of the event such as location, cause (ie, hit by a car door or a car) and weather conditions is rarely available or not available in a standardised format in EHRs. Contextual data was obtained from EMS and police electronic systems. The combination of multiple data sources identified ‘hot spots’ for incidents and areas that may benefit from bike lanes. The findings were shared with a wide range of stakeholders working to reduce bicycle-related injuries in Boston.

Electronic data available through the mining of social media and the internet can be used to identify public health concerns. Understanding the information being shared, the characteristics of the users and the mood of the ‘contributing public’ may be used to tailor public health messaging. These types of data have been used by public health agencies for influenza and restaurant-associated foodborne illness tracking.5–7 There is limited experience with the use of these systems for non-infectious event monitoring, including injury surveillance. A recent literature review of the social media and disease surveillance underscored the need for integration of these systems into public health practice.8 Possible advantages of these data include the provision of more timely data, the potential of crowd sourced data to provide details otherwise not available in traditional systems and the opportunity for multidirectional communication to directly engage affected communities. In addition, given the widespread use of handheld technology across the globe, these types of data may be valuable in resource-limited settings outside of the USA. Machine-learning analytics can be applied to maximise the utility of these data. These types of analyses will continue to require human over sight for quality assurance and ongoing system development. In addition, the current methods have limited ability to support multiple languages, and their utility in non-English speaking populations needs to be considered.

In 2015, influenza and historic levels of snow collided in Boston. BPHC syndromic surveillance data revealed fewer influenza cases presented to emergency departments after blizzards during the 2015 influenza season. Flu Near You, a community participatory surveillance tool that allows individuals to report their symptoms, correlated well with data from the CDC sentinel influenza network, and confirmed that the trends in influenza cases presenting to emergency departments were likely due to lower disease incidence rather than decreased care-seeking due to difficult road conditions and adverse weather.9 Challenges of using social media and crowd sourced internet information may include a small sample size or participants may not be representative of the local population. Communication patterns have changed, and robust tools for public engagement are needed. While the current focus of crowd sourcing tools is infectious disease monitoring, they offer the potential to monitor for disaster-related injuries, mental health or healthy living.

Electronic data sources have the potential to contribute multidimensional, contextually rich public health surveillance data. An understanding of the interaction of individuals with factors such as the built environment, policies and social norms can improve the design of public health educational messaging, affect policy, regulations, systems and environmental changes, and influence the design of prevention and control interventions. This may represent the next stage of surveillance and the science of injury prevention and control. Research, focused particularly on novel data sources and surveillance systems, costs and benefits of such systems, and analytical methods, is needed to guide the development of more timely and informative population health monitoring systems that can be used to inform public health interventions. A community resource of sharable datasets, computer codes, analytical methods and other sources is needed to support the cost-effective use of big data. A framework for ‘response epidemiology’, which uses timely and contextualised population health data to respond to public health priorities through analytics and visualisations, is needed for success in the 21st century information age. Public health agencies will require funding for systems to receive and manage electronic information and a public health workforce skilled in using these systems and interpreting the findings to respond to events of public health significance and improve population health.



  • Competing interests None declared.

  • Provenance and peer review Commissioned; externally peer reviewed.