Article Text

Download PDFPDF

Methodological considerations in MVC epidemiological research
  1. Liraz Fridman1,
  2. Linda Rothman2,
  3. Andrew William Howard1,3,
  4. Brent E Hagel4,
  5. Colin Macarthur1
  1. 1 Child Health Evaluative Sciences, Hospital for Sick Children Research Institute, Toronto, Ontario, Canada
  2. 2 School of Occupational and Public Health Faculty of Community Services, Ryerson University, Toronto, Ontario, Canada
  3. 3 Orthopaedic Surgery, Hospital for Sick Children, Toronto, Ontario, Canada
  4. 4 Department of Paediatrics, University of Calgary, Calgary, Alberta, Canada
  1. Correspondence to Dr Liraz Fridman, Hospital for Sick Children Research Institute, Toronto, ON M5G 0A4, Canada; liraz.fridman{at}


Background The global burden of MVC injuries and deaths among vulnerable road users, has led to the implementation of prevention programmes and policies at the local and national level. MVC epidemiological research is key to quantifying MVC burden, identifying risk factors and evaluating interventions. There are, however, several methodological considerations in MVC epidemiological research.

Methods This manuscript collates and describes methodological considerations in MVC epidemiological research, using examples drawn from published studies, with a focus on the vulnerable road user population of children and adolescents.

Results Methodological considerations in MVC epidemiological research include the availability and quality of data to measure counts and calculate event rates and challenges in evaluation related to study design, measurement and statistical analysis. Recommendations include innovative data collection (eg, naturalistic design, stepped-wedge clinical trials), combining data sources for a more comprehensive representation of collision events, and the use of machine learning/artificial intelligence for large data sets.

Conclusions MVC epidemiological research can be challenging at all levels: data capture and quality, study design, measurement and analysis. Addressing these challenges using innovative data collection and analysis methods is required.

  • methodology
  • epidemiology
  • motor vehicle - non traffic

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Globally, an estimated 1.35 million people die each year because of RTCs; with MVC death rates three times higher in low-income countries compared with high-income countries.1 Furthermore, while MVC death rates are falling in high-income countries, since 2013 the number of MVC deaths in low-income countries has not declined.1 As with most injuries, MVC fatalities are the tip of the iceberg—in Massachusetts, the ratio of deaths to hospital admissions to emergency department visits because of MVCs was shown to be 1:12:256.2 Similar ratios (1 death to 25 hospital admissions to 363 emergency department visits) for MVC injuries have been reported in Canada.3

The WHO considers pedestrians, pedal cyclists and motorcyclists as ‘vulnerable road users’ as they are less visible on the road and are not protected by an external ‘shield’ that would absorb energy in the event of a collision.4 Individuals with disabilities or reduced mobility are also considered vulnerable road users. The high burden of MVC deaths among vulnerable road users, particularly children and adolescents, has led to the implementation of many road safety initiatives, MVC injury prevention programmes, and enactment of legislation at the local and national level.

The epidemiological approach to injury prevention involves estimation of the burden of injury, identification of modifiable risk factors and interventions (programmes and policies) to reduce the burden of injury. Estimating the burden of MVC injuries and evaluating the effectiveness of programmes and policies are dependent on high-quality data and methodologically robust research.

In practice, a number of methodological considerations in MVC epidemiological research have been identified, including: the availability and quality of data to measure counts and calculate event rates, and challenges in evaluation related to study design, measurement and statistical analysis. The purpose of this review paper is to highlight these methodological considerations using published examples, with a focus on the vulnerable road user population of child and adolescent pedestrians and cyclists. The review uses examples drawn mainly from MVC epidemiological research in high-income countries. This methodological review is intended as a companion paper to the state-of-the-art literature review on the prevention of child and youth pedestrian MVCs by Cloutier et al.5

Data considerations: measuring counts

Estimating counts and calculating rates are the foundation of any epidemiological study on MVC injuries. In this context, mortality data, hospital admissions data, emergency department visits, police reports and surveillance data have been used as count data for epidemiological studies and as numerator data for MVC rate calculations. As outlined below, each of these data sources has challenges and limitations.

Mortality data

MVC mortality data can vary in quality and accuracy. For example, injury coding by medical examiners and coroners is not standardised. Therefore, comparison of MVC death data across cities, regions and countries may be difficult.6 In addition, with respect to data collected on pedal cyclist fatalities, reliability and validity issues with International Classification of Diseases, 10th revision (ICD-10) E-codes have been identified.7 Furthermore, the ICD coding system has undergone several revisions over time. For example, ICD version 8 specified MVC deaths by road user (pedestrian, cyclist, motorist and so on) and type of collision (collision with a bicycle, collision with a bus, collision with a vehicle and so on) based on the addition of a fourth code that had not existed in previous versions.8 Such changes can lead to challenges in the analysis of fatality trends for specific MVC injuries.

Hospitalisation data

MVC hospitalisation data have been used to estimate counts; however, a meta-analysis of studies from 13 countries showed that the capture of MVC injuries admitted to hospital varied widely, with reporting levels varying from 21% to 88%.9 In addition, hospitalisation data have sometimes been used as a measure of ‘severe’ MVC injuries. It is difficult, though, to disentangle injury severity leading to hospital admission from local health service utilisation patterns, availability of care, and social and personal factors (eg, pain tolerance and individual frailty) that also influence the decision to admit to hospital.10 Therefore, MVC injury trends based solely on hospital admission data may reflect changes in any or all of these factors and not simply injury incidence or severity.11

Single-centre hospital studies of MVC injuries are particularly prone to referral centre bias. For example, MVC injuries considered life threatening are more likely to be transported to specialty centres. Therefore, single-centre studies, depending on the type of centre, may over-represent or under-represent the frequency of MVC injury, particularly ‘severe’ injuries.3

Emergency department data

Population-level emergency department data on MVC injuries are rarely available. As a result, MVC epidemiological research is often focused on the tip of the injury pyramid, that is, deaths and hospitalisations. In Canada, MVC deaths and hospitalisations account for only 1% and 6%, respectively of all MVC injuries.12 A narrow research focus on deaths and hospitalisations may lead to the development of injury prevention strategies that fail to prevent the much more frequent, although less severe MVC injuries.

Police-reported data

While police reports of MVC injuries can be rich in explanatory data, for example, road surface, weather conditions, time of day, type of crash (rear end, rollover), location and so on, the most commonly reported limitation of police-reported MVC data is under-reporting. For example, a California study compared police reports with hospital reports of children (pedestrians and cyclists) injured after an MVC and showed that police reports were simply not completed for many individuals, particularly those with minor injuries.13 14 Under-reporting is also an issue for non-roadway MVC injury events, such as those occurring on driveways, sidewalks and on private property.13 15

Last, the spatial coordinates of an MVC event documented in police reports may be incorrect, in particular, when police officers use the Global Positioning System (GPS) coordinates from their parked cruiser to estimate the location of the MVC event, rather than the actual location.16 Such errors can lead to misclassification of whether the MVC occurred at an intersection or at a midblock location, which has important implications for prevention strategies.

Surveillance data

Examples of North American surveillance data sets that have been used in MVC epidemiological research include the Canadian Hospitals Injury Reporting and Prevention Program (CHIRPP) and the National Highway Traffic Safety Administration Crash Outcomes Data Evaluation System (CODES) in the USA. A study of the sensitivity and representativeness of CHIRPP data showed that the data were of relatively high quality.17 The major limitation of CHIRPP; however, is completeness, given that only those individuals seen in the emergency departments of hospitals that participate in the surveillance system are included.18 In the USA, CODES was originally established to mitigate the challenge of MVC data comparison across states, and CODES data have been used to evaluate a number of programmes and policies, including graduated driver licensing and seat belt laws.19 20

Surveys—national, community-based, school and household—can also provide useful information on MVC injury. A pre-eminent survey is the million-death study (MDS), a nationally representative survey in India that uses an enhanced version of the verbal autopsy to monitor 1.1 million households.21 MDS data have been used to measure counts and describe the mechanism of road traffic injury deaths in India.21 Of note, the MVC mortality rate using MDS data was higher than that estimated from police-reported data. The limitations of MDS data relate to the potential for recall bias and inaccuracies associated with verbal autopsies.21

Self-reported data have also been used in MVC epidemiological research. A systematic review by Kamaluddin et al 22 reported that most MVC self-report studies focused on car users and very few on self-report by vulnerable road users.22 The systematic review also acknowledged the lack of completeness of such data, in addition to other biases associated with self-report such as social desirability bias and recall bias.22

Data considerations: calculating MVC event rates

Rate calculations require that a base population be defined and selecting the ‘correct’ denominator can often be challenging in injury prevention research.10 Of note, while rates predominate in injury epidemiological research, engineers and city planners sometimes focus on the absolute number of collision events. This is especially true when considering the safety impact of design modifications in the context of initiatives with a Vision Zero philosophy (ie, reducing the numerator to zero).23

MVC injury rates are often calculated using the number of MVC events in a region as the numerator and the population of the region as the denominator. This crude approach, however, may overestimate the injury rate in areas with many visitors and underestimate the rate in areas with many road users who travel outside the spatial boundary of study.10 Vehicle volumes, the number of licensed drivers in a region and vehicle miles/km driven have also been used as denominators for MVC rate calculations.10 It is important to note, though, that crash rates are not independent of travel patterns. In other words, exposure to risk is different for drivers who avoid highways and drive mainly in urban settings, given that urban environments present more hazards to a driver, because of more points of potential conflict such as intersections and stop-and-go traffic flow. Some authors have suggested that exposure to risk (distance, frequency and duration of travel) ought to be incorporated into a single risk exposure density variable for MVC rate calculation.24

Limited data on exposure to risk for pedestrians and cyclists in road traffic research are also a significant challenge; a literature review of bicycling safety studies showed that 98% of such studies did not collect exposure data.25 Ling et al conducted a pre–post quasi-experimental design to compare cyclist-MVC (CMVC) before and after the implementation of cycle tracks. Crude CMVC rates (based on cycle track length) showed increased rates after the implementation of cycle tracks. After adjusting for the increased cycling volumes after track implementation, cycle tracks were associated with a significantly decreased CMVC rate.26

Of note, the impact of these specific methodological considerations related to data availability and quality in the context of measuring counts and estimating rates will vary across countries, depending on the predominant mode of transport for the population. For example, data considerations differ when the majority of the population moves as pedestrians, as compared with a population that moves mainly in motor vehicles.

Innovative data collection

Naturalistic Driving Studies (NDS) examine driver performance and behaviour in the real-world setting.27 In such studies, vehicles are instrumented with cameras, sensors, and radar to automatically and continuously capture driving parameters such as location, speed, lateral and longitudinal acceleration, deceleration, yaw and eye movement. A National Academy of Sciences-sponsored naturalistic driving study captured 2 petabytes (PB) of driving data (35 million miles) over 3 years from 3500 participants.27 The study showed that driver-related factors such as error, impairment, fatigue and distraction were present in almost 90% of crashes. The authors also calculated a population attributable risk that indicated that 4 million of 11 million annual crashes in the USA could be avoided if driver distraction could be eliminated. An earlier NDS involving 42 newly licensed adolescents confirmed historical data and showed an increased risk of crash in the first 6 months of licensure.28

There are a number of challenges with NDS data. NDS data often involve small crash sample sizes (particularly in the early studies) making them subject to considerable statistical variability. In many of these studies, ‘near crash’ events are included in the models as surrogates for crashes. Given the relative rarity of MVC events, long observation periods are needed. The possibility of the ‘Hawthorne Effect’, whereby the behaviour of individuals changes simply because of being observed may also occur. Last, the richness of data from the second Strategic Highway Research Program NDS (SHRP 2 NDS), as described earlier (with over 2 PB of data), brings with it challenges related to statistical modelling of a large number of potential explanatory variables with limited outcome events.

Evaluation considerations: design, measurement and analysis

Challenges in the evaluation of epidemiological interventions to prevent MVC injuries include study design, measurement and analysis issues, as well as policy considerations.

Study design challenges

A systematic review of published injury research on unintentional childhood injury, including transportation injuries, showed that analytical or hypothesis-testing study designs were relatively infrequent and descriptive studies predominated.29 With respect to traffic safety research, Kim and Mooney discussed the prevention of biases associated with four analytical study designs often used in traffic safety research: case–control, case–crossover, culpability and quasi-induced exposure (QIE) designs.30

Recruiting controls that are ‘representative’ of the source population that produced the cases, and ensuring that cases and controls are sampled independently of the exposure of interest are challenges for MVC case–control studies. For example, a case–control study examining the influence of marijuana and alcohol on MVC fatalities recruited ‘controls’ from a national roadside survey of drivers.31 Given the sensitive nature of the study, including drivers who chose to participate as controls may have underestimated drug and alcohol use in the source population that produced the fatal MVC cases.31

In the case–crossover design, researchers attempt to mitigate control selection bias by having cases serve as their own controls. For example, the case–crossover design was used to examine and compare cell phone use by drivers on the day of the crash and during the preceding week.32 Cell phone use was associated with a fourfold increased risk of crash. The case–crossover study design, however, is prone to reporting bias, particularly if self-report of sensitive information is required. A review of interpretation and bias issues in case–crossover studies has been published by Redelmeier and Tibshirani.33

Responsibility study designs (culpability and QIE studies) focus only on cases, and drivers are classified as responsible or non-responsible for the crash.30 The exposure distribution of responsible drivers is then compared with the exposure distribution of non-responsible drivers. Culpability studies select drivers regardless of crash type (eg, single vs multi-vehicle) and responsibility is determined for each individual driver. QIE studies sample pairs of drivers from multi-vehicle crashes and assess responsibility. Because the QIE design samples drivers from the same crash, the QIE study design, by definition, matches drivers on factors such as road conditions, weather conditions and time of day. Though this design has many strengths, some implementation challenges exist.

Assigning responsibility in culpability and QIE studies for example, based on police citations or using tools to assess responsibility can be challenging. For example, a driver receiving a citation for cannabis use may have an increased probability of being labelled ‘responsible’ for the crash, even if he or she was not at fault. Studies that employ ‘responsibility tools’ may also be difficult to interpret, as there is no established gold standard for such measurements.

While the randomised controlled trial (RCT) study design is considered the gold standard study design, the RCT is rare in MVC research. Randomised allocation—to the intervention or ‘untreated control state’—may not be considered ethical if robust observational studies have shown that the intervention is effective in reducing MVC injury or death. In addition, key outcomes in MVC research such as severe and/or fatal injuries are relatively infrequent and thus may make trials impractical with respect to duration or geography in efforts to accrue sample size. Innovative trial designs, for example, the stepped-wedge design may overcome some of these issues as the intervention is sequentially administered over a specified period of time. By the end of the trial, all participants have received the intervention, although the order in which the intervention is received is selected randomly.34

Measurement challenges

Collision analysis is a common method of evaluating traffic safety; however, researchers often have to rely on historical data, given that collisions are relatively rare events.35 This poses a major analytical challenge as driver habits, vehicle safety features and built environment features are likely to change over time. In other words, the risk factor profile for MVCs occurring a decade apart may not be the same.

The use of proxy or surrogate measures as safety indicators can also be challenging to interpret. For example, a European study combined a number of indicators such as driver reaction time and vehicle braking capabilities to estimate the ‘pedestrian risk index’.36 Other variables, such as speed, have been used as a proxy measure of risk. Wherever possible, multiple proxy indicators, such as speed, pedestrian volumes and traffic conflict assessments should be combined to provide a better understanding of risk.35

Analytical challenges

Mannering and Bhat have written an excellent review article on analytical challenges in MVC research.37 The review traces the evolution of methods for studying crash frequencies, counts and severity, as well as identifying critical methodological and statistical issues in the analysis of crash data. Critical issues that may lead to biased parameter estimates include the failure to fully specify models (ie, relevant explanatory variables are not included) and unobserved heterogeneity. The latter issue occurs when a variable such as age is included in the model as a proxy for other factors, such as health, fitness and reaction times, which can vary significantly across individuals of the same age. The authors also describe maximum likelihood statistical approaches to deal with the issues of missing data, risk compensation and regression to the mean. Interventions implemented in areas with apparently high MVC rates may be subject to a ‘regression to the mean’ effect, where random extreme rates will naturally decrease over time, regardless of intervention.

Abdulhafedh also identified several sources of error in modelling crash data including: small sample sizes, overdispersion (greater variability than expected) and underdispersion of data, and explanatory variables that change over time.19 Mitra and Buliung identified scale and zone effects as sources of bias in MVC spatial analysis research.38 Scale effects refer to differences in results depending on the spatial units used for measurement. For example, studies using census tract data may provide more stable results than those using smaller dissemination area levels. Even if space is measured on the same scale, zonal effects—differences in results depending on how space is divided—may also occur.38 For example, zonal effects may lead to differences in the interpretation of the relationship between built environment features and active school transportation prevalence.38

Geographic Information Systems based on national population-level data have also been used to study risk factors and access to healthcare for areas with higher road traffic injury rates.39 One challenge of using population-level data is the ‘ecological fallacy,’ that is, when researchers make an incorrect interpretation about an association at the individual level based on aggregated data from a population.39 40

Evaluation of policy challenges

Evaluation of policy interventions related to road safety, for example, booster seat legislation, graduated driver licensing and speed limit reductions—can be challenging largely because of the time lag between legislative intervention and outcome. The literature suggests that the time frame for the impact of policy interventions on safety metrics may be years, if not decades.41 The other major challenge in the evaluation of policy interventions is attributing causality. For example, provinces in Canada with booster seat legislation have shown a decrease in child occupant MVC injuries over time following the legislative change.42 It is difficult however, to attribute the decline in child occupant MVC injuries to policy change alone as the change may also be related to safer vehicles, built environment changes, health services utilisation and population density.42 Comprehensive and systematic collection of baseline data and the use of appropriate control locations in pre/post-studies is required to ensure that estimation of the effectiveness of any programme, intervention or policy on MVC injury rates is as robust as possible.

Recommendations and future directions

Combining multiple sources of collision-related data would ensure a more comprehensive representation of the collision event. For example, linking police-reported collision data with hospitalisation records, weather data, traffic volume data and insurance claims would allow better insights into mechanism, injury severity and long-term outcomes.14 Almost 20 years ago, Durbin et al published a paper on such an initiative, that combined insurance claims data with telephone survey and police-reported investigation data to create the first large-scale child-focused MVC surveillance system in the USA, called the partners for child passenger safety.43 This collaboration led to epidemiologically sound estimates of the protective effect of seating position and restraints44 and the effectiveness of forward facing child restraints45 and booster seats.46

The current focus on ‘big data’ and ‘machine learning’ in the clinical and research contexts may also be of value in MVC research. Data streams generated by GPS embedded in vehicles, phones and other devices could potentially be used to create rich, detailed exposure data for all classes of road users.47 Data streams from closed circuit TV cameras installed for traffic monitoring could be used to generate data on exposure and collisions. The wealth of data captured by SHRP 2 NDS will require sophisticated statistical approaches to analyse and interpret the data. Analysing and interpreting these large data streams will likely require application of artificial intelligence/machine-learning algorithms in combination with traditional multivariate statistical methods.47

Injury prevention strategies must manage the competing priorities of citizens, local communities, city planners, elected officials and researchers. Significant progress in the prevention of MVC injuries and deaths (particularly in high-income countries) is testament to the collaborative effort of all stakeholders. For example, the evolution of safe roadways for pedestrians and cyclists required effective urban development and transport planning. Likewise, effective vehicle speed management through police enforcement and the use of traffic calming approaches makes roads safer for all users, as does legislation on blood alcohol concentration limits for drivers.48 Last, child occupants are protected by standards and regulations regarding seat belt use, including specific laws for child safety seats and booster seats. These initiatives across different sectors have led to a marked decline in MVC rates in high-income countries. It is incumbent on MVC researchers, however, to collaborate early and often with communities and policymakers in the process of selecting safety interventions to ensure (wherever possible) that evidence drives the process and that evaluation studies are built-in and of the highest quality.

Globally, almost half of all road traffic deaths occur in vulnerable road users, with one-quarter of the deaths occurring among motorcyclists.49 In some countries, motorcycle collisions (with pedestrians and motor vehicles) account for the majority of road traffic deaths.4 The methodological challenges associated with different (and changing) modes of population transport and the ethical issue of the increasing burden of road traffic deaths and injuries in low-income countries require urgent attention.


MVC data are crucial to inform roadway design, vehicle design, driver education, prevention programmes and policies. This review paper has summarised a number of methodological considerations in MVC epidemiological research related to data and evaluation. The examples used in the review focus mainly on children and adolescents, however, the methodological considerations identified are applicable to MVC epidemiological research in general. The findings reinforce the established need for improved data capture and linkage of data sources, the need for novel methods and analyses to handle the large volumes of data and the collaboration of all stakeholders in the evaluation of intervention.

What is already known on the subject

  • Global burden of deaths and injuries from MVCs is huge.

  • Comparison across MVC epidemiological studies can be difficult because of methodological issues.

What this study adds

  • Comprehensive review of methodological considerations in MVC epidemiological research, using published studies as examples.

  • Review of challenges with data sources used to estimate MVC counts and calculate rate events.

  • Assessment of study design, measurement and analytical issues in the context of evaluation of MVC prevention interventions.



  • Twitter @lirazfridman1

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement There are no data in this work.