Overview of issues involved in evaluating the effectiveness of injury interventions is presented. An intervention should be evaluated to show it prevents injuries in the target population, to identify unintended consequences, to correct problems that limit effectiveness, to justify current and future resources from funding agencies, and to guide its replication elsewhere. Problems in conducting evaluations include obtaining sufficient resources, coping with rare events, establishing reliability and validity of measurement instruments, separating effects of multiple simultaneous events, and adjusting for the time lag between an intervention and its effects.
When feasible, changes in injury rates (documented by medical records) should be used. These are more convincing for demonstrating intervention effectiveness than changes in observed or reported behaviors or in knowledge and attitudes (documented by surveys). Quasiexperimental evaluation designs are often useful, such as measuring injury rates before and after an intervention in a time series design, or intervening in one of two comparable communities in a non-equivalent control group design. Evaluations using true experimental designs, in which individuals or groups are randomized to receive or not receive an intervention, are highly desirable but are often difficult due to logistical or ethical considerations. An evaluation component should be integral to the introduction of any new injury intervention.
Statistics from Altmetric.com
In 1974, an intensive television campaign to increase seat belt use was broadcast in a test community on one of the two cable television systems previously developed for marketing studies in that community.1 Although the campaign “spots” were independently judged as outstanding public service advertisements, evaluation showed them to have no effect on seat belt use. Without the evaluation, millions of dollars might have been wasted on this campaign.
The essential components of an injury control program consist of identifying and analyzing an injury problem, selecting an appropriate intervention, implementing the intervention, and evaluating the outcomes. The most common types of preventive interventions, alone or in combination, are education, regulation, and technological changes. The resources and methods needed to conduct an evaluation should be an integral part of the initial plans to implement any such intervention.
Reasons to evaluate an intervention include:
To determine whether it prevents or reduces the severity of injuries in the target population. For example, mandatory helmet laws reduce motorcyclist death rates and the repeal of such laws leads to increased rates.2
To identify any problems that limit the effectiveness of an intervention and to make minor or major changes that will optimize the success of the intervention. For example, current efforts to promote graduated licensure for teenage drivers3, 4 have evolved in part from the difficulties in demonstrating a benefit from earlier teenage driver education programs alone.5
To justify current and future resources from funding agencies and to prevent wasted resources if the intervention is not effective. For example, an educational campaign targeted to the directors of child care centers was found to be ineffective in reducing the number of playground hazards at those centers, for reasons that are unclear.6
To assist other injury control practitioners in adapting an intervention in different settings, or to discourage others from replicating an unsuccessful intervention. Dissemination of evaluation results is particularly important. For example, numerous states have now adopted mandatory bicycle helmet laws after initial evaluations demonstrated increased helmet use when such laws are combined with educational campaigns.7–9
To identify unintended or unexpected positive and negative consequences. For injury related interventions, the potential that unintended consequences of the intervention may occur should be, but often are not, considered during the initial design phase. Laws and regulations designed to meet objectives unrelated to injury prevention may also unexpectedly affect injury rates.
Expanding on this last topic, positive unintended consequences are those that reduce injury rates as a corollary of the primary intent of the intervention. For example, a Massachusetts law mandating a deposit on glass bottles was designed for its environmental benefits. However, a review of emergency department records revealed a decline in the number of glass related lacerations in children after the law went into effect.10
Negative unintended consequences are often not fully recognized until after an intervention or new technology has been introduced. For example, adolescents who are permitted to drive at an earlier age after taking a drivers' education course have a higher motor vehicle injury rate than their non-licensed peers because of increased exposure.5 A ban on the local sale of alcohol to reduce alcohol related injuries on a Native American reservation led to an increased rate of pedestrian and hypothermia deaths as the individuals sought alcohol from sources further from home.11 The right-turn-on-red law, which was promoted to save fuel, led to an increase in pedestrian injuries.12 The widespread use of automobile airbags has saved numerous lives but also contributed to the deaths of a small number of unrestrained children seated in the front passenger seat.13, 14
As indicated above, interventions unrelated to injury prevention (such as glass bottle deposits and right-turn-on-red laws) may affect injury rates. The potential for such unintended consequences may be promptly recognized if interdisciplinary injury prevention researchers regularly keep aware of legal and environmental changes reported in the news media.
Barriers to conducting evaluations
A number of issues need to be considered in the design and conduct of evaluations of injury interventions. These issues include:
WELL DEFINED GOALS
One barrier to evaluation may be the absence of clearly defined goals and objectives for the intervention. For example, it would be difficult to evaluate an advertising campaign that advised teenagers to “drive carefully”.
Because many types of injuries are relatively rare, a large sample may be needed to provide sufficient statistical power to detect a change in injury rates due to the intervention. To illustrate, if suicides occur on college campuses in one of 10 000 students per year, then an evaluation of a suicide prevention program would need to involve hundreds of thousands of students.
Financial support, appropriate expertise, and adequate staff time are all required to conduct evaluations. Depending on the evaluation design, the resources needed for an evaluation may range to perhaps 20% of the total cost of the intervention. Ideally, a budget for the cost of the evaluation should be, but often is not, established during the initial planning of the intervention.
The effect of an intervention may differ in the long term compared with the short term, so both should be examined. Educational campaigns and enforcement efforts often increase knowledge and affect behavior in the short term, but additional evaluation is needed to assess whether short term successes are sustainable in the long term.
It is often difficult to separate the effects of an intervention from other simultaneous related events, a phenomenon known as the “history effect”.
It is necessary to establish the reliability and validity of survey instruments and other outcome measures used.
For some settings, it is important to take into account the time lag between intervention and effects of intervention. For example, a reduction in child entrapment would not be expected to occur until some years after the passage of new federal standards on refrigerator doors because of the expected lifespan of existing refrigerators.15
A failure to demonstrate an impact of an intervention does not necessarily prove that it is ineffective. A negative result may be due to the evaluation design or outcomes selected. For example, a small decrease in motor vehicle deaths over a short time period after the raising of a speed limit neither proves nor disproves a relationship between motor vehicle deaths and speed limits; other factors, such as weather, law enforcement efforts, and random variation due to small numbers could account for such findings.
Process versus outcome evaluation
Two types of measures may be used in the conduct of any evaluation. Process measures assess whether the steps of the intervention actually occurred. For example, in a program to prevent fire related injuries, the number of smoke detectors distributed is a measure of whether the process of handing out smoke detectors worked.16 Outcome measures assess whether the intervention was effective in changing injury rates, knowledge, behavior, or policy. In the same example, a decrease in fire related injuries after the distribution program16 suggests (but does not prove) that the intervention had the desired effect.
Completing all the steps of an intervention is necessary but not sufficient to demonstrate that an intervention is effective. If some essential steps did not occur, then any changes in injury outcomes may be due to causes other than the planned intervention. For example, if a large number of smoke detectors are distributed in a community but few units are actually installed or are poorly maintained, it would be difficult to attribute any subsequent change in fire related injuries to the distribution of the smoke detectors.
With educational campaigns, process measures may include a tabulation of the number of brochures and coupons distributed, public service announcements televised, billboards set up, or newspaper advertisements printed.17 Outcome measures may be assessed using surveys of knowledge, attitudes, and behavior, either in comparable communities or before and after in the same community.
Types of outcome measures and sources of relevant data
The types of outcome measures commonly used in injury evaluations are listed in a hierarchical order in table 1. Evaluations that document changes in rates of actual injuries are more convincing than those that show changes in surrogate measures. Among types of data on actual injuries, computerized records are usually more readily available for events leading to hospitalization or death, but such severe events occur less frequently than those leading to emergency department or other outpatient treatment. Accurate denominators of persons at risk are essential to calculate changes in injury rates that may be attributed to an intervention. The defined population served by a health maintenance organization is a good setting for some evaluations because the number of persons treated for injury and the number at risk of injury are both known.18 For exposures that may change over time, ideally one should estimate person time at risk, for example, person hours spent using in-line skates (also known as “rollerblades”).
Surrogate measures (table 1) are useful as outcomes when actual injuries are difficult to count (such as near drownings), or are rare events (such as child pedestrian injuries in a small community19). The use of a surrogate measure presupposes a clear link between it and actual injuries. For example, it was assumed that increased bicycle helmet use after the passage of a mandatory helmet law7, 8 would be associated with reduced injuries because prior work demonstrated the protective effect of helmets.20
Among types of surrogate measures, observed behavior is usually a better indicator of the impact of an intervention than self reported behavior, knowledge, or attitudes measured in a survey. For example, the majority of Maryland children surveyed believed that bicycle helmets are protective,21 but many continued to not wear them even after legislative and educational interventions.7, 8 Responses to survey questions on attitudes and behaviors may be biased if the respondent preferentially selects socially desirable answers. Validity and generalizability of observed behavior is limited to the times and places where the observations are made.8
Types of evaluation designs
The types of study designs used to evaluate injury interventions are similar to those used in the social sciences and may be considered in three categories: non-experimental, quasiexperimental, or experimental designs.22–24 Examples of each type of design are listed in tables 2–4.
Non-experimental designs include case studies,25 observing an outcome before and after an intervention without a comparison group (fig 1), or static group comparisons without prior observations (table 2). Evaluations using non-experimental designs usually can be conducted without extensive resources but are difficult to interpret because they do not control for potential confounding factors discussed below.
Quasiexperimental designs are commonly used to evaluate injury interventions. These designs include single and multiple time series (fig 2), non-equivalent control groups (fig 3), sequential cohort designs, and case-control studies (table 3). Evaluations using quasiexperimental designs usually require more resources than non-experimental designs, but are easier to interpret because they control for at least some potential confounding factors.
Evaluations using experimental designs include the randomization of two or more groups to receive or not receive an injury intervention (table 4, fig 4). Such designs are infrequently used because there are often logistical or ethical obstacles to random assignment. While such evaluations may require substantial resources, experimental designs yield the most convincing results because they control for most potential confounding factors.
Potential problems with non-experimental and quasiexperimental designs
A number of potential problems may affect the design and interpretation of quasiexperimental and non-experimental evaluations. The major threats to the internal validity of an evaluation include history effects, maturation effects, testing effects, instrumentation effects, regression artifact effects, selection effects, and differential attrition. Shortell and Richardson describe these threats and other issues related to their interactions in detail and provides numerous examples related to the evaluation of health programs.23
History effects refer to external events that occur during the intervention but which are not connected with it. For example, news coverage of a dramatic fatal house fire may have as much impact on fire prevention as a series of public service announcements. Maturation effects are events related to passage of time. For example, children are likely to learn pedestrian skills with increased age, regardless of whether these skills are specifically taught in school.19
Testing effects refer to the knowledge communicated by the test itself. For example, a child may become more aware of bicycle helmets by completing a survey about bicycle related injuries.26 Instrumentation effects refer to changes in the content or administration of the survey instrument. For example, results may be affected if one compares data from a current knowledge and attitude survey with previous knowledge and attitude data collected for other purposes.
Interventions may appear to be effective due to regression to the mean. For example, random fluctuation can cause a community to have a very high motor vehicle death rate one year, but the rate is likely to drop the next year even in the absence of an intervention. Similarly, an educational effort focused on persons with the least knowledge about a given topic is likely to increase their knowledge but may or may not be effective when given to the general population.
Selection effects refer to differences between intervention and comparison groups in non-equivalent control group designs. Any two communities chosen to be similar are unlikely to be identical in all characteristics that may affect the impact of an intervention.8, 27, 28 Differential attrition refers to differences in drop out rates between intervention and comparison groups.
While the above examples relate to internal validity, the generalizability of an evaluation requires that the study results also be externally valid. The major threats to the external validity of an evaluation include selection-treatment interactions, testing-treatment interactions, situational effects, and multiple treatment effects. These issues are discussed in more detail with relevant examples elsewhere.23 Selection-treatment interactions refer to the possibility that the intervention may not be generalizable because of unique characteristics of the population studied, such as their predisposition to accept a particular intervention.
Testing-treatment interactions reflect the possibility that an intervention may work in other groups only if a pretest is included. Situational effects refer to the possibility that the intervention may only work in the specific circumstances under which it was tested. For example, a message conveyed by a particularly enthusiastic health educator may be less effective when conveyed by other individuals. Finally, multiple treatment effects refer to the difficulty in separating out the effect of individual components when multiple interventions are occurring simultaneously. The potential synergism of several components may also complicate one's ability to measure the impact of individual components and to assess the generalizability of the evaluation results.29
Once an injury intervention has been shown to be effective, a benefit-cost analysis may facilitate or impede wider implementation of the intervention. For example, favorable benefit-cost ratios have been calculated for child safety seats,30 farm tractor rollover protective structures,31 and an occupational back injury prevention program.32 Based on a corporate but not societal perspective, an unfavorable benefit-cost ratio may have led to a delay in the redesign of a particular crash vulnerable automobile fuel tank when the manufacturer compared re-engineering costs to estimated liability costs.
The major steps in the conduct of a benefit-cost analysis for injury interventions have been described in detail by Miller and Levy.33 These include defining the intervention; choosing a viewpoint (personal, corporate, or societal); selecting a discount rate to adjust for the present value of future costs and benefits; quantifying the costs of the intervention and the proportion of injuries preventable by the intervention; quantifying the cost of the injuries prevented including medical costs, lost earnings, and reduced quality of life; calculating the benefit-cost ratio; describing any unquantified costs and benefits; examining who benefits from, and who pays for, the intervention; and performing a sensitivity analysis to examine the results with varying assumptions.33
Guidelines for selecting an evaluation design
Ideally, the advantages and disadvantages of each method should be considered when selecting an evaluation design. In general, designs with comparison groups and with randomization of study subjects are more likely to yield valid and generalizable results. The actual selection of an evaluation design may be strongly influenced however by the availability of resources, political acceptability, and other practical issues. Such issues include the presence of clearly defined goals and objectives for the intervention, access to existing baseline data, ability to identify and recruit appropriate intervention and comparison groups, ethical considerations in withholding an intervention from the comparison group, time available if external events (such as passage of new laws) may impact the intervention or the injury of primary interest, and timely cooperation of necessary individuals and agencies (such as school principals or health care providers).
Sample size considerations are important to ensure that an evaluation has sufficient statistical power to document the effect of the intervention. The availability of resources may affect the size of the groups that can be studied, the type and scope of evaluation that can be performed, and the conclusions reached. For example, a classroom based knowledge survey before and after a pedestrian skills class is substantially less costly than individualized field observations of the children's ability to cross streets.19 However, field observations provide more convincing data to document the value of such a class.
In certain situations, a non-standard design may be useful for the evaluation of an intervention. For example, when rates were not available, a well documented decline in the proportion of children treated with sleepwear related burns at a single burn unit in Boston provided early suggestive evidence of the impact of sleepwear flammability standards.34 The evaluation of a community program to reduce alcohol impaired driving in Massachusetts included two types of control towns and involved multiple measures, including monitoring trends in crashes and traffic citations, roadside observations of speeding and seat belt use, and telephone surveys of self reported driving after drinking.29 An evaluation of the impact of daylight saving time on fatal pedestrian injuries involved regression models based on national injury mortality data.35 In some complex situations, a qualitative case study may serve as a useful evaluation tool.25
A final consideration is whether existing interventions need to be evaluated repeatedly. In general, repeat evaluations are important in the same setting to show the effect is sustainable and in different settings to show the effect is real and generalizable. Repeat evaluations are needed when there have been changes in environmental, political, or other factors that affect an intervention's success. However, repeat evaluations require resources. When resources are limited, it is reasonable to select a proven intervention36 for a given problem and to do a limited evaluation to confirm that the intervention is effective in that setting. In such situations, attention should be given to process evaluation to insure that the intervention is being implemented as intended. Attention should also be paid to differences that exist between the target population and those populations in which the intervention was found to be effective.
In summary, evaluation should be integral to the introduction of any new injury intervention to demonstrate effectiveness, identify unintended consequences, and justify present and future resources. A variety of evaluation designs have been described for use in fields other than injury prevention. The numerous published examples of the use of these designs to examine injury interventions serve as models for future evaluations. An appropriate choice of evaluation methodology and careful quality control of the entire evaluation process are necessary but not sufficient for evaluation success.
Berk and Rossi have commented that “there is no recipe for success”, especially as injury control practitioners deal with resource constraints, new information technologies, and the need to evaluate interventions for complex societal issues.37 They conclude: “Prescriptions for successful evaluations are, in practice, prescriptions for failure. The techniques that evaluators may bring to bear are only tools, and even the very best tools do not ensure a worthy product”.37 The best evaluations combine strong technique with flexibility, creativity, and perseverance.
Preparation of this manuscript was supported by grant R49/CCR302486 to the Johns Hopkins Center for Injury Research and Policy from the National Center for Injury Prevention and Control, Centers for Disease Control and Prevention. The authors are grateful to Professor Susan P Baker for her thoughtful comments on this manuscript and her valuable mentoring over many years.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.