Article Text

Download PDFPDF

Program evaluation—balancing rigor with reality
  1. Susan S Gallagher
  1. Center for Injury and Violence Prevention, Education Development Center, 55 Chapel Street, Newton, MA 02458-1060, USA e-mail: sgallagher{at}

    Statistics from

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

    Funders require it. Practitioners fear it. Epidemiologists narrowly focus it on outcomes. Perspectives on evaluation vary, but there is general agreement that we need more of it in the injury field. The question is how much? and what kinds of evaluation can be done in a real world situation, with limits on resources? Randomized control trials may be the “gold standard” and quasi-experimental design a close second choice, but most of us do not have the luxury of engaging in such studies. Rigorous and multilevel evaluations that encompass formative (design), process (implementation), and outcome (knowledge, behaviors, injury rates, and institutionalization) measures are still relatively rare in the injury prevention literature.

    Four articles in this issue (see 125, 130, 151, and 154) illustrate the range of possibilities as well as many of the difficulties inherent in evaluating community based programs. Three of the four employ multiple strategies in a pre-post design, with varying outcome measures. Although none of these studies is perfect, there is, nevertheless, much that can be learned from them.

    The study of Bhide et al (154) contains some elements of a process evaluation by examining the effectiveness of the manner of distribution of a one shot, prepackaged educational program. It presents the number exposed to the program and the components that were implemented as intended, for example the leader's guide. Given the minimal penetration (16%) and only partial implementation (46%), it is unlikely that a better designed outcome based evaluation would have shown more encouraging, statistically significant changes. In fact, if the researchers had only used an outcome evaluation, we may have concluded that the program was not effective, when, in fact, the real issues are the penetration and its incomplete implementation. This study clearly shows why an essential part of evaluation is determining whether a program has been implemented as planned. And, in this case, it illustrates the special difficulties when programs must depend on schools to distribute the prevention material.

    In the study of Lee et al (151), one element of an outcome evaluation of a three year bicycle helmet promotion campaign is presented. The study uses a comparison of self reported helmet wearing among teenagers in intervention and control sites as proxy measures for the injuries of interest. Despite the somewhat greater, but modest, increase in self reported helmet wearing over the five year period, many questions are left unanswered. Process measures, describing the extent of implementation, or the inclusion of observed helmet use as an outcome measure (as noted by the authors), would provide greater assurance that the conclusions are valid. They would also assist others who wish to replicate this approach. The use of a control group is an advantage of this study, but adds to the complexity. This is evident in the inability to obtain accident and emergency reports of injuries as planned. It also raises questions about possible contamination of the control sites by television and radio coverage of the campaign.

    In contrast, the study of Hanfling et al (125) uses actual observations of the proper use of seat belts and car seats to evaluate the short term effects of a six component public education campaign. The study compares two intervention sites to control sites after nine months of intervention. Such a design has a higher level of methodological rigor than the previous examples and lends credibility to the positive results. The use of control groups is clearly an asset, but does not answer all questions. As in the previous study, the use of a multilevel intervention approach with several partners probably contributed to the positive findings, but it is next to impossible to conceive of methods that would determine the contribution of each component, that is, was one element of the intervention more effective than another. This study also identified potential confounding variables that spring up in assignment of control sites. It illustrates the complications that so often arise when trying to obtain good controls. Finally, had they included process measures, such as exposure to the interventions, the number of incentives, or number of citations issued by the police, this might have explained why the rate of restraint use achieved was much less than that reported in countries with well enforced legislation.

    The study of Coggan et al (130) is the most comprehensive, ambitious, and rigorous in its evaluation. It is based on a multitargeted community injury prevention program that combines both qualitative and quantitative data. The investigators include a multilevel process and outcome evaluation, with intervention and control sites matched on several variables. They conducted a formative evaluation to choose the interventions for each of the targeted audiences. The qualitative data enhance our understanding of implementation details and illuminate the successful aspects of the program. The case studies provide the information needed to adapt programs to the culture. It is clearly the most sophisticated but probably the most costly of the four studies. Knowing more about the penetration rate might influence the replicability of this program, but these details are not given.

    So where does this leave us? Despite the best intentions, constraints on evaluation are inevitable for community programs. What do we need to do a better job?

    Achieving balance

    Regardless of the size of the program and its budget, some level of evaluation can and always should be performed. We should not be debating whether quantitative or qualitative research is better. Formative, process, and outcome measures are inter-related and are meant to complement each other. It is not a question of either/or, but rather a question of finding the best combination to adequately address the goals of the evaluation. Program staff and evaluators will do a better job if they work together right from the beginning to set clear, detailed goals and objectives for intervention programs and their evaluation. This approach is bound to achieve better evaluation priorities.


    The extent and nature of the evaluation is often determined by the expertise of the researcher or the program manager. The tools of epidemiology are critical for improving the evaluation of community based programs but are enhanced when combined with those from other disciplines. There is much more to evaluation than outcome measures, such as morbidity and mortality. Although high level evaluation methods may not always be feasible or may be too costly, a greater barrier to their use is often the lack of training. The need to expand research training opportunities was one of the key recommendations in the recent report by the Institute of Medicine. Ideally, researchers should be skilled in understanding the constraints of doing an evaluation in real community settings.

    Increasing resources

    Multilevel evaluation research is more expensive and often requires a longer follow up period than traditional grant awards of three years or less. Funders need to be persuaded to change their award structure to accommodate truly comprehensive, rigorous program evaluations. Without their support, or a creative linkage of several funders, we will not be able to improve the quality of evaluation research.

    The state of the art of evaluation, especially of community programs, has not evolved as much as we would like in the last two decades. The logistics and difficulties in managing one intervention in one community, let alone a complex set of interventions in several communities simultaneously, and evaluating their efficacy, continues to be a great challenge. In spite of this, there is much to learn from the kinds of studies published in this issue of the journal. They help us to better understand the critical elements of a successful or unsuccessful intervention, and highlight both a program's true worth and its limitations. Nevertheless, we can do better and must strive to do so.

    Linked Articles