Article Text

Download PDFPDF

Bias in estimates of seat belt effectiveness
  1. T D Koepsell,
  2. F P Rivara,
  3. D C Grossman,
  4. C Mock
  1. Department of Epidemiology, Box 357236, University of Washington, Seattle, WA 98195, USA
  1. Correspondence to:
 Dr Koepsell;

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

In his recent commentary entitled “Bias in estimates of seat belt effectiveness”,1 Robertson criticizes our study of seat and shoulder belts in relation to crash injury risk.2 He writes: “In one of the recent studies claiming high belt effectiveness, missing data on velocity changes in crashes were imputed partly from injury severity scores, again a cause imputed from an effect and then used as a control in the study, a true scientific ‘no-no’.”

Robertson’s criticism is incorrect. When multiple imputation is used to deal with missing data on a covariate, the imputation model needs to preserve relationships between that covariate and other key variables that will be used in the main analysis.3 These other key variables include both exposure and outcome. In contrast, Robertson argues that measures of crash outcome should not be used to impute values on a covariate which will later enter the main analysis as a predictor of crash outcome.

In our study, velocity change during the crash (delta-V) was a clear confounder: when known, larger delta-V was associated with higher case fatality and also with greater likelihood of being unrestrained. However, delta-V was often missing, and missingness was related both to restraint use and to crash outcome, which motivated our use of imputation.

The problem with Robertson’s argument can be illustrated by considering how imputation was done under these conditions for a subject with missing data on delta-V. The form of multiple imputation that we used involved drawing several delta-V values from the distribution of known values among subjects who were similar to the one with missing data. (Technically, values were drawn randomly from a bootstrap sample of these potential data donors, but since this detail affects only the variance of imputed values and not their expected value, it can be ignored here.)

By Robertson’s argument, even if the subject with missing data on delta-V was known to have died in the crash, that fact should have been ignored, and he or she should have received imputed values drawn from the distribution of delta-V among otherwise similar fatalities and survivors combined. Because most occupants survived, this implies that most of the imputed delta-V values for fatalities would have come from survivors—who, as a group, were in crashes with lower delta-V. Imputed delta-V values for fatal cases would thus have been systematically biased downward compared with known values. Imputed delta-V values for survivors would have been biased upward, because some of them came from fatal cases. In fact, among subjects with imputed values, delta-V would no longer have behaved as a confounder at all, since the imputation model would have wiped out any association between delta-V and outcome among them.

What difference does this make in terms of the relative risk estimates for restraint use? Simulation suggests that it matters. Suppose that case fatality in 10 000 crashes is considered in relation to restraint use and delta-V (dichotomized into high or low, for simplicity). Say that in the absence of any missing data, in high-delta-V crashes, case fatality is 200/1000 among restraint users and 2000/4000 in non-users. In low-delta-V crashes, case fatality is 160/4000 in restraint users and 100/1000 in non-users. Thus the true relative risk is exactly 0.4 in each delta-V stratum. Also by construction, high delta-V is associated with higher case fatality and with lower use of restraints, so that delta-V is a confounder.

Now let us examine how different analysis approaches perform, depending on the missing data mechanism. Table 1 shows three missing data patterns 4:

  1. Delta-V is missing completely at random (MCAR): a random 40% of values are missing at all combinations of exposure, outcome, and the true value of delta-V.

  2. Delta-V is missing more often in some exposure-outcome combinations than in others. The proportions shown are those observed in our study. However, missingness does not depend on the true value of delta-V, conditional on exposure and outcome. This pattern is generally termed missing at random (MAR).4

  3. Missingness on delta-V varies not only by exposure and outcome, but also by the true value of delta-V. This pattern is termed missing not at random (MNAR).

Table 2 shows the relative risk that would be obtained in each of these situations using each of three methods for handling missing data. When the analysis is restricted to cases with complete data on delta-V, the observed relative risk is biased toward 1.0 except when delta-V is missing completely at random—a situation that did not match our data and that probably rarely occurs in practice. If imputation is carried out by ignoring crash outcome when imputing delta-V values, as Robertson advocates, the relative risk is always biased. Ironically, the observed relative risks actually exaggerate the effectiveness of restraints, because the imputation method thwarts removal of some of the confounding by delta-V. When imputation of delta-V is done conditional on crash outcome, the relative risk is unbiased under the MCAR and MAR patterns, and it is less biased than either of the other analytic approaches under the MNAR pattern.

In short, both theory and simulation results indicate that the method we used to impute delta-V was sound, in contrast to Robertson’s alternative, and we stand by it.

Table 1

Missing data patterns

Table 2

Performance of alternative approaches to handling missing data on delta-V