The effects of recall on reporting injury and poisoning episodes in the National Health Interview Survey
 ^{1}Office of Analysis and Epidemiology, National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, USA
 ^{2}Office of Research and Methodology, National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, USA
 ^{3}Northern New England Poison Centre, Portland, ME, USA
 Correspondence to: Dr M Warner Office of Analysis and Epidemiology, National Center for Health Statistics, Centers for Disease Control and Prevention, 3311 Toledo Road, Hyattsville, MD 20782 USA; mwarnercdc.gov
 Accepted 21 July 2005
Abstract
Objective: To examine effects of length of time between injury or poisoning and interview on the number of reported injury and poisoning episodes in the National Health Interview Survey (NHIS). (Hereinafter, both injuries and poisonings will be referred to as “injuries”.)
Design: The NHIS collects data continuously on medically attended injuries occurring to family members during the three months before interview. Time between injury and interview was established by subtracting the reported injury date from the interview date. Values were multiply imputed for the 25% of the episodes for which dates were only partially reported.
Main outcome measures: An analysis of mean square error (MSE) was used to quantify the extent of errors in estimated annual numbers of injuries and to compare the contributions of bias and variance to these errors.
Results: The lowest estimated MSEs for annualized estimates for all injuries and for less severe injuries were attained when the annualized estimates were based on 3–6 elapsed cumulative weeks between injury and interview. The average weighted number of injuries reported per week per year was 8% lower in later weeks (weeks 6–13) than in earlier weeks (weeks 1–5) for all episodes, and 24% lower in later weeks than in earlier weeks for contusions/superficial injuries, with both differences being statistically significant. For fractures, however, the averages in the two periods were statistically similar.
Conclusions: The error associated with the estimated annual number of injuries was large with a three month reference period for all injuries and for less severe injuries. Limiting analysis to episodes with up to five weeks between injury and interview has statistical, intuitive, and analytic appeal for all injuries and for less severe injuries.
In 1997 the injury and poisoning section of the National Health Interview Survey (NHIS) was redesigned to address criticism about the lack of detail collected on the circumstances of the event in question and limitations in analyses related to sample size.^{1–}^{5} A significant change to the section was to increase the recall reference period from two weeks to three months so as to yield more injury episodes to allow for detailed analyses and to provide more stable estimates. However, studies of the ability of respondents to recall injury events suggest that as the recall reference period is increased minor injuries are less frequently reported.^{6,}^{7,}^{8,}^{9,}^{10,}^{11,}^{12,}^{13} Therefore, to limit the universe of the redesigned NHIS to injuries that would be less likely to be forgotten, the severity threshold for inclusion of injury episodes was raised to include only medically attended events.
In this paper, we examine the effects of the length of time between injury and interview on the number of reported injury episodes using the 1997–99 NHIS injury data.
METHODS
The NHIS uses a multistage sample designed to represent the US civilian noninstitutionalized population. Sampling and interviewing are continuous throughout each year.^{3,}^{14} Information on all medically attended injuries (that is, injuries for which a healthcare professional was contacted either in person or by telephone for advice or treatment) occurring to any family member during the three month period before the interview is obtained from an adult family member.^{3,}^{4} For the three year period 1997–99, a total of 8191 people were reported to have had 8592 episodes of injury.
For this study, episodes were classified by the firstlisted external cause (categorized using the ICD9CM External Cause of Injury Matrix) and up to four diagnosis codes per episode (categorized using the Barell Injury Diagnosis Matrix).^{15,}^{16}
Estimating time elapsed between injury and interview
The time elapsed between the date of the injury and the date of the interview was examined in terms of both individual weeks (denoted by week 1, 2, 3, … 13) and cumulative weeks (denoted by weeks 1–2, 1–3, … 1–13).
Episodes reported were assumed to have occurred within three months of the interview, unless a date of injury was specified that clearly contradicted this assumption, because the respondents answered yes to the following screen question:^{3,}^{4}
“During the past three months, that is since [91 days before today‘s date], [were/was] [you/anyone in the family] [injured/poisoned]
seriously enough that [you/they] got medical advice or treatment?”
The date of injury was established by asking the respondent to report the day, month, and year of the injury. A date was fully specified for 75% of the episodes; 22% had only a month specified; and 3% no date specified. The interview date recorded was the last day the interview was conducted, which in some cases was one or more days after the respondent completed the injury section. In these cases, the calculated number of elapsed days included days for which it was not possible for a person to report episodes.
Imputation for incomplete dates of injury
We did not limit our analyses to injuries with dates fully specified, because we hypothesized that respondents might partially report the date more frequently as the time between injury and interview increases. We imputed the missing data in two stages. First, for the 22% with month but no day of the month specified, we randomly selected a day of the month out of those days resulting in an elapsed time of no greater than 91 days, if possible. Second, for the 3% with no information provided, we assumed that the distribution of elapsed times would be more similar to that for the cases with partially specified dates than to that for all cases, and that the distribution would also differ by survey year and by severity. We therefore stratified by year and by hospitalization status, and then randomly selected with replacement from the elapsed times of 91 days or less from the first stage of imputation in the same strata.
To allow the assessment of variability due to imputation, we used multiple imputation,^{17} with five sets of imputations created independently via the two stage procedure just described.
Analyses of the multiply imputed data were carried out as described in section 3.1 of Rubin,^{17} but with the refinements described in Barnard and Rubin.^{18} Most analyses in this paper were limited to episodes with time lapses between injury and interview of 1–91 days. The percentage of episodes with imputed values tended to increase as the number of weeks since injury increased, with 10% imputed in week 1 and 37% imputed in week 13.
Calculating national estimates
To produce national estimates, each reported episode in the sample was weighted using the final annual weight provided for the NHIS.^{3} To annualize the estimates, the weighted number of injury episodes reported in each elapsed time period was multiplied by the number of time periods in the year (table 1). The annualized estimate with multiple imputation was the average of the estimates obtained under each of the five sets of imputed values.
Analysis of mean square error
An analysis of mean square error (MSE) was used to examine the total error associated with each point estimate of the annual number of episodes by the cumulative elapsed weeks. The relative contributions of the squared bias and variance to the MSE for each elapsed week period were estimated. The variance could be estimated directly from the data, whereas estimation of the bias required additional modeling and assumptions.
Variance estimates under each set of imputed values were calculated using the Taylor series linearization method in the SUDAAN software package.^{19} The NHIS design information available in the public use data files was incorporated into the estimates.^{3} The variance estimate with multiple imputation was calculated as a combination of the average of the variance estimates obtained under each of the five sets of imputed values and the variance among the point estimates obtained under each of the five sets of imputed values.
To estimate the potential recall biases in the annualized estimates of the number of injury episodes, we first fitted a regression model to the estimates for weeks 1, 1–2, 1–3, …, 1–13 to obtain smooth estimates of their expected values. Then, for each cumulative week period we subtracted the regression estimate for weeks 1–2 from the regression estimate for the period to obtain the estimated bias for the period. The regression estimate for weeks 1–2 was used as the reference value (that is, treated as the “truth”) under the assumption that there would be better recall associated with the shorter reporting period. Similar methods for estimating a reference for comparison were used in previous studies.^{7–}^{9}
The estimated MSE, squared bias, and variance were converted into relative measures by dividing each quantity by the square of the regression estimate for weeks 1–2. The relative measures were expressed per 10 000, which allowed the square root of the relative variance per 10 000 to equal the relative standard error in percent.
To assess sensitivity to the choice of regression model used to estimate bias, the MSE analysis was carried out separately using five different models:

a linear model (E(y) = a+bW+cI_{1998}+dI_{1999});

a quadratic model (E(y) = a+bW+cW^{2}+dI_{1998}+eI_{1999});

a cubic model (E(y) = a+bW+cW^{2}+dW^{3}+eI_{1998}+fI_{1999});

an exponential model with the number of elapsed weeks as a predictor (E(y) = exp(a+bW+cI_{1998}+dI_{1999}));

and an exponential model with the squared number of elapsed weeks as a predictor (E(y) = exp(a+bW^{2}+cI_{1998}+dI_{1999})).
In these models, y is the cumulative annualized estimate of the number of injury episodes for each week since injury, W is the number of cumulative weeks elapsed between injury and interview, and I_{1998} and I_{1999} are indicator variables for the survey years.
As weeks 1–2 could be affected by telescoping (for example, shifts of reported injuries from week 3 to week 2), we repeated the MSE analyses with weeks 1–3 and weeks 1–4 as reference periods. In addition, because weeks 1 and 13 were estimated to be affected the most by the possible discrepancy between the recorded interview date and the date the respondent completed the injury section, and because our imputation procedure might have included some out of range episodes in week 13 (as discussed in the Results section), we repeated the MSE analyses with the annualized estimates for week 1 and weeks 1–13 excluded.
Statistically, the optimal recall period is based on the time period with the lowest estimated MSE. Our suggestions of an optimal recall period are based on this statistical analysis as well as factors related to human memory and comparisons to other data sources.
Comparison of estimates based on earlier versus later weeks
To illustrate differences in reporting between earlier and later weeks, the average weighted numbers of injuries reported per week per year for weeks 1–5 and weeks 6–13 were tabulated and the percent change between the averages was calculated. The average annualized estimates over the three year period based on weeks 1–5 and weeks 1–13 and the percent change were also calculated. (We chose five weeks as the cutoff because, as stated in the Discussion section, we felt that an elapsed 1–5 weeks has statistical, intuitive, and analytic appeal for the NHIS data.)
RESULTS
Patterns in reporting
For 1997–99, the weighted number of injury episodes reported per week averaged 625 000 and ranged from 545 000 in week 10 to 750 000 in week 13 (fig 1). Overall and for the less severe incidents, such as those involving contusions and superficial injuries, there were more episodes reported each week in weeks 1, 2, 3, and 5 than in week 4 and weeks 6 through 11 (see figs 1 and 2). However, for more severe injuries, such as those involving fractures and hospitalization, the pattern of having more injury episodes reported in weeks 1, 2, 3, and 5 was not found (see fig 2).
In some cases, the weighted number reported in week 13 was as high or higher than the number in week 1, 2, or 3 (figs 1 and 2). This suggests the respondents may be reporting some incidents that actually occurred outside the bounds of the recall reference period used in the screening question in an attempt to attribute the injuries to an incident that was “about three months ago”. This is commonly referred to as telescoping.^{9} The high numbers in week 13 may also be due in part to the restriction of most of our imputed elapsed times to be no greater than 91 days, which might have resulted in the inclusion of some cases in week 13 that were actually out of range.
There was generally a decline in week 4 followed by an increase in week 5. Week 5 includes 29–35 days, and some elapsed times of five weeks might reflect respondents who were attempting to report dates for injuries they believed to have occurred about a month before the interview.
Mean square error
Across the five regression models used to estimate bias (with weeks 1–2 as the reference period), the number of cumulative weeks since injury with the lowest estimated relative MSE ranged from 5–6 for all episodes, from 3–5 for contusions/superficial injuries, and from 5–6 for episodes resulting in no time lost, and it was constant at 5 for episodes not resulting in hospitalization. In contrast, for fractures, the estimated optimal number of cumulative weeks since injury ranged from 8–13.
Table 2 illustrates components of the MSE results, for all episodes, contusions/superficial injuries, and fractures, when a linear regression model was used to estimate bias. The variance component of the MSE decreases as the length of time between injury and interview increases, because the likelihood that someone is injured increases and thus there are more injury episodes reported (that is, larger sample size). For example, there were an average of 463 episodes (unweighted) reported in weeks 1–2, 1150 episodes reported in weeks 1–5, and 2543 episodes reported in weeks 1–12.
For all episodes and for contusions/superficial injuries, the MSE values in table 2 have a roughly Ushaped relation with the number of cumulative weeks since injury, in the sense that the MSE tends to decrease steadily until the estimated optimal number of weeks, after which it increases steadily. Moreover, the MSE tends not to vary substantially for numbers of weeks adjacent to and including the estimated optimal number. In contrast, for fractures, the relation of the MSE with the number of cumulative weeks is less Ushaped, with the MSE changing slowly after five or six weeks. (For the quadratic and cubic models, for which results are not shown, the MSE for fractures decreases monotonically as the number of weeks increases from 1 to 13.) The linear regression estimate of the expected value of the estimated annual number of injuries, used in estimating the bias, decreases as the number of cumulative weeks increases for all episodes and for less severe injuries, whereas it increases slightly for fractures.
When the reference period of weeks 1–2 was replaced with weeks 1–3 or weeks 1–4 in the MSE analysis, the resulting estimated optimal recall periods either stayed the same or increased by one week (with the exception of one case for contusions, in which the estimated optimal recall period changed from 3 to 5 weeks, and one case for fractures, in which the estimated optimal recall period changed from 3 to 5 weeks). When the annualized estimates for week 1 and weeks 1–13 were excluded from the MSE analysis, the resulting estimated optimal recall periods either stayed the same or decreased by one week (with the exception of fractures, for which there were differences in both directions, with some differences of two or three weeks).
Estimates based on earlier versus later weeks
Overall, the average weighted number of injuries reported per week in weeks 6–13 was 8% lower than for weeks 1–5 (table 3), and the difference was statistically significant (based on a regression test; see the notes to table 3). For contusions/superficial injuries, there was a 24% decline (also statistically significant), but for fractures the numbers were statistically similar across the earlier and later periods. Other analyses examining the weighted number of injuries reported by individual weeks yielded results consistent with those in table 3.
Table 3 also shows the differences between average annualized estimates based on weeks 1–5 and on weeks 1–13. The relative differences in the average annualized estimates were smaller than the relative differences in the corresponding average weighted numbers reported per week (based on weeks 1–5 and on weeks 6–13) shown in table 3, simply because the annualized estimates were based on overlapping periods (that is, weeks 1–13 include weeks 1–5). However, the difference between average annualized estimates based on overlapping sets of weeks is algebraically equivalent to the difference between the corresponding average weighted numbers reported per week based on nonoverlapping sets of weeks, after multiplication by a constant. Thus, a test of significance for either difference in averages can be performed by testing whether the difference between the average weighted numbers reported per week (based on nonoverlapping periods) is significant, as was done in table 3 under the assumption that estimates based on nonoverlapping sets of weeks are independent.
DISCUSSION
The NHIS is the only US based national survey that contains detailed questions related to injury episodes and is conducted by interviewing people rather than by abstracting information from medical records. As predicted, the respondents appeared to experience some memory decay as they were asked to recall events happening further in the past, and the decay appeared to vary by severity of the episodes.
A central aim in increasing the recall reference period to three months in the NHIS was to decrease the variance of point estimates, but the effect of increasing the recall reference period on the bias needed to be addressed as well. Estimating the components of the MSE allowed the tradeoff between decreased variance and increased bias to be examined. Two main factors affected the MSE in this analysis: (1) the sample size on which the estimate was based, which influenced the variance; and (2) the severity of the injury, which influenced the rate at which injuries were reported (that is, recall bias).^{6,}^{7,}^{8,}^{9,}^{10,}^{11,}^{12} Given the dependence of the findings on sample size, our MSE results should be regarded as specific to the NHIS.
The results of the MSE analysis suggest that the total error associated with NHIS estimates of the annual total number of injuries was large when a three month recall reference period was used, but that the error varies with the characteristics of the episodes under study.
For estimating the annual total number of injuries or the annual number of less severe injuries based on the NHIS data, limiting analyses to episodes with an elapsed five cumulative weeks has statistical, intuitive, and analytic appeal. The lowest estimated MSEs were attained when the annualized estimates were based on 3–6 cumulative weeks elapsing between injury and interview. Moreover, for comparison of NHIS data with other data, five weeks would make the estimate analogous to an estimate with a one month recall reference period.
For injuries that are less likely to be forgotten, such as those requiring hospitalization or that are more severe (for example, fractures), episodes with a three month period can be used. The longer time period between injury and interview increases the number of recorded events. The resultant larger sample size allows for more detailed analyses and greater stability of estimates, which is particularly beneficial for studies of rarer events. Although data can be limited to differing time periods based on the characteristics of the episodes under study, estimates based on different time periods should not be combined to obtain estimates for totals.
Researchers suggest that obtaining the date of injury is a useful tool for analyzing recall bias.^{6,}^{7,}^{8,}^{9,}^{10} This and other studies have shown that some respondents do not remember the exact date of injury.^{6,}^{8} Our analysis was limited because of such missing information. However, 75% of the cases had full dates of injury reported, 22% had months but no days reported, and only 3% had neither months nor days reported. In addition, the early weekly periods appeared to be less affected than the later periods, as the proportion of episodes with imputed values tended to increase as the weeks progressed.
Recall reference periods and severity thresholds in national surveys from 10 countries producing national annual estimates of the numbers of injuries were compared by the International Collaborative Effort on Injury Statistics.^{20} The recall reference periods used ranged from four weeks to 12 months, although the sizes of the countries and their surveys varied as well. Countries with longer recall reference periods must accept the tradeoff of loss of episodes as a result of forgotten events versus increased sample sizes due to the longer periods.
As with any survey, obtaining the optimal recall period is a balance of needing accurate, reliable information and adequate numbers of injuries to provide acceptable statistical power and precision. The NHIS is a unique source of nationally representative data on the circumstances of injury and poisoning and, thus, the careful balancing of these factors is advised for all data users.
Key points

The lowest estimated mean square errors for annualized estimates for all injuries and for less severe injuries were attained when the annualized estimates were based on 3–6 elapsed cumulative weeks between injury and interview.

An elapsed five cumulative weeks between injury and interview has statistical, intuitive, and analytic appeal for estimating the annual total number of injuries and annual number of less severe injuries.

For injuries that are less likely to be forgotten, such as those requiring hospitalization or that are more severe, a three month recall reference period can be used.

The average weighted number of injuries reported per week per year was 8% lower in later weeks than in earlier weeks for all episodes, and 24% lower in later weeks than in earlier weeks for contusions/superficial injuries, with both differences being statistically significant. For fractures, however, the averages in the two periods were statistically similar.
Acknowledgments
The authors wish to thank Jennifer Madans, Diane Makuc, Jane Gentleman, and Patricia Barnes from the National Center for Health Statistics for advice and support throughout this project.