Measuring injury risk factors: question reliability in a statewide sample
  1. Jane Koziol-McLain1,
  2. David Brand2,
  3. Daniel Morgan2,
  4. Marilyn Leff2,
  5. Steven R Lowenstein3
  1. 1School of Nursing, Johns Hopkins University, 525 N Wolfe Street, Room 306, Baltimore, MD 21205–2110, USA
  2. 2Colorado Department of Public Health and Environment, Denver, Colorado
  3. 3Departments of Emergency Medicine, Preventive Medicine and Biometrics, School of Medicine, University of Colorado Health Sciences Center, Denver, Colorado
 Dr Koziol-McLain
Background—Recently (1996–98), Colorado added 15 questions pertaining to injury related risks and behaviors to the behavioral risk factor surveillance system (BRFSS). Questions addressed bicycle helmet use, traffic crashes, exposure to violence, suicidal behavior, and gun storage.

Objective—To measure the test-retest reliability of these injury related questions.

Methods—Of 330 BRFSS participants, 229 (69%) were called a second time and reasked nine selected injury questions. Retests were completed 7–28 days after the original interview.

Results—Test-retest agreement was very high (κ >0.80) for bicycle helmet use, domestic police visits, and gun ownership. All other injury risk questions had substantial agreement (κ >0.60).

Conclusions—The injury related questions added to the Colorado BRFSS have high test-retest reliability.

Personal habits and lifestyles play an important part in causing injury, disability, and premature death. Yet, with a few exceptions, injury related risk factors and behaviors are omitted from surveillance systems. To remedy this, from 1996 through 1998 Colorado added questions pertaining to injury related risks and behaviors to its behavioral risk factor surveillance system (BRFSS). The injury module included 15 questions about bicycle helmet use, traffic crashes, exposure to violence, suicidal behavior, and gun storage.

The BRFSS, sponsored by the Centers for Disease Control and Prevention, is a population based random digit dial telephone survey that has been conducted by 50 states since 1993. Approximately 150 Colorado residents aged 18 years and over are surveyed each month throughout the year. Although reliability testing has been performed on core questions of the BRFSS,1–4 state added questions, including those that address injury risk factors, have not typically been subjected to the same rigor.

This study was conducted to measure the test-retest reliability of the state added questions. Test-retest reliability is an assessment of the stability of a measure over time—that is, the extent to which people answer questions consistently at different times.5–8 Reliability is a necessary survey characteristic to ensure that the data are useful for surveillance, monitoring trends, and intervening to prevent injuries.


During April and May of 1998, BRFSS respondents were called a second time and reasked selected injury questions. Trained interviewers conducted both initial and recall interviews. Standard BRFSS survey methods were employed, including computer assisted telephone interviewing to facilitate direct data entry and coding, interviewer monitoring, and quality control. Recalls were completed one to four weeks after the initial BRFSS interview. Up to 15 calls in three different calling periods were placed to speak with the original respondent. Calls were conducted in either English or Spanish.

The injury control module included 15 questions adapted from published surveys that have not previously been evaluated for reliability. Seven questions were asked of all respondents and eight were asked of only a subset of respondents based on their previous answers. The seven questions that were asked of all respondents were included in the retest module. In addition, two questions that were asked of only a subset of respondents were included in the retest module because the previous month's BRFSS data demonstrated a prevalence of at least 3%.

Data analysis proceeded by first assessing differences between respondents who were successfully recalled for retest and those who could not be recalled. The χ2 (for age, sex, marital status, and Hispanic origin) and Kruskal-Wallis (for age) test statistics were used to test for differences. Then, the reliability of injury question responses was calculated using the κ and weighted κ statistics. Measuring κ is preferable to measuring “per cent agreement,” as κ measures the agreement that occurs beyond what would be expected by chance alone.9–12 Landis and Koch provide the following benchmarks for the interpretation of κ: 0.4–0.6 = moderate; 0.61–0.80 = substantial; and 0.81–1.0 = almost perfect.10 Data were analyzed using the SAS statistical package (SAS, North Carolina). Simple κ statistics were used to measure agreement for variables with dichotomous response sets, and the weighted κ was used for variables with ordinal response sets. A two month cohort study sample was chosen to allow precise estimates of the κ statistic (±0.13). Ninety five per cent confidence intervals were calculated for the simple and weighted κ statistics.


Of the initial 330 BRFSS interviews conducted during the two month study period, 229 (69%) were successfully contacted for retesting. The time between the initial and second call varied: 34% (n=78) were called the second week, 45% (n=104) the third week, and 21% (n=47) the fourth week. Persons who were recontacted were similar to those who could not be recontacted with respect to sex, age, marital status, and Hispanic origin (see table 1). Initial interview injury risk factor prevalence rates did not differ between those recontacted and those not contacted.

Table 1

Demographic characteristics

Despite varied injury risk factor prevalence rates, test-retest reliability was high for all injury questions (see table 2). Three questions (bicycle helmet use, domestic police visits, and gun ownership) had κ values that exceeded 0.80, considered “almost perfect” agreement by some authors.10 Test-retest agreement was also high (κ >0.60) for the remaining six injury questions.

Table 2

Agreement between test and retest administrations


Most large behavioral risk factor surveys, including the national BRFSS, focus on chronic diseases. Injury prone behaviors and risk factors should receive greater emphasis. Our findings demonstrate that the injury related questions added to the Colorado BRFSS are highly reliable.

One important limitation of this study is that it is more difficult to assess agreement beyond chance when prevalence is low.13 Thus, the precision in this study varied among the questions.

There are two additional important limitations to consider when testing reliability by the test-retest method.5–8 First, real change could have occurred between the testing occasions that caused participants to answer differently. For example, improving weather conditions may have made riding a bicycle more prevalent in the recall interview. Perhaps it is not surprising that the question with the lowest κ statistic (0.66) asked women about “feeling unsafe now”—a state that could easily change over short time intervals.

Second, persons may respond on the retest based on their memory of how they responded on the first test, leading to an over estimate of retest reliability. In this occasion, testing “memory” was thought to be less likely, given that the initial BRFSS interview included over 150 questions. However, by administering only a portion of the original 150 question survey, test-retest reliability may have been affected in other ways. Finally, although this study supports the test-retest reliability of the injury questions, tests of validity are still needed.

Reducing high risk behaviors is a priority of the national health objectives for the year 2010 and a cornerstone of state injury control strategic plans.14 Therefore, information about the epidemiology of common injury prone behaviors is needed. This study supports the reliability of questions to measure and monitor the prevalence of injury prone behaviors.


This study was supported by a grant from the Centers for Disease Control and Prevention (R49/CCR811509). During the project period Dr Koziol-McLain was supported by a National Research Service Award from the National Institute of Mental Health (F31 MH11716).

