Application of negative binomial modeling for discrete outcomes: A case study in aging research

https://doi.org/10.1016/S0895-4356(03)00028-3Get rights and content

Abstract

We present a case study using the negative binomial regression model for discrete outcome data arising from a clinical trial designed to evaluate the effectiveness of a prehabilitation program in preventing functional decline among physically frail, community-living older persons. The primary outcome was a measure of disability at 7 months that had a range from 0 to 16 with a mean of 2.8 (variance of 16.4) and a median of 1. The data were right skewed with clumping at zero (i.e., 40% of subjects had no disability at 7 months). Because the variance was nearly 6 times greater than the mean, the negative binomial model provided an improved fit to the data and accounted better for overdispersion than the Poisson regression model, which assumes that the mean and variance are the same. Although correcting the variance and corresponding test statistics for overdispersion is a standard procedure in the Poisson model, the estimates of the regression parameters are inefficient because they have more sampling variability than is necessary. The negative binomial model provides an alternative approach for the analysis of discrete data where overdispersion is a problem, provided that the model is correctly specified and adequately fits the data.

Introduction

Many outcomes in clinical medicine and aging research are a finite set of non-negative integer values not normally distributed; hence, they may be more appropriately analyzed as discrete rather than as continuous measures. Examples include disability in activities of daily living (ADL) or instrumental ADL; the frequency of falls or injurious falls; and the number of episodes of incontinence, delirium, or restricted activity [1]. These types of outcomes have been generally analyzed as continuous measures or dichotomous events. When the outcome is considered to be continuous, the data frequently are assumed to be normally distributed, and multiple regression techniques are applied. When the outcome is considered dichotomous, the logistic regression model often is applied.

A typical example in aging research is the outcome of ADL disability [2], [3]. Composite measures of ADL function assess the ability of individuals to perform essential tasks, including walking inside the house, bathing, upper and lower body dressing, transferring from a chair, toileting, feeding, and grooming. One scoring method assigns a value of 0 for no (personal) help and no difficulty, 1 for difficulty but no help, and 2 for help regardless of difficulty [4]. Scores are summed to produce an overall score ranging from 0 to 16 (for eight tasks). Thus, an individual with a score of 0 would have no disability, whereas an individual with a score of 4 would have difficulty or dependence in two to four tasks. Because the outcome is discrete and non-negative, the distribution is not likely to be normal; thus, applying standard regression techniques may be inappropriate.

We present a case study using the negative binomial model for the analysis of a discrete outcome in a clinical trial designed to evaluate the effectiveness of a prehabilitation program in preventing decline in ADL function among physically frail, community-living older persons. The negative binomial regression model has been well described in the statistical literature [5], [6]; however, it has been infrequently used in the clinical and epidemiologic literature.

Section snippets

Review of statistical models

The analysis of continuous outcome data using linear regression models assumes that the errors are independent and identically normally distributed with a mean of 0. Because discrete data often do not follow the underlying assumptions of normality, other analytic methods should be considered, particularly when the distribution is highly skewed (e.g., many scores or counts of zero). Transformations of the data can be tried to meet the normality assumption, but there are inherent problems with

PREHAB trial

The PREHAB trial has been described in detail elsewhere [3], [18]. Of the 188 physically frail, community-living persons, aged 75 years or older, who were enrolled in the study, 94 were randomized to the prehabilitation program, and 94 were randomized to the educational control program. Randomization was stratified by level of physical frailty (moderate versus severe) and recruitment strategy (office-based versus roster-based).

The prehabilitation program was a 6-month, home-based intervention

Results

The distribution of the ADL disability scores at 7 months is displayed in Fig. 1. The observed data are asymmetric (skewed right), with a modal disability score of 0 (40% of subjects). The disability scores range from 0 (no disability) to 16 (total disability), with a mean of 2.8 (variance of 16.4) and median of 1. There is clumping at zero, and the data are not normally distributed. Because of the clumping at zero, standard transformations (e.g., the square root) do not normalize the data.

Discussion

We have applied the negative binomial model to discrete data that have the problem of overdispersion. The model is a generalization of the Poisson regression model with an added term to correct for overdispersion. The addition of the dispersion parameter enables the variance to be more accurately estimated, which leads to valid test statistics. McCullagh and Nelder [6] have noted that overdispersion may arise from intersubject variability where the count of incidents for a given individual is

Acknowledgements

This work was supported by the Claude D. Pepper Older Americans Independence Center (P60AG10469) and the Academic Awards K23AG00759 and K24AG021507 (TMG) from the National Institute on Aging and by a training grant from the National Institute of Mental Health (T32MH14235) ALB).

References (26)

  • D Berry

    Logarithmic transformations in ANOVA

    Biometrics

    (1987)
  • M Ridout et al.

    A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives

    Biometrics

    (2001)
  • P Allison

    Logistic regression using the SAS system: theory and application

    (1999)
  • Cited by (142)

    • Role of extreme weather events and El Niño Southern Oscillation on incidence of Enteric Fever in Ahmedabad and Surat, Gujarat, India

      2021, Environmental Research
      Citation Excerpt :

      We calculated descriptive statistics to detect effects of ENSO categories on monthly and annual case incidences, climate variables and extreme heat and precipitation events. We used negative binomial generalized estimating equations to quantify the association between extreme weather events, phases of El Niño Southern Oscillations and incidence of EF (Byers et al., 2003; G, 1994). The full model consisted of EHE, EPE, phases of ENSO (strong El Niño, moderate El Niño, neutral, moderate La Niña, strong La Niña), and monsoon season (yes/no).

    View all citing articles on Scopus
    View full text