Application of negative binomial modeling for discrete outcomes: A case study in aging research
Introduction
Many outcomes in clinical medicine and aging research are a finite set of non-negative integer values not normally distributed; hence, they may be more appropriately analyzed as discrete rather than as continuous measures. Examples include disability in activities of daily living (ADL) or instrumental ADL; the frequency of falls or injurious falls; and the number of episodes of incontinence, delirium, or restricted activity [1]. These types of outcomes have been generally analyzed as continuous measures or dichotomous events. When the outcome is considered to be continuous, the data frequently are assumed to be normally distributed, and multiple regression techniques are applied. When the outcome is considered dichotomous, the logistic regression model often is applied.
A typical example in aging research is the outcome of ADL disability [2], [3]. Composite measures of ADL function assess the ability of individuals to perform essential tasks, including walking inside the house, bathing, upper and lower body dressing, transferring from a chair, toileting, feeding, and grooming. One scoring method assigns a value of 0 for no (personal) help and no difficulty, 1 for difficulty but no help, and 2 for help regardless of difficulty [4]. Scores are summed to produce an overall score ranging from 0 to 16 (for eight tasks). Thus, an individual with a score of 0 would have no disability, whereas an individual with a score of 4 would have difficulty or dependence in two to four tasks. Because the outcome is discrete and non-negative, the distribution is not likely to be normal; thus, applying standard regression techniques may be inappropriate.
We present a case study using the negative binomial model for the analysis of a discrete outcome in a clinical trial designed to evaluate the effectiveness of a prehabilitation program in preventing decline in ADL function among physically frail, community-living older persons. The negative binomial regression model has been well described in the statistical literature [5], [6]; however, it has been infrequently used in the clinical and epidemiologic literature.
Section snippets
Review of statistical models
The analysis of continuous outcome data using linear regression models assumes that the errors are independent and identically normally distributed with a mean of 0. Because discrete data often do not follow the underlying assumptions of normality, other analytic methods should be considered, particularly when the distribution is highly skewed (e.g., many scores or counts of zero). Transformations of the data can be tried to meet the normality assumption, but there are inherent problems with
PREHAB trial
The PREHAB trial has been described in detail elsewhere [3], [18]. Of the 188 physically frail, community-living persons, aged 75 years or older, who were enrolled in the study, 94 were randomized to the prehabilitation program, and 94 were randomized to the educational control program. Randomization was stratified by level of physical frailty (moderate versus severe) and recruitment strategy (office-based versus roster-based).
The prehabilitation program was a 6-month, home-based intervention
Results
The distribution of the ADL disability scores at 7 months is displayed in Fig. 1. The observed data are asymmetric (skewed right), with a modal disability score of 0 (40% of subjects). The disability scores range from 0 (no disability) to 16 (total disability), with a mean of 2.8 (variance of 16.4) and median of 1. There is clumping at zero, and the data are not normally distributed. Because of the clumping at zero, standard transformations (e.g., the square root) do not normalize the data.
Discussion
We have applied the negative binomial model to discrete data that have the problem of overdispersion. The model is a generalization of the Poisson regression model with an added term to correct for overdispersion. The addition of the dispersion parameter enables the variance to be more accurately estimated, which leads to valid test statistics. McCullagh and Nelder [6] have noted that overdispersion may arise from intersubject variability where the count of incidents for a given individual is
Acknowledgements
This work was supported by the Claude D. Pepper Older Americans Independence Center (P60AG10469) and the Academic Awards K23AG00759 and K24AG021507 (TMG) from the National Institute on Aging and by a training grant from the National Institute of Mental Health (T32MH14235) ALB).
References (26)
- et al.
Analyzing data with clumping at zero: an example demonstration
Biometrics
(2000) - et al.
Modeling traffic accident occurrence and involvement
Accid Anal Prev
(2000) - et al.
Modelling MRI enhancing lesion counts in multiple sclerosis using a negative binomial model: implications for clinical trials
J Neurol Sci
(1999) - et al.
A prehabilitation program for physically frail community-living older persons
Arch Phys Med Rehabil
(2003) - et al.
Restricted activity among community-living older persons: incidence, precipitants, and health care utilization
Ann Intern Med
(2001) - et al.
The Index of ADL: a standardized measure of biological and psychosocial function
JAMA
(1963) - et al.
A program to prevent functional decline in physically frail elderly persons who live at home
N Engl J Med
(2002) - et al.
Difficulty and dependence: two components of the disability continuum among community-living older persons
Ann Intern Med
(1998) An introduction to categorical data analysis
(1996)
Logarithmic transformations in ANOVA
Biometrics
A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives
Biometrics
Logistic regression using the SAS system: theory and application
Cited by (142)
Examining the effects of urbanization and purchasing power on the relationship between motorcycle ownership and economic development: A panel data
2022, International Journal of Transportation Science and TechnologyCOVID-19 and tobacco cessation: lessons from India
2022, Public HealthRole of extreme weather events and El Niño Southern Oscillation on incidence of Enteric Fever in Ahmedabad and Surat, Gujarat, India
2021, Environmental ResearchCitation Excerpt :We calculated descriptive statistics to detect effects of ENSO categories on monthly and annual case incidences, climate variables and extreme heat and precipitation events. We used negative binomial generalized estimating equations to quantify the association between extreme weather events, phases of El Niño Southern Oscillations and incidence of EF (Byers et al., 2003; G, 1994). The full model consisted of EHE, EPE, phases of ENSO (strong El Niño, moderate El Niño, neutral, moderate La Niña, strong La Niña), and monsoon season (yes/no).
Data-driven approach to COVID-19 infection forecast for Nigeria using negative binomial regression model
2021, Data Science for COVID-19 Volume 1: Computational Perspectives