Count data distributions and their zero-modified equivalents as a framework for modelling microbial data with a relatively high occurrence of zero counts

https://doi.org/10.1016/j.ijfoodmicro.2009.10.016Get rights and content

Abstract

In many cases, microbial data are characterised by a relatively high proportion of zero counts, as occurs with some hygiene indicators and pathogens, which complicates the statistical treatment under the assumption of log normality. The objective of this work was to introduce an alternative Poisson-based distribution framework capable of representing this kind of data without incurring loss of information. The negative binomial, and two zero-modified parameterisations of the Poisson and negative binomial distributions (zero-inflated and hurdle) were fitted to actual zero-inflated bacterial data consisting of total coliforms (n = 590) and Escherichia coli (n = 677) present on beef carcasses sampled from nine Irish abattoirs. Improvement over the simple Poisson was shown by the simple negative binomial (p = 0.426 for χ2 test for the coliforms data) due to the added heterogeneity parameter, although it slightly overestimated the zero counts and underestimated the first few positive counts for both data sets. Whereas, the zero-modified Poisson could not cope with the data over-dispersion in any of its parameterisations (p < 0.001 for χ2 tests), the parameterisations of the zero-modified negative binomial presented differences in fit due to approximation errors. While the zero-inflated negative binomial parameterisation was apparently reduced to a negative binomial due to a non-convergence of the logit parameter estimate, the goodness of fit of the hurdle negative binomial parameterisation indicated that for the data sets under evaluation (coliforms data with ~ 13% zero counts and E. coli data with ~ 42% zero counts), the zero-modified negative binomial distribution was comparable to the simpler negative binomial distribution. Thus, bacterial data consisting of a considerable number of zero counts can be appropriately represented by using such count distributions, and this work serves as the starting point for an alternative statistical treatment of this kind of data and stochastic risk assessment modelling.

Introduction

In the evaluation of microbiological quality of foodstuffs, bacterial load is conventionally expressed in terms of log CFU cm 2 or g 1. Logarithmic transformation is believed to approximate or induce data normality, which is fundamental for the application of parametric statistical data analysis such as analysis of variance. While logarithmic transformation can be suitable for bacterial counts of high occurrence, such as mesophile or total viable counts, whose log CFU can approximate to a normal distribution, this approach may be unsuitable for bacterial counts of lower occurrence, such as the hygiene indicators, coliforms, Escherichia coli, or pathogens (i.e., Salmonella Typhimurium, Listeria monocytogenes, etc). This may lead to the widely-held practice (e.g. Gill et al., 1996, Gill et al., 1998) that whenever bacterial colonies are not observed (zero counts), a low log value corresponding to the limit of enumeration of the microbiological test can be inserted. This statistical practice for ‘censored’ observations is known as imputation, and, depending on the proportion of zero counts or censored points, the mean values are normally overestimated (Hirano et al., 1994, Hornung and Reed, 1990). A maximum likelihood procedure for censored values was introduced by Rouse et al. (1985), whose assumption is that the underlying frequency distribution approximates a lognormal. With this assumption, Pouillot et al. (2007) modelled the contamination of L. monocytogenes in cold-smoked salmon. However, it is unclear how the method will perform when the untransformed data are not normal, and, while it is possible to modify the maximum likelihood for other data distributions, this still requires that the distribution be known.

In recent years, there have been considerable developments (Karlis and Ntzoufras, 2005) and interest in models for count data, particularly in econometrics (Ridout et al., 1998), clinical research (Cheung, 2002), epidemiology (Bulsara et al., 2004) and social science (Lord et al., 2005). Poisson models provide only a standard framework for the analysis of count data, because, in practice, many real-life counting outcomes exhibit more variability than the nominal variance under the Poisson distribution (which is equal to the expected value), a condition called over-dispersion. One frequent manifestation of over-dispersion is that the incidence of zero counts is greater than expected for the Poisson distribution. In this way, it is worthwhile to consider the mechanism by which the over-dispersion occurs and use more flexible models such as the heterogeneous Poisson models and zero-modified models. The most-commonly used heterogeneous Poisson distribution is the negative binomial (Masago et al., 2004, Gale et al., 1997, Hinde and Demetrio, 1998) which loosens Poisson restrictions by allowing the expected number of events (λ) to be a function of some unobserved random variable that follows a gamma distribution (Ridout et al., 1998). Zero-inflated models (Lambert, 1992) are mixture models of two data generation processes: one generating always zero counts (point mass at zero) and the other generating both zero and non-zero counts (either a Poisson or a negative binomial process). On the other hand, hurdle models (Mullahy, 1986) consist of a truncated count component employed for positive counts and a hurdle component modelling zero versus larger counts. More specifically, in both zero-modified models, a logit model with binomial assumption is used to determine which of the two processes generates an observation.

An alternative conceptual framework for bacterial data with a large number of zero counts is introduced in this work, whereby a distribution is not fitted to log-transformed data but to plate count data. Additionally, solving for a proper distribution to this type of bacterial count data can go in parallel with building appropriate count data regression models that would produce more accurate estimates of experimental effects under study (covariates). For instance, a negative binomial regression or a zero-modified negative binomial regression would make it possible to better assess possible differences in bacterial contamination among abattoirs, or to assess the effects of an intervention during processing on the numbers of a pathogen. Therefore, having in mind the potential use of the methods presented here, the distribution fitting has been performed in this article within a regression modelling context as a preamble to a follow-up work where covariates will be included. In the following sections, regression concepts and notations are introduced for the specific case of null regression models (intercepts only and absence of covariates).

The main objective of this study was to introduce count data frequency distributions for fitting bacterial load data that do not approximate to a normal distribution after logarithmic transformation due to the high proportion of zero counts. The fitting procedure shown in this article provides a protocol that can serve as a starting point for the statistical treatment of this kind of bacterial data. Two actual data sets, with different levels of zero counts were used in this study, and they corresponded to total coliforms and E. coli counts from pre-chill beef carcasses produced at nine Irish slaughterhouses over a two-year period. Poisson, negative binomial, and two zero-modified (zero-inflated and hurdle) parameterisations for the Poisson and the negative binomial distributions were fitted to both data sets and results were compared and analysed.

Section snippets

Sampling of beef carcasses and microbiological analyses

Nine beef export abattoirs, with a throughput of at least 30 000 cattle/annum each, located in the south, east and west of Ireland, were visited to obtain a representative sample of cattle being slaughtered throughout the country. Five of the abattoirs were each visited three times and the remaining four on two occasions. During each visit, approximately 30 animals were randomly sampled at the end of the slaughter line after washing by swabbing the two carcass sides. Polyurethane sponges (Sydney

Results

Defining commercially attainable acceptance criteria for the hygienic performance of beef carcass dressing processes, Gill et al. (1998) observed that the bacteria of interest must be counted in approximately 85% of samples for there to be an approximation to normal distribution of log CFU. In the present study, the logarithmic transformation of the total viable counts (log [CFU/10,000 cm2]) on pre-chill beef carcasses (n = 672) brought about the approximation of the data to a normal distribution (

Discussion

Although the Poisson distribution is normally the recommended approach for analysing count data, the extra variability of the bacterial data can be handled using the modifications to the Poisson shown in this paper. Bacterial data made up of a considerable amount of zero counts can be appropriately represented by using such modified count distributions. These distributions have been demonstrated to depict with great accuracy the observed data since they are capable of dealing with the

Conclusions

An alternative conceptual framework that accurately represents the dispersion of microbial counts from bacteria of low occurrence has been introduced. The typical logarithmic transformation of CFU/cm2 (as a way to approximate data normality) was disregarded and analysis was conducted on the discrete variable of CFUs counted on Petri dishes. Distributions for counting outcome data – modified from the baseline Poisson so as to account for both the large variance of the count data and the excess

Acknowledgments

The authors wish to acknowledge safefood, The Food Safety Promotion Board and the Food Institutional Research Measure (FIRM) administered by the Irish Department of Agriculture, Fisheries and Food. The authors also wish to acknowledge the partial financial support of ProSafeBeef, an EU 6th Framework project. The reviewers are gratefully acknowledged for detailed useful comments.

References (30)

  • P. Gale et al.

    Drinking water treatment increases micro-organism clustering: the implications for microbiological risk assessment

    Journal of Water Supply Research and Technology – Aqua

    (1997)
  • W. Gardner et al.

    Regression analyses of counts and rates: Poisson, overdispersed Poisson and negative binomial models

    Psychological Bulletin

    (1995)
  • C. Gill et al.

    Evaluation of the hygienic performances of the processes for beef carcass dressing at 10 packing plants

    Journal of Applied Microbiology

    (1998)
  • W. Greene

    Econometric Analysis

    (2002)
  • S. Hirano et al.

    Estimation of and temporal changes in means and variances of populations of Pseudomonas syringae on snap bean leaflets

    Phytopathology

    (1994)
  • Cited by (57)

    • Calculating the limit of detection for a dilution series

      2023, Journal of Microbiological Methods
    • Cross contamination of Escherichia coli O157:H7 in fresh-cut leafy vegetables: Derivation of a food safety objective and other risk management metrics

      2023, Food Control
      Citation Excerpt :

      At lower levels (4 log CFU/g), contamination in the final product showed a higher number of negative samples (⁓ 12%), which led to a worse fitting and higher AIC value for the log normal distribution (Table 4). However, this level of non-contaminated samples was not sufficient to consider a zero-inflated scenario for which the percentage of negative samples should be much higher (e.g., > 50%) (Gonzales-Barron et al., 2010). More than 90% samples were positive for the pathogen at the lowest contamination level (1 log CFU/g) even though contamination values were < LOQ, which did not allow a proper analysis or fitting of probability distributions.

    • Estimating the distribution of norovirus in individual oysters

      2020, International Journal of Food Microbiology
    View all citing articles on Scopus
    View full text