Article Text

Download PDFPDF

Estimating person-based injury incidence: accuracy of an algorithm to identify readmissions from hospital discharge data
  1. Gabrielle Davie,
  2. Ari Samaranayaka,
  3. John D Langley,
  4. Dave Barson
  1. Injury Prevention Research Unit, Department of Preventive and Social Medicine, University of Otago, Dunedin, New Zealand
  1. Correspondence to Gabrielle Davie, Injury Prevention Research Unit, Department of Preventive and Social Medicine, University of Otago, PO Box 913, Dunedin, New Zealand; gabrielle.davie{at}ipru.otago.ac.nz

Abstract

Background Effective use of routinely collected hospital discharge data (HDD) to estimate injury incidence requires a separate identification of new injuries from readmissions for a previous injury. The aim was to determine the accuracy of a computerised algorithm to identify injury readmissions in HDD.

Methods A random sample of 2000 events (‘key events’) were selected from the 2006 injury subset of New Zealand's HDD. Discharge histories from 1989 to 2007 were extracted for individuals and manually reviewed by at least two people to determine the ‘gold standard’ readmission status of each key event. The algorithm relies on four variables: unique national person identifier, dates of injury, admission and discharge. Reviewers were provided with these variables as well as additional discharge information (eg, discharge type and external cause code narrative) recorded in the HDD. Results of the manual review were compared to those obtained from the algorithm.

Results The algorithm assigned 1811 (90.6%) as incident admissions compared to 1800 (90.0%) classified by the gold standard. Agreement was 97.9%, and accuracy measures (sensitivity, specificity, negative predictive value and positive predictive value) ranged from 87% to 99%. No statistically significant differences between readmission assignation by the algorithm and the gold standard were observed by age, nature of injury, external cause of injury or body region.

Conclusions Any country with electronic HDD could readily identify readmissions and, thus, accurately estimate injury incidence from HDD, providing that a unique person identifier and the date of injury were included in addition to the obligatory dates of admission and discharge.

  • Injury
  • incidence
  • hospital discharge data
  • readmissions
  • public health
  • database
  • biostatistics
  • surveillance
  • advocacy
  • scalds
  • government
  • e-code
  • methods
  • epidemiologic
  • disability
  • terminology
  • legislation

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Routinely collected hospital discharge data (HDD) contains a rich source of information for injury epidemiology. As people are admitted to hospital for treatment of injuries in both the acute and rehabilitative phases, incident injury admissions (new injuries) need to be separately identified from readmissions for the same injury to accurately estimate incidence. Typically, 8%–10% of hospital injury discharges are readmissions, so incidence based on HDD without excluding readmissions will considerably overstate incidence.1–3 Unfortunately, hospital discharge registers often do not include information about whether treatment is for a previous injury, and if they do, its accuracy is questionable.2 4 The ability of countries/states to identify incident cases depends on whether certain variables are routinely collected in their HDD. The existence of national unique person identifiers that enable record linkage within the HDD is helpful, as this allows the identification of multiple admissions for the same patient.3 Date of injury is another variable that is obviously useful in identifying incident injuries. As far as we are aware, there are only three countries in the world that currently capture both a unique person identifier and the date of injury in their HDD—New Zealand (NZ),5 Denmark6 and Finland7—although this may increase if these variables are found to be useful in accurately estimating injury incidence.

Currently, in NZ's Injury Prevention Research Unit (IPRU), the decision about whether a hospital discharge is a first admission (new injury) or a readmission is calculated by an automated SAS8 code that relies on four variables: unique person identifier (Master National Health Index (NHI)), date of injury, date of admission and date of discharge.1 These variables have been available in NZ's HDD, the National Minimum Data Set, since 1989.5 A discharge event is classified as a readmission by the algorithm if it has the same Master NHI and:

  1. the same date of injury as an event with an earlier date of admission or

  2. a date of admission within 1 day of an earlier date of discharge.

The algorithm also assigns to each readmission the first admission in the chain of discharge events that relates to that readmission discharge event.

The overall aim of this project was to assess the accuracy of an algorithm in identifying readmissions from HDD. The specific aims were as follows:

  1. To estimate the overall accuracy (sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and agreement) of IPRU's algorithm for allocating readmission status

  2. To investigate how the accuracy of IPRU's readmissions algorithm varies by age, nature of injury, external cause of injury and body region

  3. To suggest ways to improve the readmission algorithm if the accuracy of the present method is found to be unacceptable.

Methods

The Ministry of Health Information Directorate is the custodian for the National Minimum Dataset (NMDS), which contains all publicly funded inpatient treatment of injuries in NZ hospitals. It has been estimated that for 2006 and 2007, at least 99.5% of all hospital injury discharges were publicly funded (Chris Lewis, Ministry of Health, personal communication, 2010).9

Since 1999, diagnoses in the NMDS have been coded using the Australian Modification of the International Classification of Disease, 10th Revision. For 2006, there were 77 900 hospital discharges in the NMDS, with a principal diagnosis (PDx) in the Australian Modification of the International Classification of Disease, 10th Revision range S00-T78 (Injury and Poisoning chapter excluding complications of surgical and medical care and sequelae). From this, a random sample of 2000 discharges (‘key events’) were selected for a manual review. This year was chosen because, at the time, it was the most recent year that was not considered ‘provisional’. A sample of this size was chosen based on consideration of the following: (1) precision of estimates from a pilot study of 500 discharges, (2) previous research that assessed the accuracy of injury discharges in the NMDS10 and (3) resources available to do the manual review.

The unique person identifiers (NHIs) of these key events were then used to extract all other discharge events with at least one external cause of injury code from 1989 to 2007 to create a discharge history for each person. NHI has been a mandatory in the NMDS since its creation in 1988, and date of injury was first recorded in 1989. At the time this study commenced, 2007 was the most recent NMDS year available. Discharge data for 2007 were included, as it is theoretically possible for records from a later year to provide a date of injury that enables an earlier discharge to be identified as a readmission. Inclusion of 2007 discharge data maintained consistency with the data set used by the IPRU automated readmissions code that examines all related discharges before and after a given discharge. Some illustrative examples of discharge histories are presented in table 1.

Table 1

Illustrative de-identified discharge history scenarios obtained from the National Minimum Dataset of hospital discharges, 1989–2007

If a key event was determined to be unique for the person (ie, that person had no other hospital discharge events during 1989–2007), it was, by default, classified as a ‘first admission’ (incident case) and was subsequently not included in the cases for the manual review. The remaining (‘non-singleton’) key events were distributed randomly among eight volunteer coders from IPRU who were asked to determine if the key event was a first admission or a readmission, based on the discharge history of the injured person. In addition to the four variables used by the algorithm (unique person identifier, date of injury, date of admission and date of discharge), coders were provided with the following variables to aid their decision: discharge type (eg, routine, died, transfer, etc), broad classification of the External Cause Code (major external cause), narrative of the external cause code, free-text description of the primary diagnosis code and a hospital identifier (table 1). A record identifier was also included so that a particular hospital discharge event could be referred to. To protect the anonymity of the injured persons, a fictitious unique person identifier was used rather than the Master NHI. In addition, proxy hospital and record identifiers were used. Manual reviewers were not provided with the readmission status obtained from the algorithm.

If the key event was classified as a readmission, the coder had to identify which previous discharge event was the first admission for that injury and person. In addition to its own allocation, each coder blind-reviewed a sample of another coder's allocation such that all non-singleton key events were manually reviewed twice. If two coders selected the same readmission status for a key event, this was used as the gold standard. Similarly, if two coders identified the same discharge event from the discharge history as the first admission, this was taken as the gold standard. Key events for which the assessments from the two coders differed were blind-reviewed by a third coder (coder C), who had not taken part in the previous process. For these cases, the gold standard was created from the readmission status and the first-admission discharge event that two thirds of the coders agreed on.

The readmission status assigned by the algorithm was compared with the gold standard by calculating the per cent agreement, sensitivity, specificity, PPV and NPV for detecting a readmission.11

The Barell matrix was used to classify the key events by nature of injury and body region injured.12 Nature of injury groupings and major body region were based on the PDx of the key event with external cause groupings based on the first listed external cause code of the key event. Age at time of injury was grouped into five: 0–14, 15–24, 25–64, 65–79 and 80 years or older. Stata V.11.0 was used for the analysis, with binomial exact CIs calculated.13 14

Results

Of the 2000 randomly selected key events selected, 893 were ‘singleton’ key events and, thus, were, by default, first admissions. Of the remaining 1107 key events from 1098 people, 44% had only one additional discharge event in their discharge history. In the other extreme, one key event was associated with 121 discharge events. The median number of discharge events in the discharge history of key events was three.

These 1107 key events were examined manually at least twice, with each of eight coders reviewing between 272 and 278 key events. Coder C received 58 key events where two of the initial coders had disagreed on admission status, and a further three key events where the coders had both agreed that the event was a readmission but had identified a different first admission.

Twenty-seven key events were assigned as incident admissions by the algorithm but as readmissions by the gold standard, and 16 key events were classified as readmissions by the algorithm but as incident admissions by the gold standard, giving only 43 key events with discordant readmission status (tables 2 and 3). As the key events were a random sample of discharges from 2006 NMDS, the total number of key events was analysed to give an estimate of the accuracy of the readmissions algorithm with the NMDS. Using n=2000, the algorithm assigned 1811 (90.6%) as incident admissions compared to 1800 (90.0%) classified as incident admissions by the gold standard. Specificity, NPV and per cent agreement for identifying readmission were all 98% or above, whereas sensitivity and PPV were noticeably lower (table 2). It could be argued that the 893 singleton key events should be excluded from the accuracy measures, as these events were not subject to the manual review. The accuracy of the readmissions algorithm for non-singleton discharges in the NMDS is thus estimated by using n=1107. By definition, sensitivity and PPV do not change. Specificity, NPV and per cent agreement are slightly lower when n=1107 is used but not materially so (table 3). All remaining estimates have been calculated using n=2000.

Table 2

Comparison of the automated readmissions algorithm to the gold standard using the entire sample (n=2000) as the denominator

Table 3

Comparison of the automated readmissions algorithm to the gold standard using the number of key events associated with multiple discharges (n=1107) as the denominator

Of the 58 cases that were referred to coder C, per cent agreement between the algorithm and the gold standard was only 67% (95% CI 54% to 79%), indicating that these ‘difficult’ key events that caused disagreement between the manual coders also created disagreement between the algorithm and the gold standard.

When the individual coder's responses were compared to the gold standard, slight differences between the accuracy of coders was noted, but seven of eight coders had sensitivity >89%, and specificity was above 98% for all. The lowest PPV was 84%, and the lowest NPV was 94%. Agreement varied from 94% to 99%.

Both the gold standard and algorithm identified those aged 80 years and above as having the highest rate of readmission (11.2% and 10.7%, respectively). The lowest rate of readmission was observed for those aged 65–79 years, with 8.1% of this age group identified as readmissions by the gold standard and 7.6% by the algorithm.

No statistically significant differences were observed in the rate of disagreement by age, nature of injury, body region or external cause of injury (table 4). Lack of agreement between the algorithm and the gold standard was highest for key events in which the injured person was older than 80 years. For nature of injury, the highest disagreement rate observed was 7% for dislocation. Fractures were the most commonly recorded nature of injury. Twenty-three (54%) of the 43 disagreements over readmission status were for events with fracture as the nature of injury, 12 percentage points higher than expected. Lack of agreement between the algorithm and the gold standard was noticeably higher than expected for key events in which the major body regions injured were the ‘extremities’ and ‘head and neck’. Motor vehicle crashes stand out as the external cause of injury group associated with higher disagreement rates.

Table 4

Disagreement between the automated algorithm and the gold standard overall and by age, nature, body region and external cause of injury

Overall, the accuracy of the algorithm to identify readmissions was high. The level of error was minimal, so research related to the third aim of this study, which is to identify possible methodological improvements, was not undertaken.

Discussion

Overall, the algorithm identified 11 less readmissions than the gold standard—0.6% of the sample. The 2006 rate of hospitalisation for incident injury events in NZ obtained using the algorithm to identify readmissions is 1703 per 100 000, compared to a rate of 1862 per 100 000 when no readmissions are excluded. Assuming that the underestimate of 0.6% is correct, the estimated true incidence rate was 1693 per 100 000. This indicates that analyses based on IPRU's algorithm may slightly overstate injury incidence in NZ but, obviously, to less extent than other countries that are unable to identify readmissions at all. Accuracy measures (sensitivity, specificity, PPV, NPV) for identifying readmission were between 87% and 99%. Agreement was very high at 98%.

As there were only 43 key events for which the algorithm produced a different readmission status than the gold standard, the power to detect differences by age, nature of injury, body region and external cause of injury was reduced. Also, the small number of key events in some of the nature of injury and external cause of injury categories did result in wide CIs, limiting the ability to detect differences, if they did, in fact, exist. Although not significant, the observed percentage of key events incorrectly classified by the algorithm did show variation, with the highest percentages of incorrectly identified readmissions observed for those older than 80 years, those with a PDx of injury of ‘dislocation’, those whose injuries were due to a motor vehicle crash and those who injured their head and neck.

The analysis undertaken is based on the comparisons between outcomes from the algorithm and gold standard manual review. One of the limitations is that there may be some degree of error in the manual review. In this study, the degree of error was limited by having key events considered independently by two coders, with events where differences were observed given to a third coder. One could argue that the gold standard readmission status should be determined by a review of the individual's hospital medical records. Although this may give the best indication of whether an injury event is a readmission or not, access to and independent review of hospital medical records would be costly and time consuming. As a first step, it was important to judge whether the algorithm was classifying readmissions consistently with a manual review based on additional information from the electronic record.

The number of discharges per individual obviously impacts on the likelihood of the algorithm being correct. If an individual has only ever been admitted to and discharged from hospital once, the event, by definition, cannot be a readmission. If, though, an individual has had 10 previous discharges, the key event is more likely to be classified a readmission than one in which the individual had only one previous admission.

NZ is fortunate to have both a unique person identifier and the date of injury in their HDD. The automated IPRU code and the manual review of key events relied heavily on the existence of these two variables. Obviously, one concern is that data quality of these fields may negatively impact on the correct identification of readmissions. The accuracy of the unique person identifier in NZ HDD is likely to be high, as one of the main tasks of the Ministry of Health Data Quality team is to maintain the integrity of the Master NHIs. If the date of injury has been recorded incorrectly, the algorithm would be unlikely to assign that discharge record as a readmission using the criterion ‘the same date of injury as an event with an earlier date of admission’, but it would still be possible for the discharge to be identified as a readmission if it meets the second criterion (‘a date of admission within 1 day of an earlier date of discharge’).

Observations from those involved in the manual review indicated that there were a few key events with multiple admissions for which dates of injury varied but were within a few weeks of each other and the date of admission was not within 1 day of an earlier date of discharge. Key events with information such as this are obviously classified as first admissions by the algorithm. Those performing the manual review could be guided by information provided in the additional fields such as external cause and diagnosis code narratives and, on the balance of subjective probability, decide on the likelihood of an individual sustaining, for example, a fractured ankle from a motorcycle crash twice within a short period of time. Events like this obviously result in some of the disagreements observed between the algorithm and the gold standard. Manual review coders also reported that they relied on the information provided by the discharge-type (routine, transfer, etc) field to help identify readmissions. Further research could be undertaken to see if including discharge type in the algorithm improved performance.

The algorithm assessed in this study performed surprisingly well in identifying first admissions from readmissions compared to the manual review. Despite the gold standard using information from five more variables than the algorithm, agreement was high at 98%. Accuracy measures (sensitivity, specificity, PPV, NPV) also indicated that the algorithm was performing well. Although the use of additional variables has the potential to improve the automated identification of readmissions, the complexity of the code will need to be greatly increased for minimal gains. This study has shown that the inclusion of a unique national person identifier and three other variables (date of injury, date of admission and date of discharge) in HDD is sufficient to readily identify readmissions and, thus, accurately estimate injury incidence. Countries and states that do not currently capture a national person identifier and date of injury in their electronic HDD should lobby to have these included to improve injury incidence estimates. Concern about privacy and cost implications can be lessened through the use of ‘dummy’ personal identifiers with access to the data controlled by standard ethical processes.

What is already known on this subject

  • Routinely collected hospital discharge data contain a rich source of information for injury epidemiology. Around 10% of hospital injury discharges are readmissions, so incidence based on hospital discharge data without excluding readmissions will considerably overstate incidence

  • Most countries are currently unable to accurately identify readmissions from incident injury admissions

What this study adds

  • Accuracy of a computerised algorithm to identify incident injury admissions from readmissions for a previous injury in NZ hospital discharge data was extremely high. The effective algorithm relied on only four variables: date of admission, date of discharge, date of injury and a national unique person identifier

  • There was no evidence to suggest that the accuracy of the algorithm varied by age, nature of injury, external cause of injury and body region

  • Countries and states that do not currently capture a national person identifier and date of injury in their electronic hospital discharge data should lobby to have these included to improve injury incidence estimates

Acknowledgments

The authors wish to thank the volunteer coders from the Injury Prevention Research Unit for their essential input into this study. The authors acknowledge the Information Directorate at the Ministry of Health as the custodians of NZ's hospital discharge data. We also thank Associate Professor Colin Cryer and Professor Hank Weiss for helpful comments on an earlier version of this paper.

References

Footnotes

  • Funding This research was supported by the University of Otago.

  • Competing interests None.

  • Ethics approval New Zealand Health and Disability Multi-region Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.