The burden of clostridium difficile infection: estimates of the incidence of CDI from U.S. Administrative databases

Background Many administrative data sources are available to study the epidemiology of infectious diseases, including Clostridium difficile infection (CDI), but few publications have compared CDI event rates across databases using similar methodology. We used comparable methods with multiple administrative databases to compare the incidence of CDI in older and younger persons in the United States. Methods We performed a retrospective study using three longitudinal data sources (Medicare, OptumInsight LabRx, and Healthcare Cost and Utilization Project State Inpatient Database (SID)), and two hospital encounter-level data sources (Nationwide Inpatient Sample (NIS) and Premier Perspective database) to identify CDI in adults aged 18 and older with calculation of CDI incidence rates/100,000 person-years of observation (pyo) and CDI categorization (onset and association). Results The incidence of CDI ranged from 66/100,000 in persons under 65 years (LabRx), 383/100,000 in elderly persons (SID), and 677/100,000 in elderly persons (Medicare). Ninety percent of CDI episodes in the LabRx population were characterized as community-onset compared to 41 % in the Medicare population. The majority of CDI episodes in the Medicare and LabRx databases were identified based on only a CDI diagnosis, whereas almost ¾ of encounters coded for CDI in the Premier hospital data were confirmed with a positive test result plus treatment with metronidazole or oral vancomycin. Using only the Medicare inpatient data to calculate encounter-level CDI events resulted in 553 CDI events/100,000 persons, virtually the same as the encounter proportion calculated using the NIS (544/100,000 persons). Conclusions We found that the incidence of CDI was 35 % higher in the Medicare data and fewer episodes were attributed to hospital acquisition when all medical claims were used to identify CDI, compared to only inpatient data lacking information on diagnosis and treatment in the outpatient setting. The incidence of CDI was 10-fold lower and the proportion of community-onset CDI was much higher in the privately insured younger LabRx population compared to the elderly Medicare population. The methods we developed to identify incident CDI can be used by other investigators to study the incidence of other infectious diseases and adverse events using large generalizable administrative datasets. Electronic supplementary material The online version of this article (doi:10.1186/s12879-016-1501-7) contains supplementary material, which is available to authorized users.


Background
Clostridium difficile infection (CDI) incidence in the United States has increased dramatically since 2000 [1,2]. The number of discharges from non-federal, acute care hospitals assigned the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis code for CDI (008.45) increased by 2.7-fold between 2000 and 2012 using data from the Healthcare Cost and Utilization Project (HCUP) Nationwide Inpatient Sample (NIS) [3]. CDI was estimated to cause as many as 14,000 deaths in 2007 and an attributable mortality ranging from 5.7 % in endemic settings to 16.7 % in severe outbreaks since 2000 [2,[4][5][6][7].
Much research has focused on identifying specific risk factors for CDI, but this might not be the best approach to identify high risk populations. The results of risk factor studies have not always been consistent [8][9][10][11][12][13][14][15], with potential reasons for discrepancies including differences in patient populations, data availability, and/or study definitions. These differences limit both the ability to compare results across studies and the generalizability of results, making it difficult to identify which populations have the highest CDI burden and how to best target CDI prevention practices.
Billing and claims data (referred to collectively as administrative data) are increasingly used for health services and outcomes research because of large population sizes, generalizability of findings, and the ability to follow individuals across the spectrum of health care. Unfortunately there is no single, comprehensive database in the U.S. that can be used to identify all populations at risk for CDI. In order to better understand the epidemiology of CDI we applied common definitions to identify and classify CDI from five large administrative databases, the Medicare 5 % Sample, HCUP State Inpatient Databases (SID) and the NIS, OptumInsight™ Retrospective Database (LabRx), and Premier Perspective, to improve our understanding of the burden of CDI in the U.S. from a population perspective.

Methods
The databases used for this study were anonymized; some contained encrypted identifiers to link longitudinal data within a person ("cohort" data), while the others consisted of only unlinked hospital billing data. The hospital billing databases (NIS, Premier) were analyzed at the hospital discharge level. The cohort databases containing a person-level identifier to track persons across healthcare encounters (Medicare, LabRx, and SID) were analyzed at both the person-level and hospital dischargelevel. For all cohort data hospitalizations with same-day transfers to the same or a different hospital were aggregated and treated as a single hospital stay, to avoid over-counting long hospitalizations or direct transfers as distinct hospital visits. The Washington University Human Research Protection Office and Geisel School of Medicine at Dartmouth Committee for the Protection of Human Subjects gave approval to conduct this research with a waiver of informed consent.

Identification of CDI
Criteria used to identify CDI combined any of the following: 1) ICD-9-CM diagnosis code for CDI (008.45) during an inpatient hospital stay; 2) ICD-9-CM diagnosis code for CDI in an outpatient encounter with specific restrictions (see Additional file 1: Appendix); 3) Positive test result for C. difficile toxins or toxin genes (LabRx); and 4) Non-topical metronidazole or oral vancomycin therapy within ± 14 days of a CPT-4 code for a C. difficile test or diagnosis code for CDI (Medicare, LabRx, and Premier).
For person-level analyses, subsequent unique episodes of CDI were identified if the person met criteria for CDI again after an 84 day period during which there were no healthcare encounters meeting the CDI case definition. We used a conservative definition for subsequent unique episodes of CDI to minimize misclassifying carry-forward of the CDI diagnosis code or CDI recurrence as a unique episode of CDI.

Inclusion/exclusion criteria
For Medicare and LabRx data, enrollment and complete health insurance coverage for the year prior to the first onset date of CDI was required. For Medicare age ≥ 66 years at the time of CDI onset was required; for the SID, NIS, and Premier all persons aged ≥ 18 years were included. Individuals 65 and older were excluded from the LabRx data since they represented only 7 % of the privately insured population. For the cohort data CDI episodes were excluded if the person had CDI within the prior 84 days in order to identify new episodes of CDI in 2009.

Date of onset and determination of the location of onset and attribution of CDI
The date of onset of CDI was defined as the first date corresponding to a coded diagnosis of CDI. In the LabRx data, if a CDI toxin test was performed, the date of the first positive test was used as the date of CDI onset.
The location of onset and attribution for each CDI episode was determined using an algorithm based on the most recent SHEA/IDSA definitions [16][17][18]. CDI coded during a hospitalization was classified as communityonset if: 1) CDI was the primary diagnosis; 2) the primary diagnosis was diarrhea, abdominal pain, or nausea and CDI was coded in a secondary position; or 3) CDI was coded in a secondary position and the hospital length of stay was ≤ 3 days. If no further information was available from outpatient or physician claims, CDI was classified as hospital-onset if it was coded in a secondary position and the hospital length of stay was > 3 days. If the database did not contain a common person identifier, no further categorization beyond community-or hospitalonset was possible. If a common person identifier was available and the CDI episode was community-onset, hospitalizations and other healthcare facility exposures prior to the CDI hospital admission were identified to classify the episode (see Additional file 1: Appendix).

Analysis
The rate of CDI in a population group was defined as the number of CDI episodes divided by the person-years of observation (pyo, defined from 1/1/2009 up to the next CDI event, death, or 12/31/2009, whichever came first). For the SID the population of adults aged 18-64 and the elderly in the seven states was obtained from the 2010 census (www.census.gov). Person-years in the SID data were calculated taking into account death (using the midpoint of the death discharge quarter to define the date of death). SAS version 9.3 and SPSS 20.0 were used for data management and analysis.

Results
The demographic characteristics and number of hospitalizations and outpatient encounters in the different databases are shown in the Additional file 1: Appendix. In the three longitudinal datasets approximately 0.2 % of the initially identified hospitalizations coded for CDI were excluded because the patient was previously identified with CDI within the prior 12 weeks. The criteria used to identify CDI are shown in Table 1. In the Medicare data 23 % of inpatient CDI episodes were identified by the CDI diagnosis code together with an outpatient prescription for metronidazole or oral vancomycin within 14 days after hospital discharge; when restricted to patients with Part D coverage this corresponded to 40 % of inpatient CDI episodes (1303/3280). In the Medicare data approximately 53 % of unique CDI inpatient hospital episodes were identified by a secondary diagnosis code during the hospitalization, consistent with hospital-onset CDI, compared to 70 % in the SID, and 23 % in the LabRx data. In the encounter-level Approximately 42 % of the CDI episodes in the Medicare data were first identified in the outpatient setting (Table 1). Of these outpatient CDI episodes, 78 % were identified by the CDI diagnosis code alone, and 21.8 % were identified by the diagnosis code plus outpatient CDI prescription (35 % for individuals with Part D coverage). Fifteen percent (610/4076) of the persons identified with CDI outside of the hospital were hospitalized within 14 days of CDI diagnosis; of these 60 % (364/610) were coded for CDI during the inpatient hospitalization. In the LabRx data of younger persons, 38.5 % of outpatient CDI episodes were identified by the CDI diagnosis code only with no supporting laboratory or prescription evidence for infection. A total of 28.7 % of the outpatient CDI episodes in the LabRx data were identified by a diagnosis code plus therapy, while 13.4 % of outpatient CDI episodes were identified by a positive C. difficile test plus therapy within 14 days.
The categorization of CDI episodes by database is shown in Table 2. Fifty-nine percent of CDI episodes (5648/9652) were categorized as healthcare facility onset (hospital or other facility) in the Medicare data, compared to 68 % in the SID (46,739/68,440). Communityonset healthcare facility-associated CDI made up 13 % of the CDI episodes in Medicare, compared to 11.8 % in the SID. Community-onset community-associated CDI episodes included 22.6 % of episodes in Medicare vs. 13.9 % in the SID and 35.2 % in the Premier data. Only 22.4 % (1102/4913) of the CDI episodes in the LabRx data were healthcare facility associated (excluding indeterminate association), while 68.4 % of episodes were categorized as community-onset community-associated.
The number of persons with one or more than one unique episode of CDI in the longitudinal datasets is shown in Table 3 and the cumulative incidence of CDI in Table 4. 2.6 % of persons in the Medicare and 5.0 % of persons in the LabRx data had > 1 unique episode of CDI spaced at least 12 weeks apart in 2009. The rate of CDI in the Medicare data was 677/ 100,000 pyo, while the rate was 43 % lower (383) in the SID. The rate of CDI in the younger adult population in the SID was ten-fold lower (37.5) than the rate in the elderly SID population, while the rate of CDI in the LabRx data including outpatient CDI was 1.8-fold higher than in the SID younger population. The rate of hospital onset CDI per 10,000 patient days was higher in the SID for elderly persons (15.9) compared to the Medicare data (9.8), lower in the SID and Premier data for younger adults, and lowest (1.1) in the LabRx data.
To determine the impact of including outpatient medical claims and linkage within a person on CDI incidence, we compared the cumulative incidence, categorization of episodes, and attribution of CDI in the Medicare data when complete claims were used vs. only inpatient facility claims, with and without linkage within a person. When only the inpatient facility claims were used (analogous to the SID), the total number of CDI episodes was reduced to 6276 and the cumulative incidence of CDI decreased to 440/100,000 pyo. In addition, the number of hospital-onset cases and the rate of hospital-onset CDI increased while community-associated CDI decreased over two-fold (Table 5). When the person-level linkage in the inpatient Medicare data was removed (analogous to the Episodes in the Premier and NIS encounter-level databases were classified as community-onset unknown association since it was not possible to determine prior health care exposures due to lack of a person-level identifier

Discussion
We used five types of billing or claims data to define the burden of CDI in U.S. adults in 2009. To our knowledge this is the first study to compare the burden of CDI from a population perspective in different administrative   D) and LabRx data, inpatient treatment in Premier, and outpatient C. difficile test results in LabRx. Not surprisingly, we found a higher cumulative incidence of CDI in the databases that contained inpatient and outpatient data compared to only inpatient billing data, similar to what was reported recently using Kaiser Permanente data [19]. The number of CDI episodes per 100,000 elderly persons was almost 1.8-fold higher in the Medicare data compared to the inpatient only longitudinal-SID. However, when only inpatient data were used to identify CDI in the Medicare population and the analysis was conducted at the person-level, the cumulative incidence of CDI was very close to that calculated in the SID (440 vs. 383/100,000 pyo). The 54 % increase in the cumulative incidence using complete (677) vs. inpatient-only Medicare data (440) emphasizes the importance of using complete data from inpatient and outpatient settings to calculate CDI incidence. In addition, when we treated the inpatient Medicare data as encounter-level (i.e., hospitalizations as unique encounters), the number of CDI events/100,000 hospitalizations was remarkably similar to the 544 /100,000 hospitalizations in elderly persons in the 2009 NIS.
More CDI cases were identified as hospital-onset in the datasets with only inpatient facility data, resulting in a higher apparent hospital-onset CDI incidence. The rate of hospital-onset CDI increased in the Medicare data to 11.4 cases/10,000 patient days when analysis was restricted to only inpatient facility claims, and this rate increased further when the linkage within a patient was ignored (14.1 cases/10,000 patient days). This suggests that analysis of encounter-level data, such as the NIS, may result in over-estimation of CDI hospital rates by as much as 25 % due to continued coding in subsequent hospitalizations that are part of the same CDI episode, and that caution should be used when using these data for surveillance purposes.
In analysis of complete Medicare claims, 33 % of the CDI events were categorized as hospital-onset, whereas in the analyses using only inpatient Medicare facility data, approximately 60 % of the CDI events were categorized as hospital-onset, suggesting that hospital-onset cases will be over-estimated by almost two-fold when only inpatient claims or billing data are used. These results are consistent with previous reports of the overattribution of hospital-onset CDI [20,21] and the overestimation of CDI cases identified by the ICD-9-CM diagnosis code compared to positive C. difficile toxin assay from facility billing data [20,[22][23][24][25]. More hospital-onset cases were identified in the HCUP and Premier data, likely due to misclassification of CDI with onset in the community.
The addition of laboratory results in the LabRx data suggests that 20 % of CDI episodes may be missed when analyzing data without C. difficile test results. Identification of fecal transplant in administrative data via CPT-4 and HCPCS codes (available beginning 2013) may aid in the identification of CDI in future, particularly in combination with a positive C. difficile test result. Interestingly, approximately three-quarters of the outpatient CDI episodes in the LabRx data were not supported by a positive test result, with 38 % identified on the basis of a CDI diagnosis alone. The percentage of outpatient CDI diagnoses without confirmation by a positive C. difficile laboratory test was very similar in our previous study using Veterans Administration data, in which only 32 % of the total outpatient CDI cases had a C. difficile test result [26]. Further studies to validate the use of the ICD-9-CM diagnosis code for CDI in the outpatient setting in the absence of positive C. difficile test results are warranted to determine the accuracy of coding outside of the hospital.
In the Medicare data 23 % of CDI episodes first diagnosed during an inpatient hospital stay were linked to a filled outpatient prescription consistent with CDI treatment within 2 weeks after hospital discharge. Since 47.5 % of the Medicare patients had Part D coverage, this would suggest that almost half of elderly persons diagnosed with CDI during an inpatient hospitalization continue CDI treatment after leaving the hospital. In the LabRx data almost two-thirds of episodes identified during a hospitalization were linked to outpatient CDI treatment. We identified temporally related treatment of outpatient CDI in at most one-half of persons with prescription drug coverage in the Medicare (46 %) and LabRx (49 %) data. In contrast, in the Premier data containing inpatient medications, 73 % of inpatient CDI episodes had evidence of treatment during the hospitalization. Despite lack of documentation of treatment for many CDI cases, particularly in the Medicare data, the overall incidence of CDI in the elderly of 677/100,00 pyo is remarkably similar to the incidence of 628/1000,000 elderly persons reported by the Centers for Disease Control and Prevention's Emerging Infections Program (EIP) for 2011 [2].
Lessa reported that 53 % of CDI events (159,700 community-associated + 81,300 community-onset, health care facility associated) in persons of all ages were community-onset using the EIP data [2], similar to the 41.4 % community-onset CDI episodes we identified in the Medicare data. In the recent publication using Kaiser Permanente data, 76 % of the CDI events were community-onset, with a total of 40 % characterized as community-onset community-associated [19]. This is lower than our finding that almost 90 % of the CDI events in the LabRx data from younger persons had onset in the community, with 68 % characterized as community-onset community associated CDI. The varying proportions of CDI with onset outside of the hospital in the Medicare and LabRx data compared to the EIP data may be related to differences in age of the populations. In the EIP data, 44 % of persons with CDI were < 65 years of age, and the proportion of CDI that is community-associated CDI is higher in younger populations [27]. Consistent with our current study, 52 % of laboratory-identified CDI during inpatient hospitalizations in the EIP were present at admission to the hospital [28].

Conclusions
The similarities between our findings concerning the incidence and site of onset of CDI from several different administrative databases with recent EIP results validate use of these administrative databases to identify populations at risk for CDI. We determined how results may be skewed when important information is missing, such as outpatient data and encrypted identifiers, and the advantages of using complete claims data allowing for substantiation of the CDI diagnosis using laboratory claims for CDI testing, pharmacy claims for CDI treatment, and other diagnoses consistent with CDI (e.g., diarrhea). Although there are limitations to use of administrative data, these databases offer the opportunity to analyze CDI from a population perspective, including data from many different hospitals and from other healthcare facilities. These databases can provide more complete information on the epidemiology of CDI and enrich our understanding of the impact of CDI on young and older persons in the U.S. In addition, the methods we developed to extract comparable information can be used to determine the incidence of other infectious diseases (e.g., MRSA, septicemia) and adverse events (e.g., deep venous thrombosis) in varying populations using a combination of different administrative databases.

Ethics approval and consent to participate
The Washington University Human Research Protection Office and Geisel School of Medicine at Dartmouth Committee for the Protection of Human Subjects gave approval to conduct this research with a waiver of informed consent.

Consent for publication
Not Applicable.

Availability of data and materials
Five different data sources were used for this study. None of the data sources can be shared by the authors, per data use agreements with the individual organizations. All of the data sources are available for purchase, as described below.
Medicare claims (Chronic Condition Warehouse 5 % random sample), obtained from the Centers for Medicare and Medicaid Services and the Research Data Assistance Center (www.resdac.org).
OptumInsight™  Table includes the

Competing interests
Dr. Olsen reports personal fees from Sanofi Pasteur, Merck, and Pfizer, and grants from Cubist Pharmaceuticals and Pfizer outside the submitted work; Dr. Young-Xu reports personal fees from Sanofi-Pasteur outside the submitted work; Mr. Stwalley reports other from Abbott Laboratories and Bristol-Myers Squibb outside the submitted work; Dr. Kelly reports personal fees from Astellas, Cubist, Optimer, Novartis, MedImmune, Merck, grants and personal fees from Sanofi-Pasteur, grants and personal fees from Optimer, grants from CSL-Behring, grants from Merck, and personal fees from QuantiaMed outside the submitted work; Dr. Kelly has a patent for Passive immunotherapy for CDI using IgA pending; Dr. Gerding holds patents for the treatment and prevention of CDI licensed to ViroPharma/Shire, is a consultant for Merck, Shire, Cubist, Rebiotix, Sanofi Pasteur and Actelion and holds research grants from CDC and US Dept of Veterans Affairs Research Service; Dr. Saeed has no disclosures; Dr. Mahé reports other from Sanofi Pasteur, outside the submitted work; Dr. Dubberke reports personal fees from Sanofi-Pasteur during the conduct of the study; grants from Microdermis, personal fees and other from Cubist, Merck, and Rebiotix outside the submitted work.