A population-based analysis of invasive fungal disease in haematology-oncology patients using data linkage of state-wide registries and administrative databases: 2005 - 2016

Background Little is known about the morbidity and mortality of invasive fungal disease (IFD) at a population level. The aim of this study was to determine the incidence, trends and outcomes of IFD in all haematology-oncology patients by linking Victorian hospital data to state-based registries. Methods Episodes of IFD complicating adult haematological malignancy (HM) and haematopoietic stem cell transplantation (HSCT) patients admitted to Victorian hospitals from 1st July 2005 to 30th June 2016 were extracted from the Victorian Admitted Episodes Dataset and linked to the date of HM diagnosis from the Victorian Cancer Registry and mortality from the Victorian Death Index. Descriptive analyses and regression modelling were used. Results There were 619,702 inpatient-episodes among 32,815 HM and 1,765 HSCT-patients. IFD occurring twelve-months from HM-diagnosis was detected in 669 (2.04%) HM-patients and 111 (6.29%) HSCT-recipients, respectively. Median time to IFD-diagnosis was 3, 5, 15 and 22 months in acute myeloid leukaemia, acute lymphoblastic leukaemia, Hodgkin lymphoma and multiple myeloma, respectively. Median survival from IFD-diagnosis was 7, 7 and 3 months for invasive aspergillosis, invasive candidiasis and mucormycosis, respectively. From 2005-2016, IFD incidence decreased 0.28% per 1,000 bed-days. Fungal incidence coincided with spring peaks on time-series analysis. Conclusions Data linkage is an efficient means of evaluating the epidemiology of a rare disease, however the burden of IFD is likely underestimated, arguing for better quality hospital level surveillance data to improve management strategies. Electronic supplementary material The online version of this article (10.1186/s12879-019-3901-y) contains supplementary material, which is available to authorized users.


Background
Invasive fungal disease (IFD) represents a significant challenge in the management of patients with haematological malignancies (HM) undergoing cytotoxic chemotherapy and/or haematopoietic stem cell transplantation (HSCT) [1]. IFD is associated with a high mortality ranging from 29-90% [1,2] and may affect long-term leukaemia outcomes by delaying or modifying curative chemotherapy or HSCT [3]. Few studies have evaluated the value of data linkage for IFD surveillance [4] and none have focused on the disease burden of these infections at a population-level in Victoria, Australia.
Administrative datasets are an efficient source of epidemiological data [5], yet their utility for IFD surveillance in Australia has not been well studied. The only population-based analysis of IFD in Australia used hospital discharge-coded data from 1995 to 1999 and showed that invasive candidiasis (IC) was more common than invasive aspergillosis (IA) representing 0.36% and 0.03% of all acute hospital discharges, respectively, and were associated with mortality rates between 8-26% for both IC and IA [4]. Importantly, these data predated the introduction of broad-spectrum triazole antifungal drugs that have resulted in a shift in fungal epidemiology to filamentous moulds [6] and it excluded the second most populous state in Australia, namely Victoria, with a population of 6.39 million residents [7]. The availability of state-based datasets has afforded an opportunity to revisit IFD disease burden and trends among haematology patients capturing the era of potent mould-active antifungal therapies and improvements in supportive care in cancer [8].
In this study, we linked existing population-based datasets and state registry data to characterise the epidemiology of IFD among the HM and HSCT populations across Victoria. The Victorian Admitted Episodes Dataset (VAED) is Australia's largest hospital morbidity database and comprises demographic, administrative and clinical information coded according to the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification (ICD-10-AM) associated with every hospitalisation in Victorian public and private hospitals [9]. The Victorian Cancer Registry (VCR) has recorded all cancer diagnoses from 1982 with the exception of basal and squamous cell carcinomas of the skin in Victorian residents [10], but is only available for haematological malignancies from 1 st January 2008 to the 31 st December 2014. Overall, in-hospital and out-of-hospital mortality was evaluated with linkage to the Victorian Death Index (VDI), thus allowing comparisons of survival in patients with and without IFD. We performed data linkage between the VAED, VCR and VDI to characterise the epidemiology of IFD at a population-level over a decade in order to evaluate trends, risk-factors and to identify patient groups at high-risk for IFD.

Study design and setting
This was an observational, retrospective, longitudinal study of adult patients (≥16-years) diagnosed with a HM and/or post-allogeneic-(allo) or autologous (auto)-HSCT across Victorian public and private hospitals. All reporting parameters are consistent with the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) Statement [11] (Additional file 1).

Data sources, linkage and clinical definitions
VAED data were linked to the dates of death from the VDI between the 1 st July 2005 and the 30 th June 2016. The datasets were linked by the Victorian Data Linkages Unit (VDLU) using probabilistic and stepwise deterministic linkage. A linkage map based on encrypted statistical linkage keys for every record across each dataset was assigned to differentiate individual patients as well as reports of multiple episode-of-care for any given patient (Fig. 1).
All patients diagnosed with a HM diagnostic-or HSCT procedural code were included. An episode-of-care was defined as a hospitalisation. After linkage of the VAED with the VCR, index hospitalisation for a HM was defined as the first appearance in an episode of an ICD-10-AM code denoting a HM (Additional file 2) appearing soon after the date of HM diagnosis as recorded in the VCR (e.g. C92, acute myeloid leukaemia (AML) appearing after the date of diagnosis from the VCR). For hospitalisations outside the dates of the VCR, i.e. before 1 st January 2008 or after 31 st December 2014, the first episode-of-care recorded in the VAED was assumed to be the index hospitalisation. A HSCT-recipient was defined by the first procedural ICD-10-AM code denoting a HSCT with the corresponding date in month and year format in the VAED.
We assessed IFD incidence as the first appearance of an ICD-10-AM code for IFD in the twelve-months from index hospitalisation for a HM-patient and twelve-months post-transplantation among the HSCT-cohort. Exclusion criteria included paediatric patients (<16-years) and HMor HSCT-patients diagnosed with endemic or superficial fungal infections (refer to Additional file 3 for excluded codes). Cases were defined as HM or HSCT patients who had an IFD-code assigned in the first twelve-months post HM-diagnosis or HSCT and controls were defined as patients with no IFD diagnostic code. Duplicate IFD codes occurring in the same twelve-month time period were treated as the same IFD. No censoring was used when investigating the median time to IFD-onset among the HM-and HSCT-cohorts. Hospitalisation for induction chemotherapy was defined as an episodeof-care where an ICD-10-AM procedural code for chemotherapy first appeared either during or after the index hospitalisation.

Statistical analyses
Chi-squared (χ 2 ) and Wilcoxon rank-sum tests were used to compare the statistical significance between two categorical and non-parametric continuous covariates, respectively. A multivariable logistic regression was used to identify risk-factors for IFD since induction chemotherapy. Variables with a p<0.20 on univariate-and p<0.05 on multivariate analysis were included in a manual stepwise backward elimination process. To quantify the percent risk of IFD twelve-months from induction chemotherapy, a sigmoid function that uses marginal standardisation and prediction at the means described elsewhere was applied [12,13]. A receiver operating curve and its C-statistic (Additional file 4) examined the model's discrimination power and a Hosmer-Lemeshow χ 2 test assessed the model's calibration power.
Kaplan-Meier analysis and log-rank test were used to compare survival (in months) among HM-patients with and without IFD from the date of HM-and IFD-diagnosis, respectively. A time-series analysis of IFD incidence risk-adjusted for bed-day occupancy was developed. A two-sided p-value <0.05 was considered statistically significant. All statistical analyses were undertaken using Stata/SE v14.2 software (StataCorp® LLC, College Station, Texas, U.S.A.).

Ethics
Written ethics approval was granted by the Alfred Health Human Research Ethics Committee (project number: 93/17).

Patient characteristics
A total of 32,815 HM-patients were identified from 619,702 hospitalisations recorded in the VAED from 1 st July 2005 to 30 th June 2016. IFD occurred in 669 (2.04%) patients within twelve-months following HM-diagnosis. Among 1,765 allo-or auto-HSCT-recipients, 111 (6.29%) were diagnosed with an IFD twelve-months post-transplantation ( Table 1).

Survival analysis and risk of mortality
The hazard ratio for IFD-onset post-HM-diagnosis was 1.24 (95% CI: 0.88-1.93; p=0.329) denoting that the instantaneous risk of mortality was 1.24-times greater when a HM-patient developed an IFD within twelve-months from index hospitalisation. After stratifying by IFD, the shortest median survival time was for mucormycosis-  infected patients (median survival time: 3-months from IFD diagnosis), followed by IA and IC (7-months each) (Fig. 4).
Time-series analysis of the risk-adjusted incidence of invasive fungal disease Over the 11-year study period, the incidence of IFD per 1,000 bed-days decreased by 0.28% (Fig. 5). A seasonal trend in IFD incidence was evident with peaks coinciding with the onset of spring (September to November in Australia).

Risk factors for invasive fungal disease
A multivariate analysis assessing risk factors for IFD identified 10 significant covariates, including neutropenia, acute renal failure, ICU admission, residence in rural Victoria, haemodialysis, viral infection, Clostridium difficile infection, haematological malignancy, increasing age in years and admission to a metropolitan hospital (Table 3).

Discussion
This is the first comprehensive study of IFD incidence and survival in Victoria among haematology patients over a period of 11 years and highlights the possibilities of data linkage, but also the shortcomings of administrative data for surveillance of a rare disease. The most striking finding from this study is the low overall incidence of IFD among HM-patients (2.04%) and HSCT-recipients (6.29%). It is likely that IFD is under-reported at a hospital level in coding data [14,15] and this translates into the data generated by the VAED. Despite this shortcoming, we were able to identify periods of high-risk for a range of HMs, seasonal trends in IFD and an overall decrease in IFD incidence over the 11 years. In addition, access to a high number of clinical covariates allowed for exploration of risk-factors for IFD through multivariate regression analysis that may assist in tailoring preventative therapies like antifungal prophylaxis according to individual risk. The epidemiological trends in IFD incidence and mortality in the HM population has historically been limited to institutional-specific reports and multicentre studies focusing predominantly on IA, IC and mucormycosis [1,4,16]. By contrast, through data linkage of hospital administrative data (VAED) with state-based registries (VCR and VDI), we described epidemiological trends among all HM-patients. Mould diseases predominated in keeping with global trends [6], accounting for 61% of IFD compared to 39% due to invasive candidiasis. Among mould diseases, IA was the predominant species (91%), followed by mucormycosis (8.76%); a finding concordant with recent studies [17,18]. Mucormycosis most commonly affected allo-HSCT-recipients (1.19%), followed by ALL (0.75%) and AML (0.45%) patients and was associated with the shortest median survival time of 3-months compared to 7-months each for IA and IC. The emergence of mucormycosis as the predominant non-Aspergillus mould is consistent with the largest  [4 -45]; CML, 12 [2 -25]; HL, 15 [1 -29]; MM, 22 [7 -36]; NHL, 6 [2 -18]; allogeneic-HSCT, 4 [2 -18]; autologous-HSCT, 7 [5 -10] multicentre surveillance study of IFD epidemiology in HSCT recipients [6] and is likely due to several factors including longer survival post-HSCT [6,19].
A higher IFD incidence (11%) in ALL compared to AML (9.42%) is intriguing but confirms that the ALL cohort is an emerging subgroup at high-risk of IFD with a variable fungal incidence ranging from 6.5-12% [20,21]. Prophylaxis with azole antifungals is contra-indicated due to the drug-drug interactions with vinca alkaloids used in ALL treatment regimens [22]; but the lack of an approved standard of care from clinical trial data [21] means that clinical variation in prophylactic strategies for ALL patients is likely [23]. Patients with CLL have the third-highest IFD incidence (1.33%) and are increasingly recognised as being at high-risk of IFD due to a shift from chemo-immunotherapies to agents targeting specified B-lymphocyte pathways [24]. Indeed, IFD incidence in non-Hodgkin lymphoma (NHL) (1.26%) was the fourth highest of all HM (Table 2) which may reflect the effects of multi-agent chemotherapy in combination with immunotherapy used to treat NHL [25].
Attempts at clinical risk-stratification for IFD have been crude and restricted to broadly identifying low-, intermediate-and high-risk groups [20] in a large part  because large datasets for a rare disease like IFD do not exist [26]. We confirmed risk-factors that are associated with IFD including viral infections [27], admission to a rural hospital that may reflect rural place of residence [20] and Clostridium difficile infection (p<0.05) which has not been previously described as a risk-factor for IFD but is prevalent in immunocompromised populations [28]. In addition, access to a high number of clinical covariates allowed exploration of a predictive tool to quantify IFD-risk at the patient-level informed by a range of risk-factors elucidated on multivariate regression analysis. We identified periods of high-risk for IFD from the time of HM diagnosis with the shortest median time seen in AML-patients (3-months) and the longest in patients with MM (22-months). The latter finding reflects the cumulative immunosuppression associated with successive lines of therapy, including immunomodulatory chemotherapies and prolonged corticosteroid exposure that is characteristic of myeloma treatment [29]. Consistent with intervals of high-risk described in Hammond et al. [30], risk-periods of IFD for other HMs were defined including ALL (5-months) and NHL (6-months) [20]. The shorter median time to IFD-onset after transplantation among GVHD-positive-(1-month) compared to GVHD-negative-HSCT-recipients (6-months) reflects the increased immunosuppression associated with GVHD and its treatment [31] (Additional file 5). During the study period, there was an overall decreasing trend in IFD incidence in Victoria. The 0.28% decline in IFD incidence from 2005-2016, is contrary to the overall 3.5% increase observed in an earlier retrospective study from 1995-1999 [4]. This progressive decrease in IFD incidence is likely multifactorial and related to improved supportive care encompassing broad-spectrum antifungal prophylactic regimens for some subgroup of patients (e.g. AML, HSCT-recipients with GVHD), coupled with improved diagnostic investigations [32], clinical guidelines for IFD [33], better management of GVHD [34], cytomegalovirus prevention [27] and the introduction of high-efficiency particulate air filtration systems into some transplantation wards [35]. While an intensive diagnostic approach incorporating non-culture-based tests increases diagnostic yield [36] and corresponding fungal incidence, their availability is limited with only 35% of centres in a national Australian survey providing on-site Aspergillus galactomannan (GM) or polymerase chain reaction (PCR) diagnostic tests. Therefore, it seems likely that the decline in IFD incidence we observed may be explained by the uptake of mould-active prophylaxis targeting high-risk groups, as seen in a major Victorian transplant centre, which reported a reduction in IFD incidence in patients with AML from 25% with fluconazole use to 3% with posaconazole use over a 12-year period [37]. Indeed, this practice is widespread, with a nationwide survey reporting that posaconazole prophylaxis was used in 90% of AML patients undergoing chemotherapy and 68% of allogeneic-HSCT recipients, with lower rates among ALL patients of 53% [38], highlighting the lack of a standardised approach in this patient group. Consistent with the 5.7% increase in IA incidence during the warmer months as reported by Panackal et al. [39], the peaks in IFD incidence at the onset of spring indicates seasonality not previously described in the southern hemisphere (Fig. 5). This knowledge could ensure that preventative strategies, coupled with enhanced surveillance, also take seasonality into consideration.
Linkage of administrative and clinical datasets could potentially improve knowledge discovery for a rare disease such as IFD, but is contingent on the completeness of hospital-level data collection. Cancer surveillance systems that leverage data linkage between the VCR and clinical registries is considered a technological solution to more accurately determine the epidemiology of rare leukaemia in Victoria [40]. Limited international [14] and Australian data [15] suggest that IFD are under-reported in hospital administrative systems. This is in a large part because fungal surveillance is difficult requiring multidisciplinary input followed by adjudication of cases according to complex definitions [41]. Chang et al. described the poor sensitivity of coding data of 32% for proven/probable IA in HSCT-recipients and its poor positive predictive value of 15% [14]. However, the quality of coding practice is dependent on the quality of medical record documentation, particularly discharge summaries and this has been shown to be suboptimal for IFD even when fungaemia was present [15]. Institutional underreporting has implications for hospital reimbursement but also diminishes the utility of large datasets for rare disease surveillance. Furthermore, the fact that no HSCT-recipient with an IFD was admitted to the ICU in our study is implausible considering that patients with mucormycosis frequently have multiple surgeries and require ICU support (Table 1) [6]. The introduction of sensitive machine learning-based data analytics [42] could enable real-time surveillance of IFD and improve the quality of fungal reporting at the hospital level where most of these infections are managed.
There are several limitations to this study. The quality of coding data for IFD is the foremost consideration as previously discussed. Linkage of the VAED with the VCR was only available between the 1 st January 2008 and the 31 st December 2014. Thus, we relied on the VAED to identify index hospitalisations for the other years without verification against confirmed HM-diagnosis from the VCR. Secondly, as a retrospective study, our analysis is subject to misclassification or miscoding of IFD [14]. Finally, the risk-factors we identified from multivariate analysis require validation against a separate dataset, but large datasets for IFD are currently unavailable due to the lack of comprehensive surveillance systems.

Conclusions
The true burden of IFD among haematology-oncology patients is difficult to accurately determine from hospitalbased data. We hypothesise that the true incidence is likely to be higher but without implementation of surveillance systems, it will remain underestimated. The migration of hospital systems both within Victoria, and globally, to the electronic medical record provides an opportunity to improve IFD surveillance through innovative data mining techniques [42].