A surveillance method to identify patients with sepsis from electronic health records in Hong Kong: a single centre retrospective study

Background Currently there are only two population studies on sepsis incidence in Asia. The burden of sepsis in Hong Kong is unknown. We developed a sepsis surveillance method to estimate sepsis incidence from a population electronic health record (EHR) in Hong Kong using objective clinical data. The study objective was to assess our method’s performance in identifying sepsis using a retrospective cohort. We compared its accuracy to administrative sepsis surveillance methods such as Angus’ and Martin’s methods. Method In this single centre retrospective study we applied our sepsis surveillance method on adult patients admitted to a tertiary hospital in Hong Kong. Two clinicians independently reviewed the clinical notes to determine which patients had sepsis. Performance was assessed by sensitivity, specificity, positive predictive value, negative predictive value and area under the curve (AUC) of Angus’, Martin’s and our surveillance methods using clinical review as “gold standard.” Results Between January 1 and February 28, 2018, our sepsis surveillance method identified 1352 adult patients hospitalised with suspected infection. We found that 38.9% (95%CI 36.3–41.5) of these patients had sepsis. Using a 490 patient validation cohort, two clinicians had good agreement with weighted kappa of 0.75 (95% CI 0.69–0.81) before coming to consensus on diagnosis of uncomplicated infection or sepsis for all patients. Our method had sensitivity 0.93 (95%CI 0.89–0.96), specificity 0.86 (95%CI 0.82–0.90) and an AUC 0.90 (95%CI 0.87–0.92) when validated against clinician review. In contrast, Angus’ and Martin’s methods had AUCs 0.56 (95%CI 0.53–0.58) and 0.56 (95%CI 0.52–0.59), respectively. Conclusions A sepsis surveillance method based on objective data from a population EHR in Hong Kong was more accurate than administrative methods. It may be used to estimate sepsis population incidence and outcomes in Hong Kong. Trial registration This study was retrospectively registered at clinicaltrials.gov on October 3, 2019 (NCT04114214).


Background
Global burden of severe sepsis is estimated to affect 48.9 million patients and accounts for 11 million deaths each year [1]. These latest estimates are significantly higher than previously reported figure of 19.4 million patients with sepsis globally each year [2]. This is because previous estimates were exclusively based on data from 26 high income western countries alone. The only exception was a single study from Taiwan, which provided an estimate based on a 1% sample of the population cohort [3]. Nevertheless, the updated estimate of sepsis incidence in 2020 is likely still lower than the real world incidence since it is computed from death registry data [1].
Sepsis incidence is often estimated from administrative sepsis surveillance methods such as Angus' and Martin's definitions [4][5][6][7]. These methods use readily available discharge coding and administrative data to estimate burden of sepsis. However variation in case definition of sepsis by diagnostic coding lead to very different incidence estimates of sepsis [8,9]. Moreover, change in discharge coding habits result in an apparent but misleading change in incidence of sepsis over time [10]. Epidemiological research based on claims data alone is prone to bias as hospitals are reimbursed and paid accordingly. In addition, coding practice for sepsis has changed even in countries where coding is not used for insurance claims [11]. Increased coding for sepsis is in part due to better sepsis awareness rather than increasing true sepsis incidence alone [12,13]. More recently, clinical data from electronic health records (EHR) have been used to estimate sepsis incidence [14,15]. Compared to claims data, clinical data provides objective estimate that has more stable sepsis case definition and incidence over time [16].
There is currently limited epidemiological data on sepsis in Asia. Incidence of sepsis in Asia has mostly been exclusively studied in the Intensive Care Unit (ICU) [17][18][19]. Apart from a Taiwan population study, the only other population-based study from Asia is from a subdistrict of Beijing [3,20]. Although Hong Kong is one of the most densely populated regions in Asia, the local estimate of sepsis incidence is unknown.
The purpose of this feasibility study is to develop an objective sepsis surveillance method based on a population EHR in Hong Kong to study incidence and outcomes of sepsis in this region. We evaluated its performance in identifying sepsis and compare its accuracy to Angus' and Martin's sepsis surveillance methods using a retrospective cohort of patients in Hong Kong.

Study design
This was a single centre, retrospective study on hospitalised adult patients with infection or sepsis between January 1, 2018 and February 28, 2018. Patients were identified using the Clinical Data Analysis and Reporting System (CDARS), a population EHR database managed by the Hospital Authority, which provides more than 90% of acute care in Hong Kong [21,22]. CDARS contains diagnoses, procedure codes, admission and discharge data, medication records, operation records, laboratory and microbiology results of all outpatient and inpatients treated in Hong Kong public hospitals since 1995. We applied our database sepsis surveillance method (explained below) to all patients admitted to a tertiary teaching hospital in Hong Kong during the study period. We validated our method against clinician's manual review of clinical records and compared it to Angus' and Martin's definitions using the same cohort [4,5]. This study was approved by The Joint Chinese University of Hong Kong -New Territories East Cluster Clinical Research Ethics Committee with waiver of informed consent (2019.214) and is registered on clinicaltrials.gov (NCT04114214). This manuscript was prepared using the STROBE guidelines for reporting observational studies.

Identification of patients with suspected infection
To identify patients with suspected infection within CDAR S, we first identified all inpatients at who had at least one bacterial sensitivity or culture test during the study period ( Fig. 1). We excluded patients who only had screening microbiological tests such as methicillin-resistant Staphylococcus aureus nasal swab or vancomycin-resistant Enterococci rectal swab as these were performed for cohort screening rather than on suspicion of infection. First microbiological sampling date was set as reference day for each patient. Criteria on use of antibiotics to identify patients with infection are based on the United States Centre for Disease Control Prevention Hospital Toolkit for Adult Sepsis Surveillance (ASE) [23]. Patients were excluded if antibiotics were not started 2 days before or after first microbiological sampling. They were also excluded if duration of antibiotics was less than 4 days unless death occurred before the fourth day. Lastly, patients discharged without antibiotics before the fourth day were also excluded.

Identification of patients with sepsis
To identify patients with sepsis we applied the Sepsis 3 definition to patients identified with suspected infection from CDARS [24,25]. Patients with suspected infection with an estimated increase in sequential organ failure assessment (SOFA) score of ≥2 were classified as patients with sepsis. Patients with suspected infection but estimated ΔSOFA < 2 were classified as uncomplicated infection. The estimated ΔSOFA score is calculated by the difference between hospital SOFA and pre-hospital (baseline) SOFA. Similar to previous studies, hospital discharge diagnostic/procedural coding and clinical data were used to estimate each component of SOFA (supplementary Table 1) [14,15]. Hospital SOFA is the estimated SOFA within 2 days of first microbiological sampling. Baseline SOFA is defined as the estimated SOFA score before hospitalisation. For hospital SOFA estimation, only respiratory and neurological SOFA component included discharge diagnosis coding. All other organ scores were based on objective clinical data from laboratory results or procedure codes. Patients who did not have at least one of the three lab results (bilirubin, platelet or creatinine) measured within 2 days of first microbiological sampling were excluded. Consistent with previous methods, all missing values for components of SOFA were assumed to be normal [14,15]. Lastly we calculated the proportion of patients with missing laboratory data to assess for bias.

Validation
Similar to previous studies, we used a clinician reviewed validation cohort to assess the performance of our sepsis surveillance method [14,15]. Based on data from the United States, we assumed a sepsis prevalence of 50% amongst hospitalised patients with infection, expecting a sensitivity or specificity of 0.99, lower confidence limit of 0.04, a significance level of 0.05 and power of 0.95, the minimum sample size for a validation cohort was 208 in each group using "MKmisc" [14,26]. Anticipating missing data and diagnoses, we used a computer to randomly select 245 patients with sepsis and 245 patients with uncomplicated infection as classified by our surveillance method for clinician review. Two clinicians, blinded to the surveillance method results, each reviewed the clinical notes and laboratory results to determine whether patients had infection or sepsis according to Sepsis-3 definition [25]. Clinician disagreements were resolved by discussion. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), negative likelihood ratio (NLR) and area under the curve (AUC) of the receiver operating characteristic curve were used to assess the performance of our sepsis surveillance method, Angus' and Martin's definitions compared to classification by clinician review [4,5].

Statistical analyses
Normality of continuous data was assessed by visual inspection and Shapiro-Wilk's test. The chi-square test or Fisher exact test were used to compare differences in proportion. For non-parametric comparisons, Wilcoxon rank sum test was used and median with interquartile range (IQR) were reported. Cohen's kappa coefficient test was used to measure the inter-observer agreement between clinicians. Statistical analysis were performed in SPSS (Version 22.0) and RStudio (Version 3.6). Level of significance was set at p < 0.05.

Cohort identification and characteristics
From January 1, 2018 to February 28, 2018, there were 20,475 inpatient microbiological culture tests performed at Prince of Wales Hospital in Hong Kong (Fig. 2). After exclusions, there were 1385 unique adult patients who had antibiotics initiated within 2 days of first microbiological sampling and prescribed continuously for more than 4 days. Within this group, 33 patients were excluded because they did not have any laboratory results for in-hospital SOFA scoring. A final cohort of 1352 patients with suspected infection was identified. Of these 826 (61.1, 95%CI 58.5-63.7%) had uncomplicated infection whilst 526 (38.9, 95%CI 36.3-41.5%) had a ΔSOFA score of ≥2 and thus were classified as having sepsis. There were 3 patients in the uncomplicated infection group who had ΔSOFA = − 1 because their hospital SOFA was less than baseline SOFA.
The baseline characteristics of 490 patients in the validation cohort are shown in Table 1

Performance of our sepsis surveillance method
Clinicians had an initial inter-rater agreement of weighted kappa of 0.75 (95% CI 0.69-0.81). They were able to reach consensus on diagnosis of uncomplicated infection or sepsis for all cases after discussion. The performance of our method, Angus' and Martin's methods are shown in Table 2. Compared to clinician review our method misidentified sepsis in 53 patients (supplementary Table S2). The most common reason was due to non-infection (15/53). Other reasons for incorrect identification were due to inaccurate or missing objective data: dementia code (8/53), oxygen therapy code (9/53), non-invasive ventilation code (7/53), baseline creatinine (3/53), baseline platelet (6/53), baseline bilirubin (2/53), hospital GCS score (1/53), long term home oxygen therapy code (1/53) and mechanical ventilation without respiratory failure (1/53).

Discussion
We developed a sepsis surveillance method based on stored objective clinical data from a population EHR in Hong Kong. It had an AUC of 0.90 (95% CI 0.87-0.92) to distinguish patients with sepsis from a cohort of hospitalised adult patients with suspected infection. It performed better than Angus' and Martin's sepsis surveillance methods in capturing patients with sepsis from EHR in Hong Kong. Since coding system and laboratory results are shared across all public hospitals in Hong Kong, it may be feasible to use our surveillance method to calculate the local population incidence of sepsis in this region.
In our cohort, we found that 39% of patients hospitalised with infection developed sepsis, which is similar to data from the United States [14]. Similar to studies that included non-ICU patients, we showed that 86.5% of septic patients in this cohort were managed outside the ICU [27,28]. Therefore sepsis epidemiological studies that only capture septic patients in the ICU would underestimate the burden of sepsis. This is particularly important in Hong Kong and other parts of Asia since critical care resources are limited compared to western countries [29,30]. Overall, 28-day mortality for all patients with sepsis in our cohort was 20.4% (95%CI 15.7-25.8), which is lower than the hospital mortality of 44.5% (95%CI 41.8-47.2) for patients with sepsis described in the multicentre MOSAICS study from Asian ICUs [31]. The lower overall estimate of sepsis mortality is because our method captured both patients with less severe sepsis treated on the wards and patients with severe sepsis who are managed in the ICU.
Our approach differs from previously published objective methods to estimate sepsis from EHR in several ways. First previous sepsis surveillance methods either did not account for preadmission comorbidities or assumed best laboratory values during hospitalisation as baseline values [4,14]. In contrast, we accounted for patient's pre-hospital organ dysfunction to generate a baseline organ function score and is similar to a recently reported sepsis surveillance method from Valik et al. in Sweden [15]. Thus we captured patients who at baseline had organ dysfunction and were admitted to hospital with an infection but did not constitute sepsis since there were no changes in their organ dysfunctions from baseline. Epidemiologically it is important to distinguish patients with sepsis from patients who have infection with a background of organ dysfunction as their pathophysiology differ. Second, similar to Valik et al.'s method, we used graded SOFA component scores to account for severity of organ dysfunction [15,25]. This is different to threshold based scoring where each organ either has or has not failed [14]. Third, we identified cases of suspected infection by any microbiological sampling rather than blood culture alone since the latter assumes septic patients always have blood cultures performed. In fact, in our cohort 82.1% of septic patients identified by clinician review did not have blood cultures taken. Thus using Rhee's et al.'s approach of defining onset of sepsis by blood culture is not feasible in our setting [14]. Lastly, we used the US CDC's ASE definition of 4 days of antibiotics rather than 2 doses that Valik et. al used. Together, these differences explain why our method appears to be more sensitive but less specific compared to Rhee et al.'s (sensitivity 0.697 and specificity 0.981) and Valik et al.'s (sensitivity 0.887 and specificity 0.985) methods [14,15].
Sepsis surveillance using administrative and coding data have generally shown a steady increase in sepsis incidence over time with decreasing mortality [5,32].   However estimates from clinical data showed minimal change in population incidence of sepsis [14]. Locally, our study demonstrated that coding based sepsis surveillance such as the Angus' or Martin's methods have very low sensitivity to identify septic patients in Hong Kong. This would result in significant underestimation of sepsis burden in Hong Kong. This is likely due to incomplete discharge summary coding in our population EHR. Likewise, low sensitivity but high specificity of these two diagnostic coding based methods have been reported in patient cohorts based in the United States [33].
In contrast, several studies have demonstrated the utility of using objective clinical data from EHR to provide consistent sepsis surveillance [14,15]. This has particular advantages as it is less prone to deficiencies in coding, change in coding practices or bias from claims benefits. However there is a need validate this approach in diverse populations and different EHR systems [34]. Our study is the first to demonstrate that this a feasible approach in an Asian healthcare setting and population to provide accurate and consistent sepsis epidemiological data. In Asia, the National Electronic Health Record of Singapore, Hospital Information of Thailand and Hospital Information System of West China are large EHR that contain laboratory test values, procedure records and diagnosis of more than 11 million patients combined [35]. Apart from Hong Kong, it may be possible to apply a similar sepsis surveillance method in these countries and regions with existing EHR databases to provide much needed local population estimates of sepsis.
There are a few limitations with our sepsis surveillance method. First, oxygen and non-invasive ventilation use were not captured well as it was often not recorded in the EHR. Second, we used first microbiological sampling as reference day which identified community acquired sepsis better than for hospital acquired sepsis. We could not capture patients who had cultures taken despite initial admission for non-infectious reasons and then subsequently developed nosocomial sepsis. These limitations are not unique to our method and are also weaknesses in previous studies [16]. Third, our method falsely attributed some non-infectious causes of multiorgan failure as sepsis. Fourth, we are unable to capture hypotension treated with fluid boluses since regular blood pressures recordings are not available. Fifth, we had to use either a single GCS reading upon hospital or ICU admission or diagnostic coding due to missing data. Sixth, we assumed patients had normal values for missing laboratory tests, which is an accepted limitation in previous methods [14,15]. Although 22.7-27% of  AUC area under the curve, PLR positive likelihood ratio, PPV positive predictive value, NLR negative likelihood ratio, NPV negative predictive value patients did not have prehospital bilirubin, platelet or creatinine results, these patients had normal median hospital bilirubin, platelet and creatinine. Normal hospital laboratory results suggests these were likely previously healthy patients with normal baseline results. It supports the assumption for baseline SOFA component score of 0 in patients with missing baseline data. In addition, the low Charlson Comorbidity Index in our cohort is in agreement with these assumptions. Furthermore, only 0.2-2.2% of patients in our validation cohort had missing bilirubin, platelet or creatinine results. Seventh, it cannot identify septic patients in whom infection is not recognised by the treating clinicians and thus no microbiological samples were sent or no antibiotics were given. Lastly we are unable to identify patients who were not admitted to hospital but treated for sepsis in the community. As this was a single centre feasibility study, further multicentre data would be needed to validate our surveillance method in Hong Kong.

Conclusion
In this feasibility study we found that an objective sepsis surveillance method based on clinical data from a population EHR in Hong Kong to identify sepsis was more accurate than methods based on administrative data. Although burden of sepsis in Hong Kong is currently unknown, our surveillance method may be used to provide an accurate estimate of population incidence and outcomes of sepsis in Hong Kong.