Comparison of mortality risk evaluation tools efficacy in critically ill COVID-19 patients

Background As the COVID-19 pandemic continues, the number of patients admitted to the intensive care unit (ICU) is still increasing. The aim of our article is to estimate which of the conventional ICU mortality risk scores is the most accurate at predicting mortality in COVID-19 patients and to determine how these scores can be used in combination with the 4C Mortality Score. Methods This was a retrospective study of critically ill COVID-19 patients treated in tertiary reference COVID-19 hospitals during the year 2020. The 4C Mortality Score was calculated upon admission to the hospital. The Simplified Acute Physiology Score (SAPS) II, Acute Physiology and Chronic Health Evaluation (APACHE) II, and Sequential Organ Failure Assessment (SOFA) scores were calculated upon admission to the ICU. Patients were divided into two groups: ICU survivors and ICU non-survivors. Results A total of 249 patients were included in the study, of which 63.1% were male. The average age of all patients was 61.32 ± 13.3 years. The all-cause ICU mortality ratio was 41.4% (n = 103). To determine the accuracy of the ICU mortality risk scores a ROC-AUC analysis was performed. The most accurate scale was the APACHE II, with an AUC value of 0.772 (95% CI 0.714–0.830; p < 0.001). All of the ICU risk scores and 4C Mortality Score were significant mortality predictors in the univariate regression analysis. The multivariate regression analysis was completed to elucidate which of the scores can be used in combination with the independent predictive value. In the final model, the APACHE II and 4C Mortality Score prevailed. For each point increase in the APACHE II, mortality risk increased by 1.155 (OR 1.155, 95% CI 1.085–1.229; p < 0.001), and for each point increase in the 4C Mortality Score, mortality risk increased by 1.191 (OR 1.191, 95% CI 1.086–1.306; p < 0.001), demonstrating the best overall calibration of the model. Conclusions The study demonstrated that the APACHE II had the best discrimination of mortality in ICU patients. Both the APACHE II and 4C Mortality Score independently predict mortality risk and can be used concomitantly.


Background
As the SARS-CoV-2 (COVID- 19) pandemic has continued in the year 2020, the number of COVID-19 patients admitted to the intensive care unit (ICU) around the world has severely increased. According to several reports, over 20% of patients hospitalized with COVID-19 are admitted to the ICU [1]. Despite working at their maximum capacity and increasing the number of beds and personnel, these services are overstretched, leading to worse clinical outcomes [2]. Having regard for the collapsing health care systems, this might raise the question of triage criteria amendments, since not all patients can or should be admitted to the ICU [3][4][5]. Furthermore, the strategy of admission to the ICU varies in different Open Access *Correspondence: vaidas555@gmail.com 1 Clinic of Anaesthesiology and Intensive Care, Institute of Clinical Medicine, Faculty of Medicine, Vilnius University, Vilnius, Lithuania Full list of author information is available at the end of the article countries. For example, in Belgium, more patients died in the wards (72%) than in the ICU (28%) [6]. The chances of survival are affected by many variables, including the diagnosis upon admission, the patient's comorbidities, the severity of organ failure, the patient's age and the patient's health status before admission. Therefore, it is critical to triage the patients vigorously, determining which patients have the best chances of successful treatment [7].
There are several scores used in the ICU to help clinicians estimate the mortality risk of patients. Three of the most common are the Simplified Acute Physiology Score (SAPS) II, Acute Physiology and Chronic Health Evaluation (APACHE) II and Sequential Organ Failure Assessment (SOFA) score. The SOFA score uses clinical parameters and laboratory values, while the SAPS II and APACHE II also include age, history of severe organ failure or chronic disease and type of admission. The APACHE II and SAPS II should be calculated on newly admitted patients, while the SOFA score can be recalculated every 24 h [8]. All of these scales have perfect calibration and discrimination across all ranges of possible values, determining the risk of mortality from 1% to almost up to 100%. However, it is important to note that the APACHE II was originally developed to fit all kinds of ICU populations. Therefore, it might not be as precise when evaluating specific patient groups and individual patients. The SAPS II has been built in an European/ North American environment, which is important while evaluating patients on different continents [9,10]. Furthermore, all these scores are focused on a momentary evaluation and do not give enough attention to the previous state of the patient, both chronic comorbidities and ongoing decompensation. As we have established, none of these scores are perfect in every setting, and none of them is specific to one illness, being less accurate in patients which have a particular disease.
Since the beginning of the COVID-19 pandemic, the patients with the highest mortality were those with chronic diseases, such as arterial hypertension, diabetes, heart failure, obesity and chronic kidney disease [11]. Due to these particularities, COVID-19 merits a risk stratification model of its own. Thus, the 4C Mortality Score was developed in the year 2020 in the middle of the pandemic. Using eight different parameters to evaluate patients, it was originally tested in a population that was fully Caucasian, 43% male, had a mean age of 73 and mortality rate of 32.3% [12]. The 4C Mortality Score is designed to be implemented at the moment of the hospitalization and was not intended to be used when admitting the patient to the ICU. However, it is highly specific to COVID-19, as it encompasses the parameters that are the most critical in this disease (i.e., those that reflect respiratory function and inflammatory processes) and patient demographics and comorbidities. It is likely that the effect of the 4C Mortality Score determinants remain present during treatment of the patients, aiding in deterioration, transfer to the ICU and, sequentially, mortality of the population.
The aim of this study was to estimate which of the conventional ICU mortality risk scores is the most accurate at predicting mortality in COVID-19 patients and to determine how can these scores be used in combination with the 4C Mortality Score.

Study population
This was a retrospective study of patients who were admitted to a tertiary referral university hospital in the year of 2020 and tested positive for SARS-CoV-2. Ethical approval was gained from the Regional Research Ethics Committee to conduct the study. Inclusion criteria were as follows: tested positive for SARS-CoV-2 infection, 18 years or older and admitted to the intensive care unit (ICU).

Mortality risk evaluation
The patients were evaluated two times during the study. Upon admission to the hospital, the 4C Mortality Score was calculated. Upon admission to the ICU, the APACHE II, SAPS II and SOFA scores were implemented. All the scores were used as suggested by the creators of the scores [9,10,12,13].

Definition of the outcome and groups in the study
Mortality was set as all-cause mortality in the ICU. All the cases in the study had been either discharged or deceased during collection of the data. Mortality ratios were calculated and standardized according to the recommendations of the authors of the risk evaluation tools. The patients were split into two groups: ICU survivors and ICU non-survivors.

Statistical analysis Descriptive analysis
Statistical analysis was carried out by the SPSS statistical software package version 26.0 (IBM/SPSS, Inc., Chicago, IL). Baseline characteristics were defined using descriptive statistics. Categorical variables were stated as an absolute number (n) and a relative frequency (%), and continuous variables were represented as a median (interquartile range) or as a mean (± SD), depending on the normality of the distribution. The normality of distribution was tested by the one sample Kolmogorov-Smirnov test.

Comparison of survivors and non-survivors
To compare the categorical variables, the chi-square test was performed. To compare the continuous variables, the independent samples t-test was used for the normally distributed data and the Mann-Whitney test for the non-parametric data.

Standardized mortality ratio calculation
The standardized mortality ratio (SMR) represents the excess mortality and was calculated using the observed number of lethal cases divided by the predicted number of lethal cases. The observed count was obtained from the study data. The predicted number was obtained by implementing the percentage of mortality risk from all the tools used (APACHE II, SOFA, SAPS II and 4C Mortality Score). The individual values of the risk scores were averaged to represent the study population.

Accuracy testing
The receiver operating characteristic area under the curves (ROC-AUCs) were examined to identify the accuracy of discrimination of the APACHE II, SAPS II, SOFA and 4C Mortality Score.

Regression analysis
To determine the independent predictive value of all the risk scores, these scores were integrated into a forward logistic regression analysis. The Lemeshow-Hosmer goodness-of-fit test was used to evaluate calibration.

Baseline characteristics
A total of 249 patients were included in the study, of which 63.1% were male. The mean age of the patients was 61.32 ± 13.3 years. Most of the patients were aged 50-70 years old (55.4%). The highest mortality was observed in the > 80 years age group (63.6%). The most common comorbidities were obesity (28.9%), hypertension (75.9%), and chronic cardiac disease (46.6%) ( Table 1).
SMRs were calculated, revealing several times higher values for the APACHE II (SMR = 2.84), SOFA (SMR = 4.14) and SAPS II (SMR = 4.14). The SMR of the 4C Mortality Score was 1.05, showing a good concordance to the actual mortality rate of the group ( Table 1).

Accuracy of mortality risk scores
To determine their accuracy of discrimination, a ROC-AUC analysis of the ICU mortality risk scores was performed. All the risk scores were good predictors of mortality, generating ROC-AUC values above 0.5. The most accurate scale was the APACHE II, with an AUC value of 0.772 (95% CI 0.714-0.830; p < 0.001). The 4C Mortality Score had an AUC value of 0.754 (95% CI 0.694-0.814; p < 0.001). These results are presented in Fig. 1 and Table 2.

Regression analysis of mortality risk scores
Univariate regression analysis was performed to determine the link between the risk scores and ICU mortality. All of the ICU risk scores and 4C Mortality Score were significant mortality predictors in the analysis, with acceptable calibration. The results are presented in Table 3.
Multivariate regression analysis was performed to elucidate which of the scores can be used together with the independent predictive value. In the final model, the APACHE II and 4C Mortality Score prevailed, with the best fit of the model (χ 2 = 4.72; degrees of freedom 8; p > 0.787). For each point increase in the APACHE II, mortality risk increased by 1.155 (OR 1.155, 95% CI 1.085-1.229; p < 0.001), and for each point increase in the 4C Mortality Score, mortality risk increased by 1.191 (OR 1.191, 95% CI 1.086-1.306; p < 0.001). Results are presented in Table 3. The R 2 value of the final model was 0.358, suggesting a polymodal origin of the mortality in the ICU, which is supposed to not only be determined by the pre-hospitalization factors.

Discussion
The main finding of the study is that the APACHE II score was the most accurate and had the best discrimination at predicting mortality risk in COVID-19 patients treated in the ICU. However, the best calibration was observed when the 4C Mortality Score was added to the model. Therefore, the APACHE II and 4C Mortality Score independently predict mortality risk and can be used concomitantly.
One of the main findings of the study is that conventional ICU mortality risk scores perform quite well in COVID-19 patients. In our study, the mean values of the APACHE II, SAPS II and SOFA scores are comparable with other reports. The overall mean APACHE II score  [14][15][16]. Furthermore, in our study, the APACHE II score prevailed as the most accurate one with the highest AUC of all the scores (0.772) [17,18]. These results correspond with the report from Sao Paolo (AUC 0.8). However, the overall accuracy is slightly lower than expected and reported in the literature [19]. Furthermore, the SMRs are several times higher than expected in these patients. This can be explained by the pathophysiology of the COVID-19 disease, which affects several organ systems far more extensively (i.e., respiratory and coagulation) in the beginning of the disease. Thus, the conventional scores, even when giving a maximum score in these dimensions, may under evaluate the overall mortality risk of these patients. Secondly, the 4C Mortality Score fits into the risk prediction model with the APACHE II score for ICU patients. In studies comparing the risk scores developed specifically for COVID-19 patients, the 4C Mortality Score was the most accurate, with AUCs of 0.799 and 0.774, and 0.754 in our study. This score has a more extensive evaluation of the comorbidities of patients than conventional ICU risk scores, which tend to rely on stratifying the on-the-spot evaluation of the organ systems. In our study, the combination of the APACHE II and 4C Mortality Score increased the calibration of the risk determination model in the regression analysis [12,18]. Despite this, the R 2 value of 0.358 in the final model was only satisfactory when predicting the mortality in this group. It is obvious that the clinical course in the ICU and the application of mechanical ventilation, renal replacement therapy and other treatment options are major contributors to the outcome. Thus, further studies should be done either to elucidate more risk factors or to define the key moments in the treatment of this specific population.
It is important to discuss the potential clinical implementation of our findings. In the case of the pandemic and the overload of patients, the most valuable feature of the mortality score is good discriminative performance and pragmatic identification of the patients who are likely   to die and will not benefit from treatment. Thus, the 4C Mortality Score is a perfect tool in the emergency department (ED). However, when evaluating the mortality of COVID-19 patients in the ICU, a more precise tool is needed. For example, since the 4C Mortality Score was developed for triage in the ED, a noninvasive oxygenation parameter (SpO 2 ) was chosen to evaluate respiratory function. Pulse oximetric saturation has a good correlation with PaO 2 in the range of 80-100%. However, critically ill COVID-19 patients usually stay in the ward until SpO 2 levels drop below 80%. The accuracy of SpO 2 drops when the SpO 2 level is below 80% [20]. Moreover, in the case of the APACHE II score, it has a potentially better evaluation of chronic respiratory failure because of the inclusion of base excess. Furthermore, SpO 2 strongly depends on blood flow, pulsatility and microcirculatory disturbances (such as microthrombosis), which can affect the accuracy of SpO 2 . Moreover, the APACHE II and SAPS II scores include a lot of parameters that are usually normal on admission in COVID-19 patients, but they may deteriorate during the clinical course of these patients, leading to high prognostic value when admitting patients to the ICU. Thus, we conclude that there is a benefit to combine the use of these scores. We suggest using the 4C Mortality Score when admitting patients to the hospital and to consider it when accepting the patient to the ICU, alongside the calculation of the APACHE II score. This study is retrospective. Therefore, all the assumptions should be regarded as associations rather than as causations. Furthermore, it is important to consider the sample size of our study and keep in mind that, to get more generalizable results, more patients should be included in the study. There is a slight difference in the age of the patients in the study when compared with the literature. The mean age of the patients in our study was 61.32 years, which is an average number compared to several other studies with mean ages of 50, 51.26 and 74 years [14,16,18]. However, the proportion of patients over 70 years of age is quite large, almost 30%. Increasing age is known to be associated with higher mortality risk and poor outcome. This and other differences between the population of our study and the studies that validated the risk scores (ethnic, demographic, cultural and economic conditions of the patients) should also be considered.
This was a study with a pragmatic approach, having its design oriented towards clinical decision-making. If a patient in the ED met the criteria for hospitalization, he/she was evaluated with the 4C Mortality Score, independently whether he/she was hospitalized on a general ward or directly to the ICU. If the patient was transferred to the ICU, either directly or later, he/she was evaluated with ICU risk scores (APACHE II, SOFA and SAPS II). Therefore, the study design has a potential for bias, since the patient from the general ward may deteriorate and be transferred in a more grave state as compared to the state he/she was hospitalized and evaluated with the 4C Mortality Score. However, the 4C Mortality Score was not developed to use in patients that are already hospitalized. Thus, to offer a clinician a pragmatic way of risk evaluation, we designed this study with evaluations at two points in time, indeed, in some cases, occurring at the same time, when the patient is being admitted directly to the ICU and, in some cases, at two different points in time.

Conclusion
The main finding of the study is that the APACHE II score was the most accurate and had the best discrimination at predicting mortality risk in COVID-19 patients treated in the ICU. However, the best calibration was observed when the 4C Mortality Score was added to the model. Therefore, the APACHE II score and 4C Mortality Score independently predict mortality risk and can be used concomitantly.