- Research article
- Open Access
Severity-associated markers and assessment model for predicting the severity of COVID-19: a retrospective study in Hangzhou, China
BMC Infectious Diseases volume 21, Article number: 774 (2021)
The severity of COVID-19 associates with the clinical decision making and the prognosis of COVID-19 patients, therefore, early identification of patients who are likely to develop severe or critical COVID-19 is critical in clinical practice. The aim of this study was to screen severity-associated markers and construct an assessment model for predicting the severity of COVID-19.
172 confirmed COVID-19 patients were enrolled from two designated hospitals in Hangzhou, China. Ordinal logistic regression was used to screen severity-associated markers. Least Absolute Shrinkage and Selection Operator (LASSO) regression was performed for further feature selection. Assessment models were constructed using logistic regression, ridge regression, support vector machine and random forest. The area under the receiver operator characteristic curve (AUROC) was used to evaluate the performance of different models. Internal validation was performed by using bootstrap with 500 re-sampling in the training set, and external validation was performed in the validation set for the four models, respectively.
Age, comorbidity, fever, and 18 laboratory markers were associated with the severity of COVID-19 (all P values < 0.05). By LASSO regression, eight markers were included for the assessment model construction. The ridge regression model had the best performance with AUROCs of 0.930 (95% CI, 0.914–0.943) and 0.827 (95% CI, 0.716–0.921) in the internal and external validations, respectively. A risk score, established based on the ridge regression model, had good discrimination in all patients with an AUROC of 0.897 (95% CI 0.845–0.940), and a well-fitted calibration curve. Using the optimal cutoff value of 71, the sensitivity and specificity were 87.1% and 78.1%, respectively. A web-based assessment system was developed based on the risk score.
Eight clinical markers of lactate dehydrogenase, C-reactive protein, albumin, comorbidity, electrolyte disturbance, coagulation function, eosinophil and lymphocyte counts were associated with the severity of COVID-19. An assessment model constructed with these eight markers would help the clinician to evaluate the likelihood of developing severity of COVID-19 at admission and early take measures on clinical treatment.
Coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and has spread worldwide . On March 12, 2020, the World Health Organization (WHO) announced the disease to be pandemic. It has affected more than 200 countries with about 10,000,000 confirmed cases as of July 01, 2020 . Therefore, the epidemic of COVID-19 has become a global public health crisis.
Different clinical patterns, such as mild, moderate, and severe to critical types, were observed in patients with COVID-19. Although most COVID-19 patients have mild or moderate symptoms and signs, the finding from China indicated that about 14% of patients were of the severe type and 5% were of the critical type . Previous studies and clinical practice showed that the degree of severity was associated with the clinical treatment and prognosis of the disease [3,4,5,6]. The average overall case-fatality rate of confirmed COVID-19 patients was 2.3%, but that was up to 49.0% in critical patients . Missed diagnoses will delay the appropriate clinical treatment and increase the possibility of poor prognosis. On the other hand, treatment for a severe or critical COVID-19 patient requires vast medical resources, and over misdiagnoses will overuse the medical resources and increase the medical burden. Therefore, early identification of patients who are likely to develop severe or critical COVID-19 is especially important for clinical practice and epidemic control. In clinical practice, the severity of COVID-19 is categorised into four levels as mild, moderate, severe, and critical types according to the Seventh Edition of the Guide to Diagnosis and Treatment of New Coronary Pneumonia . This classification is preformed mainly based on the clinical symptoms, oxygen saturation (SaO2), and imaging evidence from computed tomography (CT). However, no evidence from laboratory markers has been included. Previous studies have found that lymphopenia, organ dysfunction, coagulopathy, and elevated D-dimer levels were associated with the severity [3,4,5,6, 8].
In this study, we aimed to screen severity-associated markers and construct an assessment model for predicting the severity of patients with COVID-19 based on the data from two hospitals in Hangzhou, Zhejiang province, China.
This study enrolled 172 confirmed COVID-19 patients from January 20, 2020 to April 1, 2020 in Hangzhou, Zhejiang Province, China. Among these patients, 104 from Hangzhou Xixi Hospital were used for screening the severity-associated markers and constructing the assessment model as a training set. Part of the 104 patients had been used in the previously published studies [9, 10]. On the other hand, 68 patients from the First Affiliated Hospital, School of Medicine, Zhejiang University (FAHZJU) were used to validate the model as a validation set. These patients were part of the sample which had been published previously . COVID-19 was diagnosed according to the interim guidance from the WHO . The severity of COVID-19 was categorised into four levels according to the Seventh Edition of the Guide to Diagnosis and Treatment of New Coronary Pneumonia . The mild type was defined as patients with mild clinical symptoms and normal imaging on CT. The moderate type was defined as patients with fever, respiratory symptoms, or other symptoms, and altered imaging evidence with pneumonia. The severe type was defined as patients with at least one of the following symptoms: shortness of breath (breathing rate ≥ 30/min), SaO2 at rest ≤ 93%, partial pressure of oxygen in arterial blood (PaO2)/ inspired oxygen fraction (FiO2) ≤ 300 mmHg, or lung infiltrates > 50% within 24 to 48 h. The critical type was defined as patients with any of the following symptoms: respiratory failure requiring mechanical ventilation, shock, or a combination of other organ failures requiring ICU monitoring treatment.
This was a retrospective study and the protocol was approved by the Ethics Committee of Xixi Hospital and FAHZJU.
Data at admission, including demographic information, comorbidities, clinical symptoms and laboratory tests, were extracted from electronic medical records. Collected data were reviewed by a trained team of clinical physicians. Demographic information included age, sex and body mass index (BMI). Comorbidity was defined as having at least one of the following diseases: diabetes, hypertension, cardiovascular disease, severe congenital disease, cancer, and chronic diseases of the liver, kidney, or respiratory system. Clinical symptoms included fever, fatigue, cough, expectoration, shortness of breath, diarrhoea and myalgia. Laboratory markers of laboratory tests included the following eight categories: inflammation, electrolytes, nutritional metabolism, and liver, renal, cardiac, respiratory, coagulation functions.
Continuous variables were presented as median (interquartile range [IQR]), and categorical variables were presented as numbers (percentage). Continuous laboratory markers were dichotomously categorised (normal versus abnormal) under the criteria of their clinical reference values. Severity-associated markers of COVID-19 were screened using the ordinal logistic regression.
To construct an assessment model, two criteria were set for selecting markers: P value < 0.05 in the ordinal logistic regression, and at least half of severe or critical patients had an abnormality in the marker. Least Absolute Shrinkage and Selection Operator (LASSO) regression was used for further feature selection. Optimal regularization parameter (λ) was estimated by fivefold cross-validation. To increase the stability of feature selection, we used bootstrap with 1000 resamples and built a LASSO regression model for each bootstrap set. The markers, which were present in more than half of all bootstrap sets, were included in the final model.
Assessment models were constructed using logistic regression, ridge regression, support vector machine, and random forest in the training set. The performance of different models was evaluated by the area under the receiver operator characteristic curve (AUROC). For the internal validation, we used bootstrap with 500 resamples to decrease the over-fitting. For the external validation, four models were assessed in the validation set, respectively. A risk score was established according to the result of the best model. The performance of the risk score in all patients was evaluated using AUROC and calibration curve. The optimal cutoff value was calculated with the maximal Youden index. A web-based assessment system was developed based on the risk score.
All statistical analyses were conducted using R software, version 3.6.2 (R Foundation for Statistical Computing). A two-sided P value < 0.05 was considered statistically significant.
Basic characteristics of the study population
The flowchart of the study procedure is illustrated in Fig. 1. Basic characteristics of the COVID-19 patients are summarised in Table 1. The patients in the training set had a median age of 42.0 years (IQR: 33.0–56.5) and a median BMI of 22.5 kg/m2 (IQR: 20.3–25.0). Among them, 47(45.2%) patients were men, and 23 (22.1%) patients had at least one comorbidity. During hospitalisation, 21 (20.2%) patients were classified as mild type, 72 (69.2%) as moderate type, and 11 (10.6%) as severe type. In the validation set, the median age and BMI were 59.0 years (IQR: 48.0–66.0) and 24.7 kg/m2 (IQR: 22.1–27.0), respectively. 44 (64.7%) patients were men, and 56 (82.4%) patients had at least one comorbidity. During hospitalisation, 16 (23.5%) patients were classified as moderate type, 29 (42.7%) as severe type, and 23 (33.8%) as critical type. The most common clinical symptoms were fever and cough, followed by expectoration and shortness of breath in both the training and validation sets (Figs. 2, 3 and 4.
Severity-associated markers of COVID-19
Table 2 presents the associations of clinical characteristics with the severity of COVID-19 in the training set. For demographic characteristics and clinical symptoms, age, comorbidity, and fever were associated with the severity of COVID-19 (all P values < 0.05). For dichotomous laboratory markers, higher levels of C-reactive protein (CRP), lactate dehydrogenase (LDH), serum amyloid A, fibrinogen (FIB), D-dimer, adenosine deaminase, reduced haemoglobin, and lower levels of lymphocyte, eosinophil, platelet counts, calcium, phosphorus, albumin (ALB), albumin/globulin, prealbumin, total cholesterol, high density lipoprotein cholesterol, retinol binding protein, apolipoprotein A1, SaO2, PaO2/FiO2 increased the risk of elevated COVID-19 severity (all P values < 0.05). Detailed results of the associations of continuous laboratory markers data with COVID-19 severity are summarised in Additional file 1: Table S1.
Model construction and evaluation
Based on the criteria described in the Methods, 18 candidate markers and 90 patients were selected for the model construction. Because of similar clinical function, D-dimer and FIB were combined into a new variable of coagulation function as DFIB. Abnormal DFIB was defined as patients with abnormal D-dimer or FIB. Electrolyte disturbance was calculated based on the sum of abnormalities in calcium, phosphorus, potassium, sodium and chlorine. Thus, 16 markers were included in LASSO regression for further feature selection. After 1000 resamples by bootstrap, ALB, CRP, LDH, DFIB, comorbidity, lymphocyte count, eosinophil count, and electrolyte disturbance were finally selected as the predictors in the model. The detailed frequency of each marker in the 1000 LASSO models is summarised in Additional file 1: Tables S2 and S3.
Table 3 presents the performance of each model in the internal and external validations. For the internal validation, high levels of AUROCs were found among four models of logistic regression, ridge regression, support vector machine, and random forest from 0.919 (95% CI 0.793–0.955) to 0.973 (95% CI 0.935–0.993). For the external validation, the ridge regression model showed the best performance with the highest AUROC of 0.827 (95% CI 0.716–0.921). Therefore, the ridge regression model was considered as the best model because of its high predictive power.
A risk score was then calculated according to the result of the ridge regression model using the following formula:
All markers, except electrolyte disturbance, were in dichotomous forms (1 = abnormal, 0 = normal). The range of electrolyte disturbance was from 0 to 5. Figure 2 presents the receiver operating characteristic curve (A) and calibration curve (B) of the risk score. The risk score indicated good discrimination of severe or critical type with an AUROC of 0.897 (95% CI 0.845–0.940). In addition, calibration curve graphically showed good consistency between the predicted and actual probabilities of severe or critical type. Using the optimal cutoff value of 71, the sensitivity of the risk score was 87.1%, and specificity was 78.1% for the COVID-19 severity prediction. Figure 3 presents the distribution of risk scores in different degrees of COVID-19 severity. The mild patients had the lowest median risk score of 9.19 (IQR: 0–26.82), then after the moderate (median: 45.65, IQR: 19.56–76.91) and severe patients (median: 102.38, IQR: 81.37–120.92). The critical patients had the highest median risk score of 113.42 (IQR: 87.89–125.75). In order to help clinicians to detect the patients who were likely to develop severe or critical COVID-19 at admission, we developed a web-based assessment system based on our risk score. (Fig. 4, Website: http://www.gtrsp.com:8011/).
Early identification of patients who were likely to develop severe or critical COVID-19 would help reduce the case-fatality rate and efficiently utilize the limited medical resources. In this study, we identified a panel of clinical markers associated with the severity of COVID-19 and constructed different severity-prediction models. We found that the ridge regression model was the best based on high AUROCs in both the internal and external validations of 0.930 (95% CI, 0.914–0.943) and 0.827 (95% CI, 0.716–0.921), respectively. Furthermore, we established a risk score and a web-based assessment system to help clinicians to detect the patients who were likely to develop severe or critical COVID-19 at admission.
Previous studies showed that severe or critical COVID-19 patients were older, had more comorbidities, higher levels of LDH, D-dimer, CRP, and lower levels of ALB, lymphocyte count [3,4,5,6, 8]. These findings were consistent in our study. Moreover, using the data of 208 patients from Fuyang, Anhui Province, Ji et al.  established a scoring model named as CALL to predict the severity of COVID-19. Dong et al.  also developed a scoring system with the data of 147 patients from Wuhan, Hubei Province. The AUROCs of their models were 0.910 and 0.843, slightly lower than the AUROC of our assessment model in the internal validation (0.930). However, their models were not validated in an external dataset, leading to the limitation of their generalizability. In contrast, our model validated in an independent dataset and obtained a satisfactory AUROC of 0.827.
Among the eight markers in our model, LDH, CRP, ALB, and lymphocyte count were well-recognized predictors for COVID-19 severity . For eosinophil count, Zhu et al.  demonstrated that decreased eosinophils could induce acute lung injury in the mouse model. Liu et al.  also found that increased eosinophil count predicted the improvement in COVID-19 progression. Several studies reported that severe or critical COVID-19 patients often experienced electrolyte disturbances [3, 4, 16]. In our study, we used the sum of abnormalities in potassium, calcium, sodium, phosphorus and chlorine to comprehensively evaluate the degree of electrolyte disturbances. D-dimer and FIB were indicators of coagulation function. Chen et al.  reported that patients infected with SARS-CoV-2 had abnormal coagulation function (hypercoagulation). We combined the two indicators to increase the sensitivity of judging abnormal coagulation and avoid the collinearity of the two markers. Different from other studies, age was not included in our final model. This might be owing to the high correlation between age and comorbidity in the training set, and the LASSO regression identified comorbidity as a more important marker.
There were several limitations in our study. First, there were different distribution on the severity of COVID-19 between the training and validation sets. There were no critical cases in training set while without mild cases in validation set. This difference was due to the rule of government on the COVID-19 prevention and control in Zhejiang Province in China. Xixi Hospital (municipal-level hospital for infectious diseases) mainly receive and cure the patients with mild, moderate, and severe COVID-19 (no critical patients), while FAHZJU (provincial-level hospital) is mainly responsible for moderate, severe, and critical patients (no mild patients). The different distribution of the severity might have influences on the model construction and validation. However, even there were these differences, the ideal performance was still obtained in the validation stage, and this result indicated that there was relatively high generalizability in our model. Second, the subjects were mainly recruited from Hangzhou and the sample size was relatively small. This would limit the generalizability of our model. Additional validation from areas outside Zhejiang should be conducted in the future. Third, because of the retrospective study design, some laboratory tests were not done in some patients. Therefore, their associations with the severity of COVID-19 might be misestimated. Fourth, the clinical data of the subjects were not comprehensive. Adding other specific markers such as cytokines might improve the performance of our model. Finally, due to the low prevalence of comorbidity, the risks in different types of comorbidities were not considered in the assessment model.
In this study, we screened eight severity-associated clinical markers of lactate dehydrogenase, C-reactive protein, albumin, comorbidity, electrolyte disturbance, coagulation function, eosinophil and lymphocyte counts in COVID-19 patients. Based on these eight markers, an assessment model was constructed to help the clinician to evaluate the likelihood of developing severity of COVID-19 at admission and early take measures on clinical treatment.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding authors on reasonable request.
Area under the receiver operator characteristic curve
Body mass index
Coronavirus disease 2019
- FiO2 :
Inspired oxygen fraction
Least Absolute Shrinkage and Selection Operator
- PaO2 :
Partial pressure of oxygen in arterial blood
- SaO2 :
Severe acute respiratory syndrome coronavirus 2
World Health Organization
Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, Zhao X, Huang B, Shi W, Lu R, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 2020;382(8):727–33.
World Health Organization: Novel Coronavirus (2019-nCoV). Situation report-133.2020. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200701-covid-19-sitrep-163.pdf?sfvrsn=9a56f2ac_4. Accessed 2 Jul 2020
Wu ZY, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA. 2020;323(13):1239–42.
Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan. China Lancet. 2020;395(10223):497–506.
Guan WJ, Ni ZY, Hu Y, et al. Clinical characteristics of coronavirus disease 2019 in China. N Engl J Med. 2020;382:1708–20.
Xu XW, Wu XX, Jiang XG, Xu KJ, Ying LJ, Ma CL, Li SB, Wang HY, Zhang S, Gao HN, et al. Clinical findings in a group of patients infected with the 2019 novel coronavirus (SARS-Cov-2) outside of Wuhan, China: retrospective case series. BMJ. 2020;368:m606.
Wang DW, Hu B, Hu C, Zhu FF, Liu X, Zhang J, Wang BB, Xiang H, Cheng ZS, Xiong Y, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan. China JAMA. 2020;323(11):1061–9.
National Health Commission of the People’s Republic of China: Chinese management guideline for COVID-19 (version 7.0) [in Chinese]. http://www.nhc.gov.cn/yzygj/s7653p/202003/46c9294a7dfe4cef80dc7f5912eb1989/files/ce3e6945832a438eaae415350a8ce964.pdf. Accessed 15 Mar 2020.
Chen NS, Zhou M, Dong X, Qu JM, Gong FY, Han Y, Qiu Y, Wang JL, Liu Y, Wei Y, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395(10223):507–13.
Chen ZH, Li YJ, Wang XJ, et al. Chest CT of COVID-19 in patients with a negative first RT-PCR test: Comparison with patients with a positive first RT-PCR test. Medicine (Baltimore). 2020;99(26):e20837.
Chen Z, Fan H, Cai J, et al. High-resolution computed tomography manifestations of COVID-19 infections in patients of different ages. Eur J Radiol. 2020;126:108972.
Zheng S, Fan J, Yu F, et al. Viral load dynamics and disease severity in patients infected with SARS-CoV-2 in Zhejiang province, China, January-March 2020: retrospective cohort study. BMJ. 2020;369:m1443.
World Health Organization: Clinical management of severe acute respiratory infection when novel coronavirus (2019-nCoV) infection is suspected: interim guidance. 2020. https://www.who.int/docs/default-source/coronaviruse/clinical-management-of-novel-cov.pdf. Accessed 15 Mar 2020
Ji D, Zhang D, Xu J, et al. Prediction for progression risk in patients with COVID-19 pneumonia: the CALL score. Clin Infect Dis. 2020. https://doi.org/10.1093/cid/ciaa414.
Dong Y, Zhou H, Li M, et al. A novel simple scoring model for predicting severity of patients with SARS-CoV-2 infection. Transbound Emerg Dis. 2020. https://doi.org/10.1111/tbed.13651.
Zhu C, Weng QY, Zhou LR, et al. Homeostatic and early recruited CD101—eosinophils suppress endotoxin-induced acute lung injury. Eur Respir J. 2020. https://doi.org/10.1183/13993003.02354-2019.
Sun S, Cai X, Wang H, et al. Abnormalities of peripheral blood system in patients with COVID-19 in Wenzhou. China Clin Chim Acta. 2020;507:174–80.
Liu F, Xu A, Zhang Y, et al. Patients of COVID-19 may benefit from sustained Lopinavir-combined regimen and the increase of Eosinophil may predict the outcome of COVID-19 progression. Int J Infect Dis. 2020;95:183–91.
Lippi G, South AM, Henry BM. Electrolyte imbalances in patients with severe coronavirus disease 2019 (COVID-19). Ann Clin Biochem. 2020;57(3):262–5.
This work was partially supported by the special funding of New Coronary Pneumonia, Institute of China's System Research, Zhejiang University (YQYB2006). They had no role in the design of this study, data collection, data analysis, data interpretation and writing the manuscript.
Ethics approval and consent to participate
This study was approved by the Ethics Committee of Hangzhou Xixi Hospital (2020–24) and the First Affiliated Hospital, School of Medicine, Zhejiang University (FAHZJU) (2020-149). Individual informed consent was waived by the ethics committee listed above because this study used currently existing sample collected during the course of routine medical care and did not pose any additional risks to the patients. All patient data were anonymized prior to the analysis.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. Associations of continuous laboratory markers with the severity of COVID-19 in the training set. Table S2. Remaining frequency of the 16 markers in 1000 LASSO regression models. Table S3. Associations of the selected eight markers with the severity of COVID-19 in the external validation set.
About this article
Cite this article
Qi, J., He, D., Yang, D. et al. Severity-associated markers and assessment model for predicting the severity of COVID-19: a retrospective study in Hangzhou, China. BMC Infect Dis 21, 774 (2021). https://doi.org/10.1186/s12879-021-06509-6
- Assessment model
- Web-based assessment system