A novel CT-based radiomics in the distinction of severity of coronavirus disease 2019 (COVID-19) pneumonia

Background Convenient and precise assessment of the severity in coronavirus disease 2019 (COVID-19) contributes to the timely patient treatment and prognosis improvement. We aimed to evaluate the ability of CT-based radiomics nomogram in discriminating the severity of patients with COVID-19 Pneumonia. Methods A total of 150 patients (training cohort n = 105; test cohort n = 45) with COVID-19 confirmed by reverse transcription polymerase chain reaction (RT-PCR) test were enrolled. Two feature selection methods, Max-Relevance and Min-Redundancy (mRMR) and least absolute shrinkage and selection operator (LASSO), were used to extract features from CT images and construct model. A total of 30 radiomic features were finally retained. Rad-score was calculated by summing the selected features weighted by their coefficients. The radiomics nomogram incorporating clinical-radiological features was eventually constructed by multivariate regression analysis. Nomogram, calibration, and decision-curve analysis were all assessed. Results In both cohorts, 40 patients with COVID-19 pneumonia were severe and 110 patients were non-severe. By combining the 30 radiomic features extracted from CT images, the radiomics signature showed high discrimination between severe and non-severe patients in the training set [Area Under the Curve (AUC), 0.857; 95% confidence interval (CI), 0.775–0.918] and the test set (AUC, 0.867; 95% CI, 0.732–949). The final combined model that integrated age, comorbidity, CT scores, number of lesions, ground glass opacity (GGO) with consolidation, and radiomics signature, improved the AUC to 0.952 in the training cohort and 0.98 in the test cohort. The nomogram based on the combined model similarly exhibited excellent discrimination performance in both training and test cohorts. Conclusions The developed model based on a radiomics signature derived from CT images can be a reliable marker for discriminating the severity of COVID-19 pneumonia.


Background
Coronavirus disease 2019 (COVID- 19) has become a global pandemic since it started in December 2019 [1]. Although most of the confirmed patients with COVID-19 are mild, about 20% cases can still be severe [1]. Pneumonia can be developed in COVID-19 patients [2] . In some severe patients with COVID-19, dyspnea was observed more than 1 week after the onset of symptoms [3]. Septic shock, acute respiratory distress syndrome, difficulty in correction of metabolic acidosis, and coagulation dysfunction are often developed rapidly in severe patients [4]. The critical factor for decreasing complication and mortality is the effective diagnosis of severe patients. In other words, convenient and precise assessment of the severity in COVID-19 will contribute to the timely patient treatment and prognosis improvement.
CT examination can be severed as an important assistant tool for diagnosing COVID-19 [4][5][6][7]. As the literature reveals [4,5], the sensitivity of imaging examination, especially CT imaging, is relatively high, and the imaging signs can manifest earlier than the clinical symptoms, thus CT examination is significant in preclinical screening, primary diagnosis, and evaluation of disease severity. Although recent studies have reported CT findings of the COVID-19 pneumonia [5,6], the value of CT imaging in assessing the severity of the patients with COVID-19 were scarcely reported, which, however, may be more conducive to our in-depth comprehensive and accurate understanding of this new infectious disease. Radiomics, as an emerging technique involved with the extraction of high-throughput data from quantitative imaging features and the subsequent association of this parameter with clinical data, has been applied in various diseases. For example, radiomics have often been applied in discrimination of tumors and prediction of histologic grade, tumor recurrence and metastasis [7,8]. Presumably, CTbased radiomics has great advantage in the diagnosis and follow-up of COVID-19 pneumonia. As far as we know, the existing literature mainly focused on identification and diagnosis of COVID-19 [9,10]. Some studies [11][12][13] have identified CT-based radiomics as a superior tool for discriminating COVID-19 and other types of viral pneumonia or non-COVID-19 pneumonia. Few literature has reported the application of CT-based radiomics for evaluation of the severity of COVID- 19. Therefore, the purpose of this study was to apply the CT-based radiomics nomogram, combining radiomics signatures and clinical factors, for the discrimination of the severity of COVID-19 pneumonia, helping to optimize therapeutic regiment.

Demographic data
Between January and February 2020, a total of 213 patients from 2 hospitals in Anhui, China diagnosed as COVID-19 pneumonia were enrolled. This retrospective study was approved by the Ethics of Committees of the First Affiliated Hospital of Bengbu Medical College and informed consent for this retrospective study was waived. All of the procedures were performed in accordance with the Declaration of Helsinki in 1964 and relevant policies in China.
Our inclusion criteria were: (a) confirmed positive by real-time reverse-transcriptase polymerase-chainreaction (RT-PCR) assay from nasal and pharyngeal swab specimens; (b) scanned with thin-section CT; (c) CT images demonstrated pneumonia; (d) CT examination was done at the patient's first visit. Exclusion criteria were as follows: (a) lack of complete medical data (n = 31); (b) patients without thin-section CT or lack of CT images (n = 32). Finally, 150 patients were collected and divided into severe group and nonsevere group. According to the guideline of American Thoracic Society Criteria, the severe patient was defined as meeting any of the following conditions [14]: a) respiratory rate ≥ 30 breaths/min; b) respiratory distress; c) finger oxygen saturation ≤ 93% in resting state; d) arteria oxygen tension (PaO2)/inspiratory oxygen fraction (FiO2) ≤ 300 mmHg; e) mechanical ventilation required and respiratory failure occurred; f) the presence of shock; g) patients with other organ failures required ICU monitoring and treatment. The complete medical data including demographic characteristics, epidemiological information, laboratory data, symptoms, comorbidity, and medical treatment data were recorded.

CT acquisition
Patients underwent chest CT imaging on two 64detector CT scanners (LightSpeed, GE and Aquilion, TOSHIBA). The protocols were as follows: 120 kV; automatic tube current (350 mA for LightSpeed, GE and 440mAs for Aquilion, TOSHIBA); detector width, 40 mm (Light Speed, GE) and 43 mm (Aquilion, TOSHIBA); rotation time, 0.8 s (Light Speed, GE) and 1.0 s (Aquilion, TOSHIBA); section thickness, 5 mm; interlayer spacing, 5 mm; matrix, 512 × 512; and breath hold at full inspiration. The following windows were used for image display: a mediastinal window with window width of 350 HU and window level of 40 HU and a lung window with a width of 1200 HU and window level of − 600 HU. The acquired images were subsequently reconstructed using iterative reconstruction technique with a slice thickness of 0.625 mm (LightSpeed, GE) and 1.25 mm (Aquilion, TOSHIBA), respectively. To minimize discrepancies in the image acquisition parameters attributable to the different CT machines, all images were resampled to a 1 × 1 × 1 mm 3 voxel size before image analysis and feature extraction.

Clinical findings and laboratory tests
The confirmed COVID-19 patients must be treated in isolation. Patients underwent symptomatic treatment, including electrolyte turbulence correction, anti-infection treatment, nutritional support and bed rest. When the patient was perceived to have difficulty in breathing, they were placed to the ventilator.
The time course (defined as the interval between the onset of symptoms and the initial CT examination), main clinical features (age, gender, fever and cough), comorbidity (such as diabetes, hypertension, chronic liver disease, cardiac disease and chronic obstructive pulmonary disease) and main laboratory tests (C-reactive protein; lymphocyte count, and blood leukocyte count) were independently reviewed by two clinicians.

Image interpretation
Main CT signs were analyzed as follows: pure ground glass opacity (GGO), GGO with consolidation; consolidation; interlobular septal thickening; crazy-paving pattern; halo sign; reversed halo sign; air bronchogram; pleural effusion. Lesion distribution was described as left, right or bilateral lungs. The CT scoring was based on the involvement of the lung segment using 18 lung segments model from 1 to 4: 1, 1-4 lung segments involvement; 2, 5-8 lung segments involvement; 3, 9-12 lung segments involvement; 4, 13-18 lung segments involvement. Number of lesions was defined by reference to previous literature [15]. For instance, it is counted as one when the lesion only occupies one lung segment. When a large lesion involves more than one lung segment, it is counted as the number of affected lung segments. All radiological data were independently reviewed by two radiologists (with 6 and 13 years of experience in chest CT imaging, respectively). The radiologists were blinded to the clinical data of all the patients. If there was a disagreement, a third observer (with 18 years of experience in chest CT imaging) was asked for an opinion and a majority decision was reached.

CT images segmentation and features extraction
Two radiologists (R1, 6 years' and R2, 13 years' experience in chest imaging) segmented lesions using ITK-SNAP software. A 2D region of interest (ROI) was used to delineate lesions in coronal slice with the largest section of lesions. In order to improve model robustness, one radiologist segmented lesions 2 times with a time interval of a week, and the other one segmented once.
The images were firstly preprocessed using resampling, intensity discretion methods. All images were resampled into 1 × 1 × 1 mm 3 of voxel size. Resampling aimed to transform the image into the isotropic voxel spacing to ensure the texture features were rotationally invariant and comparable between the images coming from different scanners. The intensity discretion was conducted to change the gray-level into 128 bins to reduce the complexity of calculation and make features tractable.
We used AK software (Artificial intelligence Kit, GE Healthcare) to extract the radiomic features based on the preprocessed image. We finally got three classes of features: the histogram features, texture features (355, based on GLCM, RLM, GLSZM) and geometry features. A total of 396 radiomic features were extracted. In order to construct a robust model, the inter-observer agreement and intra-observer agreement tests were performed to acquire the reproducible features. Each lesion has 3 ROIs, 2 ROIs from R1, 1 from R2. The features extracted based on the ROIs of R1 were used to test the intraobserver agreement, meanwhile, the first ROI of R1 and the ROI of R2 were acquired to test the inter-observer agreement. The feature that met the two tests were retained for constructing the diagnosis model and the R1's first ROI was adopted [8].

Radiomics model construction
The cohorts were grouped into training cohort and test cohort using stratified random resampling method with a ratio of 7:3. Unless the emphasis of using the validation cohort, the following operation were all performed in training cohort. Before constructing the radiomic model, the feature engineering was conducted. Three feature selection steps were adopted. The first step was to exclude the zero-variance features. Zero-variance meant the values of the feature were same across training cohort, and couldn't be used for discrimination. The second step was using mRMR method to exclude the redundant features and kept the most relevant features with targets. After mRMR, 30 features were retained. The last step was using LASSO regression method to find the most predictive feature subset, which included two steps. We initially determined the optimized hyperparameter λ using 10-fold cross validation with binomial deviance as criterion. After the λ was determined, the features with non-zero coefficient were the last chosen features. LASSO regression was conducted to construct the radiomics model, which also meant to get the Radscore. The Rad-score could be calculated by summing the features multiplied their corresponding coefficients.

Clinical and morphologic risk features
In addition to the radiomic features, we also collected the radiological features and clinical data (collectively named "clinical feature" later). The clinical feature was used to construct the clinical model. Firstly, we used two samples' or κ 2 test to assess whether the clinical features were significantly different between two groups. The significantly different features were subsequently analyzed using the univariate logistic regression, the features with non-zero coefficients in univariate logistic regression analysis were integrated into backward stepwise selection multivariable logistic regression analysis [16]. Meanwhile, the clinical model was built.

Nomogram construction
After the Rad-score was calculated and clinical factors were selected using backward step-wise multivariate logistic regression, the Rad-score and the remaining clinical features were subsequently combined to construct the nomogram using multivariate logistic regression.

Model validation
The receiver operating characteristic (ROC) analysis was performed to evaluate the performance of the radiomic model, clinical model and nomogram. Accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were obtained from the cohorts. Besides, the calibration curves were plotted to assess the agreement between the predicted event probability and observed event probability, and Hosmer-Lemeshow statistic was applied to test the difference between the predicted event probability and observed event probability. Decision curve analysis (DCA) was finally utilized to determine the clinical utilities. The flowchart of segmentation, feature extraction, and model building is depicted in Fig. 1.

Statistical analysis
The statistical analyses were all performed using R software (version 3.6.1, R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL www.r-project.rog). The inter-observer and intraobserver agreement tests were performed using 'Desc-Tools' package to calculate the intraclass correlation coefficient (ICC). The features with ICC > 0.75 was treated as reproducibility (7). The 'caret' package was used to split the cohort, preprocess the features, build the confusion matrix to get the accuracy, sensitivity, specificity, PPV and NPV. mRMR feature selection was conducted using the 'mRMRe' package. 'pROC' package was used to perform ROC analysis, while 'ModelGood' package and 'rmda' package was used to perform calibration curve analysis and decision curve analysis, respectively. Two-sided p < 0.05 indicated statistical significance.

Clinical and radiological features
A total of 150 patients (68 men and 82 women) confirmed with COVID-19 was classified into a training cohort (n = 105, 28 were severe and 77 were non-severe; 43 men and 62 women) and a test cohort (n = 45, 12 were severe and 33 were non-severe; 25 men and 20 women). Comparisons of clinical features, laboratory, and CT image features in training and test cohorts are shown in Tables 1 and 2, respectively.

Construction and validation of the developed nomogram
The final statistically significant clinical features and Rad-score were used to construct the nomogram (Fig. 5a). The AUCs were 0.99 (0.98-1.00) and 0.98 (0.94-1.00) for the training and test cohorts, respectively. The performance of nomogram was shown as follows: the accuracy, sensitivity, specificity, PPV and NPV were 0.95, 1.00, 0.94, 0.85, 1.00 in the training cohort and were 0.89, 0.71, 1.00, 1.00, 0.85 in the test cohort. The calibration curve of the model in training and test  cohorts and a non-statistical Hosmer-Lemeshow test (P > 0.05) both indicated well discrimination of the constructed nomogram (Fig. 5b). By decision curve analysis (Fig. 5c), with the risk thresholds ranged from 0 to 0.6 and 0.8 to 1, the nomogram represented higher net benefit in clinical practice.

Discussion
In the present study, we developed and validated a predictive nomogram incorporating radiomics signatures and clinical data for further precisely discriminating the severity of COVID-19 patients. The results uncovered that the addition of radiomic characteristics to clinical model could get better performance in discriminating the severity of patients with COVID-19, with an elevated AUC (from 0.895 to 0.952) and a relatively higher sensitivity, specificity, PPV, and NPV in the test cohort. Moreover, the high NPV and specificity indicated that the developed model was reliable and could minimize the number Fig. 2 Thin-section CT images for severe and non-severe patients. a-c Images of a 25-year-old woman with non-severe COVID-19 pneumonia (CT score = 2) who had the symptoms of dry cough and fever. The axial, coronal and sagittal CT images all presented subpleural GGO (with craving stone sign) in the lower lobes of both lungs (white arrows). d-f Images of a 55-year-old woman with non-severe COVID-19 pneumonia (CT score = 1) who had the symptom of fever. The axial, coronal and sagittal CT images all presented GGO in the anterior segment of the upper lobe of the right lung, containing air bronchogram (white arrowheads) and vascular thickening (white arrow). g-i Images of a 52-year-old man with severe COVID-19 pneumonia (CT score = 4) who had the features of fever and comorbidity (diabetes, hypertension). The axial, coronal and sagittal CT images showed diffuse large regions of GGO with partial consolidation and interlobular septal thickening (white arrow). j-l Images of a 64year-old man with severe COVID-19 pneumonia (CT score = 4) who had the symptoms of fever and cough. The axial, coronal and sagittal CT images showed diffuse large regions of GGO, accompanying consolidation (black arrows), and beaded air bronchogram (black arrowheads) of false-positive and false-negative patients, which is valuable in present clinical work [17]. The relatively high PPV implied that the clinical-radiomics model contributed to discriminating true high-risk patients. Based on our research, those high-risk patients could be recommended to receive more follow-up imaging to monitor changes in the condition. Furthermore, in this study, we constructed a clinical-radiomics nomogram as an individualized and visualized model to optimize the accuracy of assessing the severity of confirmed COVID-19 patients. The novel radiomics nomogram manifested favorable calibration and clinical benefit, verified by calibration curves and DCA. As we know, radiomics has been widely applied in tumor research due to its merits of unwatched filter of comprehensive data obtained from an image. For example, radiomics could differentiate tumor heterogeneity and has often been performed to predict the prognosis of various cancers [18,19]. Hence, radiomics, a noninvasive, fast, reproducible, and low-cost technic, was utilized in COVID-19 for identifying the severity, so as to avoid unnecessary treatment and decrease patients'  analyzed 154 viral pneumonia patients (including 89 cases of COVID-19 and 65 cases of influenza pneumonia) to establish a CT-based radiomics model, whose results showed radiomics model had a satisfactory performance in distinguishing COVID-19 and influenza pneumonia. Liu et al. [12] reported CT-based radiomics model could facilitate a rapid and accurate detection in differentiation of COVID-19 and Non-COVID-19 pneumonia. Fang et al. [13] summarized 136 patients with COVID-19 and developed a CT-based radiomics model for discriminating COVID-19 and other types of viral pneumonia, which showed a good performance for predicting COVID-19 pneumonia. Homayounieh et al. [20] summarized 315 patients with COVID-19 and developed a CT-based whole lung radiomics model, which showed a better performance in predicting outcome and disease severity of patients with COVID-19 compared with subjective assessment by radiologists.
Our study also showed that clinical data including age and comorbidity were associated with the severity of COVID-19. These results were in line with the facts that the elderly patients with other diseases are more likely to suffer from severe pneumonia, in accordance with the other studies [4]. It may be due to poor immune function of the elderly patients. Ruan et al. [3] verified that age and underlying diseases are predictors of a worse outcome in COVID-19. Wang et al. [21] also found elderly age complicated with underlying diseases might serve as important risk factors for the severity of COVID-19. However, several researchers found that multiple laboratory indicators may be linked with the severity of patients [22]. Cytokine storm, comorbid various infections and inhibited immune function may lead to increased ratio of neutrophil, decreased lymphocytes, elevated C-reactive protein and procalcitonin. However, these were not significant factors in the present study. The possible reasons may be the differences in sample size and statistical methods. In addition, radiological findings, including CT scores, number of lesions and GGO with consolidation were also independent indicators of the severity of COVID-19. Compared with non-severe cases, severe patients were more likely to involve a wider range of both lungs, signifying more lesions and higher CT scores (number of lung segments involved). Several studies suggested that as the course of the disease increases, lesions in the lungs increase and worsen [23], which is also in line with our results. Moreover, GGO with consolidation appeared more frequent in severe/critical patients, implying that the alveoli damage is more filled by inflammatory exudation, such as fibromyxoid exudates [24]. The rest of CT findings in our study were not significantly different between severe and non-severe groups.
Our study has some limitations. First, this retrospective study posed inevitable selection bias. Second, ROI delineation was manual, and the irregularities of lesions might cause differences in the manual selection. In addition, the study used 2D ROI selection due to time and technical constraints, yet 3D ROI selection should be used for further research in the future. The sample size in our cohort was relatively small. Third, the relationship between prognosis and clinical-radiological characteristics has not been studied. Thus, further studies with more cases and prolonged period should be studied to further verify our results.
In conclusion, we introduced a CT-based radiomics nomogram to evaluate the feasibility of radiomics signature and clinical factors in discriminating the severity of COVID-19 patients. Besides, we developed and validated the radiomics nomogram incorporating radiomics signatures, age, comorbidity, CT scores, number of lesions and GGO with consolidation, which improved the diagnostic performance in severity stratification of COVID-19 patients.