- Research article
- Open Access
Machine learning-based CT radiomics model distinguishes COVID-19 from non-COVID-19 pneumonia
BMC Infectious Diseases volume 21, Article number: 931 (2021)
To develop a machine learning-based CT radiomics model is critical for the accurate diagnosis of the rapid spreading coronavirus disease 2019 (COVID-19).
In this retrospective study, a total of 326 chest CT exams from 134 patients (63 confirmed COVID-19 patients and 71 non-COVID-19 patients) were collected from January 20 to February 8, 2020. A semi-automatic segmentation procedure was used to delineate the volume of interest (VOI), and radiomic features were extracted. The Support Vector Machine (SVM) model was built on the combination of 4 groups of features, including radiomic features, traditional radiological features, quantifying features, and clinical features. By repeating cross-validation procedure, the performance on the time-independent testing cohort was evaluated by the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity.
For the SVM model built on the combination of 4 groups of features (integrated model), the per-exam AUC was 0.925 (95% CI 0.856 to 0.994) for differentiating COVID-19 on the testing cohort, and the sensitivity and specificity were 0.816 (95% CI 0.651 to 0.917) and 0.923 (95% CI 0.621 to 0.996), respectively. As for the SVM models built on radiomic features, radiological features, quantifying features, and clinical features, individually, the AUC on the testing cohort reached 0.765, 0.818, 0.607, and 0.739, respectively, significantly lower than the integrated model, except for the radiomic model.
The machine learning-based CT radiomics models may accurately classify COVID-19, helping clinicians and radiologists to identify COVID-19 positive cases.
Coronavirus disease 2019 (COVID-19) has spread throughout the world widely and rapidly since late December 2019 [1, 2]. The newly emerging disease is highly contagious and may cause severe acute respiratory distress or multiple organ failure in severe cases [3,4,5,6]. The World Health Organization (WHO) declared the outbreak of COVID-19 as a “public health emergency of international concern” (PHEIC) on January 30, 2020.
At present, the gold standard for the diagnosis of COVID-19 is reverse-transcription polymerase chain reaction (RT-PCR). However, the high false-negative rate  and the shortage of RT-PCR assay in the early stage of the outbreak limited the early detection and treatment of the presumptive patients [8, 9]. This speeded up the spread of COVID-19. Therefore, fast diagnosis is important for controlling the spread of COVID-19. Recent studies have demonstrated that computed tomography (CT), as a non-invasive imaging approach, is of great value in detecting lung lesions in patients with COVID-19 infection [2, 10]. Besides, CT had much higher sensitivity than initial RT-PCR in diagnosing COVID-19 [8, 9]. Consequently, CT could be used as an effective tool for early detection and diagnosis of COVID-19. We should not neglect the fact that COVID-19 may have certain similar CT imaging features with other types of pneumonia, thus making it hard to differentiate. Although measures are taken to control the spread of the disease, there have been 176,531,710 confirmed cases of COVID-19 globally, including 3,826,181 deaths, till 11:32 am CEST, 17 June 2021. Concerning the pandemic, accurate and fast diagnosis of COVID-19 is vital to isolate infected patients and slow down the spread of this disease.
Current studies have demonstrated that artificial intelligence could distinguish COVID-19 from other pneumonia [11, 12], improving radiologists’ performance in distinguishing COVID-19 from non-COVID-19 pneumonia on chest CT and providing clinical prognosis with good accuracy that can assist clinicians to adjust their clinical management timely and allocate resources appropriately [13,14,15,16,17,18,19]. However, CT manifestations of COVID-19 resemble other types of viral pneumonia such as severe acute respiratory syndrome coronavirus and Middle East respiratory syndrome coronavirus. Additionally, the non-COVID-19 diseases included as a comparison group are long before the COVID-19 outbreak . Since the CT manifestations of common pneumonia resemble those of COVID-19 pneumonia, the most difficult situation in clinical diagnosis and treatment is to identify other types of pneumonia that occur in the same period as the outbreak of COVID-19.
In recent years, much attention has been paid to radiomics in diagnosing diseases and evaluating treatment outcomes [21, 22]. Specifically, radiomics is of great value in medical imaging because of its ability to extract high throughput quantitative descriptors from routine computed tomography (CT) studies . Radiomics has been applied to many areas of cancer research, such as tumor detection, preoperative prediction of lymph node metastasis, and therapeutic response assessment [21, 23, 24]. Recently, radiomics has been proved to be helpful in COVID-19 screening, diagnosis, prediction the length of hospital stay, and assessment of the imaging characteristics and risk factors associated with adverse composite endpoints in patients with COVID-19 pneumonia [25,25,26,27,28]. Radiomics is also useful in the identification of COVID-19 [29, 30], differentiating clinical types of COVID‑19 , and the prediction of poor prognostic outcomes in COVID-19 . Recently, CT radiomics was found to perform better in the accurate diagnosis of COVID-19 pneumonia compared with the COVID-19 reporting and data system . However, these studies were limited in a small sample size. In the study of Qi et al., a total of 31 patients were included in the study . Some did not extract high-throughput imaging features . Besides, few studies have been done including holistic analysis of different radiomics features regarding COVID-19. The purpose of this study was to develop and test machine learning-based CT radiomics models including different radiomics features for the classification of COVID-19.
This retrospective study was waived by the ethics committees of the Hainan General hospital. In total, 74 patients confirmed with COVID-19 infection from January 20 to February 8, 2020, and 82 patients with other types of pneumonia in the corresponding period were collected. In the COVID-19 dataset, 63 patients who met the following inclusion criteria were finally included: (i) RT-PCR confirmed COVID-19; (ii) non-contrast CT at diagnosis time; (iii) positive CT findings. 71 patients with non-COVID-19 pneumonia who met the following inclusion criteria were included: (i) RT-PCR excluded COVID-19; (ii) non-contrast CT at diagnosis time; (iii) pneumonia highly suspected with COVID-19 by CT. The exclusion criteria were as follows: (1) contrast CT exams; (2) exams without slice thickness of 1 mm; (3) negative CT findings. Finally, 326 chest CT exams from 134 patients were included in this study (Fig. 1). The average age was 47.0 ± 15.4 years. Specifically, we included 244 (75%) exams for COVID-19 and 82 (25%) for non-COVID-19 pneumonia in the study.
All the patients with COVID-19 were confirmed as positive by RT PCR and were acquired from January 21, 2020, to Feb 8, 2020. The most common symptoms were fever (82%) and cough (77%). Each patient had one or multiple CT scans during the progression of the disease. The follow-up study was continued until February 19, 2020.
Other types of pneumonia patients over the corresponding period between January 23 to March 16, 2020 were selected from the same hospital. For 82 patients with negative RT-PCR results, pneumonia was diagnosed according to the Infectious Diseases Society of America/American Thoracic Society (IDSA/ATS) guidelines . Patients with at least one of the following clinical symptoms: cough, sputum, fever, dyspnea, and pleuritic chest pain, plus at least one finding of coarse crackles on auscultation or elevated inflammatory biomarkers, in addition to a new pulmonary infiltration on chest CT, would be diagnosed to be infected with pneumonia. The admission distribution of the patients with other types of pneumonia was: outpatient (86%, 61 of 71), inpatient (14%, 10 of 71). None received laboratory confirmation of the etiology because of limited medical resources.
CT examinations were performed on the NeuViz 128 CT (Neusoft, China) with automatic tube current (300 mA–496 mA), tube voltage = 120 kV. The pitch was set at 1.5 and breath-hold at full inspiration. The slice-thickness of each CT scan was 1 mm. The reconstruction matrix was 512 × 512 pixels. The image enhancement factor was 1.0. The window width was 1000, and the window level was −700.
All subjects’ demographic characteristics and clinical data were retrospectively reviewed and collected, including age, gender, exposure history, diabetes, hypertension, chronic obstructive pulmonary disease(COPD), chronic liver disease, chronic kidney disease, cancer, cardiovascular disease, fever, cough, myalgia, fatigue, headache, nausea, diarrhea, bellyache, dyspnea, other symptoms, white blood cell count, number of neutrophils, lymphocyte count, hemoglobin and platelet count. The demographic statistics of patients were summarized in Table 1. In the training cohort, COVID-19 patients had significantly older age, more exposure history, more cough, myalgia, fatigue, headache, neusea, diarrhea symptoms, lower lymphocyte count and platelet count than patients with other types of pneumonia. In both the training and testing cohort, COVID-19 patients had significantly lower white blood cell count and neutrophils than patients with other types of pneumonia.
The flow chart of data collection, ROI and features annotation, radiomics, and quantity feature extraction, model building and evaluation were shown in Fig. 2.
Lesion segmentation and radiological evaluation
All the CT scans were split into a training and a testing cohort with a ratio of 85:15 at the patient level according to the visiting time of the hospital. Feature selection and model building were performed on the training cohort, and the testing cohort was not used for the training procedure.
The pneumonia lesions were segmented semi-automatically. Firstly, the anonymized thin-slice DICOM format non-enhanced CT images were imported into an AI pneumonia assessment system, on which the pneumonia lesions were automatically detected and delineated. On the assessment platform, an MVP-Net (Multi-View FPN with Position-aware attention) which was trained on the NIH DeepLesion dataset and had achieved state-of-the-art performance , was used to detect abnormal patterns and classify them into consolidation and ground-glass opacity. Then a 3D U-Net model trained with a local dataset of over 10,000 lung CT scans was used to segment detected consolidation and ground-glass opacity lesions. Besides, pulmonary lobes were segmented by a pre-trained lobe segmentation model [36, 37]. Subsequently, fifteen radiologists with more than 5 years of experience in chest imaging, blind to the knowledge of the pathological report and other clinical information, refined the segmentation results (Volume of Interest, VOI) and evaluated the radiological characteristics. Each series was refined and evaluated by one of the fifteen radiologists. The segmentations and radiological characteristics were confirmed by two radiologists (F. C and Y.C) with 16 and more than 30 years of experience, respectively.
The 7 radiological characteristics included ground-glass opacity, crazy paving pattern, halo sign, reversed halo sign, vascular perforating in the lesion, subpleural line, and lesion locations (Fig. 3). For each series, the frequency of the radiological characteristics occurring was used for modeling.
Quantifying CT characteristics and radiomics features
The segmentation results were used to extract quantifying CT characteristics and radiomics features.
There was a total of 33 quantitative characteristics. Apart from the segmentation results, the AI pneumonia assessment system also provided the number of lesions that suffered bulla, emphysema, pleural thickening, reticular, and stripe, which were included as quantitative characteristics. Similar to the previous study , the mean and standard deviation of the CT values of the consolidation lesions, ground-glass lesions, and both types of lesions were calculated from the segmentation. In addition, the volumes of the consolidation lesions, ground-glass lesions, their sum, and moreover, their ratios were calculated, including the volumes of the consolidation lesions versus the volumes of the entire pulmonary and the five pulmonary lobes respectively, the ground-glass lesions versus the volumes of the entire pulmonary and the five pulmonary lobes respectively, and the volumes of both types of lesions versus the volumes of the entire pulmonary and the five pulmonary lobes, respectively.
Before radiomics features were extracted, the intensities were discretized by a fixed bin width of 25, the pixel spacing of images was resampled to 1.0 mm × 1.0 mm × 1.0 mm per pixel by the BSpline algorithm. Apart from the original images, the wavelet filters or Laplacian of Gaussian filters were performed to generate several filtered images. A total of 1218 radiomics features were extracted from the manual confirmed 3D VOIs of the original images and the filtered images by PyRadiomics V2.1.0 , including (1) 252 First-order features; (2) 14 Shape-based features; (3) 308 Gray Level Co-occurrence Matrix (GLCM) Features; (4) 224 Gray Level Size Zone Matrix (GLSZM) Features; (5) 224 Gray Level Run Length Matrix (GLRLM) Features; (6) 196 Gray Level Dependence Matrix (GLD-ZM) Features. The pre-processing methods and radiomic feature descriptions are detailed in Additional file 1: Information 1.1. and 1.2).
Development of predictive models
4 groups of features were included in the model building: radiomics features, radiological features, quantity features, and clinical features. The Support Vector Machine (SVM) models with the radial basis function kernel were built on the 4 groups of features individually and on the combination of them.
Before model building, all numerical features were normalized by the z-score method, and the categorical features were encoded by the one-hot encoder. To avoid overfitting, feature selection methods were used to reduce the number of features. The optimal parameters of the combination of the feature selection methods and the model were found by grid searching with a ten-run fivefold cross-validation procedure on the training cohort. After they were determined, the model was built using the entire training cohort and the performance on the testing cohort was evaluated. After the cross-validation procedure, the threshold that maximized the Youden Index on the validation cohort was used to cut off the discriminative score to differentiate the COVID-19 from other pneumonia.
Features were selected by a two-step method. (1)The Mann–Whitney U test was used and p values were corrected by the Benjamini–Hochberg method. The features that were significantly different (p < 0.05) between the COVID-19 cohort and non-COVID-19 cohort were preserved. (2) the minimum-redundancy maximum-relevancy(mRMR) method was used and the number of selected features was determined by the cross-validation procedure. Especially, for the radiological features, the mRMR procedure was removed because there were only 7 radiological features.
The discrimination performance of the model was evaluated by the area under the receiver operator characteristic curve (AUC), accuracy (ACC), sensitivity, and specificity. The AUCs of the SVM model that built on the combined features and those on each individual feature group were compared by the Delong test. Because the SVM model with radial basis function kernel is nonlinear, the feature importance cannot be derived directly. The permutation importance  was used to evaluate the feature importance and the AUC was used to measure the difference between the baseline and the model that was built with the permutated feature. The consistency of the traditional radiological features was evaluated by the Kappa coefficient, and the dice coefficient between the corrected segmentation and AI segmentation results were used to evaluate the reproducibility of the radiomic features. These statistical analyses were performed on R software (version 3.6.0; https://www.r-project.org/) environments. Feature selection and model building procedures were performed by the scikit-learn package .
Table 1 demonstrated the study population characteristics for the training and testing cohorts. Data related to age, exposure history, cough, myalgia, fatigue, headache, and diarrhea were significantly different between COVID-19 and other types of pneumonia in the training cohort (p < 0.05). Regarding the laboratory results, the white blood cell count and the number of neutrophils were significantly lower in the COVID-19 group than those in the negative group (p < 0.05) for both the training cohort and the testing cohort. In addition, the lymphocyte and plate count were significantly lower in the COVID-19 group than those in the other types of pneumonia group (p < 0.05).
Evaluation of the model performance
A total of 1128 radiomic features were extracted from each patient, the correlation cluster map was shown in Fig. 4. It can be found in the cluster map that most of the radiomic features were correlated and redundant. The dice coefficient between the corrected segmentation and the AI segmentation result reached 0.82 ± 0.14, indicating the satisfactory performance of the AI segmentation performance and the robustness of the radiomic feature extraction. For the ground-glass opacity, crazy paving pattern, halo sign, reversed halo sign, vascular perforating in the lesion, subpleural line, and lesion locations, the Kappa values were 0.728, 0.733, 0.728, 0.701, 0.841, 0.866, 0.818, respectively.
For the SVM model that built on the combination of 4 groups of features, it reached an AUC of 0.984 (0.971 to 0.997), 0.893 (0.841 to 0.946), and 0.925 (0.856 to 0.994) on the training, cross-validation, and testing cohort. For the sensitivity and specificity, it reached 0.816 and 0.923 on the test cohort. For the SVM models that built on radiomic features, radiological features, quantifying features, and clinical features individually, the AUC on the testing cohort reached 0.765 (95% CI 0.585 to 0.946), 0.818 (95% CI 0.698 to 0.938), 0.607 (95% CI 0.414 to 0.8) and 0.739 (95% CI 0.58 to 0.898) respectively, significantly lower than the integrated model, except for the radiomic model. The details of the performance are shown in Table 2 and the ROC curve of the 4 SVM models on the time-independent test cohort was shown in Fig. 5.
There were 30 features involved in the integrated SVM model building, including 14 radiomic features, 9 clinical features, 4 quantifying features, and 3 radiological features. The feature importance of these features was shown in Fig. 6.
Figure 7 showed the decision function value distribution of the non-COVID-19 pneumonia and COVID-19 in the test cohort. The function values were proportional to the distance of the patient to the separating hyperplane, thus indicating the integrated model’s confidence in the result of classification. The separating hyperplane was adjusted to maximize the Youden index on the cross-validation cohort. From the CT images, we could see that when the lesions of COVID-19 were at the absorption stage, they became small, and thus it was difficult to differentiate from non-COVID-19 pneumonia. On the contrary, when the lesions of COVID-19 were relatively big, it was easy to differentiate it from non-COVID-19 pneumonia with typical lesion locations and CT manifestation.
In this study, we developed and tested a machine learning-based CT radiomics model for classifying COVID-19 from non-COVID-19 pneumonia on chest CT images. CT radiomics features of lesions were extracted, and the model showed good performance on the training cohort, cross-validation result, and testing cohort. On the testing dataset, our result revealed that this model achieved a high sensitivity of 0.816 (95% CI 0.651 to 0.917) and a high specificity of 0.923 (95% CI 0.621 to 0.996) in diagnosing COVID-19. As far as we are concerned, this is the first study that uses comprehensive information by including both imaging and clinical data in the classification of COVID-19.
Since the outbreak of COVID-19, clinical characteristics have been regarded as important clues for diagnosing COVID-19. However, the value of clinical characteristics in the diagnosis of COVID has not yet been fully evaluated. Our present study revealed that clinical features were valuable, but not the only strong clue for diagnosing COVID-19. This result is of great significance since COVID-19 confirmed cases is still rising all over the world. We have included both COVID-19 patients without a history of exposure and non-COVID-19 patients with a history of exposure in the current study. Exposure history has been regarded as an important indicator in diagnosing COVID. Besides, our study demonstrated that when compared with non-COVID-19 patients, COVID-19 patients had significantly lower leukocyte, neutrophils, lymphocyte, and platelet counts. It could be explained that because COVID-19 belongs to viral infection, whereas non-COVID-19 patients were likely to be diagnosed as bacterial infection with high leukocyte count. This is consistent with the previous study that normal or abnormally low leukocyte and lymphocyte was found to be significant indicators for diagnosing COVID-19 .
CT manifestations of COVID-19 have been deemed as an indispensable role for the clinical diagnosis of COVID-19 . However, few studies have elucidated the role of CT features in diagnosing COVID-19. Therefore, we have assessed the diagnostic value of radiological characteristics including ground-glass opacity, crazy paving pattern, halo sign, reversed halo sign, vascular perforating in the lesion, subpleural line, and lesion locations in our study. Among these features, those located at the periphery seemed to be the most important for the classification. This was in line with the previous study in which the lesions of COVID-19 were distributed mainly in the subpleural area . We found that when only the radiological features were included, the model revealed a good performance of AUCs for training, validation, and testing cohort, 92.2%, 86.9% and 81.8%, respectively. This result was in accord with the previous study , in which the model was built on the basis of the clinical data, laboratory results, and CT features. Our study indicated that CT is valuable for diagnosing COVID-19.
The encouraging diagnostic performance of the machine learning-based CT radiomics model indicates that radiomics might be particularly helpful for the detection of COVID-19 as the AUCs of other models in the testing dataset were significantly lower than that of the integrated model, except for the radiomics model. Radiomics features in our model included first-order features, shape-based features, and the distribution, correlation, and variance in gray level intensities. These radiomics features described the relationship between voxels and contained quantitative information on the spatial heterogeneity of pneumonia lesions. Importantly, when only including radiomics features, the model revealed the good performance of AUCs for training, validation, and testing cohort, 96.2%, 82.8% and 76.5%, respectively. Similarly, Fang et al. found that the radiomics model has outperformed the clinical model in the prediction/diagnosis of COVID-19 pneumonia . By using deep learning classifier multi-layer perceptron (DL-MLP), Zhang et al. found that DL-MLP achieved optimal performance with AUC of 0.922 (95% CI 0.856–0.988) and 0.959 (95% CI 0.910–1.000), the same sensitivity of 0.879, and specificity of 0.900 and 0.887 on internal and external testing datasets, indicating that DL-MLP may be helpful in efficiently screening COVID-19 patients . Besides, Tan et al. demonstrated that automatic machine learning based on radiomics of non‑focus area in the first chest CT could be used to distinguish different clinical types of COVID‑19 . To summarize, radiomics was useful in controlling the spread of COVID-19. Importantly, by combining the radiological features, quantifying features, and clinical characteristics, the performance of the model was significantly improved. Its AUCs on training, validation, and testing cohorts were all over 89%, indicating that the models have the potential to be applied in a general situation. By using deep learning techniques, a previous study was able to distinguish COVID-19 from community-acquired pneumonia . We were able to collect several patients with other types of pneumonia diagnosis on CT of the corresponding period. More importantly, these types of pneumonia were highly suspected of COVID-19 in consideration of the epidemic, CT findings, and laboratory results.
A majority of the countries all over the world have been affected by COVID-19. Early diagnosis is of importance for preventing the spread of the disease. Though RT-PCR is considered as the gold standard for the diagnosis of COVID-19, CT is used as an effective supplementary tool for the diagnosis of COVID-19 [8, 9]. Our study revealed that the machine learning-based CT radiomics model by combining radiomics, subjective characteristics, quantitative characteristics, and clinical characteristics achieved good performance for the diagnosis of COVID-19 and differentiating it from non-COVID-19 pneumonia. This is in line with the idea that adding additional clinical information could significantly improve the performance of radiomics [44, 45]. Shiri et al. revealed that the combination of radiomic features, clinical and radiological data could effectively predict survival in COVID-19 patients . Similarly, Chao et al. demonstrated that the integration of both imaging and non-imaging data significantly improved the performance of prediction to need for ICU admission in patients with COVID-19 pneumonia . All in all, holistic information is effective in the diagnosis of COVID-19.
The study has several limitations. First, the sample size was relatively small. A larger prospective multicenter cohort is needed to test the effectiveness of machine learning-based CT radiomics models. Second, patients with non-COVID-19 pneumonia did not receive laboratory confirmation of the etiology because of limited medical resources during the COVID-19 outbreak. Thirdly, we did not use quantitative characteristics to evaluate the evolution of the disease. Future work should include quantitative information regarding disease progression. Regarding the field of radiomics, it remains unclear which algorithm, classifiers, and feature selector would achieve optimal results for investigation [46,47,48]. In the present study, we integrated different biological and clinical information together with radiomics, and better diagnostic performance was achieved. This was in line with the study of Parmar et al. , who found that a comparative investigation could be helpful in the identification of the optimal and reliable machine learning methods for radiomics-based prognostic analyses. Future studies should integrate different biological and clinical information together with radiomics.
In conclusion, a machine learning-based CT radiomics model is valuable for accurately classifying COVID-19, which would be helpful for clinicians and radiologists to identify COVID-19 patients.
Availability of data and materials
The datasets used during the current study are available from the corresponding author on reasonable request.
Coronavirus disease 2019
Region of interest
Support Vector Machine
World Health Organization
Public health emergency of international concern
Reverse-transcription polymerase chain reaction
Gray Level Size Zone Matrix Features
Gray Level Run Length Matrix Features
Gray Level Dependence Matrix Features
Area under the receiver operator characteristic curve
Wang D, Hu B, Hu C, Zhu F, Liu X, Zhang J, Wang B, Xiang H, Cheng Z, Xiong Y, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China. JAMA. 2020;323:1061–1065.
Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506.
Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, Qiu Y, Wang J, Liu Y, Wei Y, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395:507–13.
Phan LT, Nguyen TV, Luong QC, Nguyen TV, Nguyen HT, Le HQ, Nguyen TT, Cao TM, Pham QD. Importation and human-to-human transmission of a novel coronavirus in Vietnam. N Engl J Med. 2020;382:872–4.
Liu YC, Liao CH, Chang CF, Chou CC, Lin YR. A locally transmitted case of SARS-CoV-2 infection in Taiwan. N Engl J Med. 2020;382:1070–2.
Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, Ren R, Leung KSM, Lau EHY, Wong JY, et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med. 2020. https://doi.org/10.1056/NEJMoa2001316.
Chan JF, Yuan S, Kok KH, To KK, Chu H, Yang J, Xing F, Liu J, Yip CC, Poon RW, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet. 2020;395:514–523.
Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Tao Q, Sun Z, Xia L. Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology. 2019. https://doi.org/10.1148/radiol.2020200642.
Fang Y, Zhang H, Xie J, Lin M, Ying L, Pang P, Ji W. Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR. Radiology. 2020. https://doi.org/10.1148/radiol.2020200432.
Shi H, Han X, Jiang N, Cao Y, Alwalid O, Gu J, Fan Y, Zheng C. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. Lancet Infect Dis. 2020;20:425–434.
Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B, Bai J, Lu Y, Fang Z, Song Q, et al. Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: Evaluation of the Diagnostic Accuracy. Radiology. 2020. https://doi.org/10.1148/radiol.2020200905.
Wang S, Kang B, Ma J, Zeng X, Xiao M, Guo J, Cai M, Yang J, Li Y, Meng X, Xu B. A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19). medRxiv 2020.
Bai HX, Wang R, Xiong Z, Hsieh B, Chang K, Halsey K, Tran TML, Choi JW, Wang DC, Shi LB, et al. AI augmentation of radiologist performance in distinguishing COVID-19 from pneumonia of other etiology on chest CT. Radiology 2020; 201491.
Dong D, Tang Z, Wang S, Hui H, Gong L, Lu Y, Xue Z, Liao H, Chen F, Yang F, et al. The role of imaging in the detection and management of COVID-19: a review. IEEE Rev Biomed Eng. 2020;14:16–29.
Wang S, Zha Y, Li W, Wu Q, Li X, Niu M, Wang M, Qiu X, Li H, He Y, et al. A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis. medRxiv 2020.
Chen J, Wu L, Zhang J, Zhang L, Gong D, Zhao Y, Hu S, Wang Y, Hu X, Zheng B, et al: Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study. medRxiv 2020.
Shan F, Gao Y, Wang J, Shi W, Shi N, Han M, Xue Z, Shi Y: Lung infection quantification of COVID-19 in CT images with deep learning. arXiv preprint; arXiv:2003.04655. 2020.
Gozes O, Frid-Adar M, Greenspan H, Browning P, Zhang H, Ji W, Bernheim A, Siegel E. Rapid AI Development Cycle for the Coronavirus (COVID-19) pandemic: initial results for automated detection & patient monitoring using deep learning CT image analysis. arXiv preprint; arXiv:2003.05037. 2020.
Bai HX, Hsieh B, Xiong Z, Halsey K, Choi JW, Tran TML, Pan I, Shi LB, Wang DC, Mei J, et al. Performance of radiologists in differentiating COVID-19 from non-COVID-19 viral pneumonia at chest CT. Radiology. 2020;296:E46–54.
Dong D, Zhang F, Zhong LZ, Fang MJ, Huang CL, Yao JJ, Sun Y, Tian J, Ma J, Tang LL. Development and validation of a novel MR imaging predictor of response to induction chemotherapy in locoregionally advanced nasopharyngeal cancer: a randomized controlled trial substudy (NCT01245959). BMC Med. 2019;17:190.
Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, Zegers CM, Gillies R, Boellard R, Dekker A, Aerts HJ. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–6.
Huang YQ, Liang CH, He L, Tian J, Liang CS, Chen X, Ma ZL, Liu ZY. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol. 2016;34:2157–64.
Dong D, Tang L, Li ZY, Fang MJ, Gao JB, Shan XH, Ying XJ, Sun YS, Fu J, Wang XX, et al. Development and validation of an individualized nomogram to identify occult peritoneal metastasis in patients with advanced gastric cancer. Ann Oncol. 2019;30:431–8.
Chen X, Tang Y, Mo Y, Li S, Lin D, Yang Z, Yang Z, Sun H, Qiu J, Liao Y, et al. A diagnostic model for coronavirus disease 2019 (COVID-19) based on radiological semantic and clinical features: a multi-center study. Eur Radiol. 2020;30:4893–4902.
Qi X, Jiang Z, Yu Q, Shao C, Zhang H, Yue H, Ma B, Wang Y, Liu C, Meng X, et al. Machine learning-based CT radiomics model for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection: a multicenter study. medRxiv 2020.
Fang M, He B, Li L, Dong D, Yang X, Li C, Meng L, Zhong L, Li H, Li H, Tian J. CT radiomics can help screen the coronavirus disease 2019 (COVID-19): a preliminary study. Science China Information Sciences 2020, 63:172103.
Yu Q, Wang Y, Huang S, Liu S, Zhou Z, Zhang S, Zhao Z, Yu Y, Yang Y, Ju S. Multicenter cohort study demonstrates more consolidation in upper lungs on initial CT increases the risk of adverse clinical outcome in COVID-19 patients. Theranostics. 2020;10:5641–8.
Zhang X, Wang D, Shao J, Tian S, Tan W, Ma Y, Xu Q, Ma X, Li D, Chai J, et al. A deep learning integrated radiomics model for identification of coronavirus disease 2019 using computed tomography. Sci Rep. 2021;11:3938.
Fang X, Li X, Bian Y, Ji X, Lu J. Radiomics nomogram for the prediction of 2019 novel coronavirus pneumonia caused by SARS-CoV-2. Eur Radiol. 2020;30:6888–901.
Tan HB, Xiong F, Jiang YL, Huang WC, Wang Y, Li HH, You T, Fu TT, Lu R, Peng BW. The study of automatic machine learning base on radiomics of non-focus area in the first chest CT of different clinical types of COVID-19 pneumonia. Sci Rep. 2020;10:18926.
Wu Q, Wang S, Li L, Wu Q, Qian W, Hu Y, Li L, Zhou X, Ma H, Li H, et al. Radiomics Analysis of Computed Tomography helps predict poor prognostic outcome in COVID-19. Theranostics. 2020;10:7231–44.
Liu H, Ren H, Wu Z, Xu H, Zhang S, Li J, Hou L, Chi R, Zheng H, Chen Y, et al. CT radiomics facilitates more accurate diagnosis of COVID-19 pneumonia: compared with CO-RADS. J Transl Med. 2021;19:29.
Mandell LA, Wunderink RG, Anzueto A, Bartlett JG, Campbell GD, Dean NC, Dowell SF, File TM, Musher DM, Niederman MS, et al. Infectious Diseases Society of America/American Thoracic Society consensus guidelines on the management of community-acquired pneumonia in adults. Clin Infect Dis. 2007;44(Suppl 2):S27–72.
Li Z, Zhang S, Zhang J, Huang K, Wang Y, Yu Y. MVP-Net: Multi-view FPN with Position-aware Attention for Deep Universal Lesion Detection. International Conference on Medical Image Computing and Computer-Assisted Intervention 2019. Springer, Cham.
Wang X, Zhang Q, Zhou Z, Liu F, Yu Y, Wang Y, Gao W. Evaluating multi-class segmentation errors with anatomical priors. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI); 3–7. 2020: 953–956.
Ni Q, Sun ZY, Qi L, Chen W, Yang Y, Wang L, Zhang X, Yang L, Fang Y, Xing Z, et al. A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images. Eur Radiol. 2020;30:6517–27.
Qin L, Yang Y, Cao Q, Cheng Z, Wang X, Sun Q, Yan F, Qu J, Yang W. A predictive model and scoring system combining clinical and CT characteristics for the diagnosis of COVID-19. Eur Radiol. 2020;30:6797–807.
van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, Beets-Tan RGH, Fillion-Robin JC, Pieper S, Aerts H. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017;77:e104–7.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
Song F, Shi N, Shan F, Zhang Z, Shen J, Lu H, Ling Y, Jiang Y, Shi Y. Emerging 2019 novel coronavirus (2019-nCoV) pneumonia. Radiology. 2020;295:210–7.
Chen HJ, Qiu J, Wu B, Huang T, Gao Y, Wang ZP, Chen Y, Chen F. Early chest CT features of patients with 2019 novel coronavirus (COVID-19) pneumonia: relationship to diagnosis and prognosis. Eur Radiol. 2020;30:6178–85.
Shiri I, Sorouri M, Geramifar P, Nazari M, Abdollahi M, Salimi Y, Khosravi B, Askari D, Aghaghazvini L, Hajianfar G, et al. Machine learning-based prognostic modeling using clinical data and quantitative radiomic features from chest CT images in COVID-19 patients. Comput Biol Med. 2021;132:104304.
Chao H, Fang X, Zhang J, Homayounieh F, Arru CD, Digumarthy SR, Babaei R, Mobin HK, Mohseni I, Saba L, et al. Integrative analysis for COVID-19 patient outcome prediction. Med Image Anal. 2021;67:101844.
Shiri I, Maleki H, Hajianfar G, Abdollahi H, Ashrafinia S, Hatt M, Zaidi H, Oveisi M, Rahmim A. Next-generation radiogenomics sequencing for prediction of EGFR and KRAS mutation status in NSCLC patients using multimodal imaging and machine learning algorithms. Mol Imaging Biol. 2020;22:1132–48.
Hajianfar G, Shiri I, Maleki H, Oveisi N, Haghparast A, Abdollahi H, Oveisi M. Noninvasive O(6) methylguanine-DNA methyltransferase status prediction in glioblastoma multiforme cancer using magnetic resonance imaging radiomics features: univariate and multivariate radiogenomics analysis. World Neurosurg. 2019;132:e140–61.
Nazari M, Shiri I, Zaidi H. Radiomics-based machine learning model to predict risk of death within 5-years in clear cell renal cell carcinoma patients. Comput Biol Med. 2021;129:104135.
Parmar C, Grossmann P, Rietveld D, Rietbergen MM, Lambin P, Aerts HJ. Radiomic machine-learning classifiers for prognostic biomarkers of head and neck cancer. Front Oncol. 2015;5:272.
We thank all patients for their participation.
This work was supported by the Key R & D plan of Hainan Province (ZDYF (XGFY) 2020001); National Nature Science Foundation of China [Grant number 81971602, 81760308, 81801684]; the Program of Hainan Association for Science and Technology Plans to Youth R & D Innovation [QCXM201919]; Hainan Provincial Natural Science Foundation of China [818MS124]. This project was supported by Hainan Province Clinical Medical Center. The funder played no role in study design, data collection, data analysis, interpretation of data or writing of the manuscript.
Ethical approval and consent to participate
The study was conducted in line with the principles of the Declaration of Helsinki, and Institutional Review Board approval has been obtained. The written informed consent for this retrospective study was waived.
Consent for publication
The authors declare that there is no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The pre-processing methods and radiomic feature descriptions are detailed.
For the clinical model, the feature importance was shown. The top 3 important clinical factors were the occurrence of fatigue, age and the occurrence of cough.
The feature importance of the quantifying model was shown. The frequency of the pleural thickening, consolidation lesion and ground glass lesion were the top 3 importance features.
The radiological model was shown. The frequency occurrence of paving stone, position at periphery, and subpleural line were importance for the discrimination of the COVID-19 from other pneumonia.
The feature importance of the radiomic model was shown. The most important feature was Zone Entropy of glszm on the wavelet filtered image, indicating the heterogeneneity in the texture patterns. The shape of the lesion was also important, and the Minor Axis Length and Maximum 2D Diameter Slice were the second and third most important radiomic features.
About this article
Cite this article
Chen, H.J., Mao, L., Chen, Y. et al. Machine learning-based CT radiomics model distinguishes COVID-19 from non-COVID-19 pneumonia. BMC Infect Dis 21, 931 (2021). https://doi.org/10.1186/s12879-021-06614-6
- Machine learning
- Coronavirus Disease 2019 (COVID-19)
- Non-COVID-19 pneumonia