Revealing potential diagnostic gene biomarkers of septic shock based on machine learning analysis
BMC Infectious Diseases volume 22, Article number: 65 (2022)
Sepsis is an inflammatory response caused by infection with pathogenic microorganisms. The body shock caused by it is called septic shock. In view of this, we aimed to identify potential diagnostic gene biomarkers of the disease.
Material and methods
Firstly, mRNAs expression data sets of septic shock were retrieved and downloaded from the GEO (Gene Expression Omnibus) database for differential expression analysis. Functional enrichment analysis was then used to identify the biological function of DEmRNAs (differentially expressed mRNAs). Machine learning analysis was used to determine the diagnostic gene biomarkers for septic shock. Thirdly, RT-PCR (real-time polymerase chain reaction) verification was performed. Lastly, GSE65682 data set was utilized to further perform diagnostic and prognostic analysis of identified superlative diagnostic gene biomarkers.
A total of 843 DEmRNAs, including 458 up-regulated and 385 down-regulated DEmRNAs were obtained in septic shock. 15 superlative diagnostic gene biomarkers (such as RAB13, KIF1B, CLEC5A, FCER1A, CACNA2D3, DUSP3, HMGN3, MGST1 and ARHGEF18) for septic shock were identified by machine learning analysis. RF (random forests), SVM (support vector machine) and DT (decision tree) models were used to construct classification models. The accuracy of the DT, SVM and RF models were very high. Interestingly, the RF model had the highest accuracy. It is worth mentioning that ARHGEF18 and FCER1A were related to survival. CACNA2D3 and DUSP3 participated in MAPK signaling pathway to regulate septic shock.
Identified diagnostic gene biomarkers may be helpful in the diagnosis and therapy of patients with septic shock.
Sepsis is an inflammatory response caused by infection with pathogenic microorganisms. The body shock caused by it is called septic shock. Sepsis is a reaction to systemic infections [1, 2]. Septic shock, associated with critical hypotension, is common acute diseases in the ICU (intensive care unit) [2, 3]. It is estimated that about 8 million people worldwide die from sepsis (usually septic shock) every year, and abnormalities in the circulatory system, cells, and metabolism can significantly increase mortality [1, 4].
Most of septic shock is caused by microbial infections (bacteria, viruses, fungi, etc.) . In early microbial infections, humoral reactions are activated, in which immune cells (macrophages, neutrophils, etc.) recognize and destroy invading organisms . Reduced blood vessel volume, cardiac dysfunction and peripheral vasodilation are major causes of septic shock [6, 7]. In view of this, active fluid resuscitation and anti-infective symptomatic treatment are performed in these patients [8,9,10]. However, 28-day and hospital mortality in patients remain very high . Moreover, the probability of re-admission after discharge from hospital is higher than that of ordinary ICU patients, and a considerable proportion of patients have cognitive dysfunction after treatment [11,12,13].
Diagnosis and prognostic detection of diseases at the molecular level are now the general trend of development, which is also widely used by researchers in sepsis [14, 15]. Mohammed et al. used high-throughput sequencing technology to identify potential biomarkers and signaling pathways related to septic shock . In addition, some researchers use TSD (transcriptomic signature distance) and meta-analysis to analyze the transcriptome data of septic shock patients [17, 18]. Machine learning is a branch of computer science and statistics that play an important role in the detection, diagnosis and treatment of diseases [19, 20]. Machine learning has also been used to study septic shock [21, 22]. However, most of these studies use machine learning to predict the progression of septic shock. Machine learning is rarely used to identify potential diagnostic and prognostic biomarkers of septic shock. Therefore, in order to identify potential diagnostic gene biomarkers of septic shock, machine learning method was performed, followed by prognostic analysis in this study. Our study could be valuable in understanding the pathological mechanism of septic shock and exploring novel diagnostic gene biomarker for the diagnostic and therapy of the disease.
GEO  (Gene Expression Omnibus) database, mainly based on chip data, is developed by NCBI (National Center for Biotechnology Information). GSE4607, GSE13904, GSE26378, GSE26440, GSE65682 and GSE95233 data sets were obtained (Table 1). The original file was downloaded and the RMA algorithm was used for background adjustment and normalization. If multiple probes correspond to the same gene, the average value was taken. Among them, GSE4607, GSE13904, GSE26378, GSE26440 data sets were used for differential expression analysis and machine learning (test set), and GSE65682 data set was used for survival analysis. The GSE95233 data set was used for electronic expression verification of gene biomarkers (validation set). In this study, the GSE65682 data set was based on the chip data of GPL 13667 platform, and GSE4607, GSE13904, GSE26378, GSE26440 and GSE95233 data sets were based on the GPL570 platform. In order to avoid the difference caused by the detection technology of different platforms, the GSE65682 data set was not analyzed together with other data sets. Since the GSE4607, GSE13904, GSE26378 and GSE26440 data sets all came from GPL570 platform. Batch effect processing using the SVA package showed that the results of batch effect between the four data sets was not significant (Additional file 1: Fig. S1).
Identification of DEmRNAs (differentially expressed mRNAs)
In this study, Limma and metaMA packages were executed for identification of the DEmRNAs. The inverse normal method was used in the metaMA software package to merge P values. The FDR (false discovery rate) is the result obtained by repeating the test and correction of the original P value by the Benjamin and Hochberg methods [24, 25]. The FDR < 0.01 and |Combined.ES (effect size)|> 1.5 were screening thresholds of DEmRNAs.
To identify the function of identified genes, the DAVID (Database for Annotation, Visualization and Integrated Discovery, https://david.ncifcrf.gov/) database was used for GO (Gene Ontology, http://www.geneontology.org/) and KEGG (Kyoto encyclopedia of genes and genomes, http://www.genome.jp/kegg/pathway.html) functional enrichment analysis [26,27,28]. P < 0.05 was the threshold of significantly enriched GO and KEGG terms.
Identification of the superlative diagnostic gene biomarkers
Firstly, the R language in glmnet package was used to reduce data dimensions. The package not only has a large number of models, but also is much faster . Secondly, the random forest algorithm was used to sort the importance of mRNA according to the Mean Decrease Accuracy value from large small. Then, the superlative number of features was identified by adding one differentially expressed mRNA at a time in a top down forward-wrapper approach. The superlative DEmRNAs with diagnostic value was selected for septic shock to establish a classification model including DT (decision tree), SVM (support vector machine) and RF (random forests). The ‘rpart’ packet in R (https://cran.r-project.org/web/packages/rpart/), ‘e1071’ package in R (https://cran.r-project.org/web/packages/e1071/index.html) and ‘random forests’ packet (https://cran.r-project.org/web/packages/randomForest/) was used to establish the DT model, SVM model and RF model, respectively. Tenfold cross-validation was used to compare the average misjudgment rates of the three models. Tenfold cross-validation was used to avoid the overfitting effect [30, 31]. The diagnostic ability of classification prediction was evaluated by the accuracy, sensitivity, specificity, and AUC (area under curve) values in the ROC (receiver operating characteristic) curve. Subsequently, the Matthew’s Correlation Coefficient of the model was calculated using the mcc function in the mltools package (https://pypi.org/project/mltools/1.0.2/).
Electronic expression verification, diagnostic and prognostic analysis of superlative diagnostic gene biomarkers
The GSE95233 data set (124 blood samples from 102 cases and 22 normal controls) was used for electronic expression verification. The GSE65682 data set (521 blood samples from 479 cases and 42 normal controls) contains 28 days of survival information of patients. This data set was used to further analyze the diagnostic and survival ability of key diagnostic gene biomarkers.
In vitro validation of identified DEmRNAs
The inclusion criteria for patients were diagnosed with septic shock. Detailed inclusion criteria for patients with septic shock were as follows: (1) the body temperature > 38 ℃ or < 36 ℃; (2) heart rate > 90 times per minute or greater than 2 standard deviations in the normal heart rate range of different ages; (3) respiratory rate > 20 times per minute or PaCO2 (partial pressure of carbon dioxide in artery) < 32 mmHg; (4) white blood cell count > 12.0 × 109/L or < 4.0 × 109/L, or more than 10% immature neutrophils; (5) patients with initial septic shock; (6) patients had cardiovascular organ dysfunction, acute respiratory distress syndrome, dysfunction of two or more other organs; (7) patients had complete clinical data, including gender, age, height, weight, etc. Patients with a history of cancer or other diseases, chemotherapy, radiotherapy, etc., and incomplete clinical data were excluded. The individuals in the normal control group were gender and age matched with the case group and had no disease before and within 2 weeks after sampling. Those individual who took glucocorticoids, had a history of febrile disease or any chronic/acute disease that is slightly associated with inflammation within 2 weeks of sampling were excluded.
According to the above criteria for septic shock, 16 blood samples from 8 patients and 8 normal controls were obtained for RT-PCR (real-time polymerase chain reaction). Total RNA was extracted by using RNAliquid ultra-speed whole blood (liquid sample) kit (RN2602, Beijing Huitian Oriental Technology Co., Ltd.). FastQuant cDNA synthesis kit (KR106, TIANGEN) was used to synthesize the cDNA. RT-PCR was performed using SuperReal PreMix Plus (SYBR Green) SuperReal reagent (FP205, TIANGEN). Each experiment was repeated three times. GAPDH (glyceraldehyde-3-phosphate dehydrogenase) and ACTB (actin beta) were used as internal control for gene detection. The relative expression levels were calculated as fold-changes using the 2−ΔΔCt method .
This study was approved by the ethics committee the Second Affiliated Hospital of Shandong First Medical University (20200406).
The GraphPad Prism was used to perform all statistical analyses. The significance cutoff of RT-PCR was P = 0.05 (Duncan’s multiple range test). One-way ANOVA (analysis of variance) with orthogonal contrasts and mean comparison procedures were used to detect differences between cases and normal controls. Experiments were independently repeated at least 3 times.
According to screening criteria of FDR < 0.01 and |Combined.ES|> 1.5, a total of 843 DEmRNAs were identified. Among which, 458 were up-regulated and 385 were down-regulated (Additional file 3: Table S1). The heat map of top 100 DEmRNAs is shown in the Fig. 1.
Functional enrichment analysis of DEmRNAs
In order to understand the potential biological function of DEmRNAs, GO and KEGG functional enrichment analysis were performed. In GO terms of BP (biological process), all DEmRNAs were mainly involved in immune response, positive regulation of immune system process and leukocyte activation. In GO terms of CC (cell composition), all DEmRNAs were mainly involved in vesicle, cytoplasmic vesicle and nucleolus. In GO terms of MF (molecular function), all DEmRNAs were mainly involved in protein dimerization activity, cytokine binding and non-membrane spanning protein tyrosine kinase activity. The result is shown in Fig. 2A. Several signaling pathways in the KEGG enrichment analysis were identified, such as T cell receptor signaling pathway, primary immunodeficiency, MAPK signaling pathway, Jak–STAT signaling pathway and Fc epsilon RI signaling pathway (Fig. 2B). Among the 15 superlative diagnostic gene biomarkers, CACNA2D3 and DUSP3 participated in the MAPK signaling pathway.
Identification of superlative diagnostic gene biomarkers
After reducing data dimensions, a total of 28 DEmRNAs were retained (Table 2). 28 DEmRNAs were ranked in order of importance according to Mean Decrease Accuracy value (Fig. 3A). According to the sequence of RF sequencing results, one mRNA was added successively from top to bottom. The RF algorithm was used for classification. The tenfold cross-validation was used to obtain the accuracy rate and AUC (Fig. 3B, C). It can be seen that when the number of mRNAs reached 15, the accuracy reached the maximum value for the first time. Therefore, the first 15 DEmRNAs (KLRF1, UPP1, RAB13, KIF1B, CLEC5A, NARF, DUSP3, FCER1A, CACNA2D3, HMGN3, ECRP, HDAC4, LHFPL2, MGST1 and ARHGEF18) were selected as the superlative diagnostic gene biomarkers. The heat map analysis of the 15 superlative diagnostic gene biomarkers is shown in Fig. 4.
Classification models were constructed based on the screened 15 genes. The RF model had the highest accuracy. The accuracy, sensitivity, specificity and AUC of each model using the tenfold cross-validation process is listed in Table 3. In addition, the AUC in the ROC curve of DT, RF and SVM, was respectively 0.962, 0.993, and 0.991 (Fig. 5). The diagnostic efficacy of the model composed of these 15 genes was also validated using the GSE95233 data set. The results showed that in the validation set, our diagnostic model also showed better performance (Additional file 2: Fig. S2B–D). In addition, the Matthew’s Correlation Coefficient also showed that our model showed high accuracy in the test set. Although the performance in the verification set was not as good as the test set, it also had better accuracy (Additional file 4: Table S2). Significantly, of 15 superlative diagnostic gene biomarkers, the AUC values of CLEC5A, DUSP3, ECRP, HDAC4, KIF1B, KLRF1, NARF, RAB13 and UPP1 were higher than 0.9, the sensitivity and specificity were higher than 0.8 in the ROC curve analysis (Fig. 6).
Electronic expression verification, diagnosis and prognostic analysis of superlative diagnostic gene biomarkers
In order to further verify the expression of 15 diagnostic gene biomarkers, expression verification was performed using the GSE95233 data set. The results showed that ARHGEF18, CACNA2D3, FCER1A, HMGN3 and KLRF1 were significantly down-regulated in disease group, while CLEC5A, DUSP3, ECRP, HDAC4, KIF1B, LHFPL2, MGST1, NARF, RAB13 and UPP1 were significantly down-regulated compared with normal control group (Additional file 2: Fig. S2A). This verification result was completely consistent with the previous analysis result. The data set of GSE65682 was selected to perform further diagnosis and prognostic analysis of identified superlative diagnostic gene biomarkers (Fig. 7). The analysis results showed that only ARHGEF18 and FCER1A were related to survival. The AUC, sensitivity and specificity of ARHGEF18 were respectively 0.997, 0.967 and 1.000. The AUC, sensitivity and specificity of FCER1A were 0.985, 0.929 and 1.000, respectively (Fig. 7A, B). Box plots showed the expression levels of ARHGEF18 and FCER1A in different populations (Fig. 7C, D). In the survivor population, the expression levels of ARHGEF18 and FCER1A were significantly down-regulated, which was consisted with the bioinformatics analysis. The level of gene expression was the lowest among dead people. ARHGEF18 and FCER1A may influence the treatment effect of patients to a certain extent. Then the online survival software package (https://cran.r-project.org/web/packages/survival/index.html) was used to analyze the prognostic value of ARHGEF18 and FCER1A. The results showed that ARHGEF18 and FCER1A were significantly negatively correlated with survival (Fig. 7E, F).
The information of enrolled individuals is shown in Table 4. According to diagnostic analysis, prognostic analysis and literature reports, ARHGEF18, CLEC5A, FCER1A, HDAC4, KLRF1, DUSP3 and UPP1 were selected for RT-PCR verification. The primers are shown in Table 5. The results showed that CLEC5A, DUSP3, HDAC4 and UPP1 were up-regulated trend and FCER1A and KLRF1 were down-regulated trend (Fig. 8). The genes expression trend in the verification result was consistent with the bioinformatics analysis, except for ARHGEF18. Small sample size may cause some inconformity. In addition, further research is needed.
Based on the machine learning method, 15 DEmRNAs, such as HMGN3, CACNA2D3, DUSP3, MGST1, CLEC5A, KIF1B, RAB13, ARHGEF18 and FCER1A, were determined as the superlative diagnostic gene biomarkers. The final survival analysis showed that only FCER1A and ARHGEF18 had obvious prognostic value.
HMGN3 (high mobility group nucleosomal binding domain 3) plays an important regulatory role in pancreatic cells . In patients with sepsis, high blood sugar is a risk factor for poor prognosis. During sepsis, the rapid changes in microvascular circulation in skeletal muscle have a serious hindrance to the delivery of insulin . HMGN3 can reduce the level of glucagon in the plasma  to maintain stable blood sugar level in the body. In this study, HMGN3 was down-regulated in patients, which laid the foundation for further verification of the role in sepsis.
MGST1 (microsomal glutathione s-transferase 1), an important redox and detoxification enzyme, play a crucial role in cell defense and hematopoiesis [36, 37]. CLEC5A (c-type lectin domain containing 5A) is a Syk (spleen tyrosine kinase) coupled c-type lectin, mainly expressed in myeloid cells, such as macrophages and neutrophils , participates in host defense, inflammation, platelet activation and development . KIF1B (kinesin family member 1B) gene belongs to the kinesin superfamily, which is responsible for encoding proteins that transport mitochondria and synaptic vesicle precursors within the cell . In addition, KIF1B is found to be a tumor suppressor gene [41, 42], which has a potential role in mitochondrial morphological changes. KIF1B and mitochondrial metalloproteinase YME1L1 (YME1 like 1 ATPase) coordinately regulate mitochondrial fission to induce mitochondrial apoptosis . In the early stage of sepsis, released NO (nitric oxide) can directly block mitochondrial respiration and cause body shock when accumulated to a certain degree . The potential role of KIF1B in mitochondria suggested that it may play a role in septic shock. RAB13 (RAB13, member RAS oncogene family) is present in all macrophage-related cells . In our study, MGST1, CLEC5A, KIF1B and RAB13 were all up-regulated in patients. This showed that MGST1, CLEC5A, KIF1B and RAB13 could play a crucial role in septic shock.
KLRF1 (killer cell lectin like receptor F1) is an activating homodimeric C-type lectin-like receptor, which plays an important role in regulating the activity of natural killer cells and monocytes . Recently, UPP1 (uridine phosphorylase 1) is reported to play an important role in immune and inflammatory biological process of disease [46,47,48]. Previous studies have found that the expression of UPP1 is increased in the brain of sepsis rats . HDAC4 (histone deacetylase 4) plays an important regulatory role in sepsis and may be an effective target for sepsis treatment [50, 51]. The expression level of NARF (nuclear prelamin A recognition factor) in multiple sclerosis (a chronic neuroinflammatory disease) was increased . So far, we have not found any studies on ECRP (ribonuclease A family member 2C, pseudogene) and LHFPL2 (LHFPL tetraspan subfamily member 2) in inflammatory or immune diseases. This article may first report that ECRP and LHFPL2 play a role in the progression of septic shock. In our study, KLRF1 (down-regulated), UPP1 (up-regulated), HDAC4 (up-regulated), NARF (up-regulated), ECRP (up-regulated) and LHFPL2 (up-regulated) were all abnormally expressed and could be considered as potential diagnostic biomarkers. These results suggested that KLRF1, UPP1, HDAC4, NARF, ECRP and LHFPL2 play a key role in septic shock. It provides a potential direction for further research on septic shock.
The protein encoded by ARHGEF18 (Rho/Rac guanine nucleotide exchange factor 18) plays an important role in activating eosinophils and other white blood cells . Sepsis is a high-risk disease caused by host reaction disorder and endangering the safety of life . Eosinophils are components of white blood cells of the immune defense system, and play a role in evolution of inflammation and disease [55, 56]. FCER1A (Fc fragment of IgE receptor Ia) is an IgE receptor (immunoglobulin receptor), which is the initiating factor of allergic reactions and plays a role in allergic inflammation [57, 58]. The interaction between FCER1B and other immunoglobulin-related inflammatory genes will increase the risk of asthma . In this study, ARHGEF18 and FCER1A were related to survival. In the enriched GO function, ARHGEF18 is mainly involved in regulating cell death and apoptosis. FCER1A is mainly involved in regulating immune regulation and metabolic processes. This further showed that ARHGEF18 and FCER1A may be related to the survival of septic shock patients.
The MAPK (mitogen-activated protein kinase) signaling pathway play a crucial part in the regulation of diseases, such as anti-inflammatory, analgesic, protective injury, etc. . MAPK contains three sub-pathways p38MAPK (p38 mitogen-activated protein kinase), ERK-1/2 (extracellular signal-regulated kinase), and JNK (c-Jun-terminal kinase) [61, 62]. Among them, the p38MAPK and JNK signaling pathways play a role in hamowanie wzrostu, inflammation and pro-apoptotic signaling . MAPK pathway can be activated by extracellular signals, such as cytokines involved in inflammatory response, growth factors that regulate growth and metabolism, bacterial complexes . Inhibiting the activation of the MAPK pathway can reduce lung injury caused by septic shock . In the KEGG enrichment, CACNA2D3 and DUSP3 were taken part in the MAPK signaling pathway. CACNA2D3 (calcium voltage-gated channel auxiliary subunit alpha2delta3) plays an important role in canceration [64,65,66]. CACNA2D3 is expressed in low levels in endometrial cancer tissues and cells . Overexpression of CACNA2D3 in vitro significantly inhibits tumor cell proliferation and migration . CACNA2D3, as a new tumor suppressor gene, can significantly inhibit lymph node metastasis of esophageal squamous cell carcinoma in clinical studies . Lymph nodes are immune sites for lymphocytes, which lays the foundation for studying the role of CACNA2D3 in septic shock. DUSP3 (dual specificity phosphatase 3), also called VHR (vaccinia-H1 related phosphatase), is a founding member of the bispecific protein phosphatase group . DUSP3 plays a role in Staphylococcus aureus infection , DUSP3, a positive regulator of innate immune response , is the main protein tyrosine phosphatase in macrophages mediating cellular processes (including immune responses) . This further illustrates that MAPK signaling pathway may play an irreplaceable role in septic shock by regulating related genes such as CACNA2D3 and DUSP3.
However, this study has certain limitations. Firstly, the sample size of the RT-PCR experiment is small, which may lead to a certain degree of error. More blood samples from septic shock patients are further needed to verify the expression of the identified mRNAs. Secondly, the molecular mechanism of DEmRNAs during septic shock has not been studied. More experiments are needed to further research the underlying mechanism of the disease.
In this study, in order to identify potential diagnostic gene biomarkers of septic shock, machine learning method was performed, followed by prognostic analysis. 15 superlative diagnostic gene biomarkers (KLRF1, UPP1, RAB13, KIF1B, CLEC5A, NARF, DUSP3, FCER1A, CACNA2D3, HMGN3, ECRP, HDAC4, LHFPL2, MGST1 and ARHGEF18) for septic shock were identified by machine learning analysis. It is worth mentioning that ARHGEF18 and FCER1A were related to survival. CACNA2D3 and DUSP3 participated in MAPK signaling pathway to regulate septic shock. Identified diagnostic gene biomarkers may be helpful in the diagnosis and therapy of patients with septic shock. This study can provide a basis for the research of septic shock.
Availability of data and materials
All data generated or analyzed during this study are included in this published article. The data sets (GSE4607, GSE13904, GSE26378, GSE26440, GSE65682 and GSE95233) analysed during the current study are available in the GEO (Gene Expression Omnibus) database, persistent accessible web link to database is https://www.ncbi.nlm.nih.gov/geo/.
Fernando SM, Rochwerg B, Seely AJE. Clinical implications of the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). Can Med Assoc J. 2018;190(36):E1058–9.
Fabri-Faja N, Calvo-Lozano O, Dey P, Terborg RA, Estevez MC, Belushkin A, et al. Early sepsis diagnosis via protein and miRNA biomarkers using a novel point-of-care photonic biosensor. Anal Chim Acta. 2019;1077:232–42.
Essandoh K, Fan GC. Role of extracellular and intracellular microRNAs in sepsis. Biochem Biophys Acta. 2014;1842(11):2155–62.
Shankar-Hari M, Phillips GS, Levy ML, Seymour CW, Liu VX, Deutschman CS, et al. Developing a new definition and assessing new clinical criteria for septic shock: for the third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):775–87.
Jacob JA. New sepsis diagnostic guidelines shift focus to organ dysfunction. JAMA. 2016;315(8):739–40.
Goodwin JK, Schaer M. Septic shock. Vet Clin N Am Small Anim Pract. 1989;19(6):1239–58.
Hernandez G, Bruhn A, Castro R, Regueira T. The holistic view on perfusion monitoring in septic shock. Curr Opin Crit Care. 2012;18(3):280–6.
Fang F, Zhang Y, Tang J, Lunsford LD, Li T, Tang R, et al. Association of corticosteroid treatment with outcomes in adult patients with sepsis: a systematic review and meta-analysis. JAMA Intern Med. 2019;179(2):213–23.
Dellinger RP, Levy MM, Rhodes A, Annane D, Gerlach H, Opal SM, et al. Surviving Sepsis Campaign: international guidelines for management of severe sepsis and septic shock, 2012. Intensive Care Med. 2013;39(2):165–228.
Coopersmith CM, De Backer D, Deutschman CS, Ferrer R, Lat I, Machado FR, et al. Surviving sepsis campaign: research priorities for sepsis and septic shock. Crit Care Med. 2018;46(8):1334–56.
Shankar-Hari M, Ambler M, Mahalingasivam V, Jones A, Rowan K, Rubenfeld GD. Evidence for a causal link between sepsis and long-term mortality: a systematic review of epidemiologic studies. Crit Care (Lond, Engl). 2016;20:101.
Norman BC, Cooke CR, Ely EW, Graves JA. Sepsis-associated 30-day risk-standardized readmissions: analysis of a nationwide medicare sample. Crit Care Med. 2017;45(7):1130–7.
Venkatesh B, Finfer S, Myburgh J, Cohen J, Billot L. Long-term outcomes of the ADRENAL. Trial. 2018;378(18):1744–5.
Ma J, Chen C, Barth AS, Cheadle C, Guan X. Lysosome and cytoskeleton pathways are robustly enriched in the blood of septic patients: a meta-analysis of transcriptomic data. Mediat Inflamm. 2015;2015:984825.
Yang J, Zhang S, Zhang J, Dong J, Wu J, Zhang L, et al. Identification of key genes and pathways using bioinformatics analysis in septic shock children. Infect Drug Resist. 2018;11:1163–74.
Mohammed A, Cui Y, Mas VR, Kamaleswaran R. Differential gene expression analysis reveals novel genes and pathways in pediatric septic shock patients. Sci Rep. 2019;9(1):11270.
Manatakis DV, VanDevender A, Manolakos ES. An information-theoretic approach for measuring the distance of organ tissue samples using their transcriptomic signatures. Bioinformatics (Oxf, Engl). 2021;36(21):5194–204.
Banerjee S, Mohammed A, Wong HR, Palaniyar N, Kamaleswaran R. Machine learning identifies complicated sepsis course and subsequent mortality based on 20 genes in peripheral blood immune cells at 24 h post-ICU admission. Front Immunol. 2021;12:592303.
Peiffer-Smadja N, Rawson TM, Ahmad R, Buchard A, Georgiou P, Lescure FX, et al. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clin Microbiol Infect. 2020;26(5):584–95.
Radakovich N, Nagy M, Nazha A. Machine learning in haematological malignancies. Lancet Haematol. 2020;7(7):e541–50.
Kim J, Chang H, Kim D, Jang DH, Park I, Kim K. Machine learning for prediction of septic shock at initial triage in emergency department. J Crit Care. 2020;55:163–70.
Dhungana P, Serafim LP, Ruiz AL, Bruns D, Weister TJ, Smischney NJ, et al. Machine learning in data abstraction: a computable phenotype for sepsis and septic shock diagnosis in the intensive care unit. World J Crit Care Med. 2019;8(7):120–6.
Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–10.
Reiner-Benaim A. FDR control by the BH procedure for two-sided correlated tests with implications to gene expression data analysis. Biom J. 2007;49(1):107–26.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol). 1995;57(1):289–300.
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.
Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 2019;28(11):1947–51.
Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49(D1):D545–51.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–22.
Wang Y, Chen L, Ju L, Xiao Y, Wang X. Tumor mutational burden related classifier is predictive of response to PD-L1 blockade in locally advanced and metastatic urothelial carcinoma. Int Immunopharmacol. 2020;87:106818.
Zlobec I, Steele R, Nigam N, Compton CC. A predictive model of rectal tumor response to preoperative radiotherapy using classification and regression tree methods. Clin Cancer Res. 2005;11(15):5440–3.
Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods (San Diego, Calif). 2001;25(4):402–8.
Ueda T, Furusawa T, Kurahashi T, Tessarollo L, Bustin M. The nucleosome binding protein HMGN3 modulates the transcription profile of pancreatic beta cells and affects insulin secretion. Mol Cell Biol. 2009;29(19):5264–76.
Mignemi NA, McClatchey PM, Kilchrist KV, Williams IM, Millis BA, Syring KE, et al. Rapid changes in the microvascular circulation of skeletal muscle impair insulin delivery during sepsis. Am J Physiol Endocrinol Metab. 2019;316(6):E1012–23.
Kurahashi T, Furusawa T, Ueda T, Bustin M. The nucleosome binding protein HMGN3 is expressed in pancreatic alpha-cells and affects plasma glucagon levels in mice. J Cell Biochem. 2010;109(1):49–57.
Bräutigam L, Zhang J, Dreij K, Spahiu L, Holmgren A, Abe H, et al. MGST1, a GSH transferase/peroxidase essential for development and hematopoietic stem cell differentiation. Redox Biol. 2018;17:171–9.
Björkhem-Bergman L, Johansson M, Morgenstern R, Rane A, Ekström L. Prenatal expression of thioredoxin reductase 1 (TRXR1) and microsomal glutathione transferase 1 (MGST1) in humans. FEBS Open Bio. 2014;4:886–91.
Sung PS, Chang WC, Hsieh SL. CLEC5A: a promiscuous pattern recognition receptor to microbes and beyond. Adv Exp Med Biol. 2020;1204:57–73.
Brown GD, Willment JA, Whitehead L. C-type lectins in immunity and homeostasis. Nat Rev Immunol. 2018;18(6):374–89.
Nangaku M, Sato-Yoshitake R, Okada Y, Noda Y, Takemura R, Yamazaki H, et al. KIF1B, a novel microtubule plus end-directed monomeric motor protein for transport of mitochondria. Cell. 1994;79(7):1209–20.
Munirajan AK, Ando K, Mukai A, Takahashi M, Suenaga Y, Ohira M, et al. KIF1Bbeta functions as a haploinsufficient tumor suppressor gene mapped to chromosome 1p36.2 by inducing apoptotic cell death. J Biol Chem. 2008;283(36):24426–34.
Schlisio S, Kenchappa RS, Vredeveld LC, George RE, Stewart R, Greulich H, et al. The kinesin KIF1Bbeta acts downstream from EglN3 to induce apoptosis and is a potential 1p36 tumor suppressor. Genes Dev. 2008;22(7):884–93.
Ando K, Yokochi T, Mukai A, Wei G, Li Y, Kramer S, et al. Tumor suppressor KIF1Bβ regulates mitochondrial apoptosis in collaboration with YME1L1. Mol Carcinog. 2019;58(7):1134–44.
Hirvonen MJ, Mulari MT, Büki KG, Vihko P, Härkönen PL, Väänänen HK. Rab13 is upregulated during osteoclast differentiation and associates with small vesicles revealing polarized distribution in resorbing cells. J Histochem Cytochem. 2012;60(7):537–49.
Roda-Navarro P, Arce I, Renedo M, Montgomery K, Kucherlapati R, Fernández-Ruiz E. Human KLRF1, a novel member of the killer cell lectin-like receptor gene family: molecular characterization, genomic structure, physical mapping to the NK gene complex and expression analysis. Eur J Immunol. 2000;30(2):568–76.
Yang T, Wang R, Zhang J, Bao C, Zhang J, Li R, et al. Mechanism of berberine in treating Helicobacter pylori induced chronic atrophic gastritis through IRF8-IFN-γ signaling axis suppressing. Life Sci. 2020;248:117456.
Remy S, Verstraelen S, Van Den Heuvel R, Nelissen I, Lambrechts N, Hooyberghs J, et al. Gene expressions changes in bronchial epithelial cells: markers for respiratory sensitizers and exploration of the NRF2 pathway. Toxicol In Vitro. 2014;28(2):209–17.
Wang J, Xu S, Lv W, Shi F, Mei S, Shan A, et al. Uridine phosphorylase 1 is a novel immune-related target and predicts worse survival in brain glioma. Cancer Med. 2020;9(16):5940–7.
Hamasaki MY, Severino P, Puga RD, Koike MK, Hernandes C, Barbeiro HV, et al. Short-term effects of sepsis and the impact of aging on the transcriptional profile of different brain regions. Inflammation. 2019;42(3):1023–31.
Park EJ, Kim YM, Kim HJ, Chang KC. Degradation of histone deacetylase 4 via the TLR4/JAK/STAT1 signaling pathway promotes the acetylation of high mobility group box 1 (HMGB1) in lipopolysaccharide-activated macrophages. FEBS Open Bio. 2018;8(7):1119–26.
Ha ZL, Yu ZY. Downregulation of miR-29b-3p aggravates podocyte injury by targeting HDAC4 in LPS-induced acute kidney injury. Kaohsiung J Med Sci. 2021;37:1069–76.
Ding D, Valdivia AO, Bhattacharya SK. Nuclear prelamin a recognition factor and iron dysregulation in multiple sclerosis. Metab Brain Dis. 2020;35(2):275–82.
Turton KB, Wilkerson EM, Hebert AS, Fogerty FJ, Schira HM, Botros FE, et al. Expression of novel “LOCGEF” isoforms of ARHGEF18 in eosinophils. J Leukoc Biol. 2018;104(1):135–45.
Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801–10.
Oliveira TM, de Faria FR, de Faria ER, Pereira PF, Franceschini SC, Priore SE. Nutritional status, metabolic changes and white blood cells in adolescents. Rev Paul Pediatr. 2014;32(4):351–9.
Weller PF, Spencer LA. Functions of tissue-resident eosinophils. Nat Rev Immunol. 2017;17(12):746–60.
Baioumy SA, Esawy MM, Shabana MA. Assessment of circulating FCεRIa in Chronic Spontaneous Urticaria patients and its correlation with clinical and immunological variables. Immunobiology. 2018;223(12):807–11.
Liao EC, Chang CY, Hsieh CW, Yu SJ, Yin SC, Tsai JJ. An exploratory pilot study of genetic marker for IgE-mediated allergic diseases with expressions of FcεR1α and Cε. Int J Mol Sci. 2015;16(5):9504–19.
Hua L, Zuo XB, Bao YX, Liu QH, Li JY, Lv J, et al. Four-locus gene interaction between IL13, IL4, FCER1B, and ADRB2 for asthma in Chinese Han children. Pediatr Pulmonol. 2016;51(4):364–71.
Du W, Hu H, Zhang J, Bao G, Chen R, Quan R. The mechanism of MAPK signal transduction pathway involved with electroacupuncture treatment for different diseases. Evid Based Complement Altern Med. 2019;2019:8138017.
Tiano S, Zhong-Ren L. Acupuncture-moxibustion and mitogen-activated protein kinase signal transduction pathways. Zhongguo zhen jiu Chin Acupunct Moxibustion. 2012;32(3):284–8.
Liang C, Wang S, Qin C, Bao M, Cheng G, Liu B, et al. TRIM36, a novel androgen-responsive gene, enhances anti-androgen efficacy against prostate cancer by inhibiting MAPK/ERK signaling pathways. Cell Death Dis. 2018;9(2):155.
Pan W, Wei N, Xu W, Wang G, Gong F, Li N. MicroRNA-124 alleviates the lung injury in mice with septic shock through inhibiting the activation of the MAPK signaling pathway by downregulating MAPK14. Int Immunopharmacol. 2019;76:105835.
Kong X, Li M, Shao K, Yang Y, Wang Q, Cai M. Progesterone induces cell apoptosis via the CACNA2D3/Ca2+/p38 MAPK pathway in endometrial cancer. Oncol Rep. 2020;43(1):121–32.
Jin Y, Cui D, Ren J, Wang K, Zeng T, Gao L. CACNA2D3 is downregulated in gliomas and functions as a tumor suppressor. Mol Carcinog. 2017;56(3):945–59.
Wong AM, Kong KL, Chen L, Liu M, Wong AM, Zhu C, et al. Characterization of CACNA2D3 as a putative tumor suppressor gene in the development and progression of nasopharyngeal carcinoma. Int J Cancer. 2013;133(10):2284–95.
Li Y, Zhu CL, Nie CJ, Li JC, Zeng TT, Zhou J, et al. Investigation of tumor suppressing function of CACNA2D3 in esophageal squamous cell carcinoma. PLoS ONE. 2013;8(4):e60027.
Ishibashi T, Bottaro DP, Chan A, Miki T, Aaronson SA. Expression cloning of a human dual-specificity phosphatase. Proc Natl Acad Sci USA. 1992;89(24):12170–4.
Yan Q, Sharma-Kuinkel BK, Deshmukh H, Tsalik EL, Cyr DD, Lucas J, et al. Dusp3 and Psme3 are associated with murine susceptibility to Staphylococcus aureus infection and human sepsis. PLoS Pathog. 2014;10(6):e1004149.
Singh P, Dejager L, Amand M, Theatre E, Vandereyken M, Zurashvili T, et al. DUSP3 genetic deletion confers M2-like macrophage-dependent tolerance to septic shock. J Immunol. 2015;194(10):4951–62.
Amand M, Erpicum C, Bajou K, Cerignoli F, Blacher S, Martin M, et al. DUSP3/VHR is a pro-angiogenic atypical dual-specificity phosphatase. Mol Cancer. 2014;13:108.
Ethics approval and consent to participate
This study was approved by the ethics committee the Second Affiliated Hospital of Shandong First Medical University (20200406). All participants were informed as to the purpose of this study, and that this study complied with the Declaration of Helsinki. The informed consent was obtained from the all participants.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Batch effect processing between different data sets.
Validation analysis in GSE95233 data set. A: Electronic expression validation of 15 diagnostic gene biomarkers in GSE95233 data set. **** represent P < 0.0001; B: ROC curve of DT classifier; C: ROC curve of RF classifier; D: ROC curve of SVM classifier. AUC: area under curve, ROC: receiver operating characteristic.
: Table S2. Calculation of Matthew’s Correlation Coefficient.
About this article
Cite this article
Fan, Y., Han, Q., Li, J. et al. Revealing potential diagnostic gene biomarkers of septic shock based on machine learning analysis. BMC Infect Dis 22, 65 (2022). https://doi.org/10.1186/s12879-022-07056-4