Skip to main content

Revealing potential diagnostic gene biomarkers of septic shock based on machine learning analysis

Abstract

Background

Sepsis is an inflammatory response caused by infection with pathogenic microorganisms. The body shock caused by it is called septic shock. In view of this, we aimed to identify potential diagnostic gene biomarkers of the disease.

Material and methods

Firstly, mRNAs expression data sets of septic shock were retrieved and downloaded from the GEO (Gene Expression Omnibus) database for differential expression analysis. Functional enrichment analysis was then used to identify the biological function of DEmRNAs (differentially expressed mRNAs). Machine learning analysis was used to determine the diagnostic gene biomarkers for septic shock. Thirdly, RT-PCR (real-time polymerase chain reaction) verification was performed. Lastly, GSE65682 data set was utilized to further perform diagnostic and prognostic analysis of identified superlative diagnostic gene biomarkers.

Results

A total of 843 DEmRNAs, including 458 up-regulated and 385 down-regulated DEmRNAs were obtained in septic shock. 15 superlative diagnostic gene biomarkers (such as RAB13, KIF1B, CLEC5A, FCER1A, CACNA2D3, DUSP3, HMGN3, MGST1 and ARHGEF18) for septic shock were identified by machine learning analysis. RF (random forests), SVM (support vector machine) and DT (decision tree) models were used to construct classification models. The accuracy of the DT, SVM and RF models were very high. Interestingly, the RF model had the highest accuracy. It is worth mentioning that ARHGEF18 and FCER1A were related to survival. CACNA2D3 and DUSP3 participated in MAPK signaling pathway to regulate septic shock.

Conclusion

Identified diagnostic gene biomarkers may be helpful in the diagnosis and therapy of patients with septic shock.

Peer Review reports

Background

Sepsis is an inflammatory response caused by infection with pathogenic microorganisms. The body shock caused by it is called septic shock. Sepsis is a reaction to systemic infections [1, 2]. Septic shock, associated with critical hypotension, is common acute diseases in the ICU (intensive care unit) [2, 3]. It is estimated that about 8 million people worldwide die from sepsis (usually septic shock) every year, and abnormalities in the circulatory system, cells, and metabolism can significantly increase mortality [1, 4].

Most of septic shock is caused by microbial infections (bacteria, viruses, fungi, etc.) [5]. In early microbial infections, humoral reactions are activated, in which immune cells (macrophages, neutrophils, etc.) recognize and destroy invading organisms [6]. Reduced blood vessel volume, cardiac dysfunction and peripheral vasodilation are major causes of septic shock [6, 7]. In view of this, active fluid resuscitation and anti-infective symptomatic treatment are performed in these patients [8,9,10]. However, 28-day and hospital mortality in patients remain very high [8]. Moreover, the probability of re-admission after discharge from hospital is higher than that of ordinary ICU patients, and a considerable proportion of patients have cognitive dysfunction after treatment [11,12,13].

Diagnosis and prognostic detection of diseases at the molecular level are now the general trend of development, which is also widely used by researchers in sepsis [14, 15]. Mohammed et al. used high-throughput sequencing technology to identify potential biomarkers and signaling pathways related to septic shock [16]. In addition, some researchers use TSD (transcriptomic signature distance) and meta-analysis to analyze the transcriptome data of septic shock patients [17, 18]. Machine learning is a branch of computer science and statistics that play an important role in the detection, diagnosis and treatment of diseases [19, 20]. Machine learning has also been used to study septic shock [21, 22]. However, most of these studies use machine learning to predict the progression of septic shock. Machine learning is rarely used to identify potential diagnostic and prognostic biomarkers of septic shock. Therefore, in order to identify potential diagnostic gene biomarkers of septic shock, machine learning method was performed, followed by prognostic analysis in this study. Our study could be valuable in understanding the pathological mechanism of septic shock and exploring novel diagnostic gene biomarker for the diagnostic and therapy of the disease.

Methods

Database

GEO [23] (Gene Expression Omnibus) database, mainly based on chip data, is developed by NCBI (National Center for Biotechnology Information). GSE4607, GSE13904, GSE26378, GSE26440, GSE65682 and GSE95233 data sets were obtained (Table 1). The original file was downloaded and the RMA algorithm was used for background adjustment and normalization. If multiple probes correspond to the same gene, the average value was taken. Among them, GSE4607, GSE13904, GSE26378, GSE26440 data sets were used for differential expression analysis and machine learning (test set), and GSE65682 data set was used for survival analysis. The GSE95233 data set was used for electronic expression verification of gene biomarkers (validation set). In this study, the GSE65682 data set was based on the chip data of GPL 13667 platform, and GSE4607, GSE13904, GSE26378, GSE26440 and GSE95233 data sets were based on the GPL570 platform. In order to avoid the difference caused by the detection technology of different platforms, the GSE65682 data set was not analyzed together with other data sets. Since the GSE4607, GSE13904, GSE26378 and GSE26440 data sets all came from GPL570 platform. Batch effect processing using the SVA package showed that the results of batch effect between the four data sets was not significant (Additional file 1: Fig. S1).

Table 1 Dataset retrieved from the GEO database

Identification of DEmRNAs (differentially expressed mRNAs)

In this study, Limma and metaMA packages were executed for identification of the DEmRNAs. The inverse normal method was used in the metaMA software package to merge P values. The FDR (false discovery rate) is the result obtained by repeating the test and correction of the original P value by the Benjamin and Hochberg methods [24, 25]. The FDR < 0.01 and |Combined.ES (effect size)|> 1.5 were screening thresholds of DEmRNAs.

Functional enrichment

To identify the function of identified genes, the DAVID (Database for Annotation, Visualization and Integrated Discovery, https://david.ncifcrf.gov/) database was used for GO (Gene Ontology, http://www.geneontology.org/) and KEGG (Kyoto encyclopedia of genes and genomes, http://www.genome.jp/kegg/pathway.html) functional enrichment analysis [26,27,28]. P < 0.05 was the threshold of significantly enriched GO and KEGG terms.

Identification of the superlative diagnostic gene biomarkers

Firstly, the R language in glmnet package was used to reduce data dimensions. The package not only has a large number of models, but also is much faster [29]. Secondly, the random forest algorithm was used to sort the importance of mRNA according to the Mean Decrease Accuracy value from large small. Then, the superlative number of features was identified by adding one differentially expressed mRNA at a time in a top down forward-wrapper approach. The superlative DEmRNAs with diagnostic value was selected for septic shock to establish a classification model including DT (decision tree), SVM (support vector machine) and RF (random forests). The ‘rpart’ packet in R (https://cran.r-project.org/web/packages/rpart/), ‘e1071’ package in R (https://cran.r-project.org/web/packages/e1071/index.html) and ‘random forests’ packet (https://cran.r-project.org/web/packages/randomForest/) was used to establish the DT model, SVM model and RF model, respectively. Tenfold cross-validation was used to compare the average misjudgment rates of the three models. Tenfold cross-validation was used to avoid the overfitting effect [30, 31]. The diagnostic ability of classification prediction was evaluated by the accuracy, sensitivity, specificity, and AUC (area under curve) values in the ROC (receiver operating characteristic) curve. Subsequently, the Matthew’s Correlation Coefficient of the model was calculated using the mcc function in the mltools package (https://pypi.org/project/mltools/1.0.2/).

Electronic expression verification, diagnostic and prognostic analysis of superlative diagnostic gene biomarkers

The GSE95233 data set (124 blood samples from 102 cases and 22 normal controls) was used for electronic expression verification. The GSE65682 data set (521 blood samples from 479 cases and 42 normal controls) contains 28 days of survival information of patients. This data set was used to further analyze the diagnostic and survival ability of key diagnostic gene biomarkers.

In vitro validation of identified DEmRNAs

The inclusion criteria for patients were diagnosed with septic shock. Detailed inclusion criteria for patients with septic shock were as follows: (1) the body temperature > 38 ℃ or < 36 ℃; (2) heart rate > 90 times per minute or greater than 2 standard deviations in the normal heart rate range of different ages; (3) respiratory rate > 20 times per minute or PaCO2 (partial pressure of carbon dioxide in artery) < 32 mmHg; (4) white blood cell count > 12.0 × 109/L or < 4.0 × 109/L, or more than 10% immature neutrophils; (5) patients with initial septic shock; (6) patients had cardiovascular organ dysfunction, acute respiratory distress syndrome, dysfunction of two or more other organs; (7) patients had complete clinical data, including gender, age, height, weight, etc. Patients with a history of cancer or other diseases, chemotherapy, radiotherapy, etc., and incomplete clinical data were excluded. The individuals in the normal control group were gender and age matched with the case group and had no disease before and within 2 weeks after sampling. Those individual who took glucocorticoids, had a history of febrile disease or any chronic/acute disease that is slightly associated with inflammation within 2 weeks of sampling were excluded.

According to the above criteria for septic shock, 16 blood samples from 8 patients and 8 normal controls were obtained for RT-PCR (real-time polymerase chain reaction). Total RNA was extracted by using RNAliquid ultra-speed whole blood (liquid sample) kit (RN2602, Beijing Huitian Oriental Technology Co., Ltd.). FastQuant cDNA synthesis kit (KR106, TIANGEN) was used to synthesize the cDNA. RT-PCR was performed using SuperReal PreMix Plus (SYBR Green) SuperReal reagent (FP205, TIANGEN). Each experiment was repeated three times. GAPDH (glyceraldehyde-3-phosphate dehydrogenase) and ACTB (actin beta) were used as internal control for gene detection. The relative expression levels were calculated as fold-changes using the 2−ΔΔCt method [32].

This study was approved by the ethics committee the Second Affiliated Hospital of Shandong First Medical University (20200406).

Statistical analysis

The GraphPad Prism was used to perform all statistical analyses. The significance cutoff of RT-PCR was P = 0.05 (Duncan’s multiple range test). One-way ANOVA (analysis of variance) with orthogonal contrasts and mean comparison procedures were used to detect differences between cases and normal controls. Experiments were independently repeated at least 3 times.

Results

DEmRNAs

According to screening criteria of FDR < 0.01 and |Combined.ES|> 1.5, a total of 843 DEmRNAs were identified. Among which, 458 were up-regulated and 385 were down-regulated (Additional file 3: Table S1). The heat map of top 100 DEmRNAs is shown in the Fig. 1.

Fig. 1
figure 1

Heat map of top 100 DEmRNAs. The figure shows the bidirectional hierarchical clustering results of the top 100 DEmRNAs and samples. A full chain method combined with Euclidean distance is used to establish clustering (row: DEmRNA, column: sample). The color cluster tree on the right indicates the relative expression level of mRNA. Red indicates below the reference channel. Blue indicates the above reference

Functional enrichment analysis of DEmRNAs

In order to understand the potential biological function of DEmRNAs, GO and KEGG functional enrichment analysis were performed. In GO terms of BP (biological process), all DEmRNAs were mainly involved in immune response, positive regulation of immune system process and leukocyte activation. In GO terms of CC (cell composition), all DEmRNAs were mainly involved in vesicle, cytoplasmic vesicle and nucleolus. In GO terms of MF (molecular function), all DEmRNAs were mainly involved in protein dimerization activity, cytokine binding and non-membrane spanning protein tyrosine kinase activity. The result is shown in Fig. 2A. Several signaling pathways in the KEGG enrichment analysis were identified, such as T cell receptor signaling pathway, primary immunodeficiency, MAPK signaling pathway, Jak–STAT signaling pathway and Fc epsilon RI signaling pathway (Fig. 2B). Among the 15 superlative diagnostic gene biomarkers, CACNA2D3 and DUSP3 participated in the MAPK signaling pathway.

Fig. 2
figure 2

Top 15 significantly enriched GO and top 13 significantly enriched KEGG terms of all DEmRNAs. A Top 15 significantly enriched GO terms enrichment of DEmRNAs. The z-score clustering in the GO terms of all DEmRNAs is shown below. Red represents mRNA up-regulation and blue represents mRNA down-regulation. GO Gene Ontology, BP biological process, CC cell composition, MF molecular function. B Top 13 significantly enriched KEGG terms of DEmRNAs. The KEGG different colors represent different signaling pathways [26,27,28]. KEGG Kyoto encyclopedia of genes and genomes

Identification of superlative diagnostic gene biomarkers

After reducing data dimensions, a total of 28 DEmRNAs were retained (Table 2). 28 DEmRNAs were ranked in order of importance according to Mean Decrease Accuracy value (Fig. 3A). According to the sequence of RF sequencing results, one mRNA was added successively from top to bottom. The RF algorithm was used for classification. The tenfold cross-validation was used to obtain the accuracy rate and AUC (Fig. 3B, C). It can be seen that when the number of mRNAs reached 15, the accuracy reached the maximum value for the first time. Therefore, the first 15 DEmRNAs (KLRF1, UPP1, RAB13, KIF1B, CLEC5A, NARF, DUSP3, FCER1A, CACNA2D3, HMGN3, ECRP, HDAC4, LHFPL2, MGST1 and ARHGEF18) were selected as the superlative diagnostic gene biomarkers. The heat map analysis of the 15 superlative diagnostic gene biomarkers is shown in Fig. 4.

Table 2 28 differentially expressed mRNAs after reducing data dimensions
Fig. 3
figure 3

The importance ranking of 28 DEmRNAs and the trend graph of accuracy rate and AUC increasing with the number of mRNAs. A Importance ranking of 28 DEmRNAs; B Trend graph of accuracy rate increasing with the number of mRNAs; C Trend graph of AUC increasing with the number of mRNAs. AUC area under curve

Fig. 4
figure 4

Heat map of 15 superlative diagnostic gene biomarkers. A full chain method combined with Euclidean distance is used to establish clustering. Each row represents a diagnostic biomarker, and each column represents a sample. The color cluster tree on the right indicates the relative expression level of mRNA. Red indicates below the reference channel. Blue indicates the above reference

Classification models were constructed based on the screened 15 genes. The RF model had the highest accuracy. The accuracy, sensitivity, specificity and AUC of each model using the tenfold cross-validation process is listed in Table 3. In addition, the AUC in the ROC curve of DT, RF and SVM, was respectively 0.962, 0.993, and 0.991 (Fig. 5). The diagnostic efficacy of the model composed of these 15 genes was also validated using the GSE95233 data set. The results showed that in the validation set, our diagnostic model also showed better performance (Additional file 2: Fig. S2B–D). In addition, the Matthew’s Correlation Coefficient also showed that our model showed high accuracy in the test set. Although the performance in the verification set was not as good as the test set, it also had better accuracy (Additional file 4: Table S2). Significantly, of 15 superlative diagnostic gene biomarkers, the AUC values of CLEC5A, DUSP3, ECRP, HDAC4, KIF1B, KLRF1, NARF, RAB13 and UPP1 were higher than 0.9, the sensitivity and specificity were higher than 0.8 in the ROC curve analysis (Fig. 6).

Table 3 Ten-fold cross-validation results of each model
Fig. 5
figure 5

ROC curve of DT, RF and SVM classifier. AUC area under curve, ROC receiver operating characteristic

Fig. 6
figure 6

ROC curve of 15 superlative diagnostic gene biomarkers. AUC area under curve, ROC receiver operating characteristic

Electronic expression verification, diagnosis and prognostic analysis of superlative diagnostic gene biomarkers

In order to further verify the expression of 15 diagnostic gene biomarkers, expression verification was performed using the GSE95233 data set. The results showed that ARHGEF18, CACNA2D3, FCER1A, HMGN3 and KLRF1 were significantly down-regulated in disease group, while CLEC5A, DUSP3, ECRP, HDAC4, KIF1B, LHFPL2, MGST1, NARF, RAB13 and UPP1 were significantly down-regulated compared with normal control group (Additional file 2: Fig. S2A). This verification result was completely consistent with the previous analysis result. The data set of GSE65682 was selected to perform further diagnosis and prognostic analysis of identified superlative diagnostic gene biomarkers (Fig. 7). The analysis results showed that only ARHGEF18 and FCER1A were related to survival. The AUC, sensitivity and specificity of ARHGEF18 were respectively 0.997, 0.967 and 1.000. The AUC, sensitivity and specificity of FCER1A were 0.985, 0.929 and 1.000, respectively (Fig. 7A, B). Box plots showed the expression levels of ARHGEF18 and FCER1A in different populations (Fig. 7C, D). In the survivor population, the expression levels of ARHGEF18 and FCER1A were significantly down-regulated, which was consisted with the bioinformatics analysis. The level of gene expression was the lowest among dead people. ARHGEF18 and FCER1A may influence the treatment effect of patients to a certain extent. Then the online survival software package (https://cran.r-project.org/web/packages/survival/index.html) was used to analyze the prognostic value of ARHGEF18 and FCER1A. The results showed that ARHGEF18 and FCER1A were significantly negatively correlated with survival (Fig. 7E, F).

Fig. 7
figure 7

Diagnosis and prognostic analysis of ARHGEF18 and FCER1A in GSE65682 data sets. A ROC curve of ARHGEF18; B ROC curve of FCER1A; C Box plot of ARHGEF18; D Box plot of FCER1A; E Survival curve of ARHGEF18; F Survival curve of FCER1A. AUC area under curve, ROC receiver operating characteristic

RT-PCR validation

The information of enrolled individuals is shown in Table 4. According to diagnostic analysis, prognostic analysis and literature reports, ARHGEF18, CLEC5A, FCER1A, HDAC4, KLRF1, DUSP3 and UPP1 were selected for RT-PCR verification. The primers are shown in Table 5. The results showed that CLEC5A, DUSP3, HDAC4 and UPP1 were up-regulated trend and FCER1A and KLRF1 were down-regulated trend (Fig. 8). The genes expression trend in the verification result was consistent with the bioinformatics analysis, except for ARHGEF18. Small sample size may cause some inconformity. In addition, further research is needed.

Table 4 Clinical information of patients and normal controls in the RT-PCR
Table 5 Primer sequence in the RT-PCR
Fig. 8
figure 8

RT-PCR validation of ARHGEF18, CLEC5A, DUSP3, FCER1A, HDAC4, KLRF1 and UPP1 in blood samples. The vertical coordinate and horizontal coordinates represent relative gene expression and sample type, respectively. Normal, normal controls; SIRS, septic shock patients

Discussion

Based on the machine learning method, 15 DEmRNAs, such as HMGN3, CACNA2D3, DUSP3, MGST1, CLEC5A, KIF1B, RAB13, ARHGEF18 and FCER1A, were determined as the superlative diagnostic gene biomarkers. The final survival analysis showed that only FCER1A and ARHGEF18 had obvious prognostic value.

HMGN3 (high mobility group nucleosomal binding domain 3) plays an important regulatory role in pancreatic cells [33]. In patients with sepsis, high blood sugar is a risk factor for poor prognosis. During sepsis, the rapid changes in microvascular circulation in skeletal muscle have a serious hindrance to the delivery of insulin [34]. HMGN3 can reduce the level of glucagon in the plasma [35] to maintain stable blood sugar level in the body. In this study, HMGN3 was down-regulated in patients, which laid the foundation for further verification of the role in sepsis.

MGST1 (microsomal glutathione s-transferase 1), an important redox and detoxification enzyme, play a crucial role in cell defense and hematopoiesis [36, 37]. CLEC5A (c-type lectin domain containing 5A) is a Syk (spleen tyrosine kinase) coupled c-type lectin, mainly expressed in myeloid cells, such as macrophages and neutrophils [38], participates in host defense, inflammation, platelet activation and development [39]. KIF1B (kinesin family member 1B) gene belongs to the kinesin superfamily, which is responsible for encoding proteins that transport mitochondria and synaptic vesicle precursors within the cell [40]. In addition, KIF1B is found to be a tumor suppressor gene [41, 42], which has a potential role in mitochondrial morphological changes. KIF1B and mitochondrial metalloproteinase YME1L1 (YME1 like 1 ATPase) coordinately regulate mitochondrial fission to induce mitochondrial apoptosis [43]. In the early stage of sepsis, released NO (nitric oxide) can directly block mitochondrial respiration and cause body shock when accumulated to a certain degree [6]. The potential role of KIF1B in mitochondria suggested that it may play a role in septic shock. RAB13 (RAB13, member RAS oncogene family) is present in all macrophage-related cells [44]. In our study, MGST1, CLEC5A, KIF1B and RAB13 were all up-regulated in patients. This showed that MGST1, CLEC5A, KIF1B and RAB13 could play a crucial role in septic shock.

KLRF1 (killer cell lectin like receptor F1) is an activating homodimeric C-type lectin-like receptor, which plays an important role in regulating the activity of natural killer cells and monocytes [45]. Recently, UPP1 (uridine phosphorylase 1) is reported to play an important role in immune and inflammatory biological process of disease [46,47,48]. Previous studies have found that the expression of UPP1 is increased in the brain of sepsis rats [49]. HDAC4 (histone deacetylase 4) plays an important regulatory role in sepsis and may be an effective target for sepsis treatment [50, 51]. The expression level of NARF (nuclear prelamin A recognition factor) in multiple sclerosis (a chronic neuroinflammatory disease) was increased [52]. So far, we have not found any studies on ECRP (ribonuclease A family member 2C, pseudogene) and LHFPL2 (LHFPL tetraspan subfamily member 2) in inflammatory or immune diseases. This article may first report that ECRP and LHFPL2 play a role in the progression of septic shock. In our study, KLRF1 (down-regulated), UPP1 (up-regulated), HDAC4 (up-regulated), NARF (up-regulated), ECRP (up-regulated) and LHFPL2 (up-regulated) were all abnormally expressed and could be considered as potential diagnostic biomarkers. These results suggested that KLRF1, UPP1, HDAC4, NARF, ECRP and LHFPL2 play a key role in septic shock. It provides a potential direction for further research on septic shock.

The protein encoded by ARHGEF18 (Rho/Rac guanine nucleotide exchange factor 18) plays an important role in activating eosinophils and other white blood cells [53]. Sepsis is a high-risk disease caused by host reaction disorder and endangering the safety of life [54]. Eosinophils are components of white blood cells of the immune defense system, and play a role in evolution of inflammation and disease [55, 56]. FCER1A (Fc fragment of IgE receptor Ia) is an IgE receptor (immunoglobulin receptor), which is the initiating factor of allergic reactions and plays a role in allergic inflammation [57, 58]. The interaction between FCER1B and other immunoglobulin-related inflammatory genes will increase the risk of asthma [59]. In this study, ARHGEF18 and FCER1A were related to survival. In the enriched GO function, ARHGEF18 is mainly involved in regulating cell death and apoptosis. FCER1A is mainly involved in regulating immune regulation and metabolic processes. This further showed that ARHGEF18 and FCER1A may be related to the survival of septic shock patients.

The MAPK (mitogen-activated protein kinase) signaling pathway play a crucial part in the regulation of diseases, such as anti-inflammatory, analgesic, protective injury, etc. [60]. MAPK contains three sub-pathways p38MAPK (p38 mitogen-activated protein kinase), ERK-1/2 (extracellular signal-regulated kinase), and JNK (c-Jun-terminal kinase) [61, 62]. Among them, the p38MAPK and JNK signaling pathways play a role in hamowanie wzrostu, inflammation and pro-apoptotic signaling [60]. MAPK pathway can be activated by extracellular signals, such as cytokines involved in inflammatory response, growth factors that regulate growth and metabolism, bacterial complexes [60]. Inhibiting the activation of the MAPK pathway can reduce lung injury caused by septic shock [63]. In the KEGG enrichment, CACNA2D3 and DUSP3 were taken part in the MAPK signaling pathway. CACNA2D3 (calcium voltage-gated channel auxiliary subunit alpha2delta3) plays an important role in canceration [64,65,66]. CACNA2D3 is expressed in low levels in endometrial cancer tissues and cells [64]. Overexpression of CACNA2D3 in vitro significantly inhibits tumor cell proliferation and migration [64]. CACNA2D3, as a new tumor suppressor gene, can significantly inhibit lymph node metastasis of esophageal squamous cell carcinoma in clinical studies [67]. Lymph nodes are immune sites for lymphocytes, which lays the foundation for studying the role of CACNA2D3 in septic shock. DUSP3 (dual specificity phosphatase 3), also called VHR (vaccinia-H1 related phosphatase), is a founding member of the bispecific protein phosphatase group [68]. DUSP3 plays a role in Staphylococcus aureus infection [69], DUSP3, a positive regulator of innate immune response [70], is the main protein tyrosine phosphatase in macrophages mediating cellular processes (including immune responses) [71]. This further illustrates that MAPK signaling pathway may play an irreplaceable role in septic shock by regulating related genes such as CACNA2D3 and DUSP3.

However, this study has certain limitations. Firstly, the sample size of the RT-PCR experiment is small, which may lead to a certain degree of error. More blood samples from septic shock patients are further needed to verify the expression of the identified mRNAs. Secondly, the molecular mechanism of DEmRNAs during septic shock has not been studied. More experiments are needed to further research the underlying mechanism of the disease.

Conclusions

In this study, in order to identify potential diagnostic gene biomarkers of septic shock, machine learning method was performed, followed by prognostic analysis. 15 superlative diagnostic gene biomarkers (KLRF1, UPP1, RAB13, KIF1B, CLEC5A, NARF, DUSP3, FCER1A, CACNA2D3, HMGN3, ECRP, HDAC4, LHFPL2, MGST1 and ARHGEF18) for septic shock were identified by machine learning analysis. It is worth mentioning that ARHGEF18 and FCER1A were related to survival. CACNA2D3 and DUSP3 participated in MAPK signaling pathway to regulate septic shock. Identified diagnostic gene biomarkers may be helpful in the diagnosis and therapy of patients with septic shock. This study can provide a basis for the research of septic shock.

Availability of data and materials

All data generated or analyzed during this study are included in this published article. The data sets (GSE4607, GSE13904, GSE26378, GSE26440, GSE65682 and GSE95233) analysed during the current study are available in the GEO (Gene Expression Omnibus) database, persistent accessible web link to database is https://www.ncbi.nlm.nih.gov/geo/.

References

  1. Fernando SM, Rochwerg B, Seely AJE. Clinical implications of the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). Can Med Assoc J. 2018;190(36):E1058–9.

    Article  Google Scholar 

  2. Fabri-Faja N, Calvo-Lozano O, Dey P, Terborg RA, Estevez MC, Belushkin A, et al. Early sepsis diagnosis via protein and miRNA biomarkers using a novel point-of-care photonic biosensor. Anal Chim Acta. 2019;1077:232–42.

    Article  CAS  PubMed  Google Scholar 

  3. Essandoh K, Fan GC. Role of extracellular and intracellular microRNAs in sepsis. Biochem Biophys Acta. 2014;1842(11):2155–62.

    CAS  PubMed  Google Scholar 

  4. Shankar-Hari M, Phillips GS, Levy ML, Seymour CW, Liu VX, Deutschman CS, et al. Developing a new definition and assessing new clinical criteria for septic shock: for the third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):775–87.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Jacob JA. New sepsis diagnostic guidelines shift focus to organ dysfunction. JAMA. 2016;315(8):739–40.

    Article  CAS  PubMed  Google Scholar 

  6. Goodwin JK, Schaer M. Septic shock. Vet Clin N Am Small Anim Pract. 1989;19(6):1239–58.

    Article  CAS  Google Scholar 

  7. Hernandez G, Bruhn A, Castro R, Regueira T. The holistic view on perfusion monitoring in septic shock. Curr Opin Crit Care. 2012;18(3):280–6.

    Article  PubMed  Google Scholar 

  8. Fang F, Zhang Y, Tang J, Lunsford LD, Li T, Tang R, et al. Association of corticosteroid treatment with outcomes in adult patients with sepsis: a systematic review and meta-analysis. JAMA Intern Med. 2019;179(2):213–23.

    Article  PubMed  Google Scholar 

  9. Dellinger RP, Levy MM, Rhodes A, Annane D, Gerlach H, Opal SM, et al. Surviving Sepsis Campaign: international guidelines for management of severe sepsis and septic shock, 2012. Intensive Care Med. 2013;39(2):165–228.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Coopersmith CM, De Backer D, Deutschman CS, Ferrer R, Lat I, Machado FR, et al. Surviving sepsis campaign: research priorities for sepsis and septic shock. Crit Care Med. 2018;46(8):1334–56.

    Article  PubMed  Google Scholar 

  11. Shankar-Hari M, Ambler M, Mahalingasivam V, Jones A, Rowan K, Rubenfeld GD. Evidence for a causal link between sepsis and long-term mortality: a systematic review of epidemiologic studies. Crit Care (Lond, Engl). 2016;20:101.

    Article  Google Scholar 

  12. Norman BC, Cooke CR, Ely EW, Graves JA. Sepsis-associated 30-day risk-standardized readmissions: analysis of a nationwide medicare sample. Crit Care Med. 2017;45(7):1130–7.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Venkatesh B, Finfer S, Myburgh J, Cohen J, Billot L. Long-term outcomes of the ADRENAL. Trial. 2018;378(18):1744–5.

    Google Scholar 

  14. Ma J, Chen C, Barth AS, Cheadle C, Guan X. Lysosome and cytoskeleton pathways are robustly enriched in the blood of septic patients: a meta-analysis of transcriptomic data. Mediat Inflamm. 2015;2015:984825.

    Article  Google Scholar 

  15. Yang J, Zhang S, Zhang J, Dong J, Wu J, Zhang L, et al. Identification of key genes and pathways using bioinformatics analysis in septic shock children. Infect Drug Resist. 2018;11:1163–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Mohammed A, Cui Y, Mas VR, Kamaleswaran R. Differential gene expression analysis reveals novel genes and pathways in pediatric septic shock patients. Sci Rep. 2019;9(1):11270.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Manatakis DV, VanDevender A, Manolakos ES. An information-theoretic approach for measuring the distance of organ tissue samples using their transcriptomic signatures. Bioinformatics (Oxf, Engl). 2021;36(21):5194–204.

    Article  Google Scholar 

  18. Banerjee S, Mohammed A, Wong HR, Palaniyar N, Kamaleswaran R. Machine learning identifies complicated sepsis course and subsequent mortality based on 20 genes in peripheral blood immune cells at 24 h post-ICU admission. Front Immunol. 2021;12:592303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Peiffer-Smadja N, Rawson TM, Ahmad R, Buchard A, Georgiou P, Lescure FX, et al. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clin Microbiol Infect. 2020;26(5):584–95.

    Article  CAS  PubMed  Google Scholar 

  20. Radakovich N, Nagy M, Nazha A. Machine learning in haematological malignancies. Lancet Haematol. 2020;7(7):e541–50.

    Article  PubMed  Google Scholar 

  21. Kim J, Chang H, Kim D, Jang DH, Park I, Kim K. Machine learning for prediction of septic shock at initial triage in emergency department. J Crit Care. 2020;55:163–70.

    Article  PubMed  Google Scholar 

  22. Dhungana P, Serafim LP, Ruiz AL, Bruns D, Weister TJ, Smischney NJ, et al. Machine learning in data abstraction: a computable phenotype for sepsis and septic shock diagnosis in the intensive care unit. World J Crit Care Med. 2019;8(7):120–6.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Reiner-Benaim A. FDR control by the BH procedure for two-sided correlated tests with implications to gene expression data analysis. Biom J. 2007;49(1):107–26.

    Article  PubMed  Google Scholar 

  25. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol). 1995;57(1):289–300.

    Google Scholar 

  26. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 2019;28(11):1947–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49(D1):D545–51.

    Article  CAS  PubMed  Google Scholar 

  29. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–22.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Wang Y, Chen L, Ju L, Xiao Y, Wang X. Tumor mutational burden related classifier is predictive of response to PD-L1 blockade in locally advanced and metastatic urothelial carcinoma. Int Immunopharmacol. 2020;87:106818.

    Article  CAS  PubMed  Google Scholar 

  31. Zlobec I, Steele R, Nigam N, Compton CC. A predictive model of rectal tumor response to preoperative radiotherapy using classification and regression tree methods. Clin Cancer Res. 2005;11(15):5440–3.

    Article  CAS  PubMed  Google Scholar 

  32. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods (San Diego, Calif). 2001;25(4):402–8.

    Article  CAS  Google Scholar 

  33. Ueda T, Furusawa T, Kurahashi T, Tessarollo L, Bustin M. The nucleosome binding protein HMGN3 modulates the transcription profile of pancreatic beta cells and affects insulin secretion. Mol Cell Biol. 2009;29(19):5264–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Mignemi NA, McClatchey PM, Kilchrist KV, Williams IM, Millis BA, Syring KE, et al. Rapid changes in the microvascular circulation of skeletal muscle impair insulin delivery during sepsis. Am J Physiol Endocrinol Metab. 2019;316(6):E1012–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Kurahashi T, Furusawa T, Ueda T, Bustin M. The nucleosome binding protein HMGN3 is expressed in pancreatic alpha-cells and affects plasma glucagon levels in mice. J Cell Biochem. 2010;109(1):49–57.

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Bräutigam L, Zhang J, Dreij K, Spahiu L, Holmgren A, Abe H, et al. MGST1, a GSH transferase/peroxidase essential for development and hematopoietic stem cell differentiation. Redox Biol. 2018;17:171–9.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Björkhem-Bergman L, Johansson M, Morgenstern R, Rane A, Ekström L. Prenatal expression of thioredoxin reductase 1 (TRXR1) and microsomal glutathione transferase 1 (MGST1) in humans. FEBS Open Bio. 2014;4:886–91.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Sung PS, Chang WC, Hsieh SL. CLEC5A: a promiscuous pattern recognition receptor to microbes and beyond. Adv Exp Med Biol. 2020;1204:57–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Brown GD, Willment JA, Whitehead L. C-type lectins in immunity and homeostasis. Nat Rev Immunol. 2018;18(6):374–89.

    Article  CAS  PubMed  Google Scholar 

  40. Nangaku M, Sato-Yoshitake R, Okada Y, Noda Y, Takemura R, Yamazaki H, et al. KIF1B, a novel microtubule plus end-directed monomeric motor protein for transport of mitochondria. Cell. 1994;79(7):1209–20.

    Article  CAS  PubMed  Google Scholar 

  41. Munirajan AK, Ando K, Mukai A, Takahashi M, Suenaga Y, Ohira M, et al. KIF1Bbeta functions as a haploinsufficient tumor suppressor gene mapped to chromosome 1p36.2 by inducing apoptotic cell death. J Biol Chem. 2008;283(36):24426–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Schlisio S, Kenchappa RS, Vredeveld LC, George RE, Stewart R, Greulich H, et al. The kinesin KIF1Bbeta acts downstream from EglN3 to induce apoptosis and is a potential 1p36 tumor suppressor. Genes Dev. 2008;22(7):884–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Ando K, Yokochi T, Mukai A, Wei G, Li Y, Kramer S, et al. Tumor suppressor KIF1Bβ regulates mitochondrial apoptosis in collaboration with YME1L1. Mol Carcinog. 2019;58(7):1134–44.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Hirvonen MJ, Mulari MT, Büki KG, Vihko P, Härkönen PL, Väänänen HK. Rab13 is upregulated during osteoclast differentiation and associates with small vesicles revealing polarized distribution in resorbing cells. J Histochem Cytochem. 2012;60(7):537–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Roda-Navarro P, Arce I, Renedo M, Montgomery K, Kucherlapati R, Fernández-Ruiz E. Human KLRF1, a novel member of the killer cell lectin-like receptor gene family: molecular characterization, genomic structure, physical mapping to the NK gene complex and expression analysis. Eur J Immunol. 2000;30(2):568–76.

    Article  CAS  PubMed  Google Scholar 

  46. Yang T, Wang R, Zhang J, Bao C, Zhang J, Li R, et al. Mechanism of berberine in treating Helicobacter pylori induced chronic atrophic gastritis through IRF8-IFN-γ signaling axis suppressing. Life Sci. 2020;248:117456.

    Article  CAS  PubMed  Google Scholar 

  47. Remy S, Verstraelen S, Van Den Heuvel R, Nelissen I, Lambrechts N, Hooyberghs J, et al. Gene expressions changes in bronchial epithelial cells: markers for respiratory sensitizers and exploration of the NRF2 pathway. Toxicol In Vitro. 2014;28(2):209–17.

    Article  CAS  PubMed  Google Scholar 

  48. Wang J, Xu S, Lv W, Shi F, Mei S, Shan A, et al. Uridine phosphorylase 1 is a novel immune-related target and predicts worse survival in brain glioma. Cancer Med. 2020;9(16):5940–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Hamasaki MY, Severino P, Puga RD, Koike MK, Hernandes C, Barbeiro HV, et al. Short-term effects of sepsis and the impact of aging on the transcriptional profile of different brain regions. Inflammation. 2019;42(3):1023–31.

    Article  CAS  PubMed  Google Scholar 

  50. Park EJ, Kim YM, Kim HJ, Chang KC. Degradation of histone deacetylase 4 via the TLR4/JAK/STAT1 signaling pathway promotes the acetylation of high mobility group box 1 (HMGB1) in lipopolysaccharide-activated macrophages. FEBS Open Bio. 2018;8(7):1119–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Ha ZL, Yu ZY. Downregulation of miR-29b-3p aggravates podocyte injury by targeting HDAC4 in LPS-induced acute kidney injury. Kaohsiung J Med Sci. 2021;37:1069–76.

    Article  CAS  PubMed  Google Scholar 

  52. Ding D, Valdivia AO, Bhattacharya SK. Nuclear prelamin a recognition factor and iron dysregulation in multiple sclerosis. Metab Brain Dis. 2020;35(2):275–82.

    Article  CAS  PubMed  Google Scholar 

  53. Turton KB, Wilkerson EM, Hebert AS, Fogerty FJ, Schira HM, Botros FE, et al. Expression of novel “LOCGEF” isoforms of ARHGEF18 in eosinophils. J Leukoc Biol. 2018;104(1):135–45.

    Article  CAS  PubMed  Google Scholar 

  54. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Oliveira TM, de Faria FR, de Faria ER, Pereira PF, Franceschini SC, Priore SE. Nutritional status, metabolic changes and white blood cells in adolescents. Rev Paul Pediatr. 2014;32(4):351–9.

    PubMed  PubMed Central  Google Scholar 

  56. Weller PF, Spencer LA. Functions of tissue-resident eosinophils. Nat Rev Immunol. 2017;17(12):746–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Baioumy SA, Esawy MM, Shabana MA. Assessment of circulating FCεRIa in Chronic Spontaneous Urticaria patients and its correlation with clinical and immunological variables. Immunobiology. 2018;223(12):807–11.

    Article  CAS  PubMed  Google Scholar 

  58. Liao EC, Chang CY, Hsieh CW, Yu SJ, Yin SC, Tsai JJ. An exploratory pilot study of genetic marker for IgE-mediated allergic diseases with expressions of FcεR1α and Cε. Int J Mol Sci. 2015;16(5):9504–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Hua L, Zuo XB, Bao YX, Liu QH, Li JY, Lv J, et al. Four-locus gene interaction between IL13, IL4, FCER1B, and ADRB2 for asthma in Chinese Han children. Pediatr Pulmonol. 2016;51(4):364–71.

    Article  PubMed  Google Scholar 

  60. Du W, Hu H, Zhang J, Bao G, Chen R, Quan R. The mechanism of MAPK signal transduction pathway involved with electroacupuncture treatment for different diseases. Evid Based Complement Altern Med. 2019;2019:8138017.

    Article  Google Scholar 

  61. Tiano S, Zhong-Ren L. Acupuncture-moxibustion and mitogen-activated protein kinase signal transduction pathways. Zhongguo zhen jiu Chin Acupunct Moxibustion. 2012;32(3):284–8.

    Google Scholar 

  62. Liang C, Wang S, Qin C, Bao M, Cheng G, Liu B, et al. TRIM36, a novel androgen-responsive gene, enhances anti-androgen efficacy against prostate cancer by inhibiting MAPK/ERK signaling pathways. Cell Death Dis. 2018;9(2):155.

    Article  PubMed  PubMed Central  Google Scholar 

  63. Pan W, Wei N, Xu W, Wang G, Gong F, Li N. MicroRNA-124 alleviates the lung injury in mice with septic shock through inhibiting the activation of the MAPK signaling pathway by downregulating MAPK14. Int Immunopharmacol. 2019;76:105835.

    Article  CAS  PubMed  Google Scholar 

  64. Kong X, Li M, Shao K, Yang Y, Wang Q, Cai M. Progesterone induces cell apoptosis via the CACNA2D3/Ca2+/p38 MAPK pathway in endometrial cancer. Oncol Rep. 2020;43(1):121–32.

    CAS  PubMed  Google Scholar 

  65. Jin Y, Cui D, Ren J, Wang K, Zeng T, Gao L. CACNA2D3 is downregulated in gliomas and functions as a tumor suppressor. Mol Carcinog. 2017;56(3):945–59.

    Article  CAS  PubMed  Google Scholar 

  66. Wong AM, Kong KL, Chen L, Liu M, Wong AM, Zhu C, et al. Characterization of CACNA2D3 as a putative tumor suppressor gene in the development and progression of nasopharyngeal carcinoma. Int J Cancer. 2013;133(10):2284–95.

    Article  CAS  PubMed  Google Scholar 

  67. Li Y, Zhu CL, Nie CJ, Li JC, Zeng TT, Zhou J, et al. Investigation of tumor suppressing function of CACNA2D3 in esophageal squamous cell carcinoma. PLoS ONE. 2013;8(4):e60027.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Ishibashi T, Bottaro DP, Chan A, Miki T, Aaronson SA. Expression cloning of a human dual-specificity phosphatase. Proc Natl Acad Sci USA. 1992;89(24):12170–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Yan Q, Sharma-Kuinkel BK, Deshmukh H, Tsalik EL, Cyr DD, Lucas J, et al. Dusp3 and Psme3 are associated with murine susceptibility to Staphylococcus aureus infection and human sepsis. PLoS Pathog. 2014;10(6):e1004149.

    Article  PubMed  PubMed Central  Google Scholar 

  70. Singh P, Dejager L, Amand M, Theatre E, Vandereyken M, Zurashvili T, et al. DUSP3 genetic deletion confers M2-like macrophage-dependent tolerance to septic shock. J Immunol. 2015;194(10):4951–62.

    Article  CAS  PubMed  Google Scholar 

  71. Amand M, Erpicum C, Bajou K, Cerignoli F, Blacher S, Martin M, et al. DUSP3/VHR is a pro-angiogenic atypical dual-specificity phosphatase. Mol Cancer. 2014;13:108.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Conception and design: YF. Administrative support: HL. Provide materials and samples: QH. Data collection and collation: XZ and JL. Data analysis and interpretation: TX and GY. All authors have made important contributions to data analysis, drafting the article or revising the article. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Huaqing Li.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the ethics committee the Second Affiliated Hospital of Shandong First Medical University (20200406). All participants were informed as to the purpose of this study, and that this study complied with the Declaration of Helsinki. The informed consent was obtained from the all participants.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1.

Batch effect processing between different data sets.

Additional file 2: Figure S2.

Validation analysis in GSE95233 data set. A: Electronic expression validation of 15 diagnostic gene biomarkers in GSE95233 data set. **** represent P < 0.0001; B: ROC curve of DT classifier; C: ROC curve of RF classifier; D: ROC curve of SVM classifier. AUC: area under curve, ROC: receiver operating characteristic. 

Additional file 3: Table S1.

All DEmRNAs.

Additional file 4

: Table S2. Calculation of Matthew’s Correlation Coefficient.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, Y., Han, Q., Li, J. et al. Revealing potential diagnostic gene biomarkers of septic shock based on machine learning analysis. BMC Infect Dis 22, 65 (2022). https://doi.org/10.1186/s12879-022-07056-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12879-022-07056-4

Keywords