- Research article
- Open Access
Gene network in pulmonary tuberculosis based on bioinformatic analysis
BMC Infectious Diseases volume 20, Article number: 612 (2020)
Pulmonary tuberculosis (PTB) is one of the serious infectious diseases worldwide; however, the gene network involved in the host response remain largely unclear.
This study integrated two cohorts profile datasets GSE34608 and GSE83456 to elucidate the potential gene network and signaling pathways in PTB. Differentially expressed genes (DEGs) were obtained for Gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis using Metascape database. Protein-Protein Interaction (PPI) network of DEGs was constructed by the online database the Search Tool for the Retrieval of Interacting Genes (STRING). Modules were identified by the plug-in APP Molecular Complex Detection (MCODE) in Cytoscape. GO and KEGG pathway of Module 1 were further analyzed by STRING. Hub genes were selected for further expression validation in dataset GSE19439. The gene expression level was also investigated in the dataset GSE31348 to display the change pattern during the PTB treatment.
Totally, 180 shared DEGs were identified from two datasets. Gene function and KEGG pathway enrichment revealed that DEGs mainly enriched in defense response to other organism, response to bacterium, myeloid leukocyte activation, cytokine production, etc. Seven modules were clustered based on PPI network. Module 1 contained 35 genes related to cytokine associated functions, among which 14 genes, including chemokine receptors, interferon-induced proteins and Toll-like receptors, were identified as hub genes. Expression levels of the hub genes were validated with a third dataset GSE19439. The signature of this core gene network showed significant response to Mycobacterium tuberculosis (Mtb) infection, and correlated with the gene network pattern during anti-PTB therapy.
Our study unveils the coordination of causal genes during PTB infection, and provides a promising gene panel for PTB diagnosis. As major regulators of the host immune response to Mtb infection, the 14 hub genes are also potential molecular targets for developing PTB drugs.
Pulmonary tuberculosis (PTB) is one of the serious infectious diseases with high mortality in the world. PTB is caused by various strains of mycobacteria with Mycobacterium tuberculosis (Mtb) being mostly observed in human. According to the World Health Organization (WHO) report, there were 10 million new cases of PTB disease and 1.5 million deaths worldwide in 2017 (WHO, 2018). It has been estimated that one third of the world’s population are infected with Mtb as latent infections, among which 5 to 10% would develop into active tuberculosis (TB) [1, 2]. Quick diagnostic and efficient treatment are of great importance to control the spread of PTB and reduce its mortality [3, 4]. Despite accumulating evidence on the mechanism of PTB, the molecular processes and the specific gene regulations in the progression of PTB remain to be explored.
Omics approaches, like genomics, transcriptomics, proteomics and metabolomics, are high-throughput methods that provide an opportunity to investigate the global gene expression changes in PTB . Transcriptome profiling based on microarray or next-generation sequencing has been widely used for differentially expressed genes (DEGs) screening in human diseases. With the application of genechips, a large amount of data has been produced, most of which have been deposited in public databases. Integrating and re-analyzing these data provide valuable clues to advance our researches. In recently years, many microarray data profiling studies have been performed on PTB . Through bioinformatic analysis, a number of DEGs and functional pathways have been identified . However, these results are either inconsistent due to sample heterogeneity in individual studies, or limited by a single cohort study. So far, no reliable biomarkers are available for PTB diagnostics. Integrated bioinformatic analysis by combining these expression profiling data together would be a powerful approach to solve the disadvantages.
Here we analyzed two microarray datasets GSE34608 and GSE83456 from human whole blood samples including 53 health controls and 79 PTB samples. Multiple bioinformatics methods were employed to identify DEGs between the two datasets. Gene Ontology, pathway enrichment, Protein-Protein Interaction (PPI) network construction were performed to reveal the function of hub genes in PTB. Findings of this study might help to explore essential diagnostic signatures for PTB and shed a light on the molecular targets to treat PTB.
Gene expression microarray data acquisition
NCBI Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo) is a public functional genomics database with high throughput gene expression sequencing data and microarrays data. Two gene expression datasets GSE34608  and GSE83456 , were downloaded from GEO. GSE34608 contained 8 PTB samples and 18 control samples, which is based on GPL6480 platform (Agilent-014850 Whole Human Genome Microarray 4x44K G4112F). The GSE83456 dataset contained 45 PTB tissue samples and 61 control samples. It is based on GPL10558 platform (Illumina HumanHT-12 V4.0 expression beadchip). Another two datasets GSE19439 and GSE31348 were used for hub gene validation. GSE19439 contained 12 health and 13 PTB samples were used as validation dataset . GSE19439 is based on GPL6947 platform (Illumina HumanHT-12 V3.0 expression beadchip). GSE31348 contained 27 subjects (135 samples) in five time point: diagnosis, treatment for 1, 2, 4 and 26 weeks, which is based on GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) .
Identification of DEGs
Based on the microarray platform annotation, probe sets were converted into the corresponding gene symbol for the following analysis. Probe sets without corresponding gene symbols were removed. The DEGs between control and PTB samples were analyzed using limma (linear models for microarray data) package in R. |log2FC (fold change)| > 1 and adj. P-value < 0.05 were considered as statistically significant threshold for the DEGs selection of GSE34608. |log2FC| > 0.585 and adj. P-value < 0.05 were considered as statistically significant threshold for the DEGs selection of GSE83456. The co-existed DEGs were identified by drawing the venn diagram of DEGs of GSE34608 and GSE83456.
KEGG and GO enrichment analyses of DEGs
Metascape (a gene annotation & analysis resource; http://metascape.org/) is online gene functional annotation tool to provide a comprehensive set of biological information of genes and proteins . To understand the function of DEGs, Gene Oncology (GO) analysis, including biological process (BP), cellular components (CC), molecular function (MF), and KEGG pathway enrichment were performed using Metascape.
PPI network construction and module analysis
In the present study, the PPI network was predicted using the Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org) (version 11.0) online database . The cut off value for STRING analysis is 0.04. Analyzing the functional interactions between proteins may provide insights into the biological mechanisms of action. PPI network were further visualized and analyzed with Cytoscape (version 3.4.0) plug-in APP Molecular Complex Detection (MCODE), which is used for clustering a given network based on topology. The most critical modules in the PPI network could be identified. The genes in top one module was displayed in this study. The hub gene selection criteria were as follows: MCODE scores > 10, degree > 20, neighborhood connectivity > 10.
Hub genes analysis
The GO function, pathway, and protein domains of the top module were analyzed using STRING. The expression levels of hub genes were further validated in datasets GSE19439 and GSE31348.
Identification of DEGs
Gene expression profile of GSE34608 and GSE83456 were downloaded from GEO database. The microarray data GSE34608 contains 18 control and 8 PTB patients. The GSE83456 data contains 61 control samples and 45 PTB samples. PCA plots of both datasets indicated the distinction expression of control and PTB samples (Fig. 1a and b). 2214 and 1025 DEGs were identified from GSE34608 and GSE83456 datasets, respectively (Fig. 2a). Venn diagram demonstrated that, among the 180 shared DEGs, 51 genes were down-regulated and 129 genes were upregulated in both datasets (Fig. 2a).
GO enrichment and KEGG pathway analyses
Candidate DEGs functional Gene Ontology (GO) and pathway enrichment analyses were performed with Metascape. The results showed that DEGs were significantly enriched in defense response to other organism, response to bacterium, myeloid leukocyte activation, cytokine production, positive regulation of defense responses, cytokine-mediated signaling pathway, interferon signaling, etc. (Fig. 2b).
The subset of representative terms of gene function analysis were converted into a network layout in Metascape, as shown in Fig. 2c. Based on gene function analysis, all the significant terms were hierarchically clustered into a tree based on Kappa-statistical similarities. Each term is represented by a circle node, where its size is proportional to the number of input genes fall into that term. The color represents its cluster identity (Fig. 2c). Terms with a kappa score > 0.3 are linked by an edge. The statistically significant range of the node is marked by color range (Fig. 2c).
PPI network construction and module analysis
The PPI network of 180 DEGs was constructed using the STRING online database, and further analyzed using app MCODE in Cytoscape software. Totally, seven modules were identified shown in Table 1. Module 1 from the PPI network complex contained 35 genes, indicating the core functional gene panel. GO analysis of 35 genes showed that their functions are related to defense response and cytokine related pathway (Fig. 3a). PPI network of module 1 was redrawn by STRING (Fig. 3b). The expression level of 35 genes in dataset GSE34608 were shown in Fig. 3c. Genes CD27, CCR7, CD19, and CXCR3 were significantly down-regulated in PTB samples, where other genes were upregulated. This result was consistent with the gene expression levels in GSE83456 (Fig. 3d). Furthermore, function analysis from STRING database were shown in Table 2. GO function was significantly related to defense response and immune system response (biological process function), chemokine binding and chemokine receptor activity (molecular function), and external side of plasma membrane and cell surface (cellular component). KEGG and Reactome pathways indicated that these 35 genes were involved in cytokine-cytokine receptor interaction, Toll-like receptor signaling pathway, immune system, and cytokine signaling in immune system. TIR domain, leucine rich repeat, and chemokine receptor family were the three important features revealed by PFAM and INTERPRO protein domains analysis (Table 2).
Hub genes analysis
A total of 14 genes were selected as hub genes based on criteria MCODE (scores > 10, degree > 20, neighborhood connectivity > 10) in Table 3. All the hub genes were belonging to the module 1. These hub genes were significantly associated with Toll-like receptors, interferon-induce proteins, and chemokine receptors (Table 3). Among them, two genes were upregulated, whereas others were downregulated (Fig. 3c). The expression levels were further validated in dataset GSE19439 (Fig. 4). The expression levels were also significantly different between health and PTB patients, except gene CD19 and CXCR3 (Fig. 4).
Gene expression level detection during PTB treatment
To figure out the expression level changes during PTB treatment process, dataset GSE31348 were used to evaluate the change level. GSE31348 contained the 27 PTB patients, including 135 samples from 5 time points. Heatmap showed that the expression level of genes related with the functions (Fig. 5a). The expression level of CCR7, CD19, and CXCR3 were significantly increased, whereas the expression level of Interferon-induced proteins, Toll-like receptors were decreased during the treatment (Fig. 5a). Among these 14 genes, the expression level of CXCR3 were significantly increased, and TLR2 and TLR5 were significantly decreased during the PTB treatment (Fig. 5b). These three genes might have a potential to evaluate PTB as a gene panel.
In this study, we analyzed two GEO datasets GSE34608 and GSE83456 to identify hub genes related to PTB disease. Totally, 180 DEGs were identified by combining these two GEO datasets (Fig. 2). With STRING protein-protein interaction data, 14 hub genes were identified (Figs. 3 and 4). The function of these 14 hub genes were chemokine receptors, interferon-induced proteins and Toll-like receptors (Tables 2 and 3). The signature of hub genes are significantly correlated with PTB infection as well as anti-PTB therapy (Fig. 5).
Our study reveals the core genes in response to Mtb infection. The gene expression profile in PTB patients conforms to the common inflammatory responses upon viral and bacterial infections . Chemokine receptors, interferon-induced proteins, and Toll-like receptors were involved in this core response profiling, and significantly changed following successful treatment.
Chemokines play a major role in the host response to Mtb infection as they contribute to the formation and maintenance of quiescent granulomas and the establishment of the TB granuloma. High concentrations of cytokines and chemokines are required for early protection against Mtb infections, but may also be involved in the over-response of host immune system . Plasma cytokines can serve as biomarkers for the disease severity, and function to relieve the mycobacterial burden in PTB disease . Consistent with our analyses, a previous study has reported that PTB patients displayed significantly elevated levels of CCL1, CCL3, CXCL1, CXCL10 and CXCL11 which were significantly reduced following successful treatment .
Type I interferon response pathway is a well-established pathway crucial for the defense against viral pathogens, but it could also be detrimental upon infection with mycobacteria . Although the signaling axis through this pathway is identical regardless of the type of infection, the outcome is substantially different, suggesting that the type I IFNs and the related IFN-inducible genes are able to create a favorable or unfavorable intracellular milieu to promote or disrupt the survival of invading pathogens . Changes in mRNA levels of IFIT1, IFIT2, CXCR3 and CD163 have been validated by qPCR in previous studies [17,18,19]. Moreover, Kim et al.  confirmed that five genes (IFN-γ, TNF-α, IL-2R, CXCL9, and CXCL10) could be used for the detection of Mtb infection, including active PTB disease and LTB with sensitivity of each gene above 80%. The gene panel revealed in this study provides a more comprehensive network for selecting diagnostic biomarkers. However, it needs to be further tested in other infectious diseases to figure out the transcriptional signature specific to PTB disease or shared with other types of infections.
The essential role of Toll-like receptors against mycobacterial infection has been revealed in vivo. Toll-like receptors play key roles in innate and adaptive immunity against Mtb, and are involved in the recognition of conserved microbial structures, leading to activation of an inflammatory response. Previous study showed that TLR3 and TLR5 were upregulated at 24, 48 and 72 h post-infection in A549 pulmonary epithelial cells treated with Mtb , and the expression of neutrophil TLR2 is also increased in PTB patients . Whole blood from patients had increased mRNA levels of TLR1 and TLR2 . TLR2-deficient mice showed increased subsequent progression to PTB disease, the rapid death and higher Mtb burden . TLR2 may function as a regulator of inflammation, and its absence exacerbated the detrimental inflammatory response. TLR1 rs5743551 and rs5743618 polymorphisms significantly increased under the Mtb infection in 203 PTB patients, compared to 203 healthy subjects . TLR8 polymorphisms rs3764879 and rs3764880 have also been reported to cause differential sensitivity to Mtb infection by specific strains .
Gene function can be regulated at multiple levels. Integrated multi-omics analysis provides a better approach to understand the comprehensive biological processes in human diseases. By integrating transcriptomics, proteomics and metabolomics, Zhao et al.  revealed the molecular link between lipid metabolism and inflammatory response in chronic obstructive pulmonary disease (COPD) treated with a Chinese medicine Bufei Jianpi Formula in a rat model. Similar strategies were also applied to study the Laser Printer-Emitted Nanoparticles (PEPs) inhalation exposure-induced disease risks to identify metabolite biomarkers . In Mtb research, metabolomics has been applied to investigate the metabolic traits in Mtb species [28, 29]. However, omics study other than transcriptomics in the host system under Mtb infection is limited. Changes in host proteome and metabolome after Mtb infection need to be further investigated to provide a more detailed landscape to understand the molecular mechanism of PTB disease.
In summary, we construct a refined gene network representing the transcriptome signature in response to Mtb infection and its treatment. The identified 14 hub genes are promising biomarkers for developing transcriptome-based PTB diagnostic or prognostic tests. As major regulators of the host immune response to Mtb infection, these genes are also potential molecular targets for developing drugs to treat PTB.
Availability of data and materials
Datasets including GSE34608, GSE83456, GSE19439 and GSE31348 were downloaded from NCBI Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo).
Long B, Liang SY, Koyfman A, Gottlieb M. Tuberculosis: a focused review for the emergency medicine clinician. Am J Emerg Med. 2020;38(5):1014-22.
Tahan TT, Gabardo BMA, Rossoni AMO. Tuberculosis in childhood and adolescence: a view from different perspectives. J Pediatr (Rio J). 2020;96(Suppl 1):99-110.
Yong YK, Tan HY, Saeidi A, Wong WF, Vignesh R, Velu V, Eri R, Larsson M, Shankar EM. Immune biomarkers for diagnosis and treatment monitoring of tuberculosis: current developments and future prospects. Front Microbiol. 2019;10:2789.
Walzl G, McNerney R, du Plessis N, Bates M, McHugh TD, Chegou NN, Zumla A. Tuberculosis: advances and challenges in development of new diagnostics and biomarkers. Lancet Infect Dis. 2018;18(7):e199–210.
Qin XB, Zhang WJ, Zou L, Huang PJ, Sun BJ. Identification potential biomarkers in pulmonary tuberculosis and latent infection based on bioinformatics analysis. BMC Infect Dis. 2016;16(1):500.
Blankley S, Graham CM, Turner J, Berry MP, Bloom CI, Xu Z, Pascual V, Banchereau J, Chaussabel D, Breen R, et al. The transcriptional signature of active tuberculosis reflects symptom status in extra-pulmonary and pulmonary tuberculosis. PLoS One. 2016;11(10):e0162220.
Maertzdorf J, Weiner J 3rd, Mollenkopf HJ, Network TB, Bauer T, Prasse A, Muller-Quernheim J, Kaufmann SH. Common patterns and disease-related signatures in tuberculosis and sarcoidosis. Proc Natl Acad Sci U S A. 2012;109(20):7853–8.
Berry MP, Graham CM, McNab FW, Xu Z, Bloch SA, Oni T, Wilkinson KA, Banchereau R, Skinner J, Wilkinson RJ, et al. An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis. Nature. 2010;466(7309):973–7.
Cliff JM, Lee JS, Constantinou N, Cho JE, Clark TG, Ronacher K, King EC, Lukey PT, Duncan K, Van Helden PD, et al. Distinct phases of blood gene expression pattern through tuberculosis treatment reflect modulation of the humoral immune response. J Infect Dis. 2013;207(1):18–29.
Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, Chanda SK. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10(1):1523.
Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017;45(D1):D362–8.
Shi C, Pamer EG. Monocyte recruitment during infection and inflammation. Nat Rev Immunol. 2011;11(11):762–74.
Slight SR, Khader SA. Chemokines shape the immune responses to tuberculosis. Cytokine Growth Factor Rev. 2013;24(2):105–13.
Kumar NP, Moideen K, Banurekha VV, Nair D, Babu S. Plasma proinflammatory cytokines are markers of disease severity and bacterial burden in pulmonary tuberculosis. Open Forum Infect Dis. 2019;6(7):ofz257.
McNab F, Mayer-Barber K, Sher A, Wack A, O'Garra A. Type I interferons in infectious disease. Nat Rev Immunol. 2015;15(2):87–103.
Desvignes L, Wolf AJ, Ernst JD. Dynamic roles of type I and type II IFNs in early infection with Mycobacterium tuberculosis. J Immunol. 2012;188(12):6205–15.
Hare NJ, Chan B, Chan E, Kaufman KL, Britton WJ, Saunders BM. Microparticles released from Mycobacterium tuberculosis-infected human macrophages contain increased levels of the type I interferon inducible proteins including ISG15. Proteomics. 2015;15(17):3020–9.
Liu Q, Ou Q, Chen H, Gao Y, Liu Y, Xu Y, Ruan Q, Zhang W, Shao L. Differential expression and predictive value of monocyte scavenger receptor CD163 in populations with different tuberculosis infection statuses. BMC Infect Dis. 2019;19(1):1006.
Schierloh P, Landoni V, Balboa L, Musella RM, Castagnino J, Morana E, de Casado GC, Palmero D, Sasiain MC. Human pleural B-cells regulate IFN-gamma production by local T-cells and NK cells in a Mycobacterium tuberculosis-induced delayed hypersensitivity reaction. Clin Sci (Lond). 2014;127(6):391–403.
Kim S, Lee H, Kim H, Kim Y, Cho JE, Jin H, Kim DY, Ha SJ, Kang YA, Cho SN, et al. Diagnostic performance of a cytokine and IFN-gamma-induced chemokine mRNA assay after Mycobacterium tuberculosis-specific antigen stimulation in whole blood from infected individuals. J Mol Diagn. 2015;17(1):90–9.
Mvubu NE, Pillay B, McKinnon LR, Pillay M. Mycobacterium tuberculosis strains induce strain-specific cytokine and chemokine response in pulmonary epithelial cells. Cytokine. 2018;104:53–64.
Nancy Hilda J, Das S. Neutrophil CD64, TLR2 and TLR4 expression increases but phagocytic potential decreases during tuberculosis. Tuberculosis (Edinb). 2018;111:135–42.
Chang JS, Huggett JF, Dheda K, Kim LU, Zumla A, Rook GA. Myobacterium tuberculosis induces selective up-regulation of TLRs in the mononuclear leukocytes of patients with active pulmonary tuberculosis. J Immunol. 2006;176(5):3010–8.
Drennan MB, Nicolle D, Quesniaux VJ, Jacobs M, Allie N, Mpagi J, Fremond C, Wagner H, Kirschning C, Ryffel B. Toll-like receptor 2-deficient mice succumb to Mycobacterium tuberculosis infection. Am J Pathol. 2004;164(1):49–57.
Salie M, Daya M, Lucas LA, Warren RM, van der Spuy GD, van Helden PD, Hoal EG, Moller M. Association of toll-like receptors with susceptibility to tuberculosis suggests sex-specific effects of TLR8 polymorphisms. Infect Genet Evol. 2015;34:221–9.
Zhao P, Li J, Li Y, Tian Y, Yang L, Li S. Integrating Transcriptomics, proteomics, and metabolomics profiling with system pharmacology for the delineation of Long-term therapeutic mechanisms of Bufei Jianpi formula in treating COPD. Biomed Res Int. 2017;2017:7091087.
Guo NL, Poh TY, Pirela S, Farcas MT, Chotirmall SH, Tham WK, Adav SS, Ye Q, Wei Y, Shen S, et al. Integrated transcriptomics, metabolomics, and lipidomics profiling in rat lung, blood, and serum for assessment of laser printer-emitted nanoparticle inhalation exposure-induced disease risks. Int J Mol Sci. 2019;20(24):6348.
Olivier I, du Loots T. A metabolomics approach to characterise and identify various Mycobacterium species. J Microbiol Methods. 2012;88(3):419–26.
du Preez I, du Loots T. Altered fatty acid metabolism due to rifampicin-resistance conferring mutations in the rpoB gene of Mycobacterium tuberculosis: mapping the potential of pharmaco-metabolomics for global health and personalized medicine. OMICS. 2012;16(11):596–603.
The study was supported by the National Natural Science Foundation of China (No. 81722007) and National Health Commission of China (No. 2017ZX10304402001–008), which both supported the methodology training and provided the stipend for the authors.
Ethics approval and consent to participate
This study conformed to ethical instructions of Renmin Hospital of Wuhan University.
Consent for publication
There is no confliction of interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Li, L., Lv, J., He, Y. et al. Gene network in pulmonary tuberculosis based on bioinformatic analysis. BMC Infect Dis 20, 612 (2020). https://doi.org/10.1186/s12879-020-05335-6
- Pulmonary tuberculosis
- Mycobacterium tuberculosis
- Protein-protein interaction
- Hub genes
- Bioinformatic analysis