Spoligotyping based genetic diversity of Mycobacterium tuberculosis in Ethiopia: a systematic review

Background Understanding the types of strains and lineages of Mycobacterium tuberculosis (M. tuberculosis) circulating in a country is of paramount importance for tuberculosis (TB) control program of that country. The main aim of this study was to review and compile the results of studies conducted on strains and lineages of M. tuberculosis in Ethiopia. Methods A systematic search and review of articles published on M. tuberculosis strains and lineages in Ethiopia were made. PubMed and Google Scholar databases were considered for the search while the keywords used were M. tuberculosis, molecular epidemiology, molecular typing spoligotyping and Ethiopia. Result Twenty-one studies were considered in this review and a total of 3071 M. tuberculosis isolates and 3067 strains were included. These studies used spoligotyping and identified five lineages including Indo-Ocean, East Asian/Beijing, East African-Indian, Euro-American and Ethiopian in a proportion of 7.1%, 0.2%, 23.0%, 64.8%, and 4.1%, respectively. Thus, Euro-American was the most frequently (64.8%) occurring Lineage while East Asian was the least (0.2%) frequently occurring Lineage in the country. Surprisingly, the Ethiopian Lineage seemed to be localized to northeastern Ethiopia. In addition, the top five clades identified by this review were T, CAS, H, Manu and Ethiopian comprising of 48.0%, 23.0%, 11.0%, 6.0% and 4.1% of the strains, respectively. Furthermore, predominant shared types (spoligotype patterns) identified were SIT149, SIT53, SIT25, SIT37, and SIT21, each consisting of 420, 343, 266, 162 and 102 isolates, respectively, while, on the other hand, 15% of the strains were orphan. Conclusion According to the summary of the results of this review, diversified strains and lineages of M. tuberculosis were found in Ethiopia, and the frequencies of occurrence of these strains and lineages were variable in different regions of the country. This systematic review is registered in the PRISMA with the registration number of 42017059263. Electronic supplementary material The online version of this article (10.1186/s12879-018-3046-4) contains supplementary material, which is available to authorized users.


Background
M. tuberculosis, the causative agent of TB is still the major health problem especially in low-income countries pausing significant morbidity and mortality [1,2]. According to World Health Organization (WHO) report of 2016, globally there were an estimated 10.4 million new TB cases and 1.4 million TB deaths in the year 2015. Similarly, 11% of people who developed TB were HIV positive and 31% of TB cases in African Region was estimated to live with HIV. The report also indicated that the African Region had 26% of the world's TB cases, but carries the most severe burden relative to population [3].
Ethiopia is also among 30 high burden countries (HBCs) and TB is one the leading causes of mortality amongst the communicable diseases in the country. In 2015, the incidence of all forms of TB was 192 per 100,000 population. Moreover, Ethiopia is also one of the high TB/HIV and multi-drug resistant TB (MDR-TB) burden countries [3]. WHO estimated that the incidence of HIV/TB co-infection was 16 per 100,000 population [3]. The emergence of drug-resistant (DR), multidrugresistant (MDR) and extreme drug resistant (XDR) strains are the greatest threat to the TB control program [4]. In Ethiopia, based on the latest national anti-TB drug susceptibility surveillance report, a significantly higher proportion of MDR TB cases were reported among previously treated cases (17.8%) compared to newly diagnosed TB cases (2.3%) [5].
It is highlighted that, in addition to the problem associated with HIV co-infection and emergence of drug resistance, the continued TB transmission is highly linked to the missed cases or undiagnosed TB cases as a result of inefficient case detection and or poor diagnosis capacity [6], ineffective vaccine [7] and prolonged anti-TB treatment [4]. On the other hand, it is becoming clear that the outcome of TB infection is strangely diverse, ranging from active pulmonary TB to latent infection and disseminated extra-pulmonary TB [8]. This diverse TB infection outcome is attributed to the dynamics of host-pathogen-environment interaction. M. tuberculosis is transmitted by inhalation and once the bacterium gets access to the lung it is phagocytosed by the predominant phagocytic cells, the resident alveolar macrophages. Phagocytosis of M. tuberculosis by macrophages triggers the initiation and production immune response, that ultimately help in the containment of infection through granuloma formation [9,10].
About half of the exposed individuals to M. tuberculosis will be infected; but only 1 in 10 of those infected develops active pulmonary diseases, suggesting that there are differences in the susceptibility or resistance to diseases development among individuals [11]. Certain clinical conditions including immunodeficiency, the presence of co-infections, malnutrition, individuals with compromised lungs, and other chronic illnesses may increase susceptibility to TB diseases compared to healthy individuals [12,13]. Moreover, the outcome of TB infection depends on the complex interaction between the host, the agent, and the environment [14].

Genomics of M. tuberculosis
Human TB is predominantly caused by M. tuberculosis which is a member of M. tuberculosis complex (MTBC) know by identical 16S rRNA sequences and 99.9% similarity at nucleotide level [15]. Briefly, the members of MTBC includes M. tuberculosis, M. africanum, M. canneti, M. microti, and M. bovis. The members of the complex are also known by slow-growing nature with doubling time ranging from 12 to 24 h which of course affected by pathogen character and environmental factors. Despite the genotypic similarity amongst the members of MTBC, they differ greatly in terms of their ability to cause disease, host preferences, and phenotypic characteristics [15].
Based on complete genome analysis, the M. tuberculosis found to have close to 4.4Mbp genome and consists of 65.5% GC. Deciphering the whole genome of M. tuberculosis has provided new concepts on the understanding of the properties and history of the bacilli [16]. Compared to other bacteria, M. tuberculosis is also known by relatively highly clonal, no horizontal transfer, and low mutation and recombination rates [16].
The tendency to transmit and cause disease varies based on its phylogeny which consists of four major lineages (L-1 Indo Oceanic, L-2 East-Asian, L-3 East-African-Indian, L-4 Euro-American and more recently L-7 Ethiopian) [9,17,18]. For instance, the Beijing family has got a great deal of attention compared to other families of M. tuberculosis because some study demonstrated its association with drug resistance [19] and some experimental models have demonstrated its hypervirulence [20]. On the other hand, other humanassociated M. tuberculosis lineages have been given less attention partly because of the lack of standardized and phylogenetically robust classification system and associated nomenclature [18].
Molecular typing of M. tuberculosis is applied to study the type of strain circulating, distinguishing relapse strain versus re-infection, or detecting laboratory cross-contamination of M. tuberculosis strain [18] and overall evaluating the TB control programmes. With this aim, the currently available M. tuberculosis genotyping tools are not equally appropriate. For instance, restriction fragment length polymorphism (RFLP) analysis which bases on the monitoring the number of insertion sequence IS6110 in the chromosome which varies among different strains [21] and mycobacterial interspersed repetitive units-variable tandem repeats of DNA tandem repeats (MIRU-VNTRs) which rely on measuring repetitive DNA elements [22] are tools known by having relatively highly discriminatory power compared to spoligotyping that detects polymorphisms present in a direct repeat (DR) locus [23]. Although spoligotyping is also prone to homoplasy as individual spacers can be deleted independently in phylogenetically unrelated strains, it has several large international databases that compiled thousands of clinical isolates from many countries [18,24].
Most recently, large sequence polymorphisms (LSPs) and single nucleotide polymorphisms (SNPs) have been introduced primarily to study the phylogenetic and strain classification of M. tuberculosis [24]. Above all, whole genome sequencing (WGS) is hoped to be the next gold standard for molecular epidemiology of MTBC because it has demonstrated a much higher discriminatory power than the standard genotyping tools despite its high cost and requirement of bioinformatics capacity to analyze the data [18].
In Ethiopia, there are several studies conducted to understand the transmission dynamics and types of stains involved in causing TB [17,. However, the majority of these reports use different methodologies and as a result, it is difficult to understand the clear picture of transmission dynamics and strains involved in causing TB in the country. And also a systematic review of current genotypes has not been performed before. Hence, in order to analyze the circulating M. tuberculosis strains from Ethiopia, we reviewed studies that determined the genetic diversity of M. tuberculosis strains in patients with either pulmonary TB or extrapulmonary TB from different Regions of Ethiopia.

Methods
PubMed and Google Scholar Databases were used to identify studies with no limitation to the language and year of publication. The last search was conducted on March 27, 2017, using terms: Mycobacterium tuberculosis AND molecular epidemiology OR molecular typing OR spoligotyping AND Ethiopia. The analysis was performed based on the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) Statement [53].
The screening of articles was performed based on their relevance of the title, abstract and manuscript review. In order to minimize the risk of bias, the data was extracted from the selected studies and inserted into a data sheet by one reviewer and across studies, genotype frequency comparison was performed only when the same genotyping method was used.
This review was conducted on studies published on genetic diversity of M. tuberculosis isolates from different regions of the Ethiopian population. Ethiopia is a country situated between 8 0 N and 38°E coordinates in the Horn of Africa. The country shares border with Kenya to the south, Eritrea to the north and northeast, Djibouti and Somalia to the east and Sudan and South Sudan to the west. The Country is one of the most populous landlocked country in the world, as well as the second most populous country in Africa next to Nigeria with a population close to 100 million people. The country is also known by a home diverse nation and nationalities with more than 80 ethnolinguistic groups. The three largest nations are Oromo, Amhara, and Somali (https://en.wikipedia.org/wiki/Ethiopia).
The following major information was extracted from each study: 1) type of study participants (including pulmonary or extrapulmonary TB from Ethiopian regions, whether the participants were primary TB patients or re-treatment patients); 2) characteristics of genetic diversity (including region, year, number of strains and population); 3) type of intervention (type of genotyping method used: spoligotyping); 4) type of outcome measures (clustering rate, shared types, lineages, and frequency of novel genotypes).
Three different analysis was performed. First, we obtained the frequency of shared types found in each study and compared it to Ethiopian regions. Clustering rates were also determined (clustered shared type, representing two or more identical shared type, representing two or more identical shared type found within study/region; unique, representing a single shared type found within study/region). Moreover, a multi-marker database for M. tuberculosis (SITVITWEB) and TBinsight were also used to supplement for those studies which do not include spoligotype description and frequency.

Result
We performed a qualitative synthesis of the results for the genotyping and their outcome measures because the studies we found varied significantly based on the study design used, types of study participants and methods used for genotyping. As a result, the scope of this review primarily focuses on the results of the studies, their applicability and their limitations rather than metaanalysis. The main inclusion criteria used were entitled strains from Ethiopia with genetic diversity analysis. We selected studies published in English from Ethiopia without any time limitations. A total of 21 studies were selected for the purpose of this review (Fig. 1). In these studies, a total of 11,365 TB patients were involved and 3067 M. tuberculosis strains were obtained. A total of 21 studies involving 3067 clinical strains were identified for inclusion in the review. A total of 133 citations were identified from PubMed and Google Scholar databases. Of these, 30 studies were selected which used genotyping analysis by Spoligotyping after reviewing the title and abstracts. Additional screening was conducted in the full text and journal quality was considered and additionally, 9 studies were excluded because of repeated publication and data were not available In understanding strain diversity of M. tuberculosis, whole genome sequencing is believed to offer many advantages compared to the classical methods like RFLP, spoligotyping and MIRU-VNTR studies [18]. Nevertheless, whole genome sequencing remains relatively costly and requires advanced capacity ranging from wet lab to big data analysis [18]. In line with this, there were only a few studies reported genome sequencing of M. tuberculosis and even MIRU-VNTR techniques in Ethiopia [17,41,50]. Recently, Firdessa et al. sequenced four M. tuberculosis isolates out of 36 isolates with unusual pattern (missing spacers 4-24) using (Illumina Inc., San Diego, CA, USA) and found to be members of L-7 localized between ancient lineage 1 and modern lineages 2, 3 and 4 of M. tuberculosis phylogeny [17]. Similarly, 30 L-7 strains from the Northern part of Ethiopia were sequenced and identified over 800 mutations specific to the lineage with a total of 22,346 bp deletions [50].
On the other hand, the highest number of strains which were not registered in the global database (new/orphan) were reported from the study at Oromia (45.8%) [36]. A high clustering rate has been reported from Ethiopia which ranges from 31.2% [49] to 85.9% [32]. Consistently, two studies on drugresistant strains from Addis Ababa city administration also reported very high clustering rates 80.4% [28] and 85.9% [32].

Spoligotyping results of M. tuberculosis strains from Ethiopia
There were only very few studies reported using methods like RFLP-IS6110 (2 studies), WGS (3 studies) and MIRU-VNTR (3 studies). As a result, for the purpose of this review, we only included studies that reported based on the spoligotyping techniques (Table 1). A total of 21 studies have used spoligotyping for genetic diversity analysis (n = 3071). In the Additional file 1, a description of the M. tuberculosis shared types (n = 2596) in Ethiopia so far from nine regional states and two federal cities of Ethiopia were listed. Regarding the diversity and proportions of shared strains reported from The Regions, the highest proportion of strains were reported from Amhara region (AM) (36.0%) followed by Oromoia (OR) (28.5%), Addis Ababa (AA) city administration (21.0%), and South Nation and Nationalities Peoples Region (SNNPR) (10.0%). There were only four strains from Tigray (TG), one strain each from Dire Dawa (DD) and Harari (HR) regions reported in the shared spoligotyping pattern. Unfortunately, there were no reported shared strains from Gambella region. The largest shared spoligotyping pattern was from Amhara (AM) region followed by Oromia (OR) and Addis Ababa (AA) regions ( Figs. 2 and 3). On the other hand, the total of 15.0% of the reported strains was found to be orphan. According to the spoligotyping analysis, the identified shared types of M. tuberculosis were classified into four major lineages namely L-1 or Indo-Oceanic, L-2 or East Asian, L-3 or East African India, L-4 or Euro-America and L-7 or Ethiopian ( Fig. 3 and Table 2).
The most predominant clades identified to date include the T clade, the Central-Asian (CAS) clade, Haarlem (H) clade, Manu clade, recently investigated Ethiopian clade (Lineage 7) and Latin American-Mediterranean (LAM) clade which makes up about 96% of the reported clades so far. The T and CAS genotype shared the major clades in almost all part of the country except no information from Gambella region (Fig. 4).
In Table 3 above, a comparison of dominant isolates (> 2.0%) between our study and those previously reported in the SITVIT2 database from Ethiopia, neighboring countries in the regions and sub-regions was made. The result showed that the SIT53 (T1) and SIT26   a Clade designations according to STVIT2 and TBinsight database: Beijing clade, East African-Indian (EAI) clade and its sub-lineages, Haarlem (H) clade and its sub-lineages, Latin American-Mediterranean (LAM) clade and its sub-lineages, the ancestral "Manu" family and 3 sub-lineages, the IS6110-low-binding X clade and its sub-lineage, and an ill-defined T clade with its sub-lineages, U: Unknown patterns (CAS-Delhi) were the most widely distributed isolates the regions. On the other hand, SIT910 (ETH) and 289 (CAS1-Delhi) have limited distribution in the African region (Table 3).

Discussion
M. tuberculosis genotypes reported from Ethiopia to date primarily belonged to the four major lineages namely; L-1 associated with populations living around the Indian Ocean, L-3 associated with populations from Central Asia also common in East Africa, L-4 associated with a widespread Euro-American lineage [41] and more recently a new L-7 belonged to Ethiopian people has been identified [17]. The investigation of the new L-7 has strongly supported the existence of TB in the area or in the continent long before the European colonial contact. Moreover, the existence of TB in the area also contributed to the rejection of the "virgin soil" hypothesis of TB in Sub-Saharan Africa. According to "virgin soil" hypothesis TB epidemic in the African region was due to European contact during the colonial period as it was originally free of TB [41]. In this review, only very few studies were found on RFLP-IS6110 [25,26], MIRU-VNTR [35,40,48,49] and genome sequencing [17,41,50]. As a result, the evidence generated is acceptable enough for spoligotyping. Moreover, from the reported strains so far in the country, very few strains (< 5%) were from Benishangul-Gumz region, Harari region, Tigray region and from pastoral communities of Ethiopian Somali, Oromia, and SNNPR regions. This might also compromise the complete understanding of the strain diversity and transmission dynamics of M. tuberculosis in Ethiopia. These regions are also known by active trans-country movements and source of refugees that could play its role in the dynamicity of TB transmission in the country. Thus, this highlights a further systematic investigation of the epidemiology of M. tuberculosis in the regions. Nevertheless, from the spoligotyping result, it is possible to understand that Ethiopia is a home for a complex genetic diversity of TB ranging from ancient to new or modern TB lineages. The presence of high clustering rates for spoligotypes suggests that a high transmission rate of M. tuberculosis clone exists in the country. However, a study supported by Geographic Information System (GIS) mapping of cluster position of a strain reported that majority of the cluster strains are far apart [44]. Thus, GIS mapping supported cluster position studies are recommended in order to have a clear understanding of the ongoing M. tuberculosis transmission dynamics. The detection of a high number of previously unreported strains in the TB global databases requires further genotyping analysis to be conducted in Ethiopia.
Even though its prevalence is very low, the STI1 genotype which belongs to the Beijing genotype of East-Asian lineage has been reported from major regions of the country including Amhara, Oromia, and SNNPR and from the capital Addis Ababa. Besides its worldwide distribution, the Beijing strains have been linked to an increased virulence, drug resistance and ability to spread. Thus, it is important to monitor the distribution through continues surveillance and reporting.

Strength and limitations
This review has explored all published articles on the genotype of M. tuberculosis identified from Ethiopian population without time and language limitation. As a result, the finding has generated a good picture of the distribution of lineages and strains across the country. However, in some regions only very few strains were reported and we found it difficult to generalize. The scope of this review also primarily focuses on the spoligotyping results of the studies and hence, its applicability is limited to spoligotyping method.

Conclusion
This is the first systematic review of genetic diversity of M. tuberculosis strains from Ethiopia. The lack of genotyping information available for clinical isolates of M. tuberculosis from the developing regions (Benishangul-Gumz, Gambella, Afar, Ethiopian Somali) and pastoral areas of the country specifically using MIRU-VNTR genotyping and or genome sequencing requires due attention for further analysis. The detection of the high number of spoligotypes previously in the global database highlights  the need for additional analysis to be conducted in Ethiopia. Most importantly, further information regarding the risk of clustering of strains using GIS supported information and the status of Beijing strains in Ethiopia needs to be monitored. Some strains are also known to be associated with drug resistance and this also requires attention in future analysis. Furthermore, additional studies are needed to evaluate the transmission dynamics and drugresistant TB in the country.

Additional file
Additional file 1: Description of M. tuberculosis shared types reported for TB strains isolated in Ethiopia from nine regional states and two city administrations (n = 2596). a In the SITVIT2 database, the spoligo international type (SIT) numbers designate spoligotypes shared by two or more patient isolates. In contrast, "orphan" designates patterns reported for a single isolate. b Clade designations according to STVIT2 database: Beijing clade, East African-Indian (EAI) clade and 9 sub-lineages, Haarlem