Molecular epidemiology and transmission dynamics of Mycobacterium tuberculosis in Northwest Ethiopia: new phylogenetic lineages found in Northwest Ethiopia

Background Although Ethiopia ranks seventh among the world’s 22 high-burden tuberculosis (TB) countries, little is known about strain diversity and transmission. In this study, we present the first in-depth analysis of the population structure and transmission dynamics of Mycobacterium tuberculosis strains from Northwest Ethiopia. Methods In the present study, 244 M. tuberculosis isolates where analysed by mycobacterial interspersed repetitive unit - variable number tandem repeat 24-loci typing and spoligotyping methods to determine phylogenetic lineages and perform cluster analysis. Clusters of strains with identical genotyping patterns were considered as an indicator for the recent transmission. Results Of 244 isolates, 59.0% were classified into nine previously described lineages: Dehli/CAS (38.9%), Haarlem (8.6%), Ural (3.3%), LAM (3.3%), TUR (2.0%), X-type (1.2%), S-type (0.8%), Beijing (0.4%) and Uganda II (0.4%). Interestingly, 31.6% of the strains were grouped into four new lineages and were named as Ethiopia_3 (13.1%), Ethiopia_1 (7.8%), Ethiopia_H37Rv like (7.0%) and Ethiopia_2 (3.7%) lineages. The remaining 9.4% of the isolates could not be assigned to the known or new lineages. Overall, 45.1% of the isolates were grouped in clusters, indicating a high rate of recent transmission. Conclusions This study confirms a highly diverse M. tuberculosis population structure, the presence of new phylogenetic lineages and a predominance of the Dehli/CAS lineage in Northwest Ethiopia. The high rate of recent transmission indicates defects of the TB control program in Northwest Ethiopia. This emphasizes the importance of strengthening laboratory diagnosis of TB, intensified case finding and treatment of TB patients to interrupt the chain of transmission.


Background
Despite the existence of anti-tuberculosis drugs for the last 60 years, tuberculosis (TB) continues to be a major threat worldwide. In 2009, WHO estimated the global incidence of TB with 9.4 million cases. Most of the estimated number of TB cases occurred in Asia (55%) and Africa (30%). The 22 high burden tuberculosis countries account for 81% of all estimated cases worldwide [1]. Ethiopia ranks seventh among the world's 22 highburden tuberculosis countries. The country had 314,267 TB cases in 2007, with an estimated incidence rate of 378 cases per 100,000 population [2]. According to the Ministry of Health hospital statistics data, tuberculosis is one of the leading causes of morbidity, the fourth most common cause of hospital admission, and the second most common cause of hospital death in Ethiopia [3].
Additionally, the countrywide anti-TB drug resistance survey conducted in 2005 showed that the prevalence of multidrug resistant TB (MDR, resistance to at least isoniazid [INH] and rifampicin [RMP]) was 1.6% and 11.8% among new cases and previously treated TB cases, respectively [4]. These data show that the TB epidemic is a significant public health threat in Ethiopia.
Molecular strain typing (genotyping) has contributed significantly to the understanding of TB epidemiology and has helped to improve TB control by providing information on transmission dynamics [5], determining the importance of reactivation versus exogenous reinfection [6], investigating/confirming outbreaks [7], confirmation of laboratory cross contamination [8] and to identify the clonal spread of successful clones, including multi-drug-resistant ones [9]. Furthermore, molecular typing has revealed that the MTBC has a diverse population structure with manifold lineages that show large differences in their geographical occurrence and, also, in their pathobiological properties e.g. development and spread of drug resistance [10].
In Ethiopia, few molecular epidemiological studies have been done so far only in the capital city, Addis Ababa [11][12][13]. Recent data are only available from an MDR strain targeted study from year 2006. However, the strains were investigated by spoligotyping only, allowing neither for high resolution phylogenetic strain classification nor for analysis of transmission dynamics [11].
In this study, we used a combination of mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) typing and spoligotyping methods to investigate a large collection of MTBC strains isolated from patients living in Amhara region, Northwest Ethiopia. In contrast to classical molecular typing methods such as IS6110 DNA fingerprint and spoligotyping, 24-loci MIRU-VNTR genotyping allows for a high-resolution discrimination of isolates for epidemiological studies and a valid phylogenetic strain classification [14]. The data obtained allow for new insights into population structure and transmission dynamics, thus also revealing urgently needed data to improve TB control in Ethiopia.

Study design, area and study period
A total of 260 smear positive pulmonary tuberculosis patients diagnosed at Gondar Hospital, Gondar Health Center, Metemma Hospital, Bahir Dar Hospital and Debre Markos Hospital between March 2009 and July 2009 were included in this study. For all study subjects, information on the socio-demographic data, history of previous tuberculosis treatment, HIV status and the drug susceptibility patterns of the M. tuberculosis isolates was available. The single morning sputum sample and 5 ml venous blood sample were collected prior to commencing TB treatment.
A structured questionnaire was used to classify patients into new and previously treated tuberculosis cases and to collect socio-demographic data of the study subjects. Specimens were stored and transported to the Institute of Medical Microbiology and Epidemiology of Infectious Diseases, University Hospital of Leipzig, Germany as described previously [15] for culture and drug susceptibility testing. The study was approved by the research and publication committee of University of Gondar, Ethiopia. Written informed consent was obtained from all study subjects.

DNA extraction
DNA was extracted from all isolates by heating mycobacterial pellets obtained from liquid culture, suspended in 200 μL 10 mM Tris-HCl, 1 mM EDTA (pH 7.0) buffer at 95°C for 20 minutes followed by 15 minutes sonication in a sonicating water bath. The suspension was centrifuged at 15,000 rpm for 1 minute, and the supernatant was stored at -20°C until used.

Genotyping
All isolates were analyzed by spoligotyping technique as described previously [20] and by 24 -loci MIRU-VNTR genotyping technique as described previously [14]. Briefly, for MIRU-VNTR genotyping, 24 loci were amplified by using the MIRU-VNTR typing kit (Genoscreen, Lille, France). Analyses of the PCR products were performed by using the Rox-labeled MapMarker 1,500 The MIRU-VNTR 24-loci profiles and spoligotyping patterns were used to classify the strains into main phylogenetic lineages by using the reference strain collection and identification tools available online at www. miru-vntrplus.org [21]. Briefly, a stepwise identification procedure was carried out as follows. The strains were first classified by the simple match approach that is based on the best match with strains of the reference database. The cut of distance for lineage assignment was set to 0.17. In a second step, phylogenetic tree identification was carried out. Additionally, for each MIRU-VNTR 24-loci pattern a unique MLVA 15-9 code was assigned by using the MIRU-VNTRplus nomenclature.
Cluster analyses of molecular typing data were performed with the Bionumerics software (version 6.6; Applied Maths, Sint-Martens-Latem, Belgium) according to the manufacturers' instructions. Similarities of genotyping patterns among strains were calculated by using the categorical coefficient. A dendrogram was generated by using the unweighted pair group method with arithmetic averages (UPGMA). Minimum spanning tree analysis was done based on MIRU-VNTR typing data by using the categorical coefficient. For the cluster analysis, a cluster was defined as a minimum of two strains harbouring identical DNA genotyping patterns (using composite data, MIRU-VNTR 24-loci and spoligotyping) from different patients belonging to the study subjects. The recent transmission index (RTI) was calculated as (number of clustered patients -number of clusters)/total number of patients. Determination of the discriminatory power of the genotyping methods (MIRU-VNTR 24-loci typing and Spoligotyping) was calculated using the Hunter-Gaston Discriminatory Index (HGDI) as previously described [22].

Statistical analysis
All laboratory data were entered, cleared and analyzed using SPSS version 13 statistical package software (SPSS Inc., Chicago, IL). Categorical data were compared by the chi-square test or the fisher exact test, when expected cell sizes (n) were smaller than 5. Two models were constructed in a logistic regression analysis using clusters and anti-TB drug resistance as the respective outcome variables. In order to determine independent risk factors, odds ratios (OR) and 95% confidence intervals (CI) were calculated by using logistic regression analysis for demographic (gender, age, address and religion), epidemiologic (previous treatment and HIV status), and microbiological variables (drug resistance, and infection by M. tuberculosis lineages). P-values less than 0.05 were considered statistically significant.

Demographic characteristics
A total of 260 M. tuberculosis isolates were utilized to carry out MIRU-VNTR 24-loci and spoligotyping analysis. Out of these, 16 isolates were excluded from the final analysis as for 15 of these, no PCR amplicon was obtained at two or more loci and one multidrug resistant isolate was identified as a mixture of two independent strains during MIRU-VNTR typing. An occasional lack of PCR amplification of some loci has been reported in a previous study [14]. This might be explained by chromosomal deletion, nucleotide polymorphisms in the sequences complementary to PCR primers [23], or insufficient DNA quality. A mixture of two independent strains was also defined by the presence of double alleles at two or more loci [14,24]. Isolates with no PCR amplicon at only one locus were treated as missing data at the respective loci and included into the analysis. These observations remained the same even after repeated testing with freshly prepared materials. For the remaining 244 isolates, valid genotyping data were obtained and used for further analyses.
Some demographic data of the study subjects and drug susceptibility test results for the isolates used herein were included in our previous report [16]. Briefly, the mean age ± the standard deviation of 244 study subjects was 31.6 ± 12.5 (range, 68 years), and 58.2% patients were male. Nearly all patients, 98.8% were Amhara by ethnicity, and 97.5% patients were Christian by religion. Of all study subjects, 17.6% patients were previously treated cases and 25.4% patients were HIV co-infected (Table 1).  form four previously undefined lineages, the largest of which comprising 32 (13.1%) isolates was named Ethiopia_3, followed by a branch with 19 (7.8%) isolates and was named as Ethiopia_1, a third branch with 17 (7.0%) isolates that were closely related to the laboratory strain H37Rv was named Ethiopia_H37Rv like, and the fourth branch with 9 (3.7%) strains was named as Ethiopia_2. The remaining 23 (9.4%) isolates could not be assigned to a known phylogenetic lineage or a new lineage ( Figure 1, Table 1, and Additional file 1: Figure  S1). To confirm this strain classification, we calculated a minimum-spanning tree (MST) based on the MIRU-VNTR 24-loci data. The MST (Figure 2) confirmed the classification according to UPGMA tree-based analysis ( Figure 1) and by comparison with the MIRU-VNTRplus reference database. All lineages suspected from dendrogram-based analysis were also detected as clonal complexes in the MST, including the newly described lineages (Figure 2).

Population structure and cluster analysis
Cluster analysis based on a composite data of MIRU-VNTR 24-loci profiles and spoligotyping patterns revealed that 110 of the 244 strains (45.1%) shared a genotyping pattern with at least one other isolate and were grouped in 36 clusters ranging in size from 2 to 13 strains, resulting in a resent transmission index (RTI) of 30.3%. The remaining strains were discriminated into 134 unique genotypes (Table 2). Strains were also assigned to MLVA MtbC15-9 types. The largest cluster (n = 13; cluster 5: MLVA MtbC15-9 type 594-15) is formed by the Ethiopia-3 lineage, followed by the second largest clusters formed by strains of the Dehli/CAS lineage (n = 8, cluster 19: MLVA MtbC15-9 type 1557-32), indicating ongoing transmission of these strains. Of
The odds of clustering was also 2-fold higher among strains from previously treated cases (26 out of 43 strains) (P=0.026) compared to strains from the new cases, nearly 3-fold higher among INH resistant strains (22 out of 34 strains) compared to INH susceptible strains (P=0.013), nearly 4-fold higher among STM resistant strains (19 out of 26 strains) (P=0.002) compared to STM susceptible strains, and nearly 5-fold higher among EMB resistant strains (14 out of 18 strains) (P=0.004) compared to EMB susceptible strains. Additionally, strains that were resistant to one or more first line anti-TB drugs had 2-fold higher odds of clustering (25 out of 40 strains) (P=0.015) compared to fully susceptible strains, and strains that were resistant to all first line anti-TB drugs had 10-fold higher odds of clustering (8 out of 9 strains) (P=0.012) compared to strains that were susceptible to at least one first line anti-TB drugs. However, multidrug resistance was not a significant risk factor for clustering (8 out of 12 strains) (P = 0.123) compared to non multidrug resistant strains (Table 1).
Interestingly, more than 50% of MDR strains were classified as the Haarlem lineage (Table 3). Although the numbers are small, the risk of having multidrug resistance was 22-fold higher among patients with a Haarlem strain (P <0.001) compared to patients with the non Haarlem strains and the odds of resistance to all first line anti TBdrugs was 10-fold higher among patients with a Haarlem strain (P=0.004) compared to patients with non Haarlem strains. Similarly, significantly higher risk of resistance to INH (P=0.017), RMP (P<0.001), STM (P=0.015), EMB (P=0.002) or PZA (P=0.002) was observed among patients with a Haarlem strains compared to patients with the non Haarlem strains (Table 4).

Discussion
Recent advances in molecular strain typing such as the development of 24-loci MIRU-VNTR typing provide a powerful tool to analyze MTBC population structure and transmission dynamics locally and on the global level, which provides valuable information for the development of effective tuberculosis control policy. In this study, we present the first in-depth analysis of the population structure of M. tuberculosis strains in Northwest Ethiopia based on high-resolution MIRU-VNTR 24-loci typing and spoligotyping. Our data confirm a highly diverse population structure that comprises, thirteen phylogenetic lineages, four of which were not described before. Furthermore, our data indicate a high rate of recent transmission, of which the spread of resistant and MDR strains is of special importance. While homoplasy is a true phenomenon within the evolution of TB, spoligotyping has been shown to provide invalid phylogenetic classifications by suggesting homoplasy too often [25]. In the contrary, the MIRU-VNTR 24-loci typing method applied in our study has the advantage to allow for high-resolution genotyping needed for molecular epidemiological studies and, simultaneously, for valid phylogenetic strain classification enabling screening for new phylogenetic lineages/ clonal complexes [14].
Using this method, 90.6% of the strains investigated were classified into various M. tuberculosis complex lineages; of which, 58.9% were described before and 31.6% were newly described in this study. We documented that M. tuberculosis Dehli/CAS is the predominant phylogenetic lineage in Ethiopia, accounting for 39% of investigated strains. Similarly, a previously published study from the capital city of Ethiopia showed that 43.5% of the strains were of the CAS lineage [11], and a study from Sudan [26] also showed that M. tuberculosis Dehli/ CAS is the predominant lineage (49%) of investigated strains. The Dehli/CAS lineage is essentially localized in the Central Asia and Middle-East, more specifically in India [27]. Two hypotheses could explain the presence of high Dehli/CAS lineage in Ethiopia: (i) the large Indian and Chinese communities in Ethiopia due to the growing economic partnerships between Ethiopia and the two Asian countries, India and China may have contributed in the introduction of this lineage; or (ii) this lineage could have emerged from Ethiopia and migrated through Asia, this hypothesis is in agreement with the suggestion that East Africa is the origin of M. tuberculosis complex species [28]. Additionally, we confirmed the presence of previously undefined phylogenetic lineages named as Ethiopia_3, Ethiopia_1, Ethiopia_H37RV-like and Ethiopia_2 that were clearly defined by tree based, as well as by minimum spanning tree-based analysis. However, comparison with other studies is hampered by the fact that they are mainly based on IS6110 DNA fingerprint and/or spoligotyping analysis hindering a valid analysis of the population structure and standardized comparisons based on MIRU-VNTR nomenclature. Thus, the actual picture of M. tuberculosis population diversity in African, highincidence settings is largely incomplete and needs a systematic investigation with phylogenetic useful genotyping methods.
This study also showed a significant association between infection with strains of the Haarlem lineage and multi-drug resistance, resistance to all first line anti-TB drugs and resistance to each first line anti-TB drugs including INH, RMP, STM, EMB and PZA. Similarly, a previous study from Tunisia showed that the Haarlem family genotype has a similar relationship with drug resistance and rapid clonal expansion [29]. From TBcontrol point of view, it is relevant to understand whether specific genotype families are overrepresented among drug-resistant cases and, in particular, if these resistant strains are successfully transmitted within the community. In this study, HIV infection was not significantly associated with resistance to anti-TB drugs. The high HIV prevalence in the study subjects did not appear to be a significant risk factor selectively driving drug resistance development and transmission. This might be due to the fact that HIV infection increases the susceptibility of the population for both drug susceptible and drug resistant M. tuberculosis strains.
Clustering is a marker for recent transmission [30][31][32]. By using degree of recent TB transmission in a study population, one can estimate the efficacy of the TB control program [30]. Both high TB incidence and the current drug-resistance rates in Ethiopia are indicative of defects of the TB control program [2,4,16]. Supporting this suggestion, we found a high rate of clustering, 45.1% of the total strains investigated. This is in agreement with the previous reports from the capital city of Ethiopia that showed clustering rate of 41.2% [13] and 48.1% [12].
Even more important, we confirm an elevated clustering rate in drug resistant strains in general as well as for MDR strains. Similarly, there was a significant association between recent transmission and patients with the history of previous TB treatment, infection with INH resistant strains, STM resistant strains, EMB resistant strains, strains resistant to one or more first line anti-TB drugs and patients with strains resistant to all first line anti-TB drugs. This might be due to the fact that, in Ethiopia there is no culture and drug susceptibility testing facility for routine diagnosis of drug resistance, thus, drug resistant-TB is only diagnosed after prolonged treatment with firstline anti-TB drugs and clinical recognition that treatment has failed. Treatment of drug-resistant TB with standard first line drugs, instead of a regimen designed according to the resistance pattern has several potential adverse consequences: patients remain on inadequate treatment longer, increasing the risk of treatment failure or death; selection of drug resistant strains and patients remain infectious, promoting transmission to close contacts [33]. These data indicate a successful transmission of drug resistant and MDR strains in the community, a situation that needs to be carefully monitored in the future to determine extensive transmission of resistant strains early enough to avoid more significant problems for TB control as already eminent in several parts of Eastern Europe or South Africa [34,35].
Interestingly, we present evidence of significant association between recent transmission and the Dehli/CAS, Ethiopia_3, TUR and Ethiopia_H37Rv like strain infections. Similarly, Gagneux et al. have recently proposed that the major M. tuberculosis lineages have evolved so as to become adapted to specific host genetic backgrounds and are much more likely to transmit and cause disease among patients of the same ethnicity [36].

Conclusion
In conclusion, our study confirms a highly diverse population structure of M. tuberculosis, the presence of phylogenetic lineages that were not described before and a predominance of the Dehli/CAS lineage in the Amhara region, Northwest Ethiopia. Our study also showed a significant association between Haarlem strain infection and resistance to first line anti-TB drugs including multidrug reissuance. The high rate of recent transmission underlines active transmission of M. tuberculosis including drug-resistant strains, and consequently the inefficacy of TB control program in the study area. This emphasizes the importance of strengthening laboratory diagnosis of TB including culture and drug susceptibility testing, intensified case finding and treatment of TB patients according to the ongoing DOTS program to interrupt the chain of transmission within the community. The continued development of new high-resolution methodologies (eg. whole genome sequencing) is still crucial.

Additional file
Additional file 1: Figure S1. Classification of the strains based on the MIRU-VNTR 24-loci and spoligotype patterns.