Distinct genotypic profiles of the two major clades of Mycobacterium africanum

Background Mycobacterium tuberculosis is the principal etiologic agent of human tuberculosis (TB) and a member of the M. tuberculosis complex (MTC). Additional MTC species that cause TB in humans and other mammals include Mycobacterium africanum and Mycobacterium bovis. One result of studies interrogating recently identified MTC phylogenetic markers has been the recognition of at least two distinct lineages of M. africanum, known as West African-1 and West African-2. Methods We screened a blinded non-random set of MTC strains isolated from TB patients in Ghana (n = 47) for known chromosomal region-of-difference (RD) loci and single nucleotide polymorphisms (SNPs). A MTC PCR-typing panel, single-target standard PCR, multi-primer PCR, PCR-restriction fragment analysis, and sequence analysis of amplified products were among the methods utilized for the comparative evaluation of targets and identification systems. The MTC distributions of novel SNPs were characterized in the both the Ghana collection and two other diverse collections of MTC strains (n = 175 in total). Results The utility of various polymorphisms as species-, lineage-, and sublineage-defining phylogenetic markers for M. africanum was determined. Novel SNPs were also identified and found to be specific to either M. africanum West African-1 (Rv1332523; n = 32) or M. africanum West African-2 (nat751; n = 27). In the final analysis, a strain identification approach that combined multi-primer PCR targeting of the RD loci RD9, RD10, and RD702 was the most simple, straight-forward, and definitive means of distinguishing the two clades of M. africanum from one another and from other MTC species. Conclusion With this study, we have organized a series of consistent phylogenetically-relevant markers for each of the distinct MTC lineages that share the M. africanum designation. A differential distribution of each M. africanum clade in Western Africa is described.


Background
Mycobacteria that cause human and/or animal tuberculsosis (TB) are grouped together within the Mycobacterium tuberculosis complex (MTC). The MTC is comprised of the classical species M. tuberculosis, Mycobacterium africanum, Mycobacterium microti, and Mycobacterium bovis (along with the widely used vaccine strain M. bovis bacillus Calmette-Guérin [BCG]) [1][2][3], as well as newly recognized additions Mycobacterium caprae and Mycobacterium pinnipedii [4,5]. Although they are not presently officially described microorganisms, "Mycobacterium canettii" (proposed name), the oryx bacillus, and the dassie bacillus are additional widely-accepted members of the MTC [6][7][8]. M. tuberculosis is the predominant cause of human TB worldwide but M. africanum and M. bovis remain important agents of human disease in certain geographical regions. Of note, M. bovis is naturally resistant to pyrazinamide, a first-line anti-TB drug [9], and so treatment of human TB caused by M. bovis should not include pyrazinamide. Therefore, the correct identification of MTC isolates to the species level is important to ensure appropriate patient treatment, as well as for the collection of epidemiological information and for implementing necessary public health interventions.
Mycobacteriological laboratory methods have traditionally utilized a series of tests based upon growth, microscopic, phenotypic, and biochemical properties in order to segregate the classical members of the MTC [10]. However, these tests can be slow-to-results, cumbersome, imprecise, non-reproducible, time-consuming, may not give an unambiguous result in every case, and may not be performed by every clinical microbiology laboratory. The relatively recent identification of DNA sequence level differences amongst the species of the MTC has greatly improved our capacity for performing molecular epidemiology, phlylogenetic structuring of the MTC evolutionary tree, and MTC species determination. Molecular techniques, such as PCR, either alone or followed by sequence analysis or restriction fragment analysis (RFA), have proven particularly useful for the characterization of single nucleotide polymorphisms (SNPs) and/or chromosomal region-of-difference (RD) loci (such as insertions, deletions, and rearrangements) that are either lineage-, species-, or strain-specific [7]. Several groups have reported on the development of molecular protocols for the definitive identification of unknown MTC isolates to the species level by RD and/ or SNP analysis [2,7,[11][12][13] and clinical laboratories are now beginning to integrate such home-brew protocols into their routine identification protocols for acid-fast bacilli. The only currently available commercial protocol for MTC species identification is the GenoType MTBC® assay (Hain Lifescience, Nehren, Germany) that can differentiate M. tuberculosis from M. africanum, M. microti, M. caprae, M. bovis, and M. bovis BCG [14][15][16]. However, this test is limited in that it cannot differentiate all species of the MTC and it is not commercially available for diagnostic purposes in the USA.
In the past, M. africanum strains were generally identified by default, having first ruled-out both M. tuberculosis and M. bovis by the traditional battery of tests. Two biovars of M. africanum were commonly described that lay along the phenotypic continuum between M. tuberculosis and M. bovis [17]. We now understand that most strains formerly designated as M. africanum subtype II strains were actually M. tuberculosis [1,2,7,[18][19][20][21][22][23], while strains formerly characterized as M. africanum subtype I can be segregated into two distinct genealogical clades on the basis of multiple genome sequence-level differences [1,2,7,23]. Several names have been given to each of the subtype I lineages in order to distinguish them. In this report we refer to the subtype I groupings as M. africanum West African-1 and M. africanum West African-2 [24,25]. For reference, as first described by Mostowy et al. [23], strains of M. africanum West African-1 (also known as clade 1 [26]) uniquely possess the long sequence polymorphism (LSP) RD713, while M. africanum West African-2 (also known as clade 2 [26]) carries the defining LSPs RD701 and RD702. Huard et al. [7], recently confirmed the clade specificity of these RDs, identified and validated the first SNPs restricted to either M. africanum West African-1 or M. africanum West African-2, and placed several additional previously known and novel polymorphisms into a unified phylogenetic context vis à vis M. africanum West African-1 and M. africanum West African-2.
In the present study, we characterized the content of known phylogenetically relevant RDs and SNPs in a blinded, and M. africanum-enriched, set of MTC strains isolated from TB patients in Ghana. The results of this evaluation established the utility of several consistent RD and SNP markers for M. africanum identification and clade differentiation and allowed us to settle upon a focused approach for future evaluations. In addition, novel SNPs were identified and validated against a large and diverse collection of MTC species and found to be specific to either M. africanum West African-1 (Rv1332 523 ) or M. africanum West African-2 (nat 751 ), thereby further expanding the limited number of genetic markers that can be used to unambiguously differentiate the two M. africanum lineages.
(This study contributed to the fulfillment of the Master's degree requirements by S.E.G.V.)

MTC strains analyzed
A total of 175 unique isolates that represent all of the presently described members of the MTC were included in the analysis and were derived from three strain collections, maintained at different institutions. One set of strains (n = 47) came from the National Reference Center for Mycobacteria in Forschungszentrum, Borstel, Germany and was collected in 2001-2003 from patients with pulmonary TB in Ghana. This set of Ghana strains was provided in a non-random blinded fashion but was known to contain both M. africanum and M. tuberculosis (as controls). All strains were previously characterized using the GenoType MTBC® assay, as per the manufacturer's instructions, and these results were provided subsequent to the derivation of species identity using RD markers. A complete listing of the Ghana collection isolates by strain number accompanies a recent article by Wirth et al. [24] (excepting all M. bovis from Ghana and the non-M. bovis strains 10514/01, 1473/02, and 5357/02) and was recently made available as part of the MIRU-VNTRplus database http://www.miruvntrplus.org/MIRU/index.faces [27]. Another 124 isolates were of a well-described strain collection from the Weill Medical College of Cornell University, New York. The extensive molecular characterization of the Cornell collection, and a complete listing by MTC species, unique identifier, and origin, was previously reported [7]. Only one isolate from that collection (M. tuberculosis strain W) was not included in the current evaluation. This sampling was composed of "M. canettii" (n = 5), M. tuberculosis (n = 44), M. africanum West African-1 (n = 12) (note: given previously as M. africanum subtype Ib), M. africanum West African-2 (n = 18) (note: given previously as M. africanum subtype Ia), the dassie bacillus (n = 4), the oryx bacillus (n = 2), M. microti (n = 10), M. pinnipedii (n = 7), M. caprae (n = 1), M. bovis (n = 14), and M. bovis BCG (n = 8). Lastly, 15 DNA samples were provided from the collection of the National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands [28]. These  [2,22]). The 4 strains underlined in the above were unique and the remaining 11 were also included in the Cornell collection [7]. All strains from the Ghana collection were screened for every marker of interest while strains of the Cornell and RIVM collections were screened selectively, as described in each respective section of the Results.

PGG Analysis
Frequently observed SNPs in the genes katG 463 and gyrA 95 are routinely assessed in order to broadly categorize isolates into defined MTC phylogenies, known as principal genetic groups (PGG) [29]. The distribution of SNPs in katG 463 and gyrA 95 suggests that PGG1 M. tuberculosis strains more closely resemble the most recent common ancestor of all M. tuberculosis strains than PGG2 strains, and PPG2 strains more so than PGG3 strains. MTC species along the M. africanum M. bovis evolutionary track are also PGG1 [1,2]. SNP analysis of katG 203 was used to further segregate PGG1a isolates from PGG1b strains [7,30]. Representatives of each PGG were included in the Cornell collection of MTC strains.

MTC PCR-typing Panel
In previous reports we described [2], and then expanded upon [7], a PCR-based protocol for the differentiation of the various MTC species on the basis of genomic deletions. This MTC PCR-typing panel targets eight independent loci for amplification (16S rRNA, cfp32 The resulting band-pattern that is observed following agarose gel electrophoresis is indicative of MTC species identity. Of note, the RD12 target region in M. bovis and M. caprae overlaps a specific LSP in "M. canettii" (RD12 can ), while the RD1 target region in M. bovis BCG overlaps a specific LSP in the dassie bacillus (RD1 das ). With this protocol, the pattern of bands for M. microti and M. pinnipedii are identical, while the pattern of bands for the orxy bacillus is the same as that of M. africanum West African-2. The MTC PCR-typing panel has been successfully applied to collections of MTC strains from Rio de Janeiro, Brazil, and Kampala, Uganda, in order to characterize the diversity of MTC species within these locales [21,31].

PCR amplification primers and conditions
Purified DNA was prepared for PCR as previously described [2]. For some strains, culture thermolysates (80°C for 30 min) were used as the source of DNA in PCR amplifications. The primers used for the MTC PCR-typing panel, the RD Rio flank multiplex, RD174, RD701, RD702, RD711, RD713, in addition to targets containing the the pks15/1 micro-deletions and SNPs at aroA 285 , 3'cfp32 311 , gyrA 95 , gyrB 1450 , hsp65 540 , katG 203 , katG 463 , PPE55 2148 , PPE55 2154 , narGHJI -251 , RD13 174 , rpoB 1049 , rpoB 1163 , Rv1510 1129 , and TbD1 197 , were the same as described earlier [7,32,33]. For analysis of the loci RD8, RD9, RD10, RD701, and TbD1 additional new site-specific 3-primer combinations were designed for each, similar to as previously detailed [32], and each included two deletion flanking primers and one primer internal to the deletion. The 3-primer PCRs were each designed to amplify a product of one size when the target locus is intact or to produce a different band size when a known LSP is present. New primers were also designed to amplify a 1069-bp nat gene fragment and the SNP-containing targets in nat 751 and Rv1332 523 . New primers, along with expected band sizes and the PCR program used to amplify, are listed in Table 1. The general PCR protocol was identical to that used previously [2,7]. PCR amplification from purified DNA was performed using the following cycling conditions: Program 1a (with an initial denaturation step of 5 min at 94°C, followed by 45 cycles of 1 min at 94°C, 1 min at 60°C, and 1 min at 72°C, and ending with a final elongation step for 10 min at 72°C) or program 2a (similar to program 1a but with an annealing temperature of 65°C). PCR testing of DNA thermolysates was performed in a similar manner using the following cycling conditions: Program 1b (with an initial denaturation step of 5 min at 94°C, followed by 45 cycles of 1 min at 94°C, 1 min at 60°C, and 4 min at 72°C, and ending with a final elongation step for 10 min at 72°C) or program 2b (similar to program 1b but with an annealing temperature of 65°C). Programs 1b and 2b were also used to amplify from purified DNA when potential target PCR fragments were greater than 1,250 bp. PCR products were visualized as previously described by agarose gel electrophoresis [2]. Negative or unexpected positive PCR results were repeated at least once for confirmation. Importantly, all PCR tests included parallel samples containing DNA of M. tuberculosis strain H37Rv (ATCC 27294 T ) and either M. africanum West African-1 strain Percy16, M. africanum West African-2 strain ATCC 25420 T , or M. bovis strain ATCC 19210 T , where appropriate, as controls. All controls consistently provided the expected results for each particular marker screened. Negative control PCRs, lacking input DNA, were also included to control for DNA contamination.
It should be noted that the M. africanum West African-1-and M. africanum West African-2-restricted LSPs were amplified by RD flanking primers [23] and analyzed as previously described [7] with the results based upon a size estimation of the PCR products on agarose gel. PCR amplification of RD713 in M. africanum clade 1 strains typically yields a 2,798 bp amplicon, while amplification of this locus in other MTC strains either results in a 4,248 bp product (PGG2 and PGG3 M. tuberculosis) or no PCR product (PGG1a MTC species with the partially overlapping RD7 deletion and PGG1b M. tuberculosis which possess additional genomic content at this locus [7]). In PCR amplification of RD711, most, but not all, M. africanum clade 1 strains are expected to yield a 944 bp amplicon while the remaining M. africanum West African-1 strains and
All unexpected digestion results were repeated least once for confirmation. For each PCR-RFA evaluation, the PCR fragments from at least one strain of each digest pattern were sequenced in order to confirm the presence or absence of the target SNP.
Because it was not possible to develop a PCR-RFA based approach for characterization of the SNPs at PPE55 2148 , PPE55 2154 , RD13 174 , and Rv1332 523 , SNP analysis for these markers was performed by direct sequencing of the PCR products. The same procedure was used for verification of micro-deletions in the pks15/1 locus [34]. In most cases, the primers for PCR amplification primers were also used for sequencing, as previously described [2,7], with the exception of the 1069 bp nat fragment which was also sequenced using internal primers Table 1). Sequencing was performed using the BigDye Terminator kit (PE Applied Biosystems) on an ABI 3730 DNA Analyzer, either at the Cornell University BioResource Center (Ithaca, NY) http:// www.brc.cornell.edu or at the Oswaldo Cruz Foundation (PDTIS DNA Sequencing Platform/FIOCRUZ, Rio de janeiro, RJ.); http://www.dbbm.fiocruz.br/PDTIS_Genomica/) and the results were analysed as previously described [2,7].

Genetic characterization of MTC isolates by PCR deletion analysis
For this study we applied the MTC PCR-typing panel to a blinded, M. africanum-enriched, challenge collection of MTC strains isolated from patients with TB in Ghana (n = 47). As a result, 18 M. tuberculosis isolates, 20 strains of M. africanum West African-1, and 9 M. africanum West African-2 strains were putatively differentiated [7]. Strains were identified as M. tuberculosis by the successful amplification of targets internal to the RD9 and RD12/RD12 can loci. Strains were identified as M. africanum West African-1 on the basis of failure of amplification of the RD9 locus but the successful amplification of the RD7 target region, while M. africanum West African-2 strains were putatively identified on the basis of failure of amplification of the RD9 and RD7 loci but the successful amplification of regions within the RD1 bcg /RD1 das , RD4, and RD12 loci. No M. bovis strains (which would have shown a pattern lacking in amplicons for RD4, RD7, RD9, and RD12) or other MTC species were identified (see ref. 7 for the expected MTC PCR typing panel patterns of "M. canettii", M. microti, M. pinnipedii, and the dassie bacillus). Of note, all strains amplified for the cfp32 (Rv0577) gene, a target that has been previously proposed to be MTCrestricted and may be necessary for pathogenesis [2,7,35]. The segregation of M. tuberculosis from M. africanum in this collection by the MTC PCR typing panel paralleled the results derived from the GenoType MTBC® assay, which assigned these isolates as either M. tuberculosis (n = 18) or M. africanum subtype I (n = 29). These identifications were consistent with independently derived data for this strain set [24]. Fig. 1 illustrates a typical MTC PCR-typing panel profile for M. tuberculosis, M. africanum West African-1, M. africanum West African-2, and M. bovis. A summary of all molecular test results derived in this study is provided in Table 3 and illustrated schematically in Fig. 2. With respect to the RD markers interrogated above, note their phylogenetic positions in Fig. 2 at nodes 1, 6, 9, 14, and 16-19. An exception to the common M. tuberculosis MTC PCR-typing panel profile occurred with 9 M. tuberculosis strains from Ghana, which failed to amplify the IS1561' target (see Fig. 1B). Previously, strains with this particular band pattern were found to share a clonal deletion called RD Rio that defines a major, newly  recognized, lineage of M. tuberculosis that is the predominant cause of TB in Rio de Janeiro, Brazil, and that has disseminated to many countries around the world [7,31,32]. However, multiplex PCRs for both the RD Rio LSP and the coincident RD174 deletion [32] showed that these Ghanaian strains were not RD Rio genotype M. tuberculosis. Rather, data from the MIRU-VNTRplus website identified these strains as being of the RD726harboring Cameroon genotype (ST61 and variants) and lists the strains as lacking IS1561' [27]. The Cameroon genotype therefore appears to possess an undefined LSP of IS1561' that overlaps RD Rio (Fig. 2; see node 4) and the MiD3 locus in M. microti and M. pinnipedii ( Fig. 2; see node 16) [7,31].  (18) intact (20) intact (9) 3'cfp32 1 intact (18) intact (20) intact (9) TbD1 2 deleted (18) intact (20) intact (9) pks15/1 3, 9 7-bp deletion (18) intact (20) 6-bp deletion (9) PPE55 2 4, 16 intact (9) fail (9) intact (20) intact (9) MiD3/RD Rio 2 (IS1561' + PPE55) 4, 16 intact (9) fail (9) intact (20) intact (9) RD9 6 intact (18) deleted (20) deleted (9) RD713 3 7 fail (18) deleted (20) fail (9) RD711 8 intact (18) deleted (20) intact (9) RD7 3 9 intact (18) intact (20) deleted (9) RD8 9 intact (18) intact (20) deleted (9) RD10 9 intact (18) intact (20) deleted (9) RD701 11 intact (18) intact (20) deleted (9) RD702 11 intact (18) intact (20) deleted (9) RD1 das /RD1 bcg 4 14,19 intact (18) intact (20) intact (9) RD13 17 intact (18) intact (20) intact (9) RD4 18 intact (18) intact (20) intact (9) SNP loci  In addition to the MTC PCR-typing panel, some PCR targets used in SNP analysis, as will be described below, amplify from genomic regions that are deleted is some MTC species or lineages [7]. The successful amplification of the 3'cfp32 and RD13 loci in all the strains of the Ghana collection confirmed the species distribution obtained using the MTC PCR-typing panel, as these targets are deleted in either "M. canettii" (Fig. 2; see node 1) or both M. caprae and M. bovis ( Fig. 2; see node 17), respectively [7]. Furthermore, PPE55 is located proximal to IS1561' and so the failure to amplify PPE55 from the 9 Cameroon genotype M. tuberculosis isolates is consistent with a single genomic deletion in the region of IS1561' (Fig. 2; see node 4). Lastly, TbD1 is an important phylogenetic marker that categorically divides M. tuberculosis into two major lineages [1]. All M. tuberculosis isolates in the Ghana collection failed to amplify from targets internal to TbD1 (Fig. 2; see node 2), while all M. africanum clades 1 and 2 strains yielded an amplicon of the correct size, consistent with the previous finding that isolates from the M. africanum M. bovis evolutionary tract are all TbD1-positive and likely  [1,24].
We next evaluated the Ghana strain collection by PCR (using LSP flanking primers) for RDs that have been described previously as being either specific to M. africanum West African-1 (RD713), restricted to a subgroup of M. africanum West African-1 (RD711), or specific to M. africanum West African-2 (RD701 and RD702) [7,23]. All M. africanum West African-1 strains (n = 20) yielded amplification products for RD711 and RD713 of shorter band sizes that were consistent with amplicons that bridge a deletion ( Fig. 2; see nodes 7 and 8). All M. tuberculosis strains (n = 18) contained the RD711 and RD713 regions, while each M. africanum West African-2 strain (n = 9) yielded PCR fragments suggestive of intact RD711. Each M. africanum West African-2 strain (n = 9) also failed to produce any amplification products from the RD713 locus region, as expected, owing to the overlapping RD7 [7]. Likewise, all M. africanum West African-2 strains produced shortened RD701 and RD702 amplicons ( Fig. 2; see node 11), while each M. tuberculosis and M. africanum West African-1 strain exhibited PCR fragments representative of intact sequences within these loci. The M. africanum clade-specific bridge-deletion PCR results were therefore congruent with the MTC PCR-typing panel data.
A drawback, however, of the MTC PCR-typing assay as it was designed is that overlapping polymorphisms may occur in the target regions of the panel. Such hypothetical LSPs would therefore have the potential to cause a failure in amplification and to confuse the interpretation of banding patterns which may, in turn, lead to erroneous species determinations. To begin to address this issue, with respect to loci relevant to the species within the current Ghana collection, we developed new 3-primer combination sets for RD8, RD9, RD10, RD701, and TbD1 (Table 1). As was expected from previous phylogenetic evaluations [1,3,7], each of the test loci were found to be intact in the Ghana collection PGG2 M. tuberculosis strains, excepting TbD1. Moreover, excepting RD9, each of the studied RDs were intact in the M. africanum West African-1 strains, while in the M. africanum West African-2 strains only TbD1 remained intact, i.e. the RDs 8-10 and RD701 were deleted. Overall, no inconsistencies were observed with respect to species identification within the Ghana MTC strain collection across the different strategies for PCR deletion analysis that were employed.

Genetic characterization of MTC isolates by SNP analysis
For the second stage of this study we screened the Ghana MTC collection for known phylogenetically relevant SNPs. With respect to the M. tuberculosis strains, we determined that all were PGG2 (n = 19) (Fig. 2; see nodes 3 and 5). Consistent with this determination, the 7-bp pks15/1 micro-deletion was observed in all the M. tuberculosis strains; this polymorphism is positioned at the same point along the MTC evolutionary tree as the katG 463 CTG CGG SNP that marks PGG2 M. tuberculosis strains (Fig. 2; see node 3). Likewise, an SNP in the narGHJI operon promoter (-215 C T), that is phylogenetically coincident with TbD1 [33] was also present in all of the Ghanaian M. tuberculosis isolates evaluated ( Fig. 2; see node 2). Lastly, the gyrB 1450 G T polymorphism (also a target of the GenoType MTBC® assay [14][15][16]) is known to coincide with the RD9 deletion and likewise segregated the M. tuberculosis isolates from the strains of the M. africanum strains ( Fig. 2; see node 6).
The following considers SNPs that inform the phylogenetic interrelationships among most of the non-M. tuberculosis MTC species. First, all the M. africanum strains (n = 28) were PGG1. Previously, an ACC ACT SNP at katG 203 has been used to segregate PGG1 strains into PGG1a and PGG1b [30]. Huard et al. [7] reported that this SNP is present in M. africanum West African-2 and all downstream species in the MTC evolutionary tree ( Fig. 2; see node 9). As expected, the Ghana collection M. africanum West African-1 strains were determined to be PGG1b, while the M. africanum West African-2 strains were PGG1a by katG 203 analysis. Additional inter-species-specific SNPs that colocalize with the katG 203 SNP and segregate the M. africanum clades (and are also notably coincident with RD7, RD8, and RD10) have also been reported at 3'cfp32 311 (G A), PPE55 2148 (A G), PPE55 2154 (A G), and RD13 174 (G A), in addition to a 6-bp pks15/1 micro-deletion ( Fig. 2; see node 9) [7,34]. These loci were interrogated and indeed found to partition the M. africanum West African-2 strains from the M. africanum West African-1 and M. tuberculosis strains of the Ghana collection, consistent with previous reports [7,34]. Lastly, we also screened for an inter-species-specific SNP in mmpL6 551 (AAC AAG) [1,7] that is not observed in M. africanum West African-1, M. africanum West African-2, nor the dassie bacillus, but is present in all of the remaining distal species along the oryx bacillus M. bovis evolutionary track of the MTC phylogenetic tree [1,7,26]. As was expected, we found mmpL6 551 to be unaltered in the M. africanum West African-1 and West African-2 strains of the Ghana MTC collection (Fig. 2; see node 15). The mmpL6 551 SNP occurs within a TbD1 locus gene and was thus deleted in the TbD1-negative M. tuberculosis strains of the Ghana collection.
We then investigated SNPs that have been previously described to be restricted to either M. africanum West African-1 or M. africanum West African-2 within the MTC [7]. SNPs at aroA 285 (G A) and TbD1 197 (C T) were found to be limited to the M. africanum West African-1 strains of the Ghana MTC collection, thereby coinciding with the M. africanum West African-1-specific LSP RD713 ( Fig. 2; see node 7). Point mutations at Rv1510 1129 (G A), hsp65 540 (C G), and rpoB 1163 (C T) were also screened and found to be restricted to the M. africanum West African-2 strains (Fig. 2; see nodes 10-12); a previously noted sublineage-specific SNP at rpoB 1049 (C T) was not observed (Fig. 2; see node 13). However, from previous data [7], only hsp65 540 has been shown to be truly M. africanum West African-2-specific and to associate pylogenetically with RD701 and RD702. In fact, Rv1510 1129 was previously found to be an inter-species-specific SNP that M. africanum West African-2 shares with the dassie bacillus, and is indicative of a common ancestor between these species, while not all M. africanum West African-2 strains possess the rpoB 1163 and rpoB 1049 SNPs [7]. These latter point mutations appear to have been acquired in a stepwise sequential order and to define the branch points of sublineages within the M. africanum West African-2 species. All Ghana M. africanum West African-2 strains evaluated in this study therefore fell into the second of three potential rpoB sequence-based sublineage branches. Overall, each of the known MTC inter-species-specific, species-specific, and sublineage-specific SNPs for which the Ghana MTC collection was evaluated were entirely consistent with the current RD analyses and showed a species distribution that paralleled previous descriptions [7].

Identification of a novel Mycobacterium africanum West
African-1-specific Rv1332 523 SNP In the process of sequencing the RD711 bridge amplicon to confirm its correct amplification in an M. africanum West African-1 strain, we noted a nonsynonomous G T SNP in the region 5' of the RD711 deletion breakpoint and within the Rv1332 gene, affecting nucleotide 523 (Rv1332 523 ; V175L). To investigate the distribution of this Rv1332 523 SNP amongst the MTC species, we generated a new primer pair to amplify the SNP-containing region upstream of RD711. We then performed PCR and sequence analysis of the amplified products upon samples from select MTC strains of the Cornell collection representing each of the MTC species and major M. tuberculosis lineages, i.e., "M. africanum West African-1 strains. In total, 85 MTC isolates were screened, 32 of which were M. africanum West African-1. The data thus supported that the Rv1332 523 SNP is a specific marker for M. africanum West African-1 and is only the third such polymorphism reported to date ( Fig. 2; see node 7) [7].

Identification of a novel Mycobacterium africanum West
African-2-specific nat 751 SNP Previously, the nat (Rv3566c) gene product arylamine Nacetyltransferase has been investigated as a potential contributor to reduced isoniazid susceptibility in M. tuberculosis [36]. In the course of those investigations, SNPs were identified in the nat gene that were restricted to different M. tuberculosis lineages. We found a novel nonsynonomous G A SNP in two M. africanum West African-2 strains at nat nucleotide 751 (nat 751 ; E251K) upon amplification and sequencing of a 1069-bp nat fragment using samples from a subset of MTC representative strains (RIVM collection; n = 15). Test sequencing of the 1069-bp nat amplicon from 16 MTC strains from the Cornell collection supported the limited distribution of the nat 751 SNP. We then developed a PCR-RFA protocol for the nat 751 SNP, amplifying a shorter product using new primers and employing the restriction enzyme BcgI, and applied the protocol to all strains of both the Cornell (n = 124) and Ghana collections (n = 47). Consistent with the preliminary test results, all MTC isolates amplified nat successfully. However, only the 27 M. africanum West African-2 strains possessed the nat 751 polymorphism, as determined by PCR-RFA. The West African-2 strains showed a 4-band digest pattern on agarose gel electrophoresis as opposed to the remaining MTC strains that showed a 3band digest pattern (see Table 2). Thus, this SNP appears to be a specific marker for M. africanum West African-2 (n = 175 unique MTC strains evaluated in total) and is only the second SNP reported to be restricted to this clade ( Fig. 2; see node 11) [7]. Of note, both the nat 751 and hsp65 540 M. africanum West African-2-specific SNPs are present in the genomic sequencing project of M. africanum strain GM041182 that is currently nearing assembly completion http://www.sanger.ac.uk/sequencing/Mycobacterium/africanum/.

Discussion
M. africanum has been reported to be an important cause of TB in the West African countries of Guinea-Bissau (52%) [37], The Gambia (38%) [38], Sierra Leone (24%) [39], Senegal (20%) [17], Burkina Faso (18.4%) [40], Cameroon (9%) [41], Nigeria (8%) [42], and Côte D'Ivoire (5% of cases) [22]. M. africanum has also been identified in the West African countries of Benin, Mauritania, and Niger [7,43]. Many of the previous M. africanum reports appeared, however, before molecular markers distinguished two different clades within this species [1,7,23,25,26]. Therefore, this study is one of the few to use clade-specific molecular markers to investigate the diversity of M. africanum strains causing TB within a specific African locale. Previous MTC species surveys that characterized strains using truly informative phylogenetic markers identified M. africanum West African-1, but not West African-2, in Cameroon and Nigeria [41,42] or M. africanum West African-2, but not West African-1, in The Gambia [38,44] and Guinea-Bissau [23,45]. In contrast, with this study, we highlight the fact that both clades of M. africanum are contributing to the TB burden in Ghana [24]. However, because the Ghana MTC collection was not representative, the current study does not allow us to estimate the proportion of TB caused by the various MTC clades in this country. Such a systematic survey of MTC population structure in Ghana is currently in progress.
In actuality, few reports have definitively shown an overlap in the geographic ranges of M. africanum West African-1 and M. africanum West African-2. Previously, Huard et al. [7] studied isolates derived from patients in Niger that constituted both M. africanum clades; both lineages were likewise found to coexist in Sierra Leone [39]. In the absence of a molecular analysis similar to that presented herein, it is not known for certain which M. africanum clade predominates in many of the other M. africanum-endemic West African countries or if their ranges coincide elsewhere. However, a cross-comparison of molecular epidemiologic evidence presented in some earlier reports [17,46] and more recent data [7,41,43] does suggest that M. africanum clades 1 and 2 may both occur in at least Côte D'Ivoire, a country that borders Ghana. The picture that emerges from the combined studies [7,17,[22][23][24]30,[37][38][39][40][41][42][43][44][45][46][47][48] is of a differential geographic distribution of the M. africanum lineages, with West African-1 predominating in Eastern-West Africa (Cameroon, Nigeria), West African-2 in Western-West Africa (the Gambia, Guinea-Bissau, Senegal), and the two clades overlapping in Central-West Africa (Côte D'Ivoire, Ghana, Niger, Sierra Leone) (Fig. 3). A conceptually similar gradient of M. africanum prevalence across Western Africa was recently hypothesized by de Jong et al., but their analysis did not make a distinction between the two M. africanum clades [48]. Lastly, although TB caused by M. africanum is concentrated in sub-Saharan West African countries, with immigration and international travel, sporadic cases have also been reported in the USA, the Caribbean, and Europe [28,43,49], including one outbreak of multi-drug resistant M. africanum at a Parisian hospital [17,50]. With improved molecular methods of identification, we expect that further cases of infection will be identified outside of the traditional endemic areas of M. africanum.
Molecular systems are preferred for the differentiation of M. africanum from M. tuberculosis and M. bovis Figure 3 Map of sub-Saharan West Africa illustrating the differential geographic distribution by country of the M. africanum clades. Current evidence suggests that only M. africanum West African-1 is found in Eastern-West Africa (Cameroon, and Nigeria; black) and M. africanum West African-2 alone is found in Western-West Africa (the Gambia, Guinea-Bissau, and Senegal; speckled), but that the two clades overlap in Central-West Africa (Côte D'Ivoire, Ghana, Niger, and Sierra Leone; grey). given the heterogeneous phenotypic patterns among M. africanum strains, and the prolonged time-to-results and subjectivity inherent to the interpretation of some tests. Importantly, previous data indicate that there are no definitive phenotypic characteristics that can be exploited to differentiate the individual M. africanum clades [17,22,45]. In this study, we identified novel M. africanum clade-defining SNPs and confirmed the MTC distribution of several other phylogenetically relevant markers among the MTC. Multiple validated intra-species-specific molecular markers are important because they cross-corroborate each other and increase confidence in a given MTC species identification. By the markers described herein, M. africanum West African-1 would be defined genotypically as possessing RD713 and SNPs at aroA 285 , Rv1332 523 , and TbD1 197 , while M. africanum West African-2 would be defined genotypically by RD701 and RD702, as well as the intra-species-specific SNPs at hsp65 540 and nat 751 . Other SNPs and RDs that mark particular branches of the MTC phylogenetic tree, such as gyrB 1450 , Rv1510 1129 , RD9, and RD10 are also informative of M. africanum clade identity and provide further cross-referencing options. However, a streamlined protocol that employs 3-primer PCRs for RD9, RD10, and RD701 was the most rapid, simple, straight-forward and definitive means of differentiating the two clades of M. africanum from one another and from other MTC species. This approach limits the number of individual PCR reactions required for identification and eliminates the need for secondary procedures, such as restriction digestion, sequence analysis, or hybridization. Of note, some methods cannot distinguish the two clades of M. africanum, such as the GenoType MTBC line-probe assay [14][15][16]. Because PCR-RFA for SNPs specific to one of the M. africanum clades, as described herein, is a relatively simple approach, it may be of benefit for confirmation of species identification in laboratories with limited access to more advanced molecular methods. Other methods for M. africanum identification, such as by real-time PCR, microarray analysis, and spoligotyping (a DNA typing method) may also present advantages to laboratories with these capabilities, but these modalities were not evaluated in the current study.
Indeed, all strains of M. africanum are also known to lack spacers 9 and 39 in their spoligotype profile, similar to M. bovis, but possess one or more spacers that are consistently absent in certain other MTC species [7,25]. Previous data [17,23,37,46] suggest that many, but not all, M. africanum West African-1 strains demonstrate an absence of spacer 8 in addition to 9 and 39 (known as spoligotype signature AFRI_2) [43], while M. africanum West African-2 strains may further uniformly lack spacers 7-9 and 39 (known as spoligotype signature AFRI_1). As provided on the MIRU-VNTRplus website, all M. africanum West African-1 strains from the Ghana collection lacked spacers 8, 9, and 39, while each M. africanum West African-2 strain from the Ghana collection lacked spacers 7-9, and 39 [27]. Spoligotyping may therefore provide a preliminary indicator for each M. africanum clade [51,52], however, the validity of these associations remains to be conclusively determined using a sample set of isolates with diverse geographical origins.
In addition to identification, MTC species and sublineage specific markers are of importance for genealogical purposes, as they allow the construction of more accurate phylogenetic trees. In recent years, SNP typing has been used to group strains of M. tuberculosis [53,54], while LSP analyses and DNA sequencing approaches have been used to establish congruent phylogenies for the M. tuberculosis complex [25,51,55]. The species-and sublineage-specific polymorphisms examined in this study for the M. africanum clades may therefore be of benefit when characterizing the evolutionary history of MTC strain sets in the future. SNPs in rpoB, for instance, demarcate the sequential divergence of sublineages within M. africanum West African-2 [7]. Similarly, we previously highlighted that RD711 is deleted in most, but not all of the RD713-harboring M. africanum West African-1 strains that were evaluated [7], and so defines a major sublineage within this species. (Studies that would use deletion of RD711 as the single marker to define M. africanum West Aftican-1 strains may therefore risk mis-categorizing some isolates.) Nonetheless, all the M. africanum West African-1 strains in the Ghana strain collection had RD711 deleted and, as part of another study [24], could be further subdivided phylogenetically based upon differences in mycobacterial tandem repeats numbers. Although not evaluated in this study, Mostowy et al. [23] recently reported that RD742 was also variably distributed among M. africanum West African-2 strains and a set of phylogenetically informative SNPs for M. africanum, different from those screened herein, has been published [51]. Overall, the combined data illustrate the continued evolutionary diversification of the M. africanum clades and advance the process of organizing a set of variable markers that may be used to construct meaningful phylogenetic trees for M. africanum. To this end, RD715 and RD743 were identified within M. africanum West African-1 strains [23] and single nucleotide changes located within the RD1 locus of M. africanum West African-2 strains were recently noted in select strains [38], but the utility of these polymorphisms as phylogenetic markers remains to be determined. It should also be mentioned that at least one M. africanum-like strain has been described with RD9 deleted, but RD7, RD10, RD702, RD711, and RD713 intact [56]. Combined, these data indicate that there is greater M. africanum/MTC diversity yet to be characterized.
Our understanding of the nature of M. africanum as a species and its position within the MTC has evolved considerably in recent years. Based upon hard genome level sequence evidence, the name M. africanum subtype II is no longer applied [2,7,20,22,23], while strains denoted as M. africanum subtype I are now, ironically, recognized to constitute two relatively genetically distinct lineages emerging from separate nodes along the MTC evolutionary tree [1,7,25,26]. This opinion is reinforced by the data provided in the current report. Interestingly, the above mentioned unique M. africanum-like strain was isolated from a patient originating from the Democratic Republic of Congo, a central African country [56]. As it has been postulated that the MTC originated near the horn of Africa [57], this strain may therefore be a remnant M. africanum precursor that evolved from M. tuberculosis as humans migrated from Eastern to Western Africa [55]. Indeed, the M. africanum clades possess the phenotypic and genotypic characteristics of sequential intermediary genotypes in the evolution of M. bovis from M. tuberculosis [1,7,24,26]. In so being, there have been suggestions that an M. africanum transmission cycle may exist between humans and an unknown animal reservoir [23]. Reports of M. africanum isolation from a bovine source in Nigeria and from a goat in Guinea Bissau support this hypothesis [37,42]. Therefore, a study of animal MTC isolates employing genetic markers, such as those we have organized herein, should be made a priority effort to rule out M. africanum as an important source of zoonotic and/or anthropozoonotic TB in Western Africa.

Conclusions
With this study, we have organized a series of consistent phylogenetically-relevant markers for each of the distinct MTC lineages that share the M. africanum designation, highlighting those polymorphisms that can be used for specific clade identification. A review of molecular studies of M. africanum reveals a differential distribution of each M. africanum clade in Western Africa. Because M. africanum continues to be an important agent of disease, more M. africanum-focused studies are needed to increase our understanding of MTC pathobiology, epidemiology, and evolutionary history, all of which could lead to new strategies for TB prevention.