Machine-learning (ML) reveals that Mycobacterium tuberculosis genotypes and anatomic disease site impacts drug resistance and disease transmission among patients with proven extra-pulmonary tuberculosis

Background There is a general dearth of information on extrapulmonary tuberculosis (EPTB). We investigated Mycobacterium tuberculosis (Mtb) drug resistance and transmission patterns in EPTB patients treated in the Tshwane metropolitan, South Africa. Methods Consecutive Mtb culture-positive non-pulmonary samples from unique EPTB patients underwent further mycobacterial genotyping and were assigned to phylogenetic lineages and transmission clusters based on spoligotypes. MTBDR plus assay was used to test for isoniazid and rifampin susceptibility. ML algorithms were used to identify clinically meaningful patterns in data. We computed odds ratio (OR), attributable risk (AR) and corresponding 95% confidence intervals (CI). Results Of the 70 isolates examined, the largest cluster comprised of 25 (36%) Mtb strains that belonged to the East Asian lineage. East Asian lineage was significantly more likely to occur within chains of transmission when compared to the Euro-American and East-African Indian lineages: OR= 10.11 (95% CI: 1.56-116). Lymphadenitis, meningitis and skin TB, were significantly more likely to be associated with drug resistance: OR=12.69 (95% CI: 1.82-141.60) and AR = 0.25 (95% CI: 0.06-0.43) when compared with other EPTB sites, which suggests that poor rifampin penetration might be a contributing factor. Conclusions Majority of Mtb strains circulating in the Tshwane metropolis belongs to East Asian, Euro-American and East-African Indian lineages. Each of these are likely to be clustered, suggesting on-going EPTB transmission. Since 25% of the drug resistance was attributable to sanctuary EPTB sites notorious for poor rifampin penetration, we hypothesize that poor anti-tuberculosis drug dosing might have a role in the resistance.

test for isoniazid and rifampin susceptibility. ML algorithms were used to identify clinically meaningful patterns in data. We computed odds ratio (OR), attributable risk (AR) and corresponding 95% confidence intervals (CI). Results Of the 70 isolates examined, the largest cluster comprised of 25 (36%) Mtb strains that belonged to the East Asian lineage. East Asian lineage was significantly more likely to occur within chains of transmission when compared to the Euro-American and East-African Indian lineages: OR= 10.11 (95% CI: 1.56-116). Lymphadenitis, meningitis and skin TB, were significantly more likely to be associated with drug resistance: OR=12.69 (95% CI: 1.82-141.60) and AR = 0.25 (95% CI: 0.06-0.43) when compared with other EPTB sites, which suggests that poor rifampin penetration might be a contributing factor. Conclusions Majority of Mtb strains circulating in the Tshwane metropolis belongs to East Asian, Euro-American and East-African Indian lineages. Each of these are likely to be clustered, suggesting on-going EPTB transmission. Since 25% of the drug resistance was attributable to sanctuary EPTB sites notorious for poor rifampin penetration, we hypothesize that poor anti-tuberculosis drug dosing might have a role in the resistance.
Background South Africa has one of the highest TB/HIV incidence rate per capita, with 520 confirmed 3 pulmonary TB (PTB) cases per 100, 000 population reported in 2015 [1]. It is estimated that 15% to 20% of all TB notified cases might have disease restricted to extra-pulmonary sites (EPTB), such as meningeal, lymphatic, skin or pericardial space. However, the true proportion of proven TB at such anatomical sites is not well described [2,3]. In order to meet World Health Organization (WHO) tuberculosis (TB) elimination milestones and targets which are: 10 new TB cases per million people per year by the year 2035 and target goal of 1 case per million people per year by the year 2050, respectively, South Africa will need to undertake more vigorous TB surveillance and direct more resources towards EPTB efforts. However, there are still misinformed beliefs among some public health practitioners and TB programs that EPTB, including childhood TB, do not constitute public health threats because EPTB is less likely to transmit between persons. It is for these and other reasons that childhood TB was not reportable and therefore not formally captured in public records by many national TB programs until 2012 [4]. The net effect has been to regard all TB lesions; except for those from the bronchus, lung parenchyma and bronchopulmonary lymph nodes, in one obscure category called EPTB and devote even fewer resources to the disease [5][6][7]. As a result, the Mtb genotypes, drug resistance patterns and temporal trends associated with EPTB are not well described in South Africa, or in the Tshwane Municipality [3].
The city of Tshwane, which is in Gauteng province, is the financial hub and most densely populated municipality in South Africa. The Tshwane Metropolitan covers a region of 6368 km 2 and is supported by a sophisticated network of >25 directly observed treatment strategy (DOTS)/TB centres and tertiary level teaching facilities that serve a multi-ethnic and diverse population in excess of 3 million people, including migrants from across sub-Saharan Africa (Figure 1). This makes Tshwane an ideal place to obtain generalizable findings on transmission dynamics of different Mtb genotypes with/out drug resistance 4 within diverse populations. The predominant strains associated with the PTB epidemic in the Gauteng province of South Africa are the globally prevalent, modern and re-imported Lineages 4, particularly the so-called Latin-American (LAM) sublineages and the East Asian Lineage 2, which has the moniker "Beijing" strains [2,[8][9][10][11]. Both lineages affect geographically diverse human populations worldwide, have been associated with rapid human-to-human transmission and hence greater propensity for acquired drug resistance [9,12,13]. On the other hand, the same cannot be said about the more ancient East-African-Indian Lineage 3, such as EAI1_SOM families which have also been isolated with equal frequency in PTB patients from Gauteng province [8,10,14]. Here, we wanted to identify factors that predict drug resistance in EPTB patients in Tshwane. In order to inform policy, we specifically wanted to examine the interaction of drug resistance and TB transmission among EPTB patients in the Tshwane metropolis.
There are several molecular tools that promote the rapid identification of drug resistance patterns of Mycobacterium tuberculosis (Mtb) complex directly from clinical specimen; and, allow analyses of molecular clock within and against reference [2,15]. These genotyping technologies can assign the Mtb isolates into distinct clusters or groups based on their relatedness with respect to time (phylogeny), geography and other characteristics, in order to ascertain and compare diseases transmission between groups.
Mtb genotyping is relatively simple, readily available and now affordable. It includes the analyses of IS6100 DNA fingerprints, large sequence polymorphisms, spacer oligonucleotide typing (spoligotyping), Mycobacterial Interspersed Repetitive Units/Variable-Number Tandem-Repeats (MIRU/VNTR), single nucleotide polymorphism (SNP) and more recently, whole-genome sequencing (WGS). Of these, MIRU/VNTR and spoligotyping are the most readily accessible and widely used in developing countries because both have large global reference databases and computational tools that allow 5 assignment of isolates in the major Mtb complex lineages [12,13,[16][17][18]. When combined with spatio-temporal data, MIRU/VNTR and spoligotyping are considered the gold standard for identifying and tracking chains of Mtb transmission. More importantly, both have become indispensable for tuberculosis (TB) epidemiology studies of drug resistance within communities and across the world [5,7,14,16,19,20]. The optimal TB treatment duration for the different anatomic sites, including skin, meninges and pericardial spaces, is unknown [5]. Furthermore, there are very few data on the effect of the standard six-tonine-month combination therapy on acquired drug resistance and patients' outcomes for these different anatomic sites constituting EPTB. Here we use spoligotyping to identify TB Random forests, when coupled with stochastic gradient boosting allows identification of weak base learners and detection of nonlinear dependencies in data, including in pharmacometrics [21][22][23][24]. We hypothesized that the majority of EPTB cases will be clustered, with significant proportions bearing drug resistance, which would indicate high rates of transmission of drug resistance in Tshwane. Alternatively, high drug resistance without relationships to clustering would support the 'pharmacokinetic variability driven de novo acquired resistance' hypotheses [5,7,25,26], where inadequate drug exposures at sites of TB infection lead to selection for drug resistant and drug tolerant strains which eventually leads to therapy failure [20,21]. Here we used a combination of ML to examine patterns that were predictive of drug resistance. We specifically examined whether the 6 different EPTB disease sites, the geographic areas of where the patients came from and time when the patients had disease were also predictive of TB transmission in Tshwane.

Study design
Eligible consecutive clinical samples from EPTB anatomical sites were submitted for diagnosis confirmation by respective DOTS/TB facilities to the National Health Laboratory demographics and clinical data were collected from the specimen request forms. Since the analyzed isolates were de-identified and constituted of routinely collected data, the study was not considered human subject research (UP ethical inquiry 143/2015).

Definition of terms
The report of verified case of tuberculosis (RVCT) nomenclature and approach used by the US Centers for Disease Control and Prevention (CDC) was used to assign collected specimen isolates to EPTB anatomic sites for comparison purposes (https://www.cdc.gov/tb/programs/rvct/rvct-form.pdf). According to RVCT, EPTB comprises of pleural, peritoneal, lymphatic, meningeal, genitourinary, laryngeal and unclassified group called 'other'. We modified the RVCT and replaced laryngeal with skin and added a category 'disseminated', given the high incidence of those sites in our cohort. Disseminated TB denotes isolates from blood or bone marrow specimens, while patients with a positive isolate from sputum and an EPTB site were grouped separately and denoted 'PTB/EPTB'. If isolates were obtained from more than one EPTB sites, only one dominant site was recorded. Number needed to screen (NNS) was defined as the number of people that needed to be screened to prevent one TB transmission event or one drug resistance occurrence, based on the assumption that all drug resistance events in the study were acquired during therapy. Isolates that were resistant only to either rifampicin or isoniazid were categorized as mono-resistant, while those resistant to both were categorized as multi drug resistant (MDR) TB.
Spoligotyping examines 43 unique spacer sequences that are interspaced between repetitive sequences in a specific region of difference within the M. tb genome. The presence or absence of each of the 43 variable spacers generates strain-specific fingerprints. In this study, isolates were considered to be clustered if there was an exact match in all 43 spacers. Cluster name and isolate lineage assignments were made by comparison of fingerprints to international databases: https://www.miruvntrplus.org/MIRU/index.faces, http://www.pasteur-guadeloupe.fr:8081/SITVIT2/index.jsp [18]. The isolates with unmatched genetic profiles were considered nonclustered or orphan strains. The clustering rate was calculated using the following formula: (nc − c)/n, where nc is the total number of clustered isolates, c is the number of isolate clusters, and n is the total number of isolates. Recent transmission of TB infection was presumed to have occurred when a case had an identical spoligotyping pattern to another case in the cohort during six-month study period.

Laboratory testing methods
Isolates identified as M.tb were recovered by sub culturing 0.1 ml of the MGIT culture and on Lowenstein Jansen (LJ) slants to rule out contamination. The slants were placed on their sides, and left at room temperature for 24 hours and thereafter incubated at 37ºC for 6 weeks. For genomic DNA extraction from M.tb, colonies were swept off from LJ slants and centrifuged at 10,000 g for 15 minutes. The supernatant was discarded and the pellet was re-suspended in 100 ml of sterile distilled water. The specimen was then heat-killed at 95ºC for 20 minutes in a water bath. This was followed by sonication for 15 minutes and centrifugation at 13,000 g for 8 minutes. The DNA supernatant was used for spoligotyping and the genotype MTBDRplus line-probe assay while the remainder was

Classification and regression tree (CART) modeling with the stochastic gradient boosting
Stochastic gradient boosting was used to identify weak base learners, determine meaningful pairwise interactions and the percent of variance attributable to those interactions, variable important scores for those variables and identify thresholds for decision-making [29][30][31][32][33]. Important variables define the most influential predictors, including both linear and nonlinear rules that appear in the predictive model [33]. We used the methods of the late Leo Breiman [34] a pioneer in machine learning (ML) algorithms and artificial intelligence, and Jerome Friedman [30]. Multivariate adaptive regression spline (MARS) models for binary targets in classification problems implemented in TreeNet version 8.3 software were also used for graphic visualization. Details of the modeling approaches that use these ML algorithms and tools for pharmacokinetics and pharmacodynamics (PK/PD) analyses, pharmacometrics and for general decision-making purposes in the clinic has been published before and reviewed within [21-23, 25, 35,36].
Optimal CART for the drug resistance and clustering outcomes were also graphically depicted for illustrative purposes. The variable importance scores from Random forest were used to rank and identify variables most predictive of acquired drug resistance or clustering. CART and MARS in TreeNet were used to identify thresholds for continuous variables applied to clinical decision-making, as we have done in the past [21]. Similarly, both algorithms were used to group categorical variable that were considered similar, based on GINI criteria. Five-fold cross validation was used with all models which included all patients' clinical characteristics shown in Table 1 and isolates' spoligotypes. The area under the receiver characteristic, misclassification rates and the F1-statistics were used for model comparison. Parsimony was also used in identify the final models.

Statistical Analysis
Output from the gradient boosting ML were used to calculate attributable risk (AR), NNS and in multivariate logistic regression models. Newcombe/Wilson scores with continuity correction were used in computing AR 95% confidence intervals (CI) [37], otherwise exact binomial methods of Klopper-Pearson were employed. The STATA (College Station, Texas) and GraphPad software (San Diego, California) were used for statistical analysis. Fisher's exact test was used to compare proportions, while the Kruskal-Wallis test compared median values between groups. All tests were two-sided and set at an alpha of 5%.

Results
Of the 75 unique and consecutive isolates submitted by only 8 out of the 25 DOTS/TB facilities in Tshwane, we excluded from further analysis 5 (7%) isolates because they grew Mycobacterium bovis. All excluded isolates were from children <16 years. Of the remaining 70 (93%) isolates, females contributed 28 (40%). The overall proven EPTB incidence was 4.43 (95% CI: 3.72-5.23) per 100,000 population per year in Tshwane ( Figure 2A). Even though the ages varied widely from 1 year to 85 years, only 5 (7.14%) samples were from children <16 years ( Table 1). Detailed demographic, clinical and genotyping data in Table 1 show that women were significantly overrepresented among patients with the East-Africa-Indian genotypes or lineage 3. Figure 2B shows that the most frequently encountered proven EPTB disease sites were pleural and lymphatic each accounting for 29% (95% CI: 18-41%), and skin which accounted for 14% (95% CI: 7-25%), while peritoneal and meningeal each accounted for only 7% (95% CI: 2-16%). However, associations between major M. tb lineages, EPTB disease sites or DOTS/TB facilities were not statistically significant.

Mycobacterium tuberculosis spoligopatterns clustering and inferred transmission
Majority of Mtb isolates, 57/70 (81%), occurred in clusters that varied in size from 2 to 25 isolates. The largest cluster comprised of 25 (36%) isolates that all belonged to the Beijing clade, which is an East Asian lineage also called lineage 2 ( Figure 3A). Mtb isolates from the three major lineages 2, 3 and 4 were in chain of transmission for 98%, 67% and 70% of the isolates, respectively ( Figure 3B). Thus, the Beijing strains were significantly more likely to occur within a chain of EPTB transmission when compared to the Euro-American strain: odds ratio (OR)= 10.11 (95% confidence interval [CI] 1.56-116).
On the other hand, 12/13 (92%) of unclustered isolates were unnamed orphans in the international spoligotyping databases, while the other lone isolate belonged to the X2 clade, which is of the Euro-American lineage. Table 2 shows that there was no significant association between clustering and variety of demographic and clinical factors, including notably drug resistance and DOTS/TB facilities, in bivariate analyses based on standard statistical tests.
Next, we applied stochastic gradient ML algorithms to identify the important variables that predicted clustering and to determine if there were nonlinear associations that could explain genotypic clustering patterns (Figure 3C-D). The results shown in Figure 3C revealed that specific EPTB disease site and DOTS/TB facilities as well as any drug resistance were ranked important variables and that nonlinear interaction between these accounted for almost 45% of clustering variance. The usual factors described in TB epidemiology such as age were either less prominent or scored zero (in the case of patients' gender). The pooled isolates from disseminated TB, lymphadenitis, meningitis, EPTB/PTB and skin TB disease site were less likely to be clustered compared to those not from the same TB disease sites: OR=0.23, 95% CI: 0.10-0.99 and the attributable risk (AR) was 0.18 (95% CI: 0.01-0.40). The OR and AR for clustering improved to 0 (95% CI: 0-0.45) and 0.26 (95% CI: 0.10-0.47), respectively, if DOTS/TB facilities were also used in the combination screening ( Figure 3D). If these two factors are used as screening tools the overall NNS to prevent transmission of one TB case would be 3.91 (95% CI: 2.14-10.32. The sensitivity for the CART shown in Figure 3D was 0.56 (95% 0.43-0.68), while the positive predictive value was 0.74 (95% CI 0.60-0.85). However, both the specificity and negative predictive values were poor. Nonetheless, when combined these data show that disease site and DOTS/TB facilities; i.e., geographic information systems, can be used in combination with isolates genotypes to identify situations where TB transmission is taking place, even for paucibacillary diseases such as EPTB.

Predictors of EPTB drug resistance
Majority, 59/70 (84%), of isolates were susceptible to both rifampin and isoniazid, while 2 (3%) isolates were MDR-TB. However, rifampin resistance was observed in disproportionately large proportions of isolates, 8/70 (11%), which is rather unusual, since isoniazid mono-resistance was observed in only 3/70 (4%) isolates ( Figure 2C). Table 3 shows that there was no association between drug resistance and most demographic and clinical factors examined, including clustering (p=0.419) or Mtb spoligotypes (p=0.737) for any resistance, based on straightforward frequentist tests. The exception was between rifampin resistance and disease site: p=0.036.

ML analyses revealed the differential impact of the interactions between disease site and
Mtb genotypes on any drug resistance and especially MDR-TB/rifampin monoresistance ( Figure 4A/B). Firstly, disease sites characterized by sanctuary states, i.e., lymphadenitis, meningitis and skin TB, were significantly more likely to associate with any drug resistance: OR=12.69 (95% CI: 1.82-141.60) and AR=0.25 (95% CI: 0.06-0.43), when compared to EPTB in other sites. Secondly, with regards to MDR-TB/Rifampin monoresistance the top predictor was lymphadenitis and skin TB disease sites, which means that meningitis was excluded. This is not surprising since rifampin does not penetrate the blood-brain barrier well and the current doses given are so low that virtually none gets into cerebrospinal spaces. The sensitivity and specificity of using disease site as proxy to identify isolates likely to MDR-TB/rifampin resistant are 1.00 (95% CI: 0.68-1.00) and 0.64 (95% CI: 0.52-0.75), respectively. When information about likely Mtb genotypes is added as shown in Figure 4D, the specificity improves to 0.84 (95% CI: 0.71-0.92). These data suggest that for every four patients (95% CI: 2.11-10.64) with TB lymphadenitis or skin we would expect one or more to have MDR-TB/Rifampin resistance 13 when compared to those with TB in other sites. This means that screening with drug susceptibility tests and changing the treatment regimens would prevent therapy failure and further transmission of drug resistant TB.

Discussion
The study focused on characterizing clinical Mtb isolates in real-world settings and hence has limitations related to such observational studies, including inadequate sample size, imprecise and some missing information (in this case HIV infection). First, we only used spoligotypes to assign clusters, which limits and biases the clustering resolution and might potentially over-estimate clustering rates. The second limitation relates to a small sample size and misclassification of EPTB diseases sites, which has notoriously confounded comparison of EPTB incidences between studies [38]. Previous studies have identified disease site-specific risk factors, including those with certain Mtb genotypes, drug resistance and meningeal TB which we could not reproduce in our study, since only 5 (7%) meningeal TB and 1 (1%) pericardial TB isolates were enrolled [39][40][41][42][43]. Nonetheless, we used validated RVCT methods for ease of temporal and between studies comparisons.
Third, the Mtb isolates were not serialized, and information on drug therapy received, TB drug doses and timing of the isolates collected to TB therapy, was not readily available.
This made inference and distinction of primary transmitted resistance from acquired resistance in our study difficult [44]. Incomplete medical history on the laboratory request form also made it difficult to determine which isolates were from patients immunosuppressed from HIV or concomitant immunosuppressive agents for rheumatological diseases. Nonetheless, with ML modeling, which is best suited for modeling missing data and highly complex data structure, we were able to demonstrate that routinely collected laboratory and clinical data can be used to screen patients and identify risk groups where acquired drug resistance is most likely to occur. Gradient 14 boosting allows identification of weak predictors, nonlinear relationships and thresholds in the data space, [32] which is like the proverbial "finding a needle in a haystack", but in this case one uses giant magnets to find that needle. Sensitivities and specificities >84% are reasonable and actually acceptable, given that the information required for initial screening (i.e., identifying disease site as lymphadenitis, skin TB or meningitis) can be ascertained by history and clinical examination. Moreover, ROC values ~70% with cross validation somehow reassures us that results such as these are likely to be reproducible with similar populations. Unlike most EPTB studies performed at large specialized hospitals [45,46] our study has minimal referral bias, hence the other strength of this study is that the isolates were from primary DOTS/TB facilities and not from patients treated at tertiary specialized facilities.
There are three notable findings from our study with important public health policy and TB control efforts that target reduction of both disease transmission and drug resistance. The findings are certainly applicable in the Tshwane metropolitan and have potential relevance across similar urban populations in South Africa and across the sub-Sahara African metropoli. The key finding is the hierarchical and nonlinear association between key EPTB disease sites (mainly lymphatic, skin and meningeal) and spoligotypes (mainly to impact both disease transmission and drug resistance. The association between Beijing strains and both TB disease transmission and drug resistance has been well described in South Africa and across the world and the results have been mixed [9-11, 15, 47-49]. Our study demonstrates that those relationships are conditional, complex and characterized by several nonlinear interactions (Figures 3 and 4). For example, two-way interactions between EPTB disease site and another variable explained >20% of the variance in clustering and almost 10% of drug resistance. This means that unless those nonlinear relationships are fully examined, the purported factors driving either transmission or drug 15 resistance will be highly biased or wrong. In fact, for both clustering and drug resistance, the impact of Mtb genotypes is of second-order, which means that bacterial genotypes only acted on some EPTB disease sites and not others. The differential impact of EPTB disease site on any drug resistance (shown in Figure 4C) and on MDR-TB/Rifampin monoresistance (shown in Figure 4D), is actually revealing and consistent with standard PK/PD principles underlying drug resistance emergence [25, [50][51][52]. PK variability between individuals means that some patients will have faster drug clearances than others when given the same drug dose. Therefore, inadequate drug exposures at site of infection, which occurs because of PK variability or suboptimal drug doses or poor drug penetration into protected sites such as meningeal or pericardial spaces, leads to selection of drug resistant or drug tolerant isolates. The selected mutants eventually acquire putative mutations in time. In other words, ADR occurs de novo during therapy primarily because of inadequate dosing or with unoptimized therapy regimens. The WHO recommends the same standardized and uniform therapy regimens and doses used for PTB for EPTB, with the caveat of experts' opinions that varying longer therapy durations be given for meningeal and bone/joints disease sites [53]. Indeed, these same guidelines are used in Tshwane and as shown in Figures 4C-D is associated with drug resistance in certain EPTB sites such as lymphatic, skin and meningeal site. In this study, the AR for both any resistance and MDR-TB/monoresistance were substantial; 0.25 and 0.64, respectively. The corollary suggestion from this specific finding is that the majority resistance observed in our study are more likely acquired during therapy rather than being 'pre-existing' or primary. The NNS for targeted screening among EPTB patients based on disease sites for any resistance was 4 and for MDR-TB/monoresistance 2, which is even more efficient and effective than widely recommended population screenings for active TB in congregate settings or among select risk groups, such as diabetes mellitus or HIV [54]. For example, the NNS HIV infected patients to find one active TB case in low TB incidences places is 25 (ranges 11-144) and high TB incidence places is 10 (ranges 5-22), while that for prisoners was 520 (ranges 69-427) and 43 (ranges 21-123), respectively.
Secondly, even though the proportion of EPTB disease sites were similar to previous observations, the overall incidence of proven EPTB of 4.43 per 100,000 populations was lower than expected. There were 8,034 microbiologically confirmed PTB cases in Tshwane in 2015, an estimated incidence rate of 254 (95% CI: 249-260) cases per 100,000 population [1]. Confirmed PTB status was based on positive GeneXpert MTB/RIF assay, cultures, line probe assays and microscopy smears, which probably overestimated confirmed PTB cases by accepting nontuberculous cases based microscopy smears.
Hoogendoorn et al reviewed charts of patients treated and notified for clinical EPTB in the predominantly rural Limpopo province in ten months of 2013 [3]. Of the 336 patients diagnosed, only 57% had good evidence for stated diagnoses. Nonetheless, the overall estimated incidence of clinical EPTB in that study was 27.92 (95% CI: 24.80-31.23) and that for clinical meningeal TB was 2.56 (1.70-3.70) per 100,000 populations per year.
Meningeal TB comprised 9.82% (95% CI: 6.86-13.52) of EPTB in Limpopo and 9.04% (95% CI: 6.94-11.54) in Soweto, per year [3,46]. Our estimates of EPTB incidence in Tshwane is six folds lower than those reported from Limpopo; however, proven meningeal TB comprised 7.14% (2.36-15.89) of cases in Tshwane, suggesting that the meningeal TB proportions were similar between these disparate South African studies. In the US, EPTB as a proportion of total TB cases has been steadily increasing as TB elimination efforts are accelerated and the WHO TB elimination targets getting realized. From 7.6% in 1962 at the peak of the epidemic when TB incidence was 28.6 per 100, 000 population, EPTB increased to 15.7% in 1993 with the HIV resurgence and was 30.9% % in 2017 when the reported TB was 2.8 per 100, 000 population [55]. Contrary to explanations given by Hoogendoorn and others, we actually hypothesize that with widespread use of methods to identify proven EPTB, the incidence will increase consistent with observations in the US, where majority of EPTB reported are proven TB. We argue that several cases currently notoriously low (about 15% in high TB burden areas) and inconsistent when compared against clinically suspected TB cases. Investments in improved diagnostics to confirm proven EPTB or ML algorithms that are trained on large clinical data to predict EPTB, will not only save lives by reducing unnecessary TB treatments, but will also be cost-effective because of reduction TB diseases transmission and ADR. Both interventions will accelerate meeting TB elimination targets. Moreover, several other epidemiological data suggest that certain M.tb genotypes, such as the W-Beijing genotypes, are more transmissible than others [20][21][22] proportion of drug resistance, especially rifampin resistance, attributable to certain sanctuary EPTB sites, including lymph nodes, meninges and skin, was significantly higher, 25% (95% CI: 0.06-0.), when compared with other EPTB sites. This observation suggests that low rifampin exposures, due to poor penetration into those sites or inadequate rifampin doses, significantly contribute to acquired drug resistance, which is also consistent with PK/PD principles of PK variability. Moreover, the significant nonlinear relationship between EPTB sites, Mtb genotypes and drug resistance (particularly MDR-TB/Rifampin monoresistance) observed, is consistent with prior clinical observations. Together, these data suggest that inadequately treated EPTB is contributing to drug resistance and overall poorer outcomes.  Population estimates of extrapulmonary tuberculosis (EPTB) and proportion with drug resistance in Tshwane by age group and sex in 2015: Figure 2A shows that the estimated EPTB incidence stratified by age and gender. As shown the estimates in females was 3.54 (95% CI: 2.68-4.60), while that in males was 5.31 (95% CI: 4.24-6.58) per 100, 000 populations for the year 2015. Figure 2B shows the proportion of total isolates (N=70) by anatomic EPTB and within each category the percent of isolates with resistance to either rifampin or isoniazid or both. As shown, none of the isolates from peritoneal specimen, disseminated (i.e., blood or bone marrow) and other specimen samples were drug resistant. Figure   2C stratifies drug resistance by age, and as shown one out of the 5 isolates from children were drug resistant, and that same isolate was also rifampin resistance.  Figure 3C, while the optimal and sample tree from those models is shown in Figure 3D. Disease site was the most important variable at the apex with 100%, while DOTS/TB Facility was second with 92% relative to disease 32 site. However, between variables interactions explained 21% of the variance for disease site and 19% for DOTS/TB Facility ( Figure 2C) which means that there are important nonlinear interactions accounting clustering variance. Figure 3D shows disease site and DOTS/TB Facility interactions significantly influence clustering, even though each individual variable was not statistically significant in Table 2 based on Fischer's exact test. As shown in, isolates from disseminated diseases,  Predictors of drug resistance in Mycobacterium tuberculosis isolates from extrapulmonary sites: The variable importance scores and proportion of the variance explained by interactions between the variables that were obtained from stochastic gradient modeling for any drug resistance are shown in Figure 4A, while those for MDR-TB/Rifampin monoresistance are shown in Figure 4B.
Multivariate adaptive regression trees (MARS) for binary outcomes with two-way interactions detection were made in the TreeNet software. The optimal representative classification and regression trees (CART) are shown in Figure 4C 34 for any resistance and in Figure 4D for MDR-TB/Rifampin monoresistance. The primary node (disease site) for any drug in Figure 4C is almost identical to that for MDR-TB/Rifampin monoresistance in Figure 4D, the difference being addition of meninges to the former group.  Table 3.pdf