Skip to main content
Fig. 3 | BMC Infectious Diseases

Fig. 3

From: Machine learning reveals that Mycobacterium tuberculosis genotypes and anatomic disease site impacts drug resistance and disease transmission among patients with proven extra-pulmonary tuberculosis

Fig. 3

Clustering and chains of Mycobacterium tuberculosis transmission. The number of clusters and the sizes of each cluster are shown in Fig. 3a, while the proportion of patients from each the major genotype lineages (2, 3 and 4) in a chain of transmission are depicted in Fig. 3b (there were no isolates from lineage 1 enrolled in study). Variable importance scores and proportion of the variance explained by interactions between variables were obtained from stochastic gradient modeling of between 200 and 2000 classification and regression trees (CART) are shown in Fig. 3c, while the optimal and sample tree from those models is shown in Fig. 3d. Disease site was the most important variable at the apex with 100%, while DOTS/TB Facility was second with 92% relative to disease site. However, between variables interactions explained 21% of the variance for disease site and 19% for DOTS/TB Facility (Fig. 2c) which means that there are important nonlinear interactions accounting clustering variance. Figure 3d shows disease site and DOTS/TB Facility interactions significantly influence clustering, even though each individual variable was not statistically significant in Table 2 based on Fischer’s exact test. As shown in, isolates from disseminated diseases, lymph nodes, meninges, EPTB/PTB and skin were significantly less to be clustered; 32/43 (74%) versus 25/27 (93%), when compared to the rest of disease site. The receiver operating characteristics curve. (ROC) for this single node is 0.744 (95% confidence interval [CI] 0.590–0.991). The model is reproducible as demonstrated by the test ROC of 0.688 and error rate of < 3% on the training model

Back to article page