Risk factors associated with cluster size of Mycobacterium tuberculosis (Mtb) of different RFLP lineages in Brazil

Background Tuberculosis (TB) transmission is influenced by patient-related risk, environment and bacteriological factors. We determined the risk factors associated with cluster size of IS6110 RFLP based genotypes of Mycobacterium tuberculosis (Mtb) isolates from Vitoria, Espirito Santo, Brazil. Methods Cross-sectional study of new TB cases identified in the metropolitan area of Vitoria, Brazil between 2000 and 2010. Mtb isolates were genotyped by the IS6110 RFLP, spoligotyping and RDRio. The isolates were classified according to genotype cluster sizes by three genotyping methods and associated patient epidemiologic characteristics. Regression Model was performed to identify factors associated with cluster size. Results Among 959 Mtb isolates, 461 (48%) cases had an isolate that belonged to an RFLP cluster, and six clusters with ten or more isolates were identified. Of the isolates spoligotyped, 448 (52%) were classified as LAM and 412 (48%) as non-LAM. Our regression model found that 6–9 isolates/RFLP cluster were more likely belong to the LAM family, having the RDRio genotype and to be smear-positive (adjusted OR = 1.17, 95% CI 1.08–1.26; adjusted OR = 1.25, 95% CI 1.14–1.37; crude OR = 2.68, 95% IC 1.13–6.34; respectively) and living in a Serra city neighborhood decrease the risk of being in the 6–9 isolates/RFLP cluster (adjusted OR = 0.29, 95% CI, 0.10–0.84), than in the others groups. Individuals aged 21 to 30, 31 to 40 and > 50 years were less likely of belonging the 2–5 isolates/RFLP cluster than unique patterns compared to individuals < 20 years of age (adjusted OR = 0.49, 95% CI 0.28–0.85, OR = 0.43 95% CI 0.24–0.77and OR = 0. 49, 95% CI 0.26–0.91), respectively. The extrapulmonary disease was less likely to occur in those infected with strains in the 2–5 isolates/cluster group (adjustment OR = 0.45, 95% CI 0.24–0.85) than unique patterns. Conclusions We found that a large proportion of new TB infections in Vitoria is caused by prevalent Mtb genotypes belonging to the LAM family and RDRio genotypes. Such information demonstrates that some genotypes are more likely to cause recent transmission. Targeting interventions such as screening in specific areas and social risk groups, should be a priority for reducing transmission.


Background
Tuberculosis (TB) continues to be a challenge to control. Although widespread and common efforts have had an impact in achieving declining numbers in global incidence for the first time in history, TB still causes 10.4 million new cases and 1.4 million deaths per year in worldwide [1].
Brazil ranks sixteenth among the world's 22 countries with high TB burdens; here, in 2016, the TB incidence was 63,189 cases, and the incidence rate was 30.9 per 100,000 per year, with mortality rate of 2.2 per 100,000 according to World Health Organization estimates [1]. The state of Espírito Santo has one of the lowest incidence of tuberculosis (28.6 / 100,000 / year) in Brazil [2]. Vitoria is a large urban setting and capital of Espírito Santo state, and reports over 279 cases of TB each year, which is among the highest incidence in the country (40.2/100,000/year) [3].
Understanding how TB transmission occurs is a key component to strategically manage TB from a public health perspective. Mycobacterium tuberculosis complex (MTBC) genotyping methods have been widely used in in molecular epidemiological studies [4]. These methods help to detect its spread, understand the dynamics of the disease, and develop tuberculosis (TB) control strategies to minimize TB expansion locally and globally.
Studies have suggested that identical IS6110 RFLP patterns of M. tuberculosis (Mtb) isolates from epidemiologically linked patients reflect TB resulting from recent transmission [5,6]. Subsequent cases in transmission chains result in "clusters" of patients who share Mtb strains of the same genotype [7]. If a large proportion of new TB cases in a given community are due to recent transmissions, this is a reflection of an inadequate TB control program.
Many studies have investigated risk factors for clustering, suggesting that patient-related risk factors are important for TB transmission [5,6,8]. There is substantial evidence, however, that bacterial factors also contribute to variability in cluster size and the extent of transmission of TB in a community [8]. Indeed molecular epidemiologic studies have suggested that some strains are more successfully transmitted than others [9][10][11].
We performed genotyping of a large collection of Mtb strains that had been collected over a 11-year period in the metropolitan area of Vitoria, Espirito Santo state, Brazil and evaluated the relation between genotypes and clustering and strain and epidemiologic, clinical, and demographic characteristics. Our hypothesis was that this approach might help better understand risk factors for recent TB transmission in this particular setting.

Study population
This cross-sectional study examined all TB patients newly diagnosed in the metropolitan area of Vitoria, Brazil between 2000 and 2010. The Metropolitan area comprises four municipalities (Vitória, Cariacica, Serra and Vila Velha) with about 1,200,000 inhabitants. The study sample included isolates from all patients with positive culture results. The isolates were classified according to cluster size of Mtb strains and their associations with molecular and epidemiologic features were assessed.

IS6110 restriction fragment length polymorphism (RFLP) analysis
Sputum cultures for TB diagnosis are done routinely by the reference Mycobacteriology Laboratory at Núcleo de Doenças Infecciosas at the Federal University of Espirito Santo (NDI -UFES). We analyzed all available stored Mtb isolates that were consecutively obtained at reference laboratory for genotype analysis.
We used the standard IS6110 RFLP protocol [12] to genotype the isolates. Briefly, the genomic mycobacterial DNA was extracted, digested, and separated by gel electrophoresis. The DNA fragments resolved in agarose gel were transferred to a Hybond N-Plus membrane (GE Healthcare Life Sciences) and were hybridized with a probe made from a PCR product of the 3′ part of the PvuII fragment of IS6110. The IS6110 containing fragments on the membrane were detected by chemiluminescence (ECL direct™ nucleic acid labeling and detection system, GE Healthcare Limited, UK) and exposure to an X-ray film (A Hyperfilm™ ECL, GE Healthcare Limited, UK). The Mtb 14,323 strain was used as a reference strain for comparison of the RFLP patterns.
The IS6110 RFLP band patterns were analyzed by the BioNumerics software version 6.5 (Applied Maths -Belgium). A dendrogram was constructed to show the degree of similarity among the isolates by unweighted pair group method of arithmetic average (UPGMA) and the Dice index (1.0% tolerance, 1.5% optimization).
Two or more isolates with identical RFLP patterns (fingerprint) were defined as belonging to a cluster while strains with RFLP patterns of at least 70% similarity were considered members of the same "family". As described in other studies, isolates belonging to a cluster were considered to result from recent infections while isolates whose RFLP patterns were distinctly different from any other pattern identified among the isolates studied were considered unique or non-cluster patterns and were considered to represent reactivation from an old infection. Clusters composed patterns with less than six bands were tested by spoligotyping as this increases cluster reliability [13][14][15][16]. We named the clusters in our study with an abbreviation of ES, for Espírito Santo State.
Spoligotyping Isolates were also submitted to spoligotyping by a commercial kit (Ocimum Biosolutions Inc., India) according to a standard protocol [17,18], allowing the classification of strains into spoligotype-based families, based on the presence or absence of spacer regions. Results were recorded in a 43-digit binary format and compared with an updated SpolDB4 [18] database -SIT-VITWEB [19] of the Pasteur Institute of Guadeloupe (available at http//:www.pasteur-guadeloupe.fr:8081/SIT VITDemo/) that provides information on the Mtb spoligotypes worldwide. The orphan patterns were entered into SPOTCLUST [20] in order to define the probability of a strain to belong to a certain family.
Long sequence polymorphism (LSP) A multiplex PCR adapted from Gibson et al. [21] was performed to identify isolates of the RD Rio genotype. The differentiation of RD Rio from non-RD Rio was determined according to the PCR product band size; the presence of a band of 1175 bp indicated RD Rio while a band of 530-bp identified non-RD Rio strains.

Epidemiological, clinical and molecular characteristics
We obtained general epidemiologic characteristics including gender, age, race, schooling (years), and previous history of TB, from the Brazilian national surveillance system (SINAN) and also from laboratory records maintained at the NDI-UFES. SINAN is the Brazilian Information System for notifiable diseases and its data are publicly accessible via the website of the Data Processing Department of Brazilian Ministry of Health (DATASUS) [22].

Statistical analysis
In order to identify risk factors for clustering, we performed univariate analysis using t-test for continuous variables and chi-square or the Fisher's exact test for categorical variables. Factors that were significantly associated with clustering were analyzed by a multiple logistic regression test by a stepwise approach to identify factors that were independent predictors of clustering.
We analyzed the isolates' distribution according to cluster size and observed that the sample was not normally distributed. Thus, we defined the "cluster size" into four categories (clusters with 2-5, 6-9 or ≥10 isolates/cluster and unique patterns). Descriptive analysis of molecular and epidemiologic data was performed, according to cluster size classification. Based on a theoretical model for the study of determining TB [23], we performed crude analyses and we built one hierarchical polytomous regression model to identify factors associated with cluster size. Despite the categories of cluster size to imply an order, they did not meet the assumptions of an ordered logistic regression. Thus, we chose polytomous regression, which allows us to model simultaneously these multiple categories without the order assumption.
In the model we included all isolates analyzed. Unique pattern was defined as the reference group and was compared with the three cluster size categories.
The hierarchical levels for both models were defined as follows: level 1: the molecular variables (spoligotype and RD Rio Genotype); level 2: the variables of level 1 and demographic variables (municipality of residence); level 3: the variables of level 2 and socio-demographic variables (age, gender, skin color and schooling); and level 4: the variables of level 3 and clinical variables (X-ray suspicious for TB, result of initial sputum smear, and TB clinical form). Therefore, the total effect of each variable is adjusted for the variables at the same level and the levels above. Descriptive data were shown as absolute and relative frequencies or mean value and standard deviation. Results from association analysis were presented as odds ratios (OR) with confidence intervals of 95% (95% CI). All analyses were conducted with the Stata® statistical package, version 13.0 (StataCorp LP, College Station, TX, USA).

Results
Between January 2000 and December 2010, 5470 TB patients were diagnosed in the metropolitan area of Vitoria. Among these, 1320 (24%) had culture performed, and we obtained good quality RFLP patterns from 959 (72.6%) of them.
All clusters were grouped into 30 RFLP families and six of these comprised 24.1% of the clustered isolates, while 108 (11.2%) belonged to the six largest clusters (≥ 10 isolates).
The cluster with the highest number of isolates was ES14 containing 34 followed by ES1b, ES8, ES14o, ES19h and ES25, which included 20, 16, 15, 13 and 10 isolates, respectively (Fig. 1). The ES14 genotype has an eight band pattern and is a member of the largest family (n = 86), sharing this pattern with one to three additional bands. In addition, the ES14 cluster has been present throughout the 11-year period and in 2003, 68% of all TB cases belonging to the largest clusters were caused by three clonal groups ES14, ES19h and ES25. However, in 2007, the clusters ES14o, ES1b and ES8 contributed to 88% of all TB cases belonging to larger clusters.
We found that all isolates of the ES14 cluster were of the RD Rio genotype while the other large clusters ES1b (20 isolates) and ES8 (16 isolates) were exclusively non-RD Rio (WT).
The hierarchical polytomous regression model (Table 3) showed that at the first level, those patients in the 6-9 and with ≥10 isolates/cluster group were more likely to belong to the LAM lineage (adjusted OR = 1.17, 95% CI 1.08-1.26; adjusted OR = 1.25, 95% CI 1.14-1.37, respectively), using unique patterns as the reference.
On the other hand, subjects in the 2-5 isolates/cluster group were less likely to belong to the RD Rio genotype (adjusted OR = 0.91, 95% CI 0.80-1.03). At the second level, living in a specific neighborhood (Serra city) in the metropolitan area of Vitória decrease the risk of being in the 6-9 isolates/cluster group (adjusted OR = 0.29, 95% CI, 0.10-0.84), using unique patterns as the reference group. In addition, at the third level individuals aged 21 to 30 years, 31 to 40 years and > 50 years were less likely of belonging the 2-5 isolates/cluster group than unique patterns compared to individuals < 20 years of age (adjusted OR = 0.49, 95% CI 0.28-0.85, OR = 0.43 95% CI 0.24-0.77 and OR = 0.49, 95% CI 0.26-0.91) respectively.
Suspicion of TB on chest x-ray was less likely observed for those infected with strains in the 2-5 isolates/cluster group (adjusted OR = 0.35, 95% CI 0.15-0.79) than unique patterns. Interestingly, the extrapulmonary disease was less likely to occur in those infected with strains in the 2-5 isolates/cluster group (adjustment OR = 0.45, 95% CI 0.24-0.85) than unique patterns. Furthermore, TB patients whose isolates were included in the 6-9 isolates/cluster group (crude OR = 2.68, 95% IC 1.13-6.34) were more likely to be smear-positive compared to unique patterns as the reference. The difference was not statistically significant, however, after regression analysis.

Discussion
Since two decades ago, many studies on transmission of TB have been complemented by genotyping techniques. The IS6110 RFLP test has been used to distinguish patients with TB due to recent transmission from reactivation disease [24]. In the present study, we evaluated transmission dynamics of TB in Vitoria, Brazil, during a 11-year period, comparing demographic, clinical and epidemiologic characteristics with Mtb genotypes and genotype clustering. We observed that a large proportion of recently transmitted TB was due to a limited set of Mtb genotypes and that certain cluster sizes were associated with patient demographic, clinical, or epidemiological characteristics.
A limitation of our study is that our patient data were derived from the SINAN secondary database with limited data. Data such as smoking, HIV status, drug abuse and drug susceptibilities testing, at the time of the study,   are not regularly reported by SINAN. On the other hand, the same database was the basis for studies on disease surveillance as described in earlier studies [11,22,25,26]. The strength of the study is the large sample size, offering a statistical power that is higher than in most other studies. In addition, the long study period increases the chance of finding epidemiological links through genotyping of the Mtb strains, while that a small sample size and a poorly defined area can underestimate clustering proportions [27]. We provide evidence that six Mtb (cluster) strains have consistently contributed to the high burden of recent-transmission TB from 2000 to 2011 in the Metropolitan area of Vitória-ES. They accounted for 12% of all culture-confirmed TB cases in this area during this period. This observation was possible probably because of the high case coverage and length of the study period, increasing the chance of genotype clusters, as reported by van Soolingen et al. [28]. This finding is in concordance with a failing TB control program that fails to stop disease transmission in this area, and better approach should be implemented. This may be related to a variety of factors including delay in diagnosis and poor contact investigation strategies. Maciel and colleagues recently suggested a possibility for implementing a new casefinding strategy based on screening populations in neighborhoods with high-density recent-transmission TB and social network analyses [29].
In the present study, we found that isolates of the ES14 family accounted for the largest proportion of recently-transmitted TB cases, which suggests that these strains are either more transmissible or more likely to cause disease after infection. Strains of this family were LAM (mostly LAM9) family and RD Rio genotype. Isolates with the particular eight band pattern that is the basis of this family have been reported as predominant also in studies conducted in Rio de Janeiro, São Paulo and Rio Grande do Sul in Brazil, and is frequently encountered in a database of isolates originating from other countries such as the Caribbean, Europe, Africa and other countries in South America [21,[30][31][32]. These findings suggest that the incidence of TB in this region may be strongly influenced by a relatively small subset of actively circulating strains. It is known that in areas with a higher incidence of TB, RFLP patterns are often less variable than in low-incidence areas [33]. Recently Ribeiro et al. (2015) showed that new TB cases do not just cluster in space, but that certain M. tuberculosis lineages tend to cluster even after controlling for known individual and socioeconomic factors that can influence transmission [29].
In our study, the clinical manifestation of TB-pulmonary or extrapulmonary TB-was associated with any particular cluster size (2-5 isolates/cluster). Although Gomes and colleagues showed no association between the clinical manifestation of TB and clustering rates, interestingly when stratified by cluster size a larger proportion of EPTB cases were in particular cluster size [25]. Our earlier studies also demonstrated that RD Rio strains are less likely to cause extrapulmonary disease than non-RD Rio strains [11] and the high prevalence of RD Rio strains in the present study could be associated with levels of clustering of EPTB cases but many other factors influence clinical manifestations of TB, including the duration of illness before diagnosis as well as underlying host factors.
Our findings are consistent with several studies which have demonstrated the predominance of isolates of the LAM family and of the RD Rio lineage in TB cases in Brazil [9,11,[34][35][36]. Lazzarini and colleagues showed that the LAM1 and LAM2 sublineages exclusively belonged to the RD Rio genotype, while the lineages LAM4, LAM5, LAM6 and LAM9 included both RD Rio and non RD Rio genotypes, and LAM3 were all non-RD Rio [9], although some exceptions on this rule were presented recently [32]. Indeed, previous studies showed that RD Rio genotype is significantly associated with cluster groups (an indication of recent transmission) than non-RD Rio strains, both in Brazilians and in non-Brazilian populations [36]. These data corroborate with findings in our study that showed that isolates from RD Rio genotype belong to 6-9 isolates/cluster group. Although the proportion of cases with more 6-9 isolates/cluster patterns among RD Rio strains was significantly greater than that non RD Rio strains, it is not clear if this difference could be attributed to enhanced virulence and transmissibility of the RD Rio strains. Previous studies suggested that these strains were recently introduced in some regions of Brazil and evolved after its introduction, or that the RD Rio strains are more biologically "fit" [11,36]. Glynn and colleagues suggested that these cluster strains are particularly transmissible or particularly more likely to cause disease after infection [36]. Other possibilities for their predominance are that they have been present in a geographic setting longer than others and that they had more time to become widespread, or that we are seeing a founder effect in some populations with subsequent spread following human migration patterns [33].

Conclusions
Our findings suggest that strains belonging to the LAM family and RD Rio genotype showed are more likely to be largest clusters (6-9 and ≥10 isolates/ cluster). We confirmed that the ES14 family is the most prevalent genotype of Mtb in Vitória -ES, Brazil, this suggesting, a large proportion of TB cases in one city can be caused by a few set of lineages circulating in the city. This provides an opportunity to characterize factors that affect transmission instead of host factors. Therefore, once an M. tuberculosis lineage enters in a community, that particular strain and its related family strains are more likely to propagate than outside strains. Ethics approval and consent to participate Sputum culturing of individuals suspected to have TB is done routinely at clinics in the Vitória metropolitan area and the TB Reference Laboratory located at the Infectious Diseases Laboratory of the Federal University of Espirito Santo (UFES). The Mtb isolates are routinely stored by this laboratory for use in outbreak investigations and epidemiologic surveillance. This study was a retrospective analysis of data collected routinely during activities of the state TB control program. Patients were not contacted to request additional information. The study was reviewed and approved by the institutional review board of Universidade Federal do Espírito Santo (UFES; under number 121/06) who granted permission for use of the Mtb isolates and clinical data for the purposes of the study and waived the need for written informed consent from participants as the study involved no more than minimal risk and was done with existing microbiology specimens. Patients had an identification number for clinical purposes while cultures had a different accession number for laboratory purposes. To protect patient confidentiality, only one investigator (ELM) had access to both de-identification codes and was the person that linked of the clinical and culture databases for this study. After linkage, a new code number was created for each record for use in the study analysis.

Competing interests
The author(s) declare that they have no competing interests.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.