Validation of genotype cluster investigations for Mycobacterium tuberculosis: application results for 44 clusters from four heterogeneous United States jurisdictions

Background Tracking the dissemination of specific Mycobacterium tuberculosis (Mtb) strains using genotyped Mtb isolates from tuberculosis patients is a routine public health practice in the United States. The present study proposes a standardized cluster investigation method to identify epidemiologic-linked patients in Mtb genotype clusters. The study also attempts to determine the proportion of epidemiologic-linked patients the proposed method would identify beyond the outcome of the conventional contact investigation. Methods The study population included Mtb culture positive patients from Georgia, Maryland, Massachusetts and Houston, Texas. Mtb isolates were genotyped by CDC’s National TB Genotyping Service (NTGS) from January 2006 to October 2010. Mtb cluster investigations (CLIs) were conducted for patients whose isolates matched exactly by spoligotyping and 12-locus MIRU-VNTR. CLIs were carried out in four sequential steps: (1) Public Health Worker (PHW) Interview, (2) Contact Investigation (CI) Evaluation, (3) Public Health Records Review, and (4) CLI TB Patient Interviews. Comparison between patients whose links were identified through the study’s CLI interviews (Step 4) and patients whose links were identified earlier in CLI (Steps 1–3) was conducted using logistic regression. Results Forty-four clusters were randomly selected from the four study sites (401 patients in total). Epidemiologic links were identified for 189/401 (47 %) study patients in a total of 201 linked patient-pairs. The numbers of linked patients identified in each CLI steps were: Step 1 - 105/401 (26.2 %), Step 2 - 15/388 (3.9 %), Step 3 - 41/281 (14.6 %), and Step 4 - 28/119 (30 %). Among the 189 linked patients, 28 (14.8 %) were not identified in previous CI. No epidemiologic links were identified in 13/44 (30 %) clusters. Conclusions We validated a standardized and practical method to systematically identify epidemiologic links among patients in Mtb genotype clusters, which can be integrated into the TB control and prevention programs in public health settings. The CLI interview identified additional epidemiologic links that were not identified in previous CI. One-third of the clusters showed no epidemiologic links despite being extensively investigated, suggesting that some improvement in the interviewing methods is still needed. Electronic supplementary material The online version of this article (doi:10.1186/s12879-016-1937-9) contains supplementary material, which is available to authorized users.


(Continued from previous page)
Conclusions : We validated a standardized and practical method to systematically identify epidemiologic links among patients in Mtb genotype clusters, which can be integrated into the TB control and prevention programs in public health settings. The CLI interview identified additional epidemiologic links that were not identified in previous CI. One-third of the clusters showed no epidemiologic links despite being extensively investigated, suggesting that some improvement in the interviewing methods is still needed.
Keywords: Tuberculosis, Epidemiology, Genotype, Cluster investigation, MIRU-VNTR, Spoligotype, Contact investigation, Surveillance Background Tuberculosis (TB) contact investigation (CI) is a disease control strategy that performs a crucial role in understanding the most relevant epidemiologic factors influencing TB transmission between individuals [1]. In addition to CI, tracking the dissemination of specific Mycobacterium tuberculosis (Mtb) strains in populations is an important tool used to understand TB transmission dynamics [2]. For over 20 years, investigators have been discovering and utilizing genetic elements of the Mtb genome as molecular genotype markers [3]. The Mtb genotyping methodologies include utilizing the direct repeat locus-based spacer oligonucleotide typing (spoligotyping) [4,5] and mycobacterial interspersed repetitive unit-variable number of tandem repeat (MIRU-VNTR) typing [6]. These genotyping techniques have been routinely used by the Centers for Disease Control and Prevention (CDC) since 2004 [7].
United States (US) public health departments evaluate persons having known contact with infectious TB patients to identify and treat individuals for whom TB transmission results in active TB disease or latent TB infection (LTBI). Because of the difficulty in identifying and assessing all individuals potentially infected by a given TB patient, CIs provide an incomplete picture of TB transmission. The investigation of TB transmission has been enhanced with the application of Mtb genotyping [8]. When Mtb genotyping is conducted routinely on all or nearly all Mtb isolates from a given jurisdiction, persons with isolates that have the same genotype are termed "clustered" and are suspected of being transmitted recently. Individuals with Mtb isolates that have unique genotypes are termed "non-clustered". TB development in these persons is considered to be due to reactivation of previously acquired LTBI, recent transmission with someone who was not genotyped, transmission from a person outside the 3-year surveillance time window or geographic area, or relapse of a prior episode of TB disease [9].
Genotypic data can facilitate the detection of previously unsuspected transmission [10][11][12]. Furthermore, when TB patients are identified as epidemiologic-linked through CI, TB transmission can be confirmed or refuted by matching (concordant) or discrepant (discordant) genotypes, respectively [13]. Due to issues concerning the discriminatory power of the genotyping techniques used [8,14,15], as well as the endemic level of genotype in a jurisdiction [16], it cannot be assumed that TB patients with matching genotypes result from the same chain of transmission. However, transmission between TB patients with matching genotypes can be verified by detecting epidemiologic linkages, which include: timing, interactions, or relationships among the persons [17]. Epidemiologic investigations of TB patients having genotypically matched Mtb isolates can uncover transmission venues and epidemiologic links between persons not identified by routine CI [11]. Public health investigators refer to these additional efforts as cluster investigations (CLI). The current study implements a standardized process for conducting CLI systematically and validates the application of this process to a set of randomly selected Mtb clusters identified in public health settings.

Population
Mtb culture-positive patients from four study sites reported to the CDC from January 2006 to October 2010, whose Mtb isolates were genotyped by CDC's National TB Genotyping Service (NTGS), were evaluated for Mtb clustering. Study sites, Georgia (GA), Maryland (MD), Massachusetts (MA) and Houston (HOU), Texas, were members of the Tuberculosis Epidemiologic Studies Consortium, a consortium of US sites funded by CDC to conduct TB epidemiologic research [18]. All sites except Texas evaluated TB patients for Mtb clustering in counties throughout the state. In Texas, only TB patients reported in the City of Houston jurisdiction (HOU) were evaluated. The study was approved by Institutional Review Boards at CDC and each study site.

TB cluster selection process
Mtb isolates from all patients were characterized by NTGS using spoligotyping and 12-locus MIRU-VNTR (MIRU12). Each unique combination of spoligotype and MIRU12 results is assigned a "PCRType" [19]. Clusters were defined as two or more TB patients with the same PCRType in a given public health jurisdiction (county or HOU) during the study period. Clusters were eligible for random sampling selection if the cluster consisted of at least three TB patients residing in the same given public health jurisdiction, whose TB status were reported between January 1, 2006 and the time of cluster evaluation. Eligible clusters for each of the four sites (see above) were assigned to three priority groups (low, medium, and high priority) based on their calculated log-likelihood ratio (LLR: <1.00, 1.00-5.79, and ≥5.80, respectively) associated with the public health cluster priorities [20,21]. After reviewing the geospatial scores [21], initial expert panel rankings, and cluster investigation findings, the CDC statistician and the expert panel determined the log-likelihood ratio (LLR) cut-points that were associated with high-, medium-, and low-priority clusters in our surveillance data. Clusters were then randomly selected from each group. In total, 44 clusters (11 per site) were selected for further investigation. Details of sample size considerations and the sample selection strategy are provided in Additional file 1.

Epidemiologic links
An epidemiologic link was defined as relationships between two TB cases within a cluster who were determined to have likely shared air space while at least one of the cases had active TB disease. Epidemiologic links were considered definite if two cases named each other as a contact or were identified as having been in the same place at the exact same time; probable if the cases were in same place in the same timeframe (same week); and possible if the cases were in the same place possibly at the same time (month or season). A homogeneous attribute was defined as a single epidemiologic characteristic describing all patients in a given cluster.

Cluster investigation
Beginning in late 2009, TB surveillance data for all subjects were obtained using the Report of Verified Case of Tuberculosis (RVCT) [22] through collaborations with local public health staff. The subjects were part of the selected clusters in the data routinely collected by the CDC's National Tuberculosis Surveillance System.
In coordination with local TB programs, CLIs for each selected cluster were conducted to determine whether TB patients in a given cluster had epidemiologic-linkages. Patients in selected clusters who were identified after cluster selection occurred were also investigated. A study protocol was developed whereby CLIs were carried out in a stepwise fashion ( Fig. 1 and Additional file 2-CLI Instruments): Step 1: Public health worker interview Public health workers (public health supervisors, case managers, disease intervention specialists or contact investigators) for each clustered TB patient were contacted by study staff and asked whether they Fig. 1 Steps for cluster investigation were aware of any epidemiologic links between the patients. If epidemiologic links were identified between two or more patients in the cluster, that information was documented. The public health workers could use any available documents as mental reminders during the interview.
Step 2: Contact investigation evaluation In coordination with local TB program staff, contact investigation records of patients in clusters were collected and reviewed to determine whether epidemiologic linkages to other TB patients were identified during the routine CI. For each patient evaluated, the number of contacts evaluated and the number of contacts with newly identified LTBI, previously diagnosed LTBI and active TB disease were documented. Public health worker interviews and contact investigation evaluations were carried out on each study patient except when no public health worker could be contacted or when contact investigations were not done. After each single epidemiologic link was established, investigations routinely continued to explore additional epidemiologic links between a patient and other patients in a given cluster.v Step 3: Review of public health records If no epidemiologic links were identified in Steps 1 and 2, TB patients' public health records which contain documentation of any intake or follow-up patient interviews conducted by the health department were reviewed to determine whether there were documented epidemiologic links to other TB patient(s), and whether location-based relationships existed between patients in the same cluster (e.g. residential, social, or medical settings).
Step 4: CLI TB patient interviews If no epidemiologic links were identified between patients in a cluster from CLI Steps 1-3, patients were contacted and interviewed after verbal consent was obtained using a pilot-tested interview instrument (Additional file 2), beginning with the most recently diagnosed subject. The interview instrument was designed to facilitate identification of epidemiologic linkages to other TB patients. For every epidemiologic link identified, estimated dates of symptom onset, relationship between patients, the most frequent patient-pair setting where transmission may have occurred, and the CLI step where the link was identified were documented. Epidemiologic links were investigated only if both patient-isolates were genotyped. CLI study instruments (Additional file 2) contained items designed to collect details of the study patients' frequently visited locations, which could be evaluated as possible venues for transmission.

Data management and analysis
Study data were entered into a Microsoft Access 2003 (Redmond, WA) database by site staff and merged for analysis by the data coordinating center at the Texas site. National summary data on study PCRTypes (number of patients and the number of states reporting the given genotype) were provided by the CDC. To summarize the characteristics of study clusters, patients in a given cluster were compared to all other study patients by select demographic and behavioral characteristics and two-sided Pvalues were calculated. Clusters associated with at least one epidemiologic link were compared to those without identified epidemiologic link by demographic, behavioral, clinical and genotypic variables.
Comparison between patients whose linkage was identified through the study's CLI interviews (Step 4) and patients whose linkage was identified earlier in CLI (Steps 1-3) was conducted using univariate and multiple logistic regression. Statistical analyses were conducted using Stata/SE 13.1 (StataCorp LP, College Station, TX).

Results
From 2006 to 2010, there were 62,642 reported TB cases in the US, reflecting a TB case rate of 4.1 cases/100,000 [23]. During the same time period, study site jurisdictions reported the following number of TB cases: MD-1239, MA-1208, GA-2291 and HOU-1315, corresponding to an average TB rate of 4.4, 3.7, 4.8 and 12.5 cases/100,000, respectively [23]. The proportion of Mtb culture-positive patients that were genotyped during the study period was 82.1 % for the US and 98.5, 89.9, 84.4 and 85.8 % for MD, MA, GA and HOU, respectively [19]. From a pool of 132 eligible clusters (MD-25, MA-23, GA-35, HOU-49), 44 clusters (11 clusters from each site) corresponding to 38 distinct PCRTypes were randomly selected for investigation. Three PCRTypes (PCR00002, PCR00016 and PCR00017) were investigated in more than one study jurisdiction ( Table 1). Most of the PCRTypes were of Euro-American (L4) or East Asian (L2) lineage (n = 29 and n = 7, respectively), but one PCRType each was identified of East African Indian (L3) and Indo-Oceanic (L1) lineages. PCRTypes found in the study were also seen nationally with a distribution range from one to 46 states. Three PCRTypes were seen in no US state other than that associated with the study site during the study period: PCR06732 (GA), PCR04837 (TX) and PCR04846 (TX).
A total of 401 study patients in the 44 selected clusters were evaluated by the CLI method. Median cluster size was six (range 3-33); HOU clusters tended to be larger than those from other sites (median 10 vs. 6, p = 0.024). Nineteen clusters (43 %) had only US-born patients and eight clusters (18 %) contained only foreign-born patients ( Table 2). Certain single epidemiologic profiles describing all patients in a given cluster were identified for specific clusters ( Table 2, "Homogeneous attribute" column).
In 401 study patients, 189 (47 %) patients were identified with epidemiologic links in a total of 201 linked patient-pairs (Fig. 2), of which 132 (66 %) were definite linkage strength, 27 (13 %) were probable and 42 (21 %) were possible epidemiologic links. Screening by a PHW (Step 1) identified 105/401 (26.2 %) linked patients.  while seven also had discordant spoligotypes. These 13 genotypically discordant, but epidemiologic-linked, patientpairs were excluded from further consideration because their high level of discordance suggested that the linked patient-pairs were not part of the same transmission chain. The 188 linked patient-pairs with concordant PCRTypes corresponded to only 179 of the 401 study patients (45 %) having epidemiologic links because 75 patients had more than one link identified.
Specific transmission venues were identified for some clusters. Among 19 clusters with at least three pairs of epidemiologic-linked patients (Clusters 01, 02, 08, 10,11,12,13,14,16,20,23,24,28,29,30,31,36, 39 and 42), 11 (57.9 %) had at least 50 % of their total epidemiologic links associated with a specific venue: four with homeless shelters (Clusters 01, 12, 23 and 36), three with drug houses (Clusters 16, 30 and 42), two with churches (Clusters 14 and 31), one with a bar (Cluster 39) and another with a social club (Cluster 24). Over 90 % of epidemiologic links identified for Clusters 01, 23 and 36 were associated with homeless shelter transmission venues. All epidemiologic links identified for Cluster 42 were associated with a drug house venue and seven of the eight (88 %) epidemiologic links identified for Cluster 24 were associated with a social club transmission venue. Seven (37 %) of the 19 clusters with at least three pairs of epidemiologic-linked patients were mainly (≥50 %) associated with household or non-household close social transmission venue (Clusters 02, 08, 10, 13, 20, 28, and 29). Among 16 epidemiologic linked pairs of the remaining There was substantial variability by cluster in terms of the proportion of patients with identified epidemiologic links ( Table 2, "Epi-linked" column), ranging from 0 to 100 %. No epidemiologic links were identified for patients in 13/44 (30 %) clusters, despite having all four CLI steps completed on 36 % of the 77 patients in these clusters. The number of clusters having all black patients was significantly lower in 13 clusters without epidemiologic links than in those with epidemiologic links [2 (15.4 %) versus 15 (48.4 %), p = 0.040]. No difference in the number of clusters having 100 % foreign-born patients was seen between the two groups (data not shown).
Twenty-five percent of epidemiologic links from HOU were identified through CLI. Meanwhile, epidemiologic links from MA and MD had lower odds of being identified by CLI TB patient interviews than linkages from other sites ( Table 3; p = 0.004 and p = 0.036, respectively). All epidemiologic links with a household transmission setting and/or involving relatives were identified earlier than Step 4, while workplace and church transmission settings were associated with identification through CLI TB patient interviews in Step 4 (p = 0.032 and p = 0.046, respectively). Epidemiologic links involving a black TB patient had higher odds of being identified by early investigation steps (p = 0.036). Epidemiologic links including Asians or patients with extrapulmonary TB were associated with identification through CLI TB patient interviews in univariate analysis (p < 0.001 and p = 0.033, respectively); these associations became non-significant in multivariate results. Definite (strength) epidemiologic links had decreased odds for identification through interviews (p < 0.007) ( Table 3). All epidemiologic links identified for clusters 14, 19, and 43 were identified by CLI TB patient interviews and over 50 % of links identified for clusters 31 and 39 were identified by CLI TB patient interviews (Data not shown).

Discussion
Contact investigation of individuals who had contact with TB patients is a cornerstone of public health TB control [1]. However, limitations of the concentric circle approach to contact investigations have been highlighted by reports of TB transmission not found through traditional contact investigation methods [11,[23][24][25][26]. In our study, a considerable number of additional linked patients (n = 28; 14.8 % of all identified patients with at least one link to another person in the cluster) found in the CLI interview were not identified through the previous CI (Fig. 2).
Molecular epidemiologic data suggests that routine contact investigations, targeting household, work, and school contacts, commonly miss other locations where infectious TB patients spend time and transmit disease, especially leisure or social settings [11,17,27,28]. The CDC contact investigation guidelines [1] recommend collecting information on potential transmission settings during patient CI interview. In the absence of named TB patient contacts, location-based information on possible transmission venues collected routinely during patient interviews can be useful in establishing relationships between genotypically linked TB patients [11].
By looking for homogeneity within a cluster using routinely collected surveillance data, we were able to generate characteristic profiles for many clusters. These clusterspecific epidemiologic profiles provided hints into potential transmission venue types for given clusters and provided insight into questions to ask, or locations to look for while seeking epidemiologic linkages during CLI steps.
CLI steps were prioritized to minimize resources required to uncover epidemiologic links by first asking health department staff who were directly involved in the TB patient's care if they were aware of links to other patients (Step 1). When applied in a local health department context, existing knowledge of clusters or patient relationships is available through communication with a case manager, disease intervention specialist, or contact investigator (public health workers). Existing contact investigation records were then reviewed for documented links (Step 2). The next investigation step, entailing review and evaluation of public health records, added a more time-intensive and analytic component to investigations (Step 3). Finally, the most resource intensive step was patient re-interviews (Step 4). The analysis of the CLI step where epidemiologic links were determined (Table 3) demonstrated various scenarios where CLI interviews had added utility compared to earlier investigative steps. Higher odds of epidemiologic links were found in association with workplace, when patient-pairs resided in different zip codes within the same jurisdiction and Asian or African American patients (Table 3). Although we found 11 study participants having unknown epidemiologic links through contact investigation review (Step 2), we do not know how many contacts with active TB had epidemiologic links because a contact with active TB might be involved in more than one epidemiologic link.
Limitations to this study include the possibility of not including all patients in a potential genotype cluster given genotype coverage during the study period (especially for GA and HOU), the inability to locate and obtain consent from patients for re-interviews and exclusion of clinically defined and non-genotyped culture-positive patients with epidemiologic links to patients in study clusters. In addition, the infectious period of each patient was not considered. Although beyond the scope of this study, including non-genotyped patients may show a more complete picture of cluster transmission dynamics. Furthermore, NTGS transitioned from using spoligotype and 12-locus MIRU-VNTR (MIRU12) to spoligotype and 24-locus MIRU-VNTR (MIRU24) in 2009 to increase the discriminatory power of MIRU-VNTR [8,14,15]. Since this study was initiated in 2009 and included cases in previous years, cluster definition and selection process had to be defined by spoligotype and MIRU12. Given the variant number of epidemiologic links identified by different study sites, interview style (although standardized) may have played a role in potentially influencing subjective and qualitative outcomes. Despite the variation of results seen between sites, one of our study outcomes was to provide additional high-risk TB contacts identified by CLIs. In resource-limited jurisdictions where local funding and resource may not be enough to launch the cluster investigations, the information of high-risk contacts that were missed by the initial CIs is still helpful for evaluation purposes and to help TB programs improve their conventional CI techniques. Lastly, recall bias could not be ruled out, especially in patients who were diagnosed with TB many years before their cluster investigation interview was conducted.
Public health departments need to develop strategies and focus resources to prioritize and investigate clusters that may be of public health concern. An initial step in these investigations should be to evaluate clusters using readily available data. Many data elements needed to investigate clusters in specific jurisdictions are now available to TB control personnel routinely and electronically through the Tuberculosis Genotyping Information Management System [19]. Additionally, the 2009 expansion of the RVCT includes up to two state case numbers for TB patients epidemiologic-linked to the reported patient [22], so health departments can easily assess clusters for epidemiologic links. If the transmission dynamics are poorly understood and the cluster continues to grow, additional resources should be devoted to the CLI, including abstracting public health records of clustered patients and interviewing the TB patients to find epidemiologic linkages between patients beyond those identified by the health department. As we found in this study, re-interviewing patients in a cluster (Step 4), especially when no epidemiologic links have been identified can facilitate the identification of transmission venues and locations that are crucial in interrupting the ongoing transmission and cluster growth. Further study on improving the interviewing methods may be needed to increase the detection rate of epidemiologic links in Mtb genotype clusters. In addition, CI record review (Step 2) found 15 (3.9 %) linked patients exemplifying a need for better tools and trainings for contact investigations, which is an essential component of TB control programs.
Despite the continuing decline in US TB rates leading to a decrease of funding for public health activities for TB control, the elimination goals established in 1989 [29] remain unmet. With the recent leveling rates of TB [30], an interruption of the Mtb transmission by implementing the expanded and efficient CIs and CLIs would be critical for the success of TB control and prevention programs in the US.

Conclusion
We validated a practical method to systematically identify tuberculosis epidemiologic links that can be integrated into routine TB control and prevention programs in public health settings. Re-interviewing patients in a cluster can identify additional epidemiologic links that were not found in the previous CLI steps. Improvement of the interview methods and effective contact investigation trainings may be needed as no epidemiologic links were identified in one-third of the Mtb genotype clusters.