Neighborhood socioeconomic position and tuberculosis transmission: a retrospective cohort study

Background Current understanding of tuberculosis (TB) genotype clustering in the US is based on individual risk factors. This study sought to identify whether area-based socioeconomic status (SES) was associated with genotypic clustering among culture-confirmed TB cases. Methods A retrospective cohort analysis was performed on data collected on persons with incident TB in King County, Washington, 2004–2008. Multilevel models were used to identify the relationship between area-level SES at the block group level and clustering utilizing a socioeconomic position index (SEP). Results Of 519 patients with a known genotyping result and block group, 212 (41%) of isolates clustered genotypically. Analyses suggested an association between lower area-based SES and increased recent TB transmission, particularly among US-born populations. Models in which community characteristics were measured at the block group level demonstrated that lower area-based SEP was positively associated with genotypic clustering after controlling for individual covariates. However, the trend in higher clustering odds with lower SEP index quartile diminished when additional block-group covariates. Conclusions Results stress the need for TB control interventions that take area-based measures into account, with particular focus on poor neighborhoods. Interventions based on area-based characteristics, such as improving case finding strategies, utilizing location-based screening and addressing social inequalities, could reduce recent rates of transmission.


Background
Although tuberculosis (TB) incidence continues to decline in the United States, studies have revealed that intense TB transmission continues to occur in low-incidence countries [1,2]. To assess transmission dynamics, molecular techniques are used to identify genetic clusters of isolates of Mycobacterium tuberculosis with identical genotypes. Those isolates with identical genotypes are thought to indicate recent transmission and a possible continuing transmission chain, while a predominance of unique 'non-clustered' isolates implies that most TB cases are caused by reactivation of remote infection [3,4].
Studies have shown that lower socioeconomic status (SES) neighborhoods are correlated with greater clustering among TB strains [5,6] with associations shown between homelessness, unemployment and TB clusters [7][8][9], yet the association between area-based socioeconomic measures and clustering has not been well assessed. Better knowledge of area-based risk factors for clustering could help develop more effective targeted prevention strategies, and the joint effect of both individual-and communitylevel measures of SES might help distinguish compositional and contextual effects of socioeconomic factors on TB transmission.
In King County, Washington, the population is highly diverse in terms of birth origin, as well as socioeconomic status. It is likely that TB genotypic clustering would significantly vary, with increased clustering either caused by recent transmission, or by commonly circulating strains within some populations. Those individuals living in block groups with greater socioeconomic disadvantage were hypothesized to be associated with increased TB transmission, as assessed using genotypicallydefined TB clusters [8,10].

Study population and setting
The study population consisted of all incident reported culture-TB cases with available genotyping with block group-level geocodes recorded in King County, Washington between January 1, 2004 and December 31,2008. An incident case of TB was defined according to Centers for Disease Control and Prevention (CDC) surveillance criteria, where TB was either diagnosed for the first time or more than 12 months had elapsed since the patient previously completed TB therapy [11]. A culture-positive sample was defined as isolation of M. tuberculosis from a clinical specimen. Patients who did not have both spoligotyping and mycobacterial interspersed repetitive unit-variable-number repeat (MIRU-VNTR) analysis performed on their isolate or did not live in King County at the time of specimen collection were excluded from the analysis. The analysis merged reporting, medical record and genotyping data for TB cases and US census data. Subsequently, only cases with available genotyping results and geocoded addresses were included in the final study population. Approval was granted for this study in May 2009 from the University of Washington and Washington State Institutional Review Boards and final project analysis completed October 2010.

Data sources
Individual-level case variables were collected at the local level from the Tuberculosis Information Management System (TIMS) and follow standard surveillance definitions [10]. Individual-level variables were subsequently aggregated by block group. Residential address at the time of diagnosis was obtained from patient medical records. Using a geographic information system and latitude/longitude coordinate data, TB cases were geocoded to the corresponding block group of residence. Only block groups with diagnosed TB cases were included in the analyses.
SES was defined at the block group level using censusbased indicators of socio-economic disadvantage. A socioeconomic position (SEP) index, was constructed consisting of a standardized z-score combining data on percent working class, unemployed, poverty, high school, expensive homes and median household income. To construct the score, each variable was given a standardized score, which was the sum of all block group values with SEP data (n = 1,576), minus the mean sum, divided by the standard deviation, and then summed up the individual z-scores. Although high inter-correlations and reliability were noted (Cronbach's α coefficient 0.78), these measures, along with the index, have previously been used to assess US small area differences in health, with the latter developed based on a factor analysis of eleven single SES factors using rank values of the census data [12]. All socio-economic data as well as area-based data were derived from the US Population Census 2000, SF1 and SF3 [13,14]. All culture-positive patients were genotyped using spoligotyping and 12-locus MIRU-VNTR genotype results obtained through the National TB Genotyping Service. Genotype results were subsequently linked to National TB Surveillance System data using a standardized state case identification number. A cluster was defined as two or more patients with identical TB genotypes within King County. Given the study scope, if cases were part of a Washington cluster designation but unique within King County, they were considered to have a unique TB genotype.

Statistical analysis
Descriptive statistics were applied to included block groups to assess poverty distributions as well as deviation from King County as a whole. The proportion of TB patients considered to belong to a chain of recent transmission was calculated as the number of subjects belonging to a cluster divided by total number of individuals genotyped [15]. Additionally, the proportion of cases caused by ongoing transmission was estimated using the n-1 method, where the source case of each cluster was not considered to have recent disease [16]. Incidence rates over time were calculated for both clustered and non-clustered (unique genotype) patients. Univariate associations of independent variables and genotype clustering were assessed using Pearson χ 2 . SaTScan was used to generate a spatial scan statistic identifying geographic areas with a higher-than-expected clustering rate. TB incidence rates were calculated for each SEP stratum by dividing the number of TB cases in a particular quartile by the corresponding stratum population, multiplied by the five years in the reporting period. Cuzick's nonparametric test for trend across ordered SEP groups was assessed as a summary test of statistical significance [17].
To examine area-level influences on disease clustering in addition to individual attributes, multilevel regression models were used to assess the association between SEP and TB clustering. A two-level hierarchical model with binary clustering outcome was estimated with the high SEP quartile serving as the referent. Hierarchical models have the advantage of yielding accurate parameter estimates and sampling variances in the presence of correlated errors [18]. Prevalence ratios and 95% confidence intervals were estimated by binomial regression with the log link function [19]. Model 1 consisted of an empty two-level model to examine log-odds of genotypic clustering in an 'average' block group and to quantify blockgroup-level variance. Model 2 added socioeconomic quartiles as exposure variables. Model 3 controlled for the individual demographic variables of age, race (modeled as dummy variables with white serving as referent), sex (males as referent) and country of origin (US-born as referent) in addition to SEP index. Model 4 included individual socioeconomic variables (homelessness with non-homeless referent, employment with employed referent, provider type modeled with dummy variables with public service provision as referent) in addition to demographics and SEP index. Model 5 added area-level variables of race, ethnicity and foreign birth in addition to individual-level variables and SEP index. Complete case analysis was used such that the number of patients with missing covariates (n = 12) excluded from each model was the same.

Block group demographics
The study consisted of 327 block groups in King County with at least one case residing in each (20.7% of block groups with SES data) ( Table 1). Block groups included in the study were largely of white (60%), US-born (78%) composition. Hispanic ethnicity made up approximately eight percent of the population, about 10% of individuals were under the federal poverty line and 4% were unemployed. The average five-year incidence rate of TB was 15.6 per 100 000 across all included block groups. In comparison to other block groups in King County (N = 1,249), those included in the study were more likely to contain individuals reporting as black or asian race as well as of Hispanic ethnicity. Additionally, the median proportion foreign-born in these block groups was almost twice as high as that of King County.

TB patients
Of 686 incident TB cases reported in King County from 2004-2008, 577 (84%) were culture positive, excluding relapses, interjurisdictional transfers, and individuals with missing TB treatment date. Of reported culturepositive cases, 547 (95%) had a reported genotype and 519 (95%) of these cases had both genotyping and block group geocoding available, and therefore were included in the analysis. TB patients were mostly of asian (44%) and black (28%) race, and were largely (81%) foreignborn. Approximately one third of foreign-born patients were identified within five years of arrival in the US.

Genotype clustering
Of those with a known genotyping result, 212 (41%) of isolates clustered genotypically. Forty-six distinct clusters were identified. The number of patients per cluster ranged from 2 to 32 ( Figure 1). A median of 3 and mean of 7 patients were identified per cluster. 52 clustered patients (25%) belonged to 2-case clusters and 160 (75%) belonged to clusters with 3 patients or more. Individual clusters ranged in duration from 1 year to the full 5 years of the study period. Based on spoligotype/MIRU match, 336 unique TB genotype strains were identified in King County during this time period. Assuming that 1 patient per cluster resulted from reactivation of remote infection and that the remainder resulted from the spread of recently transmitted disease (n-1 method), 166 (32%) of isolates could be defined as recently transmitted tuberculosis. Further analysis showed that of patients identified after subtracting out the index case and unique genotypes, 134 (83%) matched the isolate of a patient identified within the 1-year period prior to diagnosis date, suggesting potential recent transmission from individual to another. Clustered TB disease was not spatially homogenously distributed throughout the included block groups with significant spatial aggregation of the clustered patients (P = .047 for most likely cluster, Figure 2). In unadjusted clustering analyses, patients with unique genotyping results were compared to those patients in clusters (Table 2). Clustering was positively associated with female gender, non-Hispanic ethnicity, US birth, homelessness and substance abuse and with indicators of patient infectivity, including pulmonary TB and cavitary TB disease, although not with HIV infection. On average, patients were identified 397 days apart in 2person clusters, compared with 155 days' apart among 3-person or greater clusters (P < 0.001).
Among foreign-born patients, average clustered patient incidence rates (5.10/100 000) were lower than average non-clustered (8.93/100 000). The reverse was true among US-born patients, where clustered rates were almost twice as high as non-clustered (7.04/100 000 vs. 4.81/100 000). Greater proportions of foreignborn patients clustered as time between arrival and diagnosis increased (data not shown).

Socioeconomic trends
In unadjusted analyses, as SEP decreased, so the proportion clustering increased. A significant linear trend for increased clustering occurred from high to low SEP quartiles (P = 0.001) ( Table 3). Clustered case incidence rates increased with lower SEP index, with the greatest increases in rates when going from low to very low SEP quartiles among both clustered and non-clustered cases and with low incidence rates observed among clustered patients living in the highest SEP quartile. Clustered rates were lower than non-clustered for all quartiles, but much more alike in each progressively lower SEP quartile. Unadjusted fitted log odds of clustering for the continuous SES z-score are shown in Figure 3. Patients residing in block groups in the lowest 10% of all z-scores were even more likely to cluster (56%).
The majority (73%) of US-born patients clustered at the lowest socioeconomic quartile. Within the low and lowest SEP index quartile block groups, US-born patients were significantly more likely to cluster than foreign-born. Clustering increased significantly with residence in progressively lower SEP block groups among both US-(P-trend 0.005) and foreign-born TB patients (P-trend 0.016).
When stratified by SEP index quartiles, the only significant difference between patients stratified by time from arrival to TB diagnosis was seen among those living in the highest SEP group, where clustering peaked among individuals who had been in the US between 10- 19 years from arrival to TB diagnosis (data not shown). Individuals who arrived more recently (0-4 years) were more likely to cluster if they lived in lower SES quartile block groups (P-trend 0.035).
Multilevel models in which community-level characteristics were measured at the block group level demonstrated that lower SEP index was positively associated with TB genotypic clustering after controlling for individual covariates, but the trend of higher clustering risk with lower SEP quartile was diminished when adding additional block-group covariates. In an unadjusted model, a large change in between-community variance (25% decrease) suggested the distribution of SEP quartiles was different across block groups. With progressively lower SEP index quartiles, odds of TB clustering increased compared to the next highest quartile (Table 4, model 2). A positive linear trend was observed (P = 0.005). Once individual demographic variables were included in the model (model 3), the association between SEP and TB clustering did not change. Foreign-born patients were significantly less likely to have clustered disease when compared to US-born patients. Addition of individual-level SES measures did not affect the SEP-clustering association (model 4).
When area-level demographic variables were added, SEP-TB clustering odds ratios decreased in the lowest SEP quartile and the significant linear trend showing increasing with decreasing SEP disappeared (P = 0.244). Areas with larger proportions of black inhabitants were more likely to have TB clusters (Adjusted OR = 1.25; 95% CI: 1.01, 1.29) (model 5). In this multilevel analysis, the only individual-level variables to remain independently associated with TB clustering were foreign-born and race after inclusion of all covariates. These findings suggest that arealevel demographic measures, and hence factors related to the area of residence, may substantially affect genotyping clustering among TB patients in the lowest SEP quartile.

Discussion
In this study, TB genotype clustering was common and closely linked to lower block group socioeconomic status. These findings were novel, in use of a validated SEP index and in showing the explicit association between SES and transmission across areas using a multilevel framework. Both clustered and non-clustered case incidence rates were seen to increase with lower SES quartile, with those patients living in the lowest SEP quartile at measurably higher risk for clustering. The analysis confirmed previous molecular epidemiologic investigations identifying patients of US birth, Hispanic ethnicity, homelessness and higher frequencies of substance use as at greater odds for clustering [3,7,20]. As in previous work, there was less evidence of genotypic clustering among foreign-born persons, and genotyping clusters indicated almost no transmission between US and foreign-born groups [20][21][22]. These findings also confirm similar multilevel analyses that found a positive association between low SES and TB burden and incidence [23,24]. Previous ecologic studies have observed that clustering is greater in poorer areas [5,6,10] and associations have been demonstrated between homelessness or unemployment and TB clusters [20][21][22]. Clustering by restriction fragment length polymorphism insertion sequence 6110 (RFLP-IS6110) has also previously been shown to correlate with individual markers of low SES, such as homelessness and low income clustering [3,25]. In this study, while individual-level SES measures were crudely associated with clustering, and likely mediate the relationship between SEP and clustering, these measures may have been too crude to pick up the association in the multivariate analyses. Living in a poorer neighborhood may result in higher rates of recent TB transmission because of shared airspace through population density and lack of ventilation [26]. Additionally, contextual effects such as health care availability, or the natural or structural environment may influence transmission [27]. Several studies have also shown that residents of neighborhoods with higher poverty rates encounter environments conducive to stressors and riskier behavior [28][29][30].
In this study, clustered TB genotypes were spatially aggregated, confirming previous findings that utilized different genotypic and spatial methods [6,31]. In multivariate analyses, neighborhoods which had lower socioeconomic status exhibited greater odds of genotypic clustering. Block-group level race, ethnicity and foreign birth measures attenuated observed associations in the lowest SEP quartile, and may indicate that the effect of neighborhood disadvantage does not dominate that of population demographic characteristics in that area. On the other hand, collinearity between degree of poverty and predominantly minority neighborhoods may make it difficult to disentangle these variables at the block group level. Race has controversially been hypothesized to be the main driving factor in the spatial organization of urban areas, rather  Figure 3 Predicted log-odds of TB clustering by z-score in unadjusted model. than class [32]. However, race may have less of an effect on clustering and ongoing transmission as it does on baseline incidence. SES has been shown to account for much of the increased TB risk attributed to particular races. It is also possible that low SES may not capture all differences in socioeconomic conditions across neighborhoods that also differ in racial/ethnic composition [33]. Previous US-based studies have shown only 25-42% of patients in genotypic clusters to have known epidemiologic links [25,34]. Thus, certain shared genotypes may represent older, endemic strains that are dispersed widely in the US or countries of origin, and clustering may be a result of common contact from circulating strains within a community rather than ongoing active transmission [9]. Spatial variations of unique TB strains by zip code suggest that immigrant neighborhoods have higher rates of unique isolates, suggestive of remote transmission [35]. Some groups of immigrants might share strains acquired in high incidence settings, where one predominant strain type exists. Within each quartile of SEP index, as proportion of foreign-birth in the block group increased, so clustering decreased, perhaps because of higher likelihood of remote TB, or because of decreased stressors as a result of social status, social networks and cohesion [36].
Even if clustering does not indicate an ongoing contagious process, immigrants from areas with known common strains are more likely to be poor and to settle in poorer neighborhoods [37]. Poverty is likely to result in inadequate access to health care and TB treatment [38]. Nevertheless, poverty rates among immigrant groups decline quickly with time in the US [39]. Lower clustering rates among recent foreign-born arrivals in the Unites States reflect a lack of ongoing transmission regardless of SES group. Among foreign-born persons, within the recent arrival group, clustering seemed to increase with lower socioeconomic quartile, but this trend was not observed among those who had been here longer. Genotyping has previously indicated ongoing transmission among the foreign-born within the largest high-incidence zone in Montreal, correlating with lower SES neighborhoods as well as these findings [40]. Previous research has also suggested that new transmission could be expected to cause more active TB in "TB-naïve" neighborhoods, as high prevalence of latent TB infection among foreign-born patients is protective against recurrent TB exposure [41]. Multivariate findings were consistent with this hypothesis. One might also expect less clustering in an area with high migration and strain diversity since isolates not truly linked by new transmission are likely to be distinct [42].
Estimates of degree of clustering and size of clustering are likely to be conservative because individuals with the same genotype are potentially present outside of the study area [43]. Substantial challenges also remain in interpreting the extent of recent transmission, given the background heterogeneity of genotypes, strain evolution over time, and which criteria are used to infer transmission. Authors have previously evaluated various transmission indices in this evolving field of study [44]. Additionally, although the use of spoligotyping and MIRU techniques are currently used by the CDC to determine recent transmission, their low calculated specificities compared to RFLP-IS6110 have been shown to lead to misclassification of patients, inflated estimates of TB transmission, and low positive predictive values [45]. Since 2009, 24-locus MIRU-VNTR has been used in the US and may reduce this misclassification [46]. Finally, some strains may be more transmissible than others, giving rise to sputum smear-positive disease, slower onset of clinical symptoms even as the patient is infectious, or leading to more virulent disease [47].

Conclusions
Further investigation needs to show how risk factors for clustering are associated with poverty in underlying communities at risk. Substance abuse and homelessness were associated with clustering in this study in unadjusted analyses. Clustering was not associated with HIV infection, as opposed to other recent findings [48] and may demonstrate that in this population co-infected cases were more likely due to reactivation of latent infection rather than re-infection. These findings may also have occurred because HIV-infected TB patients are on average less likely to be the source of transmission, differing demographic profiles, a masking effect due to low force of infection, or the small sample and low prevalence of HIV-infected persons in this study population [49].
Future studies might incorporate other evidence to determine the effect of area-based socioeconomic status on transmission patterns, such as investigating drug susceptibilities and epidemiological linkages that include spatial and temporal associations [48,50]. Since patient residence at diagnosis seems to be a factor in determining clustering, it would be useful to determine whether clusters are proximal to homeless facilities, bars, or other historically important sites of tuberculosis transmission [51].
The findings reported here suggest the importance of understanding not only individual characteristics of patients leading to clustering but also contextual characteristics of neighborhoods. Results of this study stress the need for TB control interventions that focus on highrisk groups within poor neighborhoods. Recently transmitted disease is most likely propagated among a core of hard-to-reach patients in these areas [5,51]. Poverty is likely to concentrate risk factors for TB and limit access to adequate care, fueling transmission. Interventions based on area-based characteristics, such as improving case finding strategies, utilizing location-based screening and addressing social inequalities, could reduce recent rates of transmission.