 Research article
 Open Access
 Published:
A novel Bayesian geospatial method for estimating tuberculosis incidence reveals many missed TB cases in Ethiopia
BMC Infectious Diseases volumeÂ 17, ArticleÂ number:Â 662 (2017)
Abstract
Background
Reported tuberculosis (TB) incidence globally continues to be heavily influenced by expert opinion of case detection rates and ecological estimates of disease duration. Both approaches are recognised as having substantial variability and inaccuracy, leading to uncertainty in true TB incidence and other such derived statistics.
Methods
We developed Bayesian binomial mixture geospatial models to estimate TB incidence and case detection rate (CDR) in Ethiopia. In these models the underlying true incidence was formulated as a partially observed Markovian process following a mixed Poisson distribution and the detected (observed) TB cases as a binomial distribution, conditional on CDR and true incidence. The models use notification data from multiple areas over several years and account for the existence of undetected TB cases and variability in true underlying incidence and CDR. Deviance information criteria (DIC) were used to select the best performing model.
Results
A geospatial model was the best fitting approach. This model estimated that TB incidence in Sheka Zone increased from 198 (95% Credible Interval (CrI) 187, 233) per 100,000 population in 2010 to 232 (95% CrI 212, 253) per 100,000 population in 2014. The model revealed a wide discrepancy between the estimated incidence rate and notification rate, with the estimated incidence ranging from 1.4 (in 2014) to 1.7 (in 2010) times the notification rate (CDR of 71% and 60% respectively). Population density and TB incidence in neighbouring locations (spatial lag) predicted the underlying TB incidence, while health facility availability predicted higher CDR.
Conclusion
Our model estimated trends in underlying TB incidence while accounting for undetected cases and revealed significant discrepancies between incidence and notification rates in rural Ethiopia. This approach provides an alternative approach to estimating incidence, entirely independent of the methods involved in current estimates and is feasible to perform from routinely collected surveillance data.
Background
Population level tuberculosis (TB) prevalence and incidence studies are resource and timeintensive, and so impractical for regular evaluation of TB trends in most settings. Almost universally, data acquired from routine programmatic TB notifications are adjusted by a number of methods, including expert opinion regarding case detection rates (CDR) in the local context, to estimate and report TB incidence [1]. In the few countries with recent and wellconducted prevalence surveys, incidence is calculated from a combination of these survey findings and estimates of the duration of disease, with the latter derived from the prechemotherapy era or from mathematical models [2, 3]. Because of uncertainties in the duration of disease, incidence estimates from these methods differ and optimal methods remain elusive [2]. As a result, significant uncertainty exists regarding the true TB incidence and other programmatically important unobserved values, such as CDR [4].
Given these uncertainties, notification data with imputation of estimated missing cases have been relied on in many epidemiological TB studies, including in spatiotemporal characterization studies [5,6,7,8]. Failure to appropriately account for missed cases could reduce the power of such studies to detect factors affecting populationlevel TB dynamics, and introduce systematic bias of estimates of the effect of covariates towards the null [9]. Both factors could obscure important populationlevel patterns. Similarly, interpretation of maps from these studies is problematic, as spatial patterns in notification data could reflect spatial dynamics in the true underlying incidence or case detection performance. Thus these maps could be biased to areas with better case detection performance, leading to resource misallocation. One such cause of bias is systematic underreporting which could also vary depending on the availability of health care facilities.
To address this bias, we aimed to estimate TB incidence and case detection in Ethiopia using an alternative Bayesian methods based on routinely collected surveillance data, without the need for expert opinion regarding programmatic performance or estimates of duration of disease.
Methods
We collected data on TB patients diagnosed between 2010 and 2014 in all 66 kebeles (the smallest geographical administrative unit in Ethiopia) of Sheka Zone (a remote zone in the country). These TB cases were pooled based on kebele of residence and year. We produced population density using census data and kebele shape files obtained from the Central Statistical Agency (CSA). Data on health facility availability were obtained from the Zonal health department. At the time of data collection, close to 50% of health facilities in the Zone did not have sputum microscopy, and there was no access to chest xray and culture facilities. Further details of the data and the study area are presented elsewhere [10].
Model development
Hidden Markov models (Bayesian geospatial binomial mixture models) were developed to estimate the true underlying but unknown TB incidence and case detection rates. Hidden Markov models(HMMs) are flexible time series models for sequences of observations that are known to be driven by an underlying state which is hidden from the observer yet is in some way predictable, such as being serially autocorrelated, geospatially correlated or through predictor variables that are observable. The value of HMMs is their ability to predict the underlying process; which they do provided the second process (the relationship between hidden and observed data) is one that itself follows rules that can be defined and in turn estimated.
Here, the underlying process is the factors that drive TB incidence in the study region. We have information including the population density, serial data and geospatial position of the kebele. The process that relates incidence to notification is the case detection rate or the proportion of cases that are notified. Given that (by definition) notification is the product of incidence and case detection rate, the natural model choice is the binomial relationship between the observation (notification) and hidden state (incidence). We also allow for kebelespecific effects on incidence, in effect leading to a (higher variance) betabinomial distribution for observed notifications in each kebele.
Such a model when challenged by data can lead to issues of identifiability in that both high incidence/low case detection and low incidence/high case detection can explain the same notification rate. However, with sufficient information (e.g. predictors of incidence, changes over time and further information to assist in the observation model, such as presence of a health centre), the precision of model estimates can increase.
The models are informed by spatially and temporally replicated TB case counts and yield estimates of the true incidence and case detection. The parameters of this state process describe the spatiotemporal variation in incidence, which is considered as a latent variable by the model and is our key output of interest.
The statespace (the true underlying incidence) model
The number of incident TB cases (the latent state) in site i and year j, conditional on the expected mean Î»_{ij} is a realisation of a Poisson distribution, where the expected number Î»_{ij} is a product of the per capita TB rate (Ï€_{ij}, measured as a probability between 0 and 1) and the susceptible population size in year j at site i:
The site index i runs from 1 to 66, representing the 66 kebeles and the year index j runs from 1 to 5. The logit transformed probability of incident TB (Ï€ _{ ij } ) is, in turn, a logitlinear function of the site and yearspecific covariates where population density (X _{ i }), average incidence rate in kebeles that share a border with the index kebele (Z _{ ij } ), and logit transformed incidence rate at a temporal lag of one year (Ï€ _{ ij1 }) with intercept Î² _{ 0 } and slopes Î² _{ 1 } , Î² _{ 2 } and Î² _{ 3 } (eq. 3) were fitted as fixed effects. ExtraPoisson dispersion in the incidence is accounted by specifying two types of random effects: a spatially correlated random effect (É›) and a nonspatially correlated random effect (Î½).
Spatial dependence was introduced into regression in two ways: by introducing spatially structured random error and spatial lag. The spatially structured random effect, É›Â =Â (É›_{1}, â€¦ É›_{66}), accounts for the effect of spatial proximity, with the prior distribution taken as a Gaussian Conditional Autoregressive function (CAR), in which the prior probability distribution of the value of É›_{i} has a mean equal to the weighted average of the neighbouring random effects [11, 12] and variance following an inverse gamma distribution (shapeÂ =Â 0.5, scaleÂ =Â 0.0005). Average TB notification rate in the kebeles adjacent to an index kebele was used to define spatial lag.
The case detection model
Notified TB cases have two sources of variation: variation in the true underlying incidence (Incidence _{ ij }) and variation in the case detection process (P _{ ij } ,). As a description of the detection process giving rise to detected (observed) cases, we assumed detected cases (Y _{ ij }) by health facilities are realisations of a binomial process conditional on the underlying true incidence and case detection probability. The logit transformed case detection probability (P _{ ij }) is a linear function of the covariate of health facility (H) availability.
Hence Î¸_{1} is the log(odds ratio) of detection in the presences of a health centre compared with no health centre. To account for additional heterogeneity in CDR not captured by this covariate, we fitted a normally distributed random effect that differed by kebele and year (É· _{ ij }). The prior distribution of this error term had a mean of zero and a standard deviation drawn from the uniform (0, 5) distribution.
A noninformative uniform prior distribution (âˆ’10, 10) was chosen for all regression coefficients and intercept parameters (other than those already specified) to express the absence of prior information about model parameters.
We used WinBUGS 1.4.3 to fit the model using Markov chain Monte Carlo (MCMC). MCMC is a simulation tool to draw large samples from the Bayesian posterior distribution of parameters that is typically analytically intractable [13]. The model was executed in R version 3.3.1 using the R_{2}WinBUGS library. To enhance convergence of the MCMC sampler, we standardized population density by dividing the difference between each observation and the group mean by their respective standard deviations and also truncated the normal distributions for over dispersion effects to within (âˆ’16, 16) by multiplying with an indicator uniform prior distribution [14]. We ran the model for 250,000 iterations and discarded the first 50,000. We checked whether the priors were too restrictive, by inspecting a histogram of the posterior [15], with no such evidence found.
As we were executing two interdependent logit models, the model initially did not appear to find realistic parameter space for incidence rate and case detection rates (with estimated incidence rates far exceeding observed rates). Thus the model was forced into realistic parameter space [16] by restricting the maximum incidence rate to 1%, corresponding to 1000 per 100,000 per year (this is a realistic upper limit, being five times that of current notification rates and only reported in one country in the world in 2015).
Four candidate Bayesian geospatial models were considered:
Model 1 Covariate only; logit(Ï€ _{ ij })â€‰=â€‰Î²â€‰+â€‰Î² _{ 1 } X _{ i }â€‰+â€‰Î² _{ 2 } Z _{ iÂ j }â€‰+ Î² _{ 3 } logit(Ï€ _{ ijâ€‰âˆ’â€‰1 }).
Model 2 Covariates and nonspatially correlated random effect; logit(Ï€ _{ ij })â€‰=â€‰Î²â€‰+â€‰Î² _{ 1 } X _{ i }â€‰+â€‰Î² _{ 2 } Z _{ iÂ j }â€‰+â€‰Î² _{ 3 } logit(Ï€ _{ ijâ€‰âˆ’â€‰1 })â€‰+â€‰Î½ _{ ij } .
Model 3 Covariate and spatially correlated random effect; logit(Ï€ _{ ij })_{=}Â Î²â€‰+â€‰Î² _{ 1 } X _{ i }â€‰+â€‰Î² _{ 2 } Z _{ iÂ j }â€‰+â€‰Î² _{ 3 } logit(Ï€ _{ ijâ€‰âˆ’â€‰1 })â€‰+â€‰É› _{ i } .
Model 4 Full Model with covariates and all random effects included logit(Ï€ _{ ij })â€‰=â€‰Î²â€‰+â€‰Î² _{ 1 } X _{ i }â€‰+â€‰Î² _{ 2 } Z _{ iÂ j }â€‰+â€‰Î² _{ 3 } logit(Ï€ _{ ijâ€‰âˆ’â€‰1 })â€‰+ Î½ _{ ij }â€‰+â€‰É› _{ i } .
Covariates and random effects were included/excluded to determine if this had an effect on model fit and to determine the extent to which they accounted for the spatial correlation. The effect of spatially correlated random effects was assessed by examining the credible intervals (CrI) of the coefficients of the selected covariates and incidence and case detection estimates.
The deviance information criterion (DIC) statistic was calculated for the models with or without random effect terms to determine if the addition of the geospatial component improved model fit. Maps of the summary statistics of the posterior distributions of predicted incidence and the notified cases were constructed using R.
Goodnessoffit
We conducted posterior predictive checks to evaluate whether the models considered could likely have generated datasets that are similar to our observed dataset. This procedure uses parameter values estimated by the model using observed data to generate simulated data sets. Chisquare fit statistic was calculated to quantify the lack of fit both for the observed data and for the simulated data sets, and a Bayesian Pvalue was calculated to quantify the ratio between the fit statistic for the observed data and that of the simulated (perfect datasets under model assumptions) (AdditionalÂ fileÂ 1). A Bayesian Pvalue close to 0.5 indicates a model fits the data, while a Pvalue close to zero or one suggests poor fit [15, 17, 18].
Results
Model selection
We found that a binomial mixture model containing the spatially structured random effect and covariates was the best fitting model, and that the model with no random effects included was the poorest fitting. When a spatially structured random effect was included into the covariate model, the DIC dropped by more than 230 whereas inclusion of the nonspatially correlated random effect reduced the DIC by only 123 (AdditionalÂ fileÂ 2).
The role of the spatially correlated random effect
As well as the marked reduction in the DIC, the coefficient for the temporal lag in the statespace model became nonsignificant while the credible intervals surrounding other coefficients widened, but remained significant, after inclusion of the geospatial component in the statespace model. Similarly, the inclusion of nonspatially structured random effects resulted in nonsignificant temporal lag. By contrast, the credible intervals surrounding the predicted case detection and incidence rates became more precise when the geospatial component was added to the model (Additional file 2). In the covariate only model, the coefficient for the spatial lag covariate was not statistically significant, but became significant with the inclusion of both random effects.
Incidence and case detection rates
Outputs from the best fitting model are presented in TableÂ 1 and show the estimated TB incidence in Sheka Zone was 198 (95% CrI: 187, 233) per 100,000 per year in 2010 and 232 (95% CrI: 212, 253) per 100,000 per year in 2014.
The model demonstrated a wide discrepancy between the estimated incidence rate and the notification rate, with estimated incidence 1.4 (2014) to 1.7 (2010) times the notification rate, placing CDR at 60% (95% CrI: 49, 72) in 2010 to 71% (95% CrI: 60, 81) in 2014 (Fig.Â 1).
Covariate effects on incidence and detectability
Coefficients of the covariates included in the four candidate hierarchical binomial mixture models are presented in the Additional file and the coefficients for the best fitting model are presented in TableÂ 2. Our methodological approach enabled us to present predictors of incidence and CDR separately.
As shown in Table 2, TB incidence rate was positively associated with population density and spatial lag in the geographically adjacent sites. An increase of 10/100,000/year in average TB incidence in adjacent kebeles predicted a 5.0/100,000/year increment in TB incidence in an index kebele. Similarly, an increase of population size by 10 per square kilometre predicted an increase in TB incidence by 1/100,000/year. Population density remained a significant predictor of TB incidence in all candidate models (Additional file 2). All the models except the covariate only model demonstrated the statistically significant effect of incidence at neighbouring locations. In the best fitting model, TB incidence was not significantly related to incidence at a temporal lag of one year (Table 2).
On the other hand, CDR was related to the presence of a health facility in the best fitting model, as well as in all other candidate models considered in this study. The estimated case detection rate in kebeles with no health facility was 59% (95% CrI: 55, 63), while the rate was 69% (95% CrI: 62, 77) in kebeles with a health facility.
Spatial distribution
Maps of the spatial distribution of estimated incidence and notification rates of TB presented in Fig. 2 revealed the presence of undetected TB cases at kebele level in Sheka Zone. The patterns observed in the maps of incidence and notifications appeared broadly correlated. However, the incidence map shows areas of considerably greater burden and identifies new, previously unrecognised areas of high burden. The incidence map identified many rural kebeles without a health facility and urban kebeles as high burden locations, in contrast to the notification map that identified mainly urban kebeles. These locations corresponded to kebeles with high population density and were surrounded by high incidence kebeles. The maps illustrate that estimated incidence rates are higher than notification rates, highlighting that notification data markedly underestimate incidence.
Goodness of fit of the model
We calculated a Bayesian Pvalue by comparing a Pearson chisquare for both the simulated datasets and the actual dataset. The proportion of times the discrepancy measure for simulated data is greater than the actual data (a Bayesian Pvalue) from a posterior predictive check was 0.40. The Bayesian Pvalue suggests that the model fits the observed data satisfactorily [15, 18]. As an internal validity check, we simulated using posterior parameter values whether our model could reproduce the notification data that were previously used to estimate the model parameters. Comparison of notification data with simulated data indicated a close fit (Fig.Â 3). In addition, comparison of priors with their posteriors also indicated that our estimates were entirely datadriven (Fig.Â 4).
Discussion
The Bayesian hidden Markov model approach used in this study provided a means to estimate both TB incidence and case detection rate, and identified previously unrecognised TB hotspots in rural Ethiopia. To our knowledge, this is the first report applying spatially explicit binomial mixture methods to estimate TB incidence and case detection using only case notifications distributed over space and time.
By using Bayesian inference, we were able to incorporate the spatial dependence structure, random effects and a hidden state (true TB incidence) into our model. The improved fit achieved by including a geospatial term highlights the importance of accounting for spatial correlation in TB studies [8]. In this study, incorporation of the spatial dependence structure changed estimated relationships between covariates and incidence: some relationships changed from significant to nonsignificant, while others remained significant but with reduced precision. Like others [19, 20], we conclude that failing to account for spatial effects may lead to inflated regression coefficients and spuriously narrow credible intervals.
In contrast to previous analyses that described patterns in the notified cases (as a proxy for incidence) [7, 8], our model explicitly separated the true incidence (hidden state) and observation process (case detection). A high or increasing notification rate could reflect either an efficient health system rapidly diagnosing all incident cases or a poor health system failing to detect patients quickly enough to gain control of the epidemic. As these two phenomena lie at opposite poles of programmatic TB control and would be associated with profoundly different case detection rates, it is critical to distinguish them objectively and without the need for opinionbased estimates of case detection [21].
Applied to five years of TB surveillance data in Sheka Zone Ethiopia, our model demonstrated a wide discrepancy between the estimated incidence rate and notification rate in areas with no health centres, which is consistent with WHO estimates [22, 23], highlighting the importance of strengthening surveillance systems to reduce missed cases [23, 24]. Unlike the WHO estimates, however, we estimated that the incidence of tuberculosis is increasing, and that increased notification rates reflect both increased detection and increased incidence.
The estimated TB incidence in this study is highly spatially heterogeneous, replicating previous reports [10, 20, 25], and associated with average incidence in the neghbouring kebeles (spatial lag) and population density, reflecting an extended duration of contact between individuals as a driver of TB epidemiology in highburden settings [26, 27]. In contrast, TB incidence at a temporal lag of one year in this study failed to replicate the statistically significant association observed in the previous analysis using GLMs [10].
In our study, health facility availability predicted high TB case detection. Higher notification rates in some of the studied kebeles were attributable to both underlying higher incidence and higher case detection in the setting of health facility availability.
In contrast to methods used by WHO to estimate TB incidence, our approach uses only routinely collected surveillance data and does not incorporate costly prevalence surveys and expert opinion regarding CDR, which is criticized for its insensitivity to recent changes and the potential for bias [1]. Expert opinion is a potentially important strategy to use in a Bayesian models where information is lacking, but has a recognised limitations in accurate adjustment of disease estimates [3] and is widely considered the lowest standard of empirical evidence [28]. In addition, it does not exist at fine geographical levels, such as the one we are assessing here, and so is impractical to use in this context. Hence, our approach uses expert opinion only for the purpose of developing a vague prior probability distribution. We aim to improve on expert opinion through the use of the structured hidden Markov Model and so to develop a feasible alternative approach that can be used for regular monitoring of TB incidence and is reactive to recent changes.
Our case detection model assumes that individual TB cases are detected at a fixed rate and independently (conditional on a given incidence). This assumption may not be valid in the case of concerted efforts at contact tracing. However, as contact tracing is not systematically implemented in Ethiopia, the proportion of cases in our study arising from contact tracing would be small and this assumption would be a minor consideration. Moreover, we included a random effect term (É·_{ij}) in our model to allow for changes in detection rate over place and time. Sparse data were accounted for by including random effects to explain extramodel variability. Further work is in progress to build models that account for over dispersion, which may arise from nonindependent detections of individuals.
Conclusions
By using Binomial mixture models, we were able to investigate different epidemiological questions related to the size, trend and predictors of TB incidence and CDR. Our model demonstrated a wide discrepancy between incidence rate and notification rates, and identified previously unrecognised TB hotspots in rural Ethiopia, but was broadly consistent with official estimates. This approach provides an alternative approach to estimating incidence, entirely independent of the methods involved in current case detection rate estimates and is feasible to perform from routinely collected surveillance data.
Abbreviations
 CDR:

Case detection rate
 CrI:

Credible Interval
 DIC:

Deviance information criteria
 TB:

Tuberculosis
References
Avilov KK, Romanyukha AA, Borisov SE, Belilovsky EM, Nechaeva OB, Karkach AS. An approach to estimating tuberculosis incidence and case detection rate from routine notification data. Int J Tuberc Lung Dis. 2015;19(3):288â€“94.
Glaziou P, Sismanidis C, Zignol M, Floyd K: Methods used by WHO to estimate the global burden of TB disease. 2016.
World Health Organization. Global tuberculosis report 2016. Geneva: World Health Organization; 2016.s
Borgdorff MW. New Measurable Indicator for Tuberculosis Case Detection. Emerg Infect Dis. 2004;10(9):1523â€“8.
Touray K, Adetifa I, Jallow A, Rigby J, Jeffries D, Cheung Y, Donkor S, Adegbola R, Hill P. Spatial analysis of tuberculosis in an urban west African setting: is there evidence of clustering? Tropical Med Int Health. 2010;15(6):664â€“72.
Maciel ELN, Pan W, Dietze R, Peres RL, Vinhas SA, Ribeiro FK, Palaci M, Rodrigues RR, Zandonade E, Golub JE. Spatial patterns of pulmonary tuberculosis incidence and their relationship to socioeconomic status in Vitoria, Brazil. Int J Tuberc Lung Dis. 2010;14(11):1395â€“402.
Randremanana RV, Richard V, Rakotomanana F, Sabatier P, Bicout DJ. Bayesian mapping of pulmonary tuberculosis in Antananarivo, Madagascar. BMC Infect Dis. 2010;10:21â€“21.
Souza WV, Carvalho MS, Albuquerque Mde F, Barcellos CC, Ximenes RA. Tuberculosis in intraurban settings: a Bayesian approach. Tropical Med Int Health. 2007;12(3):323â€“30.
KÃ©ry M, Schaub M. Bayesian population analysis using WinBUGS: a hierarchical perspective. Oxford: Academic Press; 2012.
Shaweno D, Shaweno T, Trauer J, Denholm J, McBryde E. Heterogeneity of distribution of tuberculosis in Sheka Zone, Ethiopia: drivers and temporal trends. Int J Tuberc Lung Dis. 2017;21(1):79â€“85.
Huang X, Clements AC, Williams G, Mengersen K, Tong S, Hu W. Bayesian estimation of the dynamics of pandemic (H1N1) 2009 influenza transmission in Queensland: A spaceâ€“time SIRbased model. Environ Res. 2016;146:308â€“14.
Lawson AB. Statistical methods in spatial epidemiology. 2nd ed. England: Wiley; 2013.
KÃ©ry M, Royle JA. Hierarchical Bayes estimation of species richness and occupancy in spatially replicated surveys. J Appl Ecol. 2008;45(2):589â€“98.
Kâ€™ery M, Royle A. Inference About Species Richness and Community Structure Using SpeciesSpecific Occupancy Models in the National Swiss Breeding Bird Survey MHB. In: Modeling Demographic Processes in Marked Populations. Volume 3, edn. Edited by Patil GP, Gregoire TG, Lawson AB, Nussbaum BD. Newyork, NY: Springer; 2009:639â€“656.
KÃ©ry M. Introduction to WinBUGS for ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses. Oxford: Academic Press; 2010.
World Health Organization. Global tuberculosis report 2015. Geneva: World Health Organization; 2015.
Martin J, Royle JA, Mackenzie DI, Edwards HH, Kery M, Gardner B. Accounting for nonindependent detection when estimating abundance of organisms with a Bayesian approach. Methods Ecol Evol. 2011;2(6):595â€“601.
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis, vol. vol. 2. FL, USA: Chapman & Hall/CRC Boca Raton; 2014.
Clements ACA, Lwambo NJS, Blair L, Nyandindi U, Kaatano G, Kinung'hi S, Webster JP, Fenwick A, Brooker S. Bayesian spatial analysis and disease mapping: tools to enhance planning and implementation of a schistosomiasis control programme in Tanzania. Tropical Med Int Health. 2006;11(4):490â€“503.
Congdon P. Spatiotemporal Frameworks for Infectious Disease Diffusion and Epidemiology. Int J Environ Res Public Health. 2016;13:1261. doi:10.3390/ijerph13121261.
Strengthening Health systems to stop TB: A Peoplecentered Approach. https://www.msh.org/resources/strengtheninghealthsystemstostoptbapeoplecenteredapproach. Accessed 3 June 2016.
World Health Organization. Global tuberculosis report 2013. Geneva: World Health Organization; 2013.
Gains in tuberculosis control at risk due to 3 million missed patients and drug resistance. http://www.who.int/mediacentre/news/releases/2013/tuberculosisreport2013/en/. Accessed 2 Feb 2017.
Claassens M, Jacobs E, Cyster E, Jennings K, James A, Dunbar R, Enarson D, Borgdorff M, Beyers N. Tuberculosis cases missed in primary health care facilities: should we redefine case finding? Int J Tuberc Lung Dis. 2013;17(5):608â€“14.
Dowdy DW, Golub JE, Chaisson RE, Saraceni V. Heterogeneity in tuberculosis transmission and the role of geographic hotspots in propagating epidemics. Proc Natl Acad Sci U S A. 2012;109(24):9557â€“62.
World Health Organization. WHO policy on TB infection control in healthcare facilities, congregate settings and households. Geneva: World Health Organization; 2009.
Classen CN, Warren R, Richardson M, Hauman JH, Gie RP, Ellis JH, van Helden PD, Beyers N. Impact of social interactions in the community on the transmission of tuberculosis in a high incidence area. Thorax. 1999;54(2):136â€“40.
Donaldson SI, Christie CA, Mark MM. What counts as credible evidence in applied research and evaluation practice?. USA: Sage; 2009.
Acknowledgements
The authors thank Sheka Zone Health Department, Ethiopia, and all health centres for permission to access the data; and the Ethiopian Central Statistical Agency, Ethiopia, for the provision of kebele polygons. The authors are grateful to Mr.Tamrat Shaweno, Mr. Elesa Moges and Mr. Sewnet Dasho for data collection and coordination.
Availability of data and materials
The spatial data that support the findings of this study were made available by the Central Statistical Agency (CSA) of Ethiopia, and so are not publicly available. The code required to run the above analysis using R2WinBUGS library is provided as an additional file.
Author information
Authors and Affiliations
Contributions
DS and EM wrote the code in WinBUGS, DS drafted the initial study concept and EM, JD & JT refined this further. DS wrote the initial draft of this manuscript, and all authors provided input into revisions and approved the final draft and submission for publication.
Corresponding author
Ethics declarations
Ethical approval and consent to participate
The study was approved by the Melbourne University Health Sciences Human Ethics Subcommittee, Melbourne, VIC, Australia, and the Zonal Health Department, Sheka Zone, Masha, Ethiopia.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Model code. (DOCX 14Â kb)
Additional file 2:
Outputs from Candidate Models. (DOCX 21Â kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Shaweno, D., Trauer, J.M., Denholm, J.T. et al. A novel Bayesian geospatial method for estimating tuberculosis incidence reveals many missed TB cases in Ethiopia. BMC Infect Dis 17, 662 (2017). https://doi.org/10.1186/s1287901727590
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287901727590
Keywords
 Tuberculosis
 Incidence
 Spatial analysis
 Binomial mixture models