We collected data on TB patients diagnosed between 2010 and 2014 in all 66 *kebeles* (the smallest geographical administrative unit in Ethiopia) of Sheka Zone (a remote zone in the country). These TB cases were pooled based on *kebele* of residence and year. We produced population density using census data and *kebele* shape files obtained from the Central Statistical Agency (CSA). Data on health facility availability were obtained from the Zonal health department. At the time of data collection, close to 50% of health facilities in the Zone did not have sputum microscopy, and there was no access to chest x-ray and culture facilities. Further details of the data and the study area are presented elsewhere [10].

### Model development

Hidden Markov models (Bayesian geospatial binomial mixture models) were developed to estimate the true underlying but unknown TB incidence and case detection rates. Hidden Markov models(HMMs) are flexible time series models for sequences of observations that are known to be driven by an underlying state which is hidden from the observer yet is in some way predictable, such as being serially auto-correlated, geospatially correlated or through predictor variables that are observable. The value of HMMs is their ability to predict the underlying process; which they do provided the second process (the relationship between hidden and observed data) is one that itself follows rules that can be defined and in turn estimated.

Here, the underlying process is the factors that drive TB incidence in the study region. We have information including the population density, serial data and geospatial position of the *kebele*. The process that relates incidence to notification is the case detection rate or the proportion of cases that are notified. Given that (by definition) notification is the product of incidence and case detection rate, the natural model choice is the binomial relationship between the observation (notification) and hidden state (incidence). We also allow for *kebele*-specific effects on incidence, in effect leading to a (higher variance) beta-binomial distribution for observed notifications in each *kebele*.

Such a model when challenged by data can lead to issues of identifiability in that both high incidence/low case detection and low incidence/high case detection can explain the same notification rate. However, with sufficient information (e.g. predictors of incidence, changes over time and further information to assist in the observation model, such as presence of a health centre), the precision of model estimates can increase.

The models are informed by spatially and temporally replicated TB case counts and yield estimates of the true incidence and case detection. The parameters of this state process describe the spatiotemporal variation in incidence, which is considered as a latent variable by the model and is our key output of interest.

### The state-space (the true underlying incidence) model

The number of incident TB cases (the latent state) in site *i* and year *j,* conditional on the expected mean λ_{ij} is a realisation of a Poisson distribution, where the expected number λ_{ij} is a product of the per capita TB rate (π_{ij}, measured as a probability between 0 and 1) and the susceptible population size in year *j* at site *i*:

$$ {Incidence}_{ij}\sim Poisson\ \left({\lambda}_{ij}\right) $$

(1)

$$ {\lambda}_{ij}={\pi}_{ij}\times {Population}_{ij}, $$

(2)

The site index *i* runs from 1 to 66, representing the 66 *kebeles* and the year index *j* runs from 1 to 5. The logit transformed probability of incident TB (*π*
_{
ij
}
*)* is, in turn, a logit-linear function of the site- and year-specific covariates where population density (*X*
_{
i
}), average incidence rate in *kebeles* that share a border with the index *kebele (Z*
_{
ij
}
*)*, and logit transformed incidence rate at a temporal lag of one year *(π*
_{
ij-1
}) with intercept *β*
_{
0
} and slopes *β*
_{
1
}
*, β*
_{
2
} and *β*
_{
3
} (eq. 3) were fitted as fixed effects. Extra-Poisson dispersion in the incidence is accounted by specifying two types of random effects: a spatially correlated random effect (ɛ) and a non-spatially correlated random effect (ν).

$$ \log it\left({\pi}_{ij}\right)=\beta +{\beta}_1{X}_i+{\beta}_2{Z}_{ij}+{\beta}_3\log it\left({\pi}_{ij-1}\right)+{v}_{ij}+{\varepsilon}_i $$

(3)

Spatial dependence was introduced into regression in two ways: by introducing spatially structured random error and spatial lag. The spatially structured random effect, **ɛ** = (ɛ_{1}, … ɛ_{66}), accounts for the effect of spatial proximity, with the prior distribution taken as a Gaussian Conditional Autoregressive function (CAR), in which the prior probability distribution of the value of ɛ_{i} has a mean equal to the weighted average of the neighbouring random effects [11, 12] and variance following an inverse gamma distribution (shape = 0.5, scale = 0.0005). Average TB notification rate in the *kebeles* adjacent to an index *kebele* was used to define spatial lag.

### The case detection model

Notified TB cases have two sources of variation: variation in the true underlying incidence (*Incidence*
_{
ij
}) and variation in the case detection process (*P*
_{
ij
}
*,*). As a description of the detection process giving rise to detected (observed) cases, we assumed detected cases (*Y*
_{
ij
}) by health facilities are realisations of a binomial process conditional on the underlying true incidence and case detection probability. The logit transformed case detection probability (*P*
_{
ij
}) is a linear function of the covariate of health facility (*H*) availability.

$$ {Y}_{ij}\sim Binomial\left({P}_{ij},{Incidence}_{ij}\right), $$

(4)

$$ \log it\left({P}_{ij}\right)=\theta +{\theta}_1{H}_i+{\omega}_{ij} $$

(5)

Hence θ_{1} is the log(odds ratio) of detection in the presences of a health centre compared with no health centre. To account for additional heterogeneity in CDR not captured by this covariate, we fitted a normally distributed random effect that differed by *kebele* and year (*ɷ*
_{
ij
}). The prior distribution of this error term had a mean of zero and a standard deviation drawn from the uniform (0, 5) distribution.

A non-informative uniform prior distribution (−10, 10) was chosen for all regression coefficients and intercept parameters (other than those already specified) to express the absence of prior information about model parameters.

We used WinBUGS 1.4.3 to fit the model using Markov chain Monte Carlo (MCMC). MCMC is a simulation tool to draw large samples from the Bayesian posterior distribution of parameters that is typically analytically intractable [13]. The model was executed in R version 3.3.1 using the R_{2}WinBUGS library. To enhance convergence of the MCMC sampler, we standardized population density by dividing the difference between each observation and the group mean by their respective standard deviations and also truncated the normal distributions for over dispersion effects to within (−16, 16) by multiplying with an indicator uniform prior distribution [14]. We ran the model for 250,000 iterations and discarded the first 50,000. We checked whether the priors were too restrictive, by inspecting a histogram of the posterior [15], with no such evidence found.

As we were executing two interdependent logit models, the model initially did not appear to find realistic parameter space for incidence rate and case detection rates (with estimated incidence rates far exceeding observed rates). Thus the model was forced into realistic parameter space [16] by restricting the maximum incidence rate to 1%, corresponding to 1000 per 100,000 per year (this is a realistic upper limit, being five times that of current notification rates and only reported in one country in the world in 2015).

Four candidate Bayesian geospatial models were considered:

Model 1 Covariate only; *logit*(*π*
_{
ij
}) = *β* + *β*
_{
1
}
*X*
_{
i
} + *β*
_{
2
}
*Z*
_{
i j
} + *β*
_{
3
}
*logit*(*π*
_{
ij − 1
})*.*

Model 2 Covariates and non-spatially correlated random effect; *logit*(*π*
_{
ij
}) = *β* + *β*
_{
1
}
*X*
_{
i
} + *β*
_{
2
}
*Z*
_{
i j
} + *β*
_{
3
}
*logit*(*π*
_{
ij − 1
}) + *ν*
_{
ij
}
*.*

Model 3 Covariate and spatially correlated random effect; *logit*(*π*
_{
ij
})_{=} *β* + *β*
_{
1
}
*X*
_{
i
} + *β*
_{
2
}
*Z*
_{
i j
} + *β*
_{
3
}
*logit*(*π*
_{
ij − 1
}) + *ɛ*
_{
i
}
*.*

Model 4 Full Model with covariates and all random effects included *logit*(*π*
_{
ij
}) = *β* + *β*
_{
1
}
*X*
_{
i
} + *β*
_{
2
}
*Z*
_{
i j
} + *β*
_{
3
}
*logit*(*π*
_{
ij − 1
}) + *ν*
_{
ij
} + *ɛ*
_{
i
}
*.*

Covariates and random effects were included/excluded to determine if this had an effect on model fit and to determine the extent to which they accounted for the spatial correlation. The effect of spatially correlated random effects was assessed by examining the credible intervals (CrI) of the coefficients of the selected covariates and incidence and case detection estimates.

The deviance information criterion (DIC) statistic was calculated for the models with or without random effect terms to determine if the addition of the geospatial component improved model fit. Maps of the summary statistics of the posterior distributions of predicted incidence and the notified cases were constructed using R.

### Goodness-of-fit

We conducted posterior predictive checks to evaluate whether the models considered could likely have generated datasets that are similar to our observed dataset. This procedure uses parameter values estimated by the model using observed data to generate simulated data sets. Chi-square fit statistic was calculated to quantify the lack of fit both for the observed data and for the simulated data sets, and a Bayesian *P*-value was calculated to quantify the ratio between the fit statistic for the observed data and that of the simulated (perfect datasets under model assumptions) (Additional file 1). A Bayesian P-value close to 0.5 indicates a model fits the data, while a P-value close to zero or one suggests poor fit [15, 17, 18].