We first perform a simple exploratory data analysis, to investigate visually relations between new malaria cases, rain and mosquito abundance data. We then study the association between the longitudinal measurements and the event of interest through a Bayesian joint model. The mosquito abundance represents the longitudinal component, while the time to malaria is the survival part of the joint model. Further explanations on the derivation of the joint models can be found in the book [13].

### Longitudinal component for anopheles mosquitoes abundance

The time unit in our models is the week *t*. Let *x*
_{
i
}(*t*) be the measured abundance of mosquitoes at week *t* for child *i*, as measured in her/his village. As abundance data were measured once every month, *x*
_{
i
}(*t*) is constant over four week periods. Let *k*(*i*)∈1,2,…,16 indicate the village of child *i*. We assume

$$ x_{i}(t) = N_{k(i)}(t) + \epsilon_{i}(t), $$

(1)

where *N*
_{
k(i)}(*t*) is a real valued latent process describing a regularized version of the mosquito abundance at time *t* in the village *k*(*i*) of child *i*, once a measurement error \(\epsilon _{i}(t) \sim {\mathcal {N}} (0, \sigma ^{2})\), independent of the other variables, is subtracted from the actual counts.

Next we define a useful weather related covariate. In Ethiopia there are three yearly seasons, namely Bega (dry season), Kiremt (long rainy season) and Belg (short rainy season), based on the magnitude and distribution of rainfall across the 12 months of the year [11]. In [10] these three seasons where identified for the two years and for the region of southwest Ethiopia in our study as the following seven climate periods:

$$ {\begin{aligned} \text{period}(t)= \left\{\begin{array}{ll} 1, &\text{when t} \in \text{[weeks in July] in 2008}\\ 2, & \text{when t} \in \text{[weeks from August to November] in 2008} \\ 3, &\text{when t} \in \text{[weeks from December to March] in 2009}\\ 4, &\text{when t} \in \text{[weeks from April to July] in 2009 }\\ 5, &\text{when t} \in \text{[weeks from August to November] in 2009} \\ 6, &\text{when t} \in \text{[weeks from December to March] in 2010} \\ 7, &\text{when t} \in \text{[weeks from April to June] in 2010}. \\ \end{array}\right. \end{aligned}} $$

(2)

The first and last periods are shorter than others, which last four months. See also [11] about this definition of the seasonal periods.

The daily rainfall (in mm/day) precip(*s*) is available for each day *s*, but we found it useful to use smoothed versions. In our model the best fit, based on comparison of the Deviance Information Criterion (DIC) [14], was obtained when averaging over the above climate periods: we define

$$rain(t) = \frac{1}{\text{number of days in period}(t)} \sum_{s \in \text{period}(t)} \text{precip}(s) $$

In order to smooth the step function *r*
*a*
*i*
*n*(*t*), we approximated it with a natural cubic spline with three degrees of freedom and denote as *S*
_{1}(*r*
*a*
*i*
*n*(*t*)) and *S*
_{2}(*r*
*a*
*i*
*n*(*t*)) the two B-spline base function components of *r*
*a*
*i*
*n*(*t*), see [12] and [15] for details. In our data there is one global measure of rainfall shared by all children in all villages, which explains the seasonal pattern of the mosquito abundance. Let *t*
*e*
*m*
*p*(*t*) be the monthly average temperature and *h*
*u*
*m*
*i*
*d*(*t*) the monthly average humidity of the month of week *t*. We denote by *c*
*o*
*r*
*r*
*u*
*g*
*a*
*t*
*e*
_{
i
} the variable taking value 1 if the house of child *i* had a corrugate iron roof, 0 otherwise. Finally, let *d*
*i*
*s*
*t*
_{
k(i)} be the distance from the center of the village *k*(*i*) of child *i* to the closest dam shore.

For the latent abundance *N*
_{
k(i)}(*t*) we assume the following random effect model:

$$ \begin{array}{r} N_{k(i)}(t) = \beta_{0} + \beta_{1} S_{1}(rain(t)) + \beta_{2}S_{2}(rain(t))+\beta_{3} dist_{k(i)} \\ \beta_{4} temp(t) + \beta_{5} humid(t) + \beta_{6} corrugate_{i} \\ + w_{i0} + w_{i1} S_{1}(rain(t)) + w_{i2}S_{2}(rain(t)), \end{array} $$

(3)

where *β*=(*β*
_{0},*β*
_{1},*β*
_{2},*β*
_{3},*β*
_{4},*β*
_{5},*β*
_{6}) is the vector of fixed effects and *w*
_{
i
}=(*w*
_{
i0},*w*
_{
i1},*w*
_{
i2}) is the vector of random coefficients with three component. We assume that *w*
_{
i
} is a priori normally distributed with zero means and covariance matrix *D*

$$w_{i} = (w_{i0}, w_{i1}, w_{i2}) \sim \mathcal{N}_{3} (0, D), $$

where *D* is specified below. We also tested a model without the distance as covariate.

### Time to event model component for malaria contraction

Let *T*
_{
i
} denote the time (in weeks) to malaria infection for child *i*, which can be censored at the study end point in week 100. We used a Cox proportional hazard model [16], where the hazard of child *i* at time *t* is

$$ \begin{aligned} h_{i}(t)&=h_{0}(t)\exp \left\{ \theta_{1} age_{i}+ \theta_{2} gender_{i}+\alpha_{1} { N_{k(i)}(t)}\right.\\ &\qquad\qquad\qquad\left.+ \alpha_{2} N_{k(i)}(t)I_{k(i)}(t) \right\}. \end{aligned} $$

(4)

Here *a*
*g*
*e*
_{
i
} is the age of child *i* at week one, *g*
*e*
*n*
*d*
*e*
*r*
_{
i
} the gender of the child and the covariate *I*
_{
k(i)}(*t*) is the total number of new malaria cases in the village *k*(*i*) of child *i* in the last three weeks before and including week *t*. This is a proxy for a local incidence of malaria because the infectious period of malaria infected children lasts approximately 3 weeks. The term *N*
_{
k(i)}(*t*)*I*
_{
k(i)}(*t*) represents the interaction between mosquito abundance and incidence. We also tested separately the incidence with a two weeks windows. As suggested in [12], we model the baseline logarithm of the hazard *h*
_{0}(*t*) with a B-spline with a quite large number of knots, 15 in our analysis, in order to allow for flexibility. Knots are equally spaced according to the percentiles of the observed event times. We assume

$$\log h_{0}(t) = \phi_{0} + \sum_{z=1}^{15}\phi_{z}B_{z}(t,\nu) $$

where *B*
_{
z
}(*t*,*ν*) denotes the *z*
^{th} basis function of a B-spline with knots *ν*=*ν*
_{1},*ν*
_{2},...,*ν*
_{15} and *ϕ* is the vector of all spline coefficients, which are then penalised for smoothness, penalising with the integral of the second derivative of the fitted log hazard, as explained in [12], using a penalty coefficient denoted by *λ*.

### Priors

The prior probability distribution for the fixed effects (*β*
_{0},*β*
_{1},*β*
_{2},*β*
_{3},*β*
_{4},*β*
_{5},*β*
_{6}) of the linear mixed effects model is assumed to be normal with means equal to zero and a seven by seven precision matrix, i.e. the inverse of the covariance matrix, with diagonal elements equal to 0.01 and zero otherwise.

The prior probability distribution for the regression coefficients (*θ*
_{1},*θ*
_{2}) of the survival model in (4) is assumed normal with mean zero and a 2 by 2 precision matrix with diagonal 0.01 and zero otherwise.

A normal probability distribution prior is also assumed for the association parameters (*α*
_{1},*α*
_{2}) in the survival model, again with means equal to zero and a 2 by 2 precision matrix with diagonal elements 0.1 and zero otherwise.

For the parameters of the log baseline hazard function we follow [17] and assume for the coefficients *ϕ* of the B-spline an improper prior and for the smoothing parameter *λ* a Gamma hyperprior as follows:

$$P(\phi| \lambda) \propto \lambda^{\frac{r(A)}{2}}\exp\bigg(-\frac{\lambda}{2}\phi^{T}A\phi \bigg) $$

where \(A = \Delta _{2}^{T}\Delta _{2}\), where *Δ*
_{2} is the order two difference penalty matrix and *r*(*A*) denotes the rank of *A*. For the smoothing parameter *λ* we assume a Gamma (1,0.005) prior, which leads to a proper posterior for *ϕ* ([17]). More detailed information can be found in [12, 15].

The prior for the precision matrix *D*
^{−1} of the the random effects in the linear mixed model is a Wishart distribution, because it is the conjugate prior for the precision matrix in Gaussian models [18]. For the hyperprior we performed a preliminary run of the *lme*-function in *R* for model (1) and (3) to obtain an estimate of *D*
^{−1}, which we used to fix the scale matrix in the prior as follows:

$$D^{-1} \sim Wish\left(\left[\begin{array}{lll} 53903.4 & 1980.7 & -1096.7\\ 1980.7 & 75.9 & -42.1\\ -1096.7 & -42.1 & 25.6\end{array}\right], 3\right).$$

We also used this scale matrix as starting point of the MCMC, to speed up convergence.

A similar technique was used for the prior probability distribution of the precision parameter (*σ*
^{2})^{−1}. This was assumed to be Gamma distributed with hyper-parameters obtained by running the *lme*-function for the mixed effect model (1) and (3). The idea of using a preliminary run in order to fix hyperparameters in the prior in an empirical Bayesian fashion, has been suggested in [12], where also a further general discussion on the choice of priors in the Bayesian joint model is given. See also [19].

### Implementation

We used the R-package JMBayes [12] for our analysis. It implements a full scale MCMC algorithm to sample all parameters in the posterior model. Briefly, the first step is to run the linear mixed effect model on its own and the Cox proportional hazard model with only gender and age as covariates, in order to obtain good starting values for most parameters in the MCMC. Then a Metropolis Hasting algorithm is started, with random walk proposals. Convergence of the MCMC was tested using trace plots and checking robustness of estimates when repeating the runs with different random starting points. We concluded that it was appropriate to run 200,000 iterations in our analysis, after a burn-in of 120,000 iterations. For further details on the algorithm see [12]. Credible intervals are used to describe the variability of the posterior around the posterior mean.