Seasonality and the effects of weather on Campylobacter infections

Background Campylobacteriosis is a major public health concern. The weather factors that influence spatial and seasonal distributions are not fully understood. Methods To investigate the impacts of temperature and rainfall on Campylobacter infections in England and Wales, cases of Campylobacter were linked to local temperature and rainfall at laboratory postcodes in the 30 days before the specimen date. Methods for investigation included a comparative conditional incidence, wavelet, clustering, and time series analyses. Results The increase of Campylobacter infections in the late spring was significantly linked to temperature two weeks before, with an increase in conditional incidence of 0.175 cases per 100,000 per week for weeks 17 to 24; the relationship to temperature was not linear. Generalized structural time series model revealed that changes in temperature accounted for 33.3% of the expected cases of Campylobacteriosis, with an indication of the direction and relevant temperature range. Wavelet analysis showed a strong annual cycle with additional harmonics at four and six months. Cluster analysis showed three clusters of seasonality with geographic similarities representing metropolitan, rural, and other areas. Conclusions The association of Campylobacteriosis with temperature is likely to be indirect. High-resolution spatial temporal linkage of weather parameters and cases is important in improving weather associations with infectious diseases. The primary driver of Campylobacter incidence remains to be determined; other avenues, such as insect contamination of chicken flocks through poor biosecurity should be explored. Electronic supplementary material The online version of this article (10.1186/s12879-019-3840-7) contains supplementary material, which is available to authorized users.


S1 GEST model
The generalized structural time series (GEST) model assumes that, conditional on the past, the response variable Y t comes from a parametric distribution with probability (density) function f Yt (y t |θ t ), where θ t is a vector of unknown distribution parameters. Here θ t is restricted to two parameters: θ t = (µ t , σ t ), where µ t is in general a location parameter, σ t a scale parameter. Each parameter (µ t , σ t ) is modelled by a structural time series model and/or linear, non-linear or smooth non-parametric models to account for explanatory variables. Each structural model is a random walk or autoregressive model, and a random seasonality.
Definition: Let Y t be the dependent response variable for t = 1, 2, . . . , T then the GEST model of two parameters (µ t , σ t ), defined as: s 1,t−m + w 1,t g 2 (σ t ) = x 2,t β 2 + γ 2,t + s 2,t γ 2,t = J1 j=1 φ 2,j γ 2,t−j + b 2,t s 2,t = − M −1 m=1 s 2,t−m + w 2,t (S1) where D represents the conditional distribution of the response variable, for k = 1, 2, g k is a known link function (e.g., identity or log link function), β k is a parameter vector of length p k , x k,t are explanatory variable vectors, γ k,t is the unobserved autoregressive (or random walk) trend of order J k , s k,t is the time-varying seasonality, b k,t and w k,t are independently distributed disturbance terms with mean zero and variances

S2 Application of GEST to Campylobacter infections
Here Y t is the weekly number of cases of Campylobacter infections in England and Wales from February 2005 to December 2009, D = N BI(µ t , σ t ) is a negative binomial type I distribution, the explanatory variables are maximum, minimum, and average temperature and the total rainfall of, one, two, three and four weeks before diagnosis, the dispersion parameter is assumed to be constant.
S2.1 GEST with fixed coefficient of temperature and rainfall where ζ t = 1, 2, . . . , T is a fixed linear trend of time, s 1,t is the random seasonality with frequency M = 52 and a random error term w 1,t ∼ N T 0, σ 2 w I T , h and l are the number of weeks before diagnosis. Hence, the predictive mean number of cases of Campylobacter infection per week is: The coefficient of average temperature of two weeks before is positive and highly statistically significant, indicating an increase in the mean number of cases of Campylobacter due to the temperature of two weeks before. The coefficient of the total rainfall is negative and statistically significant, indicating a decrease in the mean number of cases of Campylobacter due to the total rainfall of one week before.  .000 .000 .000 .000 .000 .000 .000 .000 Ave temperature 2 weeks before .000 .049 .000 .000 .180 .000 .000 .069 .964 .000 .375 Total rainfall 1 week before .009 .563 .428 .085 .232 .396 .342 .483 .032 .009 .059 Trend .000 .000 .358 .000 .000 .000 .000 .000 .000 .000 .000 Max temperature 2 weeks before .000 .000 .000 .000 .000 .002 .000 .002 .222 .000 .188 Total rainfall 1 week before .020 .599 .607 .103 .210 .334 .480 .535 .030 .028 .068 Trend .000 .000 .544 .000 .000 .000 .000 .000 .000 .000 .000 Min temperature 2 weeks before .000 .048 .000 .043 .017 .001 .000 .213 .777 .000 .005 Total rainfall 1 week before .038 Table S2 gives the p values at 5% significance level of the estimated parameters of the GEST with fixed coefficient of temperature and rainfall for national and regional weekly cases of Campylobacter in England and Wales. Three models of GEST were fitted with different measures of temperatures (average, maximum and minimum) of two weeks before. The table shows a strong statistical significance of the temperature in the prediction of Campylobacter cases in England and Wales, both at a regional and national level.
is the interaction of a varying coefficient term, weeks, with the temperature plus a nonparametric function of the temperature, where weeks is a factor with 13 levels, each level has four weeks period, weeks = (1, 1, 1, 1, 2, 2, 2, 2, ...., 13, 13, 13, 13), and the nonlinear function f is modelled with P-spline, using "pb" and/or "s" routines from the packages gamlss and mgcv in R . ζ t = 1, 2, . . . , T is a fixed linear trend of time, s 1,t is the time-varying seasonality with frequency M = 52 and a random error term w 1,t ∼ N T 0, σ 2 w I T . S3 The contribution of temperature, rainfall, trend and seasonal terms to the burden of Campylobacter The contribution of temperature, rainfall, trend and seasonal terms to the burden of Campylobacter was estimated by calculating the relative change in the predictive mean caused by each factor and averaging across the entire timedomain T . We compared between the GEST with fixed coefficient of temperature and rainfall model and the GEST with varying coefficient of temperature and fixed coefficient of rainfall. The purpose of this comparison is the see whether the interaction between temperature and weeks has an effect on the seasonality in the fitted mean number of Campylobacter cases.
In formulae: Relative change in the mean number of cases due to the fixed effect of temperature: whereμ t is given by equation S2. Relative change in the mean number of cases due to the varying effect of temperature: whereμ t is given by equation S3. Relative change in the mean number of cases due to the effect of rainfall: whereμ t is given by equation S2, (S3).
Relative change in the mean number of cases due to the effect of trend: whereμ t is given by equation S2, (S3). Relative change in the mean number of cases due to the effect of seasonality: whereμ t is given by equation S2, (S3).  Table S4 shows that the relative change associated with the seasonality is much smaller in the model with varying coefficients, also the relative change associated with the temperature is much higher in the model with varying coefficients. This suggests that, in contrast with the model with fixed coefficients, the model with varying coefficients is able to capture most of the variability of Campylobacter cases (resulting in a smaller seasonal component), although understanding the underlying bio-physical mechanism requires further research.

S4 Hierarchical clustering analysis
Let Y t be the weekly number of cases of Campylobacter infection in thirty three Strategic Health Authorities in England and Wales from January 1989 to December 2009, and consider the conditional negative binomial type I distribution N BI(µ t , σ t ) of the response variable Y t in the GEST model, without explanatory variables and a fixed dispersion parameter. The fitted seasonality of the GEST for Campylobacter infections were clustered using the Ward's minimum variance algorithm in R.

S4.1 Avon, Gloucestershire and Wiltshire SHA
Here is an example of GEST decomposition of Campylobacter cases in Avon, Gloucestershire and Wiltshire Strategic Health Authority with a conditional negative binomial distribution. The trend was fitted with a random walk order two and the seasonality was fitted with random seasonality of frequency 52 weeks.

S5 Wavelet analysis
Wavelet analysis decomposes the time series into a sum of wavelets, i.e. basic, time-localized, small, waves of a particular frequency (Cazelles 2007). Accordingly, the (continuous) wavelet transformation is the convolution: where x(t) is the signal (e.g. the time series of Campylobacter cases) to be transformed, t is the time,ψ ( t− a ) represents the wavelet (in general is a complexvalued function), the subscripts x refer to the time series under study, and the superscript ' * denote the complex conjugate form. Here we used a wellestablished functional forms for the wavelet (the Morlet wavelet) for which the scale a is almost equal to the period of the wavelet (Torrence 1998). The squared wavelet transform, W x (a, τ )W * x (a, τ ) is referred as power spectrum which is usually displayed as a contour plot on a time-period plane. The power spectrum quantifies the contribution of the various components of the time series x(t) of different periods (also denoted as 'harmonics) at different times (Cazelles 2007). The power spectrum averaged over the entire time domain is referred to as 'global power spectrum.
Wavelet analysis can be applied to compare two different non-stationary time series, e.g. disease incidence and weather time series. This is usually done by studying the wavelet cross-spectrum and wavelet coherence. The cross-spectrum is defined as the product of the single wavelet transforms the two different time series x(t) and y(t) i.e. W x (a, )W * y (a, ); the wavelet coherency is defined as the cross-spectrum normalized by the spectrum of each time series after smoothing in both time and scale. The wavelet coherency, which is bounded between 0 and 1, quantifies the strength of a linear relationship between the various corresponding harmonics of two non-stationary time series at different times. Finally, possible time-lags between the harmonics of the two time series can be detected by studying the phase difference which can be obtained by appropriate relationship between the imaginary and the real part of the wavelet cross-spectrum (Cazelles 2007).
The analysis of reported Campylobacter infections was strongly affected by under-reporting during weekend and Bank Holidays (the reported cases per day of the week consistently dropped on Saturdays and Sundays). The wavelet analysis of the power spectrum and the global spectrum exhibited a strong weekly seasonality, with an additional 3.5 day harmonic ( Figure S5). The strong peaks at 3.5 and 7 days were removed by using adjusted data with a seven day rolling mean to correct for day of week and bank holidays (Nichols 2012) Figure S5 shows: a) Average weekly reported Campylobacter cases averaged over 20 years (from 1989 to 2009). All time series were square root transformed and then normalised to sum to unity. c)wavelet power spectrum of the transformed time-series of Campylobacter. Low values of the power spectrum are shown in dark blue, and high values in dark red. The dotted white lines show the maxima of the undulations of the wavelet power spectrum and the dotted-dashed black lines show the 5% significant levels computed based on 100 bootstrapped series. The light blue shaded areas identify the region subjected to errors arising from dealing with a finite-length time series (edge effect). e) global average wavelet power spectrum g) original and reconstructed time-series according to all harmonics and the selected first 3 harmonics only. b), d), f) h) As in figures a),c) e) and g) but after the time-series of Campylobacter cases were adjusted using a seven day rolling mean, removal of bank holiday artefacts and adjusted for long term trend.