An evidence synthesis approach to estimating the incidence of symptomatic pertussis infection in the Netherlands, 2005–2011

Background Despite high vaccination coverage, infection with Bordetella pertussis is a current public health concern in the Netherlands and other European Union member states. Because surveillance data are subject to extensive under-ascertainment and under-reporting, incidence is difficult to determine. Our objective was to estimate the age-group specific incidence of symptomatic pertussis infection in the Netherlands over the period 2005–2011, using multi-parameter evidence synthesis. Methods Age-specific seroconversion probabilities were estimated for 2007 using Netherlands population data stratified by age-group and cross-sectional population-wide serosurvey (PIENTER-2) data, with a sero-diagnostic cut-off of 125 EU/ml as a proxy for recent infection. Symptomatic probabilities were derived from a study of household contacts and from PIENTER-2. The annual number of symptomatic infected (SI) persons was estimated using evidence synthesis methods in a Bayesian framework, by combining the estimated incidence of infection with notification data and symptomatic probabilities. Results An incidence rate of 128 SI cases per 10,000 population (95 % credible interval [CrI]: 110–150) was estimated for 2005, which decreased to 107 per 10,000 (95 % CrI: 91–126) for 2011. The degree of underestimation in statutory notified cases was age-dependent, ranging from 10-fold (10–19 years) to 69-fold (60+ years). The largest annual decreases in SI incidence rate over the study period were in the 1–4 and 5–9 years age-groups (24.3 %, 15.9 % per year, respectively). Conclusions By synthesising all available data, the incidence of symptomatic pertussis and the extent to which SI is underrepresented by notification data can be estimated. Such estimates are essential for disease burden computation and for informing public health priority-setting. Electronic supplementary material The online version of this article (doi:10.1186/s12879-015-1324-y) contains supplementary material, which is available to authorized users.


Background
Infection with Bordetella pertussis is recognised as a current public health concern as it is endemic in the Netherlands and other European Union member states, despite relatively high vaccination coverage. The Dutch National Immunization Program has led to a high vaccination coverage for infants (~96 %) for more than 50 years. A two-fold rise in seroprevalence has been reported between 1995/6 and 2006/7 for persons aged >9 years, from 4.0 to 9.3 % [1], and an increasing trend in notified cases was reported over the period 1996-2012 [2]. For incidence estimation, notified and laboratory-confirmed case counts are of limited value due to extensive under-ascertainment and under-reporting. Indeed, pertussis surveillance data in most countries are inadequate for the accurate estimation of disease burden [3].
Mathematical modelling methods based on the kinetics of IgG-Ptx antibody titres [4] estimated the seroincidence of pertussis infection in the Netherlands to lie between 1 and 6 % per year [5]. Based on prevalence data from a population-wide serosurvey (PIENTER-1), the estimated incidence of infection in 1995/96 for persons aged 3-79 years was estimated at 6.6 % per year [6]. Comparison with the number of notified cases suggested the presence of 100-fold under-estimation of symptomatic infections. The results of similar methods applied to Danish data [7] indicate an even greater extent to which notification data under-estimate the number of infections. Although pertussis seroincidence is demonstrably high in the Netherlands, it is of considerable value to estimate the age-specific incidence of symptomatic infection to obtain an accurate picture of the current disease burden associated with pertussis infection, and of the variation in burden between agegroups.
Our goal was to estimate the unknown annual numbers of persons symptomatically infected (SI) with pertussis in the Netherlands in the period 2005-2011, and thus estimate the degree to which the incidence of SI is under-represented by notification data. Knowledge regarding these quantities will help inform decisionmaking regarding vaccination and other prevention initiatives [3]. In the Netherlands a number of indirect sources of data exist that may be useful for estimating SI. We employed multi-parameter evidence synthesis (MPES) to make optimal use of these available data. MPES is an established methodology for integrating various sources of data to estimate a quantity of interest for which there are no or limited direct data, and has been recently applied to estimating the prevalence or incidence of HIV, hepatitis C, and influenza virus infection [8][9][10][11].

Methods
The study period chosen was 2005-2011. The start year of 2005 was selected because from the beginning of this year the acellular vaccine replaced the whole-cell vaccine administered to babies (the acellular vaccine as 4-years booster had been administered since 2001) [12].

Data sources
Age-specific population data for the Netherlands for 2005 through 2011 were used to define the base population for each study year; these data were obtained from the website of Statistics Netherlands (http://statline. cbs.nl). We defined six age-groups: <1 year, 1-4 years, 5-9 years, 10-19 years, 20-59 years and 60+ years. PIENTER-2, a cross-sectional seroprevalence survey conducted in the Netherlands in 2006/2007 provided age-group specific data on seroprevalence [13]. Based on a sero-diagnostic cut-off level for IgG pertussis toxin of 125 EU/ml [14], overall seroprevalence in the Netherlands general population was estimated at 3.4 % for 2006/2007 [1].
Since 1976 notification of pertussis to the Inspectorate of Health Care has been obligatory by law in the Netherlands. Notification data covering the study period were obtained from the Dutch online registration system for infectious diseases. The case definition for pertussis infection includes laboratory confirmation (or close contact with a person with laboratory-confirmed pertussis), and a clinical picture compatible with pertussis (i.e., serious cough with a duration of more than two weeks and/or coughing attacks and/or cough followed by vomiting).
To estimate the proportion of infected persons who are symptomatic, we used two data sources. First, the PIENTER-2 serosurvey [13] recorded the prevalence of coughing symptoms in the past year among infected individuals (defined according to a IgG-Ptx threshold of 62.5 EU/ml). Limited symptom data were available from PIENTER-2 for individuals aged 10 years and older (i.e., applicable to our 10-19, 20-59 and 60+ years agegroups only) [1]. These data were supplemented by data from the BINKI study of household contacts of pertussis-infected infants 6 months old or younger who had been hospitalised [15]. The BINKI study provided sufficient numbers of laboratory-confirmed infected contacts and the numbers reporting typical disease manifestation (defined as at least 2 weeks of coughing and one or more of the following: paroxysmal coughing, posttussive vomiting, inspiratory 'whooping'), for the age-groups 1-4, 5-9, 10-19, and 20-59 years only.
As this modelling study used fully anonymised statutory notification and survey data, formal ethical approval from a medical ethical committee was not required.

Evidence synthesis
We applied multi-parameter evidence synthesis to combine estimates of the incidence of infection (both symptomatic and asymptomatic) derived from the PIENTER-2 seroprevalence survey, annual notified pertussis case numbers, and the data on age-group specific symptomatic proportions, in order to estimate the critical model parameter: the number of SI cases per year, stratified by age-group. In this approach, annual numbers of SI cases are informed by indirect evidence from other model parameters for which observed data and/or prior information is available.
The evidence synthesis approach was a logical choice for addressing our research question, because SI incidence cannot be measured directly but can be inferred from other, existing data. A simple point estimate of SI incidence (i.e., SI = seroprevalence × symptomatic proportion) ignores information on the uncertainty associated with each component, and in the event that a component can be taken from different data sources/studies, the analyst is forced to either select a single source or to combine sources by averaging. The same multiplicative definition underlies our model; however, the evidence synthesis uses all the relevant data, allows several data sources for a given indicator to be integrated in a statistically sound way, and takes into account the uncertainty inherent in all data sources.
Estimation was conducted in a Bayesian framework, which is advantageous for model formulation and the easy incorporation of prior knowledge. The Bayesian framework ensures the correct propagation of uncertainty regarding model parameters (where 'parameter' includes subpopulation sizes, the proportion symptomatic, etc.), and permits any available prior information on these parameters to be flexibly combined with observed data (if available), to produce a posterior distribution. Uncertainty associated with a parameter value is expressed as 95 % credible intervals (CIs) around the median posterior estimates. Figure 1 shows the relationship between the actual sizes of the subpopulations of interest, the observed numbers in each subpopulation (i.e., the true number of seroconversions per year, the number of pertussis cases with symptomatic infection), the conditional probabilities linking the true numbers corresponding to each subpopulation, and the sources of direct evidence (data) informing the model parameters.

Model specification
We distinguish basic from functional parameters within an evidence synthesis model. Basic parameters can be assigned a prior distribution, whereas functional parameters are defined as functions of basic parameters. In the below, the notation N, O, and c a|b indicate the actual number of persons in a subpopulation (the number to be estimated), the observed number, and a generic conditional probability of a given b, respectively. For instance, the subpopulation of individuals with evidence of pertussis infection (seroconversion) in a given year t, N t,S , is related to the total population, N t,Pop , by the conditional probability c t,S|Pop . In Fig. 1, the conditional probabilities c S|Pop are informed by seroconversion data from PIENTER-2; the conditional probabilities c SI|I are informed by data on coughing symptoms among infected persons, from PIENTER-2 and the BINKI study. Fig. 1 Directed acyclic graph of the relationship between model parameters and observed data; for clarity, only one age-group of the stratified model is shown. Distributional and functional relationships are indicated by solid and dashed lines, respectively. Circles indicate model parameters; double circles indicate parameters for which priors (either informative or vague) are applied. N t and O t refer to unobserved and observed numbers of individuals at time t, respectively; c t are conditional probabilities, SIAR t is the symptomatic infection attack rate, and d refers to detection probability Parameters for six separate age-groups (<1 year, 1-4, 5-9, 10-19, 20-59, and 60+ years) were estimated whenever possible; i.e., if age-group specific data and/ or prior information were available. The model computes the posterior distribution over all parameters and Table 1 lists all model parameters and the prior distributions adopted. The model specification ensures that the probability of seroconversion for a given agegroup is correlated across the years of the study period. Through the links between subpopulations (i.e., N I and N SI ), variability in notified case numbers across years indirectly influences temporal variability in the seroconversion probability, c t,S|Pop (and via a functional relation in which vaccination-related seroconversion is adjusted for, the temporal variation in infection probability). Intuitively, years with relatively high number of notified cases would be associated with a stronger infection pressure than years with fewer notified cases.

Model parameters
Basic parameters are those model parameters to which a prior distribution is assigned (see Table 1), and include age-group dependent conditional probabilities and detection probabilities. The probability of seroconversion, c S|Pop , is also time-dependent, because the force of infection is assumed to vary across years. c a,t,S|Pop , Probability of seroconversion in the population, for age-group a in year t c a,SI|I , Probability of being symptomatic given infected, for age group a d a,SI , Proportion of actual SI cases that are observed (i.e., notified), by age-group The probability of infection in the population is functionally related to the probability of seroconversion, by adjusting for the proportion of seroconversions estimated to occur due to previous, recent vaccination (see Additional file 1). The proportion of vaccination-related seroconversions is zero for age-groups older than 1-4 The following parameters relate the numbers of persons between subpopulations: The symptomatic infection attack rate (SIAR) is a functional parameter expressed as the product of agegroup specific conditional probabilities:

Relevant subpopulations and distributional assumptions
The subpopulations of interest are N I and N SI (Fig. 1). Because N SI cannot be observed, the 'detection probability' parameter d SI relates the observed values (i.e., number of notified cases) to the true, or actual numbers (see below).

Infected (both symptomatic and asymptomatic)
The probability of seroconversion was informed by cross-sectional seroprevalence survey data from 2006/7 (PIENTER-2) and a pre-determined sero-diagnostic cutoff level of IgG pertussis toxin. A titre of >125 EU/ml is a highly specific indicator of recent infection (within the previous 6 months) [14]; thus a titre exceeding this threshold served as a proxy for seroincidence [1]. Because of a lack of direct data for all years of the period modelleddata on this parameter were only available from the PIENTER-2 serosurvey carried out from February 2006 until June 2007 [13] this data source was assumed to inform the year 2007 only. Data on age-group specific number of seroconversions (y) and number of persons tested (n) were therefore used directly to inform the prior distribution for the year 2007:

Symptomatic infected
The number of notified (observed) SI cases, O a,SI , was assumed to be binomially distributed, given the true (unobserved) number of symptomatic infections, N a,SI . The detection probability, d a,SI , was given a vague Beta prior distribution since it is unknown. The detection probability varied by age-group but was invariant across years; this encodes the belief that the likelihood of notification is constant over time, and entails that variability in notified cases between years drives variability in the actual number of symptomatic infections. O a;t;SI ∼ Binomial N a;t;SI ; d a;SI À Á The conditional probability of exhibiting symptoms given infection with pertussis was estimated as timeindependent (this parameter was assumed to be biological, and so not be affected by study year), but was stratified by age-group. This parameter was informed by two studies. The first study was based on data from the PIENTER-2 serosurvey conducted in 2005/6 [1]. We constructed Beta priors from the reported prevalence of coughing in the past year among persons with presumptive infection (defined using a 62.5 EU/ml threshold). For the age-groups 10-19 and 20-59, the coughing prevalence was 24 and 22 %, respectively. We extrapolated the value of 36 % (95 % CI: 27-44 %) cited for persons aged 65-79 years in this study to the 60+ years age-group. For age-groups under 10 years, we specified vague Beta priors as no data were available.

Correlation in seroconversion probability across time
Variability and dependence in the force of infection across time were represented by specifying random-walk priors for the probability of seroconversion, c a,t,S|Pop (see Fig. 1). The probability of seroconversion for a given age-group, c a,t,S|Pop , was allowed to vary across time but to be correlated with the probability of seroconversion in previous year(s), through specification of a randomwalk prior for this parameter. The precision of the (logit of the) probability of seroconversion parameter was assigned the vague prior distribution Gamma(0.001,0.001).
The random-walk prior on seroconversion probability, by "borrowing strength" across time, effectively allows for autocorrelation in seroconversion prevalence across successive years. Thus, the posterior probability c a,t,S|Pop for each agegroup a could vary between years, but only to the extent that is determined by prior assumptions and controlled by annual variation in the total notifications. Because of the chain of relationships specified between the estimated number of seroconversions, N a,t,S , and the observed case data informing N a,t,SI (see Fig. 1), the number of notifed cases in a given year influences the posteior probability of seroconversion for that year.

Model inference
For each parameter, sampling of the posterior distributions was carried out via Markov-chain Monte-Carlo methods using OpenBUGS version 3.2.1 [16] and the BRugs package [17] for the R statistical programming environment [18]. BUGS code is provided in Additional file 1. Two independent chains were run for 230,000 iterations, with the first 150,000 iterations treated as burn-in and discarded. Brooks-Gelman-Rubin diagnostic plots were checked to establish that convergence of the chains was satisfactorily achieved.
The presence and magnitude of temporal trends in posterior median estimated SI incidence rates were evaluated using Poisson regression, also using R. Tables 2 and 3 show, for each year of the study period, the posterior summaries of the parameters of interest, and the estimated incidence rates for each subpopulation, aggregating over all age-groups. Both the estimated overall incidence of infection (i.e., N t,I , including both symptomatic and asymptomatic cases) and the overall   (Fig. 2). The extent of uncertainty in the agegroup and year-specific estimates for these parameters means that apparent differences within a particular agegroup or year should only be interpreted considering the precision of the estimates (Fig. 2). We note that agegroup variation within a given year is related to variation in seroconversion probabilities, and temporal variation for a given age-group is related to temporal variation in notified cases.

Results
The overall SI incidence rate was estimated at 128 per 10,000 population in 2005, decreasing to 107 per 10,000 in 2011. Temporal trends in SI incidence rates over the study period varied by age; annual SI incidence rates significantly decreased by 7.8 %, 24.3 % per year and 15.9 % per year on average, for the <1 years, 1-4 years, and 5-9 years age-groups respectively, but increased by 1.7 % per year on average for the 20-59 years age-group (Table 4 and Fig. 3).
Multiplication factors (MFs) to convert notified case numbers to the true number of SI cases were derived as (1/d a,SI ), for each age-group separately (Table 4), and so were not assumed to vary smoothly with age. Under-estimation was most pronounced for We additionally estimated the extent to which the Markov model-estimated proportions of vaccinationrelated high IgG-Ptx titres for the 1-4 and 5-9 years age-groups were influenced by the value assumed for the waning rate parameter. When specifying either a 10 % higher or a 10 % lower value for this parameter, there were only small differences in posterior median SI incidence rate for these two age-groups (Additional file 1: Table S2), which indicates that our prinicipal results were not unduly sensitive to one assumption inherent in the estimation of vaccination-related seroconversions. Finally, prior and posterior distributions for the parameter c a,SI|I are graphically compared in (Additional file 1: Figure S2).

Discussion
Using evidence-synthesis methods, we estimated the incidence of symptomatic pertussis infection over the period 2005-2011 in the Netherlands. Aggregating over age, the estimated SI incidence rate ranged from a low of 76 per 10,000 (95 % CrI: 64-89) in 2010 to a high of 173 (95 % CrI: 147-203) in 2008.
Decreasing trends in estimated SI incidence rates were apparent over study period for all age-groups except for the 20-59 years age-group; the greatest annual average decreases were observed for the 1-4 years and 5-9 years age-groups (−24.3 %, and −15.9 % per annum, respectively), consistent with the replacement of the whole cell by the acellular vaccine (i.e., improving vaccine effectiveness) in the infant vaccination in 2005 (and in the booster since 2001) [1,2]. Previous research has shown that within the period 1996-2011, a declining trend in notification rates was associated with vaccination measures for only those age-groups eligible for vaccination, i.e., up to ages 4-6 years [2]. Our SI incidence rate trends largely concur; however, the weak decreasing trend in SI incidence rate (−1.8 %) that we estimated for the 60+ years age-group appears inconsistent with reported increases in seroprevalence in >9 year-olds over time (between 1995/6 and 2006/7 [1]). In general,   [2]. The age-dependent pattern of SI trends suggests that although the acellular vaccine has prevented more infections in children, circulation has not been greatly affected. We could also estimate the extent to which the national notification system under-represented the numbers of individuals with SI. The derived multiplication factors varied with age-group, from 10 (95 % CrI: [6][7][8][9][10][11][12][13][14][15] to 69 (95 % CrI: 49-96), for the age-groups 10-19 years and 60+ years, respectively. This age-dependent variation in the estimated degree of under-notification may be due to (a combination of ) age differences in disease severity, the likelihood of contacting primary health care, diagnostic practice, and/or reporting bias. Although the estimated MF for the <1 year age-group was also high (41; 95 % CrI: 23-66), suggesting a large number of unreported symptomatic infections in young infants, no correction for vaccination-related high IgG-Ptx titre was attempted for this age-group.
Previous research in the Netherlands has estimated the incidence of either symptomatic or asymptomatic infection from the prevalence of serological markers. Based on an anti-Ptx IgG concentration threshold of 125 EU/ml, de Greeff and colleagues [1] estimated that 3.4 % (95 % CI: 2.8-3.9 %) of the Dutch population aged >9 years had had a pertussis infection within the six months prior to their PIENTER-2 sample date in 2006/ 2007. Synthesising all the evidence available, we estimated the posterior median incidence of seroconversion (i.e., both symptomatic and asymptomatic infection) for persons aged 10+ years at 240,100 and 508,400 persons in 2006 and 2007, respectively, which corresponds to 1.7 % (95 % CI 1.4-1.9 %) and 3.5 % (95 % CI 3.1-4.0 %) of the population aged 10+ years in each of these two years, respectively. These estimates differ from the priors specified for the seroconversion probability, because the posteriors for these parameters are also informed by indirect evidence from the rest of the model.
There has been substantial recent progress in using mathematical modelling methods to understand aspects of the epidemiology of pertussis infection [19][20][21][22], including how natural immunity may underlie long-term trends [19], and how changes in the duration of vaccineinduced and natural immunity can account for trends in incidence [20]. Such dynamic modelling approaches are useful both for explaining historical incidence patterns and for forecasting. Unlike these approaches, our study derives the national-level incidence of symptomatic pertussis infection using statistical modelling. Advantages of the Bayesian evidence synthesis approach adopted here include provision of a coherent, flexible framework in which diverse sources of information can be combined and the correct propagation of uncertainty associated with all model parameters to the final estimates.
Limitations to the current approach should also be noted. First, the validity of the current SI estimates is dependent on the assumptions made when specifying relationships between evidence sources and on the quality (representativeness and bias) of the observed data and prior information. For instance, if bias is present in the data sources used to inform the symptomatic proportion parameters, SI incidence would be affected. Symptomatic proportions for several age-groups were derived from PIENTER-2, which may biased downward because recall of coughing symptoms in the previous year was required. In contrast, the values for this parameter derived from the BINKI study may be upwardly biased, because household contacts of infected infants may have more severe disease compared with community study participants. A second limitation concerns the interpretation of seroconversion. For children under 5 years of age, of whom more than 90 % will have been vaccinated [12], a single sample with a high IgG-Ptx titre cannot distinguish between infection and previous vaccination. The Dutch National Immunisation Programme recommendations stipulate administration of a pertussis booster at four years of age; given that high vaccination-related IgG-Ptx levels wane rapidly [23,24], a proportion of high IgG-Ptx titres in 5-year-olds and a smaller proportion in 6 through 9 year olds are likely attributable to the 4-years booster. Although we corrected PIENTER-2 seroprevalence in the 1-4 years and 5-9 years age-groups for the model-estimated proportion of vaccination-related high titres (Additional file 1: Table S1), our estimates should still be interpreted with caution, as this adjustment depends on the model parameters and other assumptions. It was not feasible to adjust the <1 years age-group due to the granularity of the seroprevalence and vaccination coverage data.
Pertussis incidence displays long-term periodicity [25]. We recognise that if a different study period was chosen for instance if the outbreak year 2012 had been included in the analysisreported trends in annual SI incidence, and their interpretation, may differ.
Finally, the degree of under-representation of the number of 'true' SI cases by the statutory notification system depends on several factors, such as health-care seeking behaviour, diagnostic accuracy, reporting bias, and severity of disease. All of these factors are plausibly dependent on age, but could also vary over time. Although we have estimated MFs separately by age-group, we assumed that MFs were constant over time to aid identifiability of the statistical model. For the same reason, we constrained the probability of developing symptoms to be time-independent. If symptom severity is associated with temporal changes in transmission [26], then model outcomes may be oversimplified.

Conclusions
In summary, by applying Bayesian evidence synthesis methodology to a variety of national data sources, we have derived robust age group-specific estimates of the incidence of symptomatic pertussis infection in the Netherlands. This information is essential for determining the pertussis disease burden, and together with modelling and other studies [27], can assist in informing policy decisions regarding the design and improvement of preventive measures.

Availability of data and materials
For access to data used in this study that are not publicly available, please get in contact with the first author.

Additional file
Additional file 1: OpenBUGS code, description of the Markov model for estimating vaccination-related high titres, and additional figures. (PDF 379 kb)