### Statistical model

Full details of the model and fitting procedure, and detailed sensitivity analyses, have previously been published [7–12]. In brief, the probability that an individual dies from clinical disease at time *u* and age *a* is given by

where *S*(*u*,*a*) is the survival probability (estimated from UK census data), *f*(*u*) is the incubation period distribution and β is the transmission coefficient. The time- and age-dependent exposure hazard *I*(*t*,*a*) is given by

*I*(*t*, *a*) = *v*(*t*) *g*(*a*) ∫ Ω(*z*) *w*(*z*, *t*) *dz*. (2)

Here ν(*t*) is the effectiveness of control measures limiting the bovine tissues allowed into the human food supply (introduced mid-1989, and allowed to vary between 0 and 100% effective at reducing infectivity from this point onwards), *g*(*a*) an age-dependent susceptibility/exposure function, Ω(*z*) the relative infectiousness of bovines over the course of their incubation period and *w*(*z*, *t*) the proportion of cattle slaughtered at time *t* and time *z* away from disease onset. The expected number of cases at time u and age a is given by

*c*(*u*, *a*) = *B*(*u* - *a*) *p*(*u*, *a*) (3)

where *B*(*u* - *a*) is the number of individuals born at time *u* - *a*.

The model is fitted using maximum likelihood methods to the time- and age-stratified vCJD deaths to the end of 2002 assuming that the data arise from a Poisson distribution. The individual cases are categorised into cells with at least 5 observations in each cell. The best-fit point is that which maximises the log likelihood (LnL), or equivalently minimises

where *x*(*u*,*a*)are the observed cases at time *u* and age *a*. For any model parameter, maximum likelihood confidence intervals are obtained using likelihood ratio tests. Non-linear optimisation techniques are used to fit the model [20] using custom-written code. An intensive search of parameter space was performed, fitting from multiple starting points and restricting parameter bounds, to ensure that the best-fitting models were obtained.

Confidence intervals for the expected number of cases are obtained by re-parameterising the model so that the expected number of cases between times *u*
_{1} and *u*
_{2}, *T*(*u*
_{1},*u*
_{2}), is a model parameter. Combining equations 1–3, this is given by:

For each proposed parameter set, equation (5) was solved numerically to calculate β. If no solution was possible (for example, if the proposed parameter set generated fewer infections than the required epidemic size) then the proposed parameter set was rejected. Prediction intervals are obtained by adding Poisson variability to the confidence intervals on the mean.

The model is additionally fitted to the prevalence data obtained from a recent survey of appendix tissues. As the study was interrupted (although not stopped) due to the observation of a positive appendix, the 95% confidence interval for the prevalence of infection in the population is obtained from the likelihood term for a negative binomial distribution. The reported results arise from 9 batches with the positive result arising in the sixth batch. The likelihood for this result (assuming a binomial distribution for the sixth batch conditional on having observed a positive result and for the final three batches) is therefore given by

where *p*
_{
i
}is the expected prevalence in batch *i*, *n*
_{
i
}is the number of samples in batch *i* and *x*
_{
i
}is the number of positive appendixes in batch *i.* The expected prevalence at time *u* in age-group *a* is given by

The expected prevalence in batch *i* is calculated from this expression as the average prevalence in the age-group over the period of time from which the samples were collected. The model fit is then judged by the combination of the Poisson likelihood fit to the cases and the likelihood fit to the prevalence data given by equation (6).

Both the sensitivity and specificity of the diagnostic tests at different incubation stages are unknown. We include an additional parameter which specifies the proportion of the incubation period for which the test is assumed to be fully sensitive (working backwards from death), and assume that the test has no sensitivity prior to this time. We also assume the test is fully specific. Throughout this work we assume that the positive appendix is from an MM-homozygous individual, but that the 8318 tissues are representative of the general population (and hence 40% of these tissues are from MM-homozygous individuals).

### Parameters

Predictions from the model are sensitive to the estimates of exposure to BSE-infected material, *w*(*z*,*t*) [9]. As in previous work, our analyses rely on estimates of exposure using data relating to BSE in cattle in Great Britain. However, in our most recent analyses estimating exposure to pre-clinical infected animals and unreported clinical cases we have integrated the analysis of BSE clinical case data (up to October 2002), screening data from apparently healthy cattle and screening data from high risk' (casualty/fallen stock) cattle. Previous work [21] details the methodological extensions required to integrate the analysis of data on clinical case incidence and results from the screening of apparently healthy cattle to obtain estimates of case ascertainment rates. Key to this is the incorporation of a mechanism underlying the apparent underascertainment of cases indicated by the screening data. Two different possible mechanisms were examined: differential mortality of BSE-infected animals and underreporting of clinical cases [21]. The former is better able to explain the observed data and involves animals close to clinical onset being more likely than their age-matched peers to die on farm or be slaughtered. Thus, differential mortality leads to a higher infection prevalence in slaughtered animals than would be expected naively from the clinical case incidence data alone. Furthermore, animals slaughtered as a result of the excess risk imposed by differential mortality, all in the latest stages of incubating the disease, are allowed via a fitted parameter to be more likely to be categorized as a high-risk animal (either as fallen stock found dead on farms or casualty animals). If the parameter is significant, then the resulting prevalence of infection is greater in high-risk animals than in other animals slaughtered as apparently healthy.

The exposure estimates used here were obtained assuming that diagnostic test sensitivity was 100% for the last 3 months of the incubation period but virtually 0% before this and assuming that deaths due to differential mortality were distributed over the 3 months of the incubation period of infected animals (this is the differential mortality pattern assumed in Donnelly *et al*. 2002 [21]). Furthermore, the diagnostic test was assumed to be 100% specific in all animals.

Pathogenesis experiments in cattle have indicated that BSE infectivity is most widely distributed in the carcass towards the end of the incubation period [22, 23]. We therefore model the relative infectiousness of cattle at different stages of incubation assuming that infection increases exponentially towards the end of the incubation period [7]. Clinical BSE cases (including those reported prior to mid-1988) are assumed to be as infectious as those pre-clinical animals within 6 months of onset of disease. Predictions are fairly insensitive to the parameters for this parametric form.

In previous work, we explored a number of different parametric forms for the age-dependent susceptibility/exposure function, *g*(*a*) [9]. The best fitting distributional form is a 3-parameter combined uniform distribution with gamma-distributed tails, which in its limits is either the uniform or gamma distribution.

The functional form of the incubation period distribution is informed by patterns observed for other TSEs. In particular, for other human TSEs it is known that the incubation period is long and highly variable [16, 24, 25]. For many TSEs there is a minimum incubation period so that functional forms must allow for the potential for a substantial delay following infection until the onset of clinical symptoms of vCJD [26, 27]. We use a modified form of the four-parameter Generalised Lambda distribution [28] (with inverse CDF
). This distribution is more flexible than distributions based on standard functional forms (such as the Weibull, LogNormal or Gamma distributions), and incorporates all of these standard distributions as special cases.