- Research
- Open access
- Published:
Predicting subnational incidence of COVID-19 cases and deaths in EU countries
BMC Infectious Diseases volume 24, Article number: 204 (2024)
Abstract
Background
Recurring COVID-19 waves highlight the need for tools able to quantify transmission risk, and identify geographical areas at risk of outbreaks. Local outbreak risk depends on complex immunity patterns resulting from previous infections, vaccination, waning and immune escape, alongside other factors (population density, social contact patterns). Immunity patterns are spatially and demographically heterogeneous, and are challenging to capture in country-level forecast models.
Methods
We used a spatiotemporal regression model to forecast subnational case and death counts and applied it to three EU countries as test cases: France, Czechia, and Italy. Cases in local regions arise from importations or local transmission. Our model produces age-stratified forecasts given age-stratified data, and links reported case counts to routinely collected covariates (e.g. test number, vaccine coverage). We assessed the predictive performance of our model up to four weeks ahead using proper scoring rules and compared it to the European COVID-19 Forecast Hub ensemble model. Using simulations, we evaluated the impact of variations in transmission on the forecasts. We developed an open-source RShiny App to visualise the forecasts and scenarios.
Results
At a national level, the median relative difference between our median weekly case forecasts and the data up to four weeks ahead was 25% (IQR: 12–50%) over the prediction period. The accuracy decreased as the forecast horizon increased (on average 24% increase in the median ranked probability score per added week), while the accuracy of death forecasts was more stable. Beyond two weeks, the model generated a narrow range of likely transmission dynamics. The median national case forecasts showed similar accuracy to forecasts from the European COVID-19 Forecast Hub ensemble model, but the prediction interval was narrower in our model. Generating forecasts under alternative transmission scenarios was therefore key to capturing the range of possible short-term transmission dynamics.
Discussion
Our model captures changes in local COVID-19 outbreak dynamics, and enables quantification of short-term transmission risk at a subnational level. The outputs of the model improve our ability to identify areas where outbreaks are most likely, and are available to a wide range of public health professionals through the Shiny App we developed.
Introduction
Local dynamics of COVID-19 depend on a combination of the level of immunity in a population, which in turn depends on previous incidence, vaccine uptake, and the immune escape properties of the currently circulating SARS-CoV-2 variants, and other factors such as population density, social contact patterns, migration patterns and adherence to public health measures. Most countries in Europe have been affected by repeated COVID-19 waves since March 2020, and have implemented extensive vaccination campaigns in order to reduce the health impact of these waves. This led to high national levels of both natural and vaccine-induced immunity [1]. Before the emergence of the Delta and Omicron variants of concern (VOCs), this immunity provided considerable and durable protection against severe outcomes (hospitalisation and death) and some transient protection against infection [2]. The emergence of new VOCs, and the extent of their immune escape, along with the waning of efficacy observed for a two-dose vaccine course, precipitated the expansion and acceleration of “booster” vaccination campaigns, further shifting the immune landscape of the population. Overall immunity to the Omicron BA.1 variant rose quickly after its emergence due to the unprecedentedly large wave of BA.1 cases that occurred throughout Europe in late 2021-early 2022, and vaccine booster campaigns. Since then, repeated emergence of Omicron subvariants with high levels of immune escape has led to multiple waves of infections, with lower case burden than the first BA.1 wave.
Currently, the overall level of immunity against infection is high compared to the early phase of the COVID-19 pandemic. The immune landscape of the population is spatially heterogeneous due to considerable variation in infection histories (e.g. multiple reinfections with different variants) and differences in vaccine uptake and timing across the population. If overall immunity rises, this immune landscape should result in the risk of outbreaks becoming more spatially and demographically heterogeneous, with the potential for distinct identifiable outbreaks. Therefore, there is interest from public health agencies in forecasting short-term incidence at a subnational level, and in different age groups, to anticipate large localised spikes in case numbers and local pressure on health care systems. Predicting such spikes in health care demand will become increasingly relevant as time moves forward if COVID-19 becomes more seasonal and influenza-like in its dynamics. This project was developed in collaboration with the European Centre for Disease Prevention and Control (ECDC), with the objective of developing a statistical framework for forecasting local case and death incidence in a range of European countries, and visualisation tools to communicate the predictions. The forecasts and scenario analysis can then be used to optimise planning and allocation of resources.
There have been numerous attempts to model subnational incidence of COVID-19 over the course of the pandemic. These have tended to fall into two broad categories: mechanistic “spatial” susceptible-exposed-infectious-recovered transmission models and statistical time series/spatiotemporal models. Most time series and transmission models have treated subnational regions as independent (fitted the model and made predictions separately for each region) without accounting for spatiotemporal correlations in incidence between regions [3,4,5,6,7], despite these being strong [8,9,10]. A limited number of transmission models have instead treated subnational regions as connected sub-populations via a metapopulation approach, and used geographical distance or mobility/commuting data to parametrise connectivity between regions [11,12,13,14,15,16]. These models become increasingly complex and hence slow to generate predictions as the number of affected regions increases, and have therefore received limited use for real-time forecasting during the pandemic. Time series models and spatiotemporal statistical models, on the other hand, have been used extensively and successfully to forecast future incidence at national and subnational levels [6, 7, 17]. In the latter category, we focus on Endemic-Epidemic models, a flexible class of spatiotemporal statistical models that can be used to link changes in incidence to recent case numbers and the effect of various different covariates, on which the framework used in this paper is based. Endemic-Epidemic models have already been employed during the pandemic to understand and forecast spatiotemporal spread of COVID-19 at a subnational level [18,19,20,21] and at a national level in Africa [22], and to assess the impact of non-pharmaceutical interventions (NPIs), including lockdowns and border closures [23,24,25,26] (see [27] for a review). Their superiority to time series models that assume independence between regions has been demonstrated on data from northern Italy [28]. However, they have not, as yet, been applied to forecast subnational incidence across multiple countries.
In this paper, we present a flexible modelling framework to capture subnational, age-stratified case dynamics of COVID-19, alongside a publicly available RShiny App to visualise the results and forecasts generated. The framework is used to predict subnational incidence of COVID-19 cases and deaths from routinely collected, publicly available surveillance data, and forecast the impact of changes in transmission on short-term dynamics (e.g. due to changes in behaviour or transmissibility). The framework can be run with an age-stratified model if local, age-stratified data is available. Public health professionals can use the RShiny App to visualise the forecasts of reported cases and deaths, and the predicted incidence under different scenarios, to get a full picture of the anticipated short-term burden of COVID-19 in local areas. We apply the framework to forecast cases and deaths at a NUTS-3 (Nomenclature of Territorial Units for Statistics 3) level in France, Czechia and Italy as test cases, and evaluate its ability to predict subnational incidence up to four weeks ahead.
Methods
Spatiotemporal modelling framework for reported cases
We model subnational COVID-19 case counts using the Endemic-Epidemic spatiotemporal modelling framework [29], as implemented in the surveillance R package [30]. Endemic-Epidemic models provide a flexible framework for relating current incidence of cases to recent case incidence and to imported cases, accounting for the influence of other factors (i.e. covariates) on these relationships. They decompose local incidence in region \(i\) at time \(t\) into three components:
-
An autoregressive component that quantifies the number of new infections expected from cases reported in region \(i\) at previous time steps.
-
A neighbourhood component that quantifies the number of new infections expected from cases reported in regions around region \(i\) in previous time steps, which depends on the connectivity (or the human mobility) between regions.
-
An endemic component that quantifies the background number of new infections occurring in region \(i\) at time \(t\), independent of the current level of transmission, representing importations from regions outside those included in the study (or cases that could not be linked to the mechanistic components).
Endemic-Epidemic models are able to integrate multiple data sources from disease surveillance activity to link case incidence and various covariates. Each component is independently impacted by each covariate. Since we introduce various covariates and controls in our model, we merge the autoregressive and neighbourhood component into an epidemic component to reduce the number of parameters estimated, and avoid identifiability and convergence issues.
Early versions of the Endemic-Epidemic models only included dependence of current cases on cases in the previous time step, but they have since been extended to account for distributed lags via the hhh4addon R package [31,32,33]. This allows for a more faithful representation of the impact of recent incidence on current case number, for instance by aligning the lag distribution to the serial interval of the pathogen.
The link between indicators of immunity and risk of SARS-CoV-2 infection is complex and unstable, due to waning of vaccine and infection-induced immunity, and emergence of variants able to escape immunity. The flexibility of the Endemic-Epidemic framework therefore makes it well suited to capture local dynamics of COVID-19 cases, while fully mechanistic frameworks may be too complex to parametrise. Depending on data availability, the model can be stratified by age.
Non-age-stratified model
The total number of cases (over all age groups) in region \(i\) at time \(t\), \({Y}_{it}\), conditional on the number of cases in the same region and neighbouring regions in the previous \(p\) time steps, is modelled as negative binomial:
with conditional mean \({\mu }_{it}\) and dispersion parameter \({\psi }_{i}\) such that \(Var\;\left(Y_{it}\left|\left(Y_{i,t-1},\;\dots\;,\;Y_{i,\;t-p}\right)\right.\right)\;=\;\mu_{it}\;+\;\psi_i\;\mu_{it}^2\).
The mean is given by
where \(Y^{\prime }_{j,t-1}= {\sum\nolimits_{d=1}^{p}\left[{u}_{d}\right]{Y}_{j,t-d}}\) is the transmission potential from recent cases in region \(j\), with \(\left[{u}_{d}\right]\) the normalised lag weight for cases \(d\) days ago. We set \(p=20\) days and use a custom composite serial interval distribution for \({u}_{d}\) that accounts for missing infection generations [34], with a mean and standard deviation for the first infection generation of 5 days and 1.5 days respectively, based on the estimated serial interval for SARS-CoV-2 for pre-Omicron variants [35,36,37], and 80% of the composite serial interval assumed to reflect direct transmission (without missing infections) (Supplementary Fig. 1). The predictors for the epidemic (combined autoregressive and neighbourhood) and endemic components, \({\varphi }_{it}\) and \({\nu }_{it}\), determine the number of cases stemming from each component and are assumed to depend on log-linear component-specific predictors:
The predictors are independently impacted by different covariates, \({z}_{it}^{\left(\varphi \right)}\) and \({z}_{it}^{\left(\nu \right)}\), i.e. a covariate may be associated with fewer imported cases (endemic component \({\nu }_{it}\)), but have little impact on the spread of the virus in and between the regions (epidemic component \({\varphi }_{it}\)). The association between the covariates and the local number of cases expected is quantified by regression coefficients, \({\beta }^{\left(\varphi \right)}, {\beta }^{\left(\nu \right)}\), estimated through a maximum-likelihood approach. The weights \({w}_{ji}\) in the neighbourhood component quantify the degree of connectivity between region \(i\) and surrounding regions \(j\). We use a power-law model for the neighbourhood weights:
where \({o}_{ji}\) is the adjacency order between regions \(i\) and \(j\) and \(\rho\) is a decay parameter to be estimated. The adjacency order is defined as the minimum number of borders that must be crossed to get from \(i\) to \(j\): the adjacency degree is equal to 1 between neighbours, 2 between neighbours of neighbours, and so on. The maximum adjacency order we consider is 5. We normalise the weights, such that \(\sum\nolimits_{i=1}^{I}\left[{w}_{ji}\right]=1\), with \(I\) the number of regions in the country.
Age-stratified model
For countries in which age-stratified subnational case count data is available, we implement an age-stratified version of the model above using modified code from the hhh4contacts R package [38, 39]. For this model, the number of cases in age group \(a\) in region \(i\) at time \(t\), \({Y}_{ait},\) conditional on the numbers of cases in the previous \(p\) time steps in the same and other age groups in region \(i\) and the surrounding regions is modelled as negative binomial:
with conditional mean
where\(Y^{\prime }_{b,j,t-1}= {\sum\nolimits_{d=1}^{p}\left[{u}_{d}\right]{Y}_{b,j,t-d}}\)
and overdispersion parameter \({\psi }_{i}>0\) such that \({Var(Y}_{ait}|{({\rm Y}}_{t-1},\;\dots\;,{{\rm Y}}_{t-p}))={\mu }_{ait}+{\psi }_{i}{\mu }_{ait}^{2}\), where \({{\rm Y}}_{t-1}=\{{Y}_{a,i,t-1}{\}}_{a=1,\;\dots\;,{n}_{a},i=1,\;\dots\;,{n}_{i}}\) with \({n}_{a}\) age groups and \({n}_{i}\) regions, and \({c}_{ba}\) is the mean number of daily contacts in age group \(a\) of an individual in age group \(b\). We stratify the population into \({n}_{a}=9\) age groups (0–9, 10–19,…, 70–79, 80+ years). We use age-structured contact matrices from country-specific pre-pandemic contact surveys for \({c}_{ba}\) where available, e.g. for France [40], and synthetic contact matrices estimated from contact survey and demographic data for countries in which no nationally representative contact surveys have been conducted, e.g. for Czechia [41]. We used age-stratified intercepts in the epidemic component.
Covariates
We incorporate various covariates in \({z}_{it}^{\left(\varphi \right)}\) and \({z}_{it}^{\left(\nu \right)}\) in the log-linear predictors for \({\varphi }_{it}\) and \({\nu }_{it}\). The full model equations are given in the Supplementary Material (see Model equation for each country and Supplementary Table 1). The covariates were picked based on potentially having had an effect on transmission and importation risk, or having an effect in future. We compare the forecasts obtained with the full model with covariates against a baseline model without covariates in the Supplement.
We included the same set of covariates in the endemic component in the age-stratified and non-age-stratified models:
-
population: the total population of region \(i\)
-
urban/rural status: binary covariates for whether the NUTS-3 region \(i\) is classified as urban, intermediate urban, intermediate rural or rural.
-
seasonality: sinusoidal terms with annual periodicity to account for seasonal effects on importations. The amplitude and offset of the seasonality function are estimated by the model.
-
number of cases in the WHO European region over the last month.
These covariates cover the impact of demographic characteristics and transmission in neighbouring countries on the background number of cases in each region (and age group in the age-stratified version of the model). The covariate specifications of the epidemic component depend on the availability of age-stratified data as follows:
-
population:
-
age-stratified: two covariates:
-
total population of region \(i\)
-
proportion of the population of age group \(a\) in region \(i\).
-
-
non age-stratified: one covariate: the total population of region \(i\).
-
-
testing:
-
age-stratified: two covariates:
-
proportion of the population in the country tested in the last 2 weeks.
-
if local age stratified testing data is available: proportion of population of age group \(a\) in region \(i\) (if local testing data is available, national otherwise) tested in the last two weeks. Otherwise, proportion of population of age group \(a\) in the country tested in the last two weeks.
-
-
non age-stratified: one covariate: the proportion of the population in the country tested in the last 2 weeks.
-
-
vaccination coverage: the proportion of the population in region \(i\) who received their second dose in the last 120 days, or who have received three or more doses (for each age group in the age-stratified model)
-
cumulative incidence: the cumulative incidence of cases between the start of the fitting period and a month ago in region \(i\) (for each age group in the age-stratified model)
-
recent incidence: the cumulative incidence of cases in the past month in region \(i\) (for each age group in the age-stratified model)
-
variant: two binary indicator variables for whether the proportion of sequenced cases that were Delta or an Omicron variant was higher than 30% (in all three countries, the Delta covariate is equal to one in late 2021, and the Omicron covariate is equal to one from January 2022 onwards).
-
day-of-the-week: indicator variables to account for day-of-the-week reporting effects in the numbers of cases.
-
urban/rural status: binary covariates for whether the NUTS-3 region \(i\) is classified as urban, intermediate urban, intermediate rural or rural.
-
seasonality: sinusoidal terms with annual periodicity to account for seasonal effects on local transmission. The amplitude and offset of the seasonality function are estimated by the model.
The covariates above cover the impact of local immunity (due to vaccine or previous infection), testing patterns, variants and seasonality on the risks of reported cases. We included covariates that quantify the association between number of tests and incidence so that the model may capture changes in surveillance and reporting patterns, whereby drops in testing can be associated with changes in the number of new cases. For instance, in age groups with a lower proportion of hospitalised cases (e.g. younger age groups) the reporting rate may vary greatly if only severe cases are tested and reported. The covariates quantifying the association between infection or vaccine-acquired immunity and transmission risk depend on various thresholds (e.g. recent incidence corresponds to cases reported in the last month; second vaccine doses are only taken into account for the last 120 days). These thresholds can easily be changed in the code to generate sensitivity analyses, or to adapt to the characteristics of new variants or vaccines. Other covariates that were considered for inclusion in the model were population mobility indices (such as the Google mobility index [42]) and binary covariates for various NPIs (such as the Oxford Stringency Index [43]), but these were found either to lead to parameter identifiability issues or not to improve predictive performance, and their data sources have been discontinued.
Model fitting
To make forecasts of future cases and deaths, we fit the model to case data reported between September 2020 and the latest reported date for each country (currently end of April 2023) to estimate the regression coefficients \({\beta }^{\left(\varphi \right)}\)and \({\beta }^{\left(\nu \right)}\). The model is fit separately for each country. We do this via maximum-likelihood estimation, as implemented in the surveillance and hhh4addon R packages.
Case forecasts
We use the parameter estimates from the model fitting to generate four-week-ahead forecasts of daily case numbers in each region and age group by simulating the model forward 28 days with projected values of the covariates. We used the latest value of the covariates describing vaccine coverage and testing, and the mean value over the past 30 days for the number of cases in the rest of Europe in the past month. We run 100 simulations for each country to account for stochasticity (from the negative binomial draws for the number of cases) and parameter uncertainty (from model fitting). We do this by drawing ten sets of parameter values using the parameter covariance matrix estimated from the model fitting, and running ten simulations for each parameter set. We output the median, 2.5th, 25th, 75th, and 97.5th percentiles of the resulting predicted distribution for each date and region (and age group for the age-stratified model) for visualisation in the RShiny app.
Death forecasts
We use simple linear regression models to generate four-week-ahead forecasts of weekly numbers of reported deaths. To do so, we estimate the Case Fatality Rate (CFR), and combine it with the recent number of reported cases to forecast the number of deaths, We first aggregate the number of reported cases and deaths by week, and compute a proxy for the age-stratified CFR assuming a three-week delay between weekly reported cases and reported deaths [44,45,46]. Specifically, we compute the CFR for age group \(ai\) in region \(i\) at week \(w\) as:
where \({N}_{aiw}\) and \({D}_{aiw}\) are the numbers of cases and deaths in age group \(a\), in NUTS-1/NUTS-2 region \(i\), in week \(w\). The death forecast model was implemented at a NUTS-1/NUTS-2 geographical scale since death data was not always available at a NUTS-3 level.
We then calculate \(\varDelta CF{R}_{aiwx}\), the change in CFR, and \(\varDelta {N}_{aiwx}\), the change in the number of cases, between the prediction date and the forecast horizon:
where \(x\) is the forecast horizon (1, 2, or 3 weeks). We implement a linear regression model for each forecast horizon and age group, with \(\varDelta CFR\) as outcome, and \(\varDelta N\) as explanatory variable, again using a three-week delay:
We then use the estimates of \({\alpha }_{ax}\) and \({\beta }_{ax}\) to predict \(\varDelta CFR\) between the last week of reported case data (\({w}_{pred}\)) and the forecast dates (\({w}_{pred}+1\), \({w}_{pred}+2\), \({w}_{pred}+3\)), and calculate the predicted CFR as:
(using a sample of 10 values of \(\varDelta CF{R}_{aiwx}\), computed from the mean estimate and the prediction interval of the linear regression). Finally, we draw the number of new weekly deaths each week using a binomial distribution, from the estimated CFR, and the number of cases reported three weeks before:
We draw 500 forecasts for \({D}_{a,i,{w}_{pred}+x}\) in total (50 per value of \(CF{R}_{a,i,{w}_{pred}+x}\)). Given the three-week delay between cases and deaths, one-, two-, and three-week-ahead death forecasts are generated using already reported case data. In contrast, four-week-ahead death forecasts require one-week-ahead case forecasts. We estimate the change in CFR using the regression parameters from the three-week-ahead forecasts and the four-week change in number of cases in the one-week-ahead case forecasts
Scenario simulations
To explore the impact of variations in transmission intensity or implementation of non-pharmaceutical interventions (NPIs), we generate 28-day forecasts under combinations of the following scenarios, by changing the values of the epidemic predictor \({\varphi }_{ait}\), or the endemic predictor \({\nu }_{ait}\):
-
Moderate (20%) or large (40%) increase in transmission intensity (\({\varphi }_{ait}\)), due to inherent properties of the pathogen (i.e. emergence of a new, more transmissible variant), or to changes in behaviour.
-
Moderate (20%) or large (40%) decrease in transmission intensity (\({\varphi }_{ait}\)), due to changes in human behaviour or NPIs. Furthermore, for countries where an age-stratified model was implemented, this decrease can be targeted at a certain age group (children and teenagers below 20 years old, adults between 20 and 60 years old, or older inhabitants), and implemented one week after the current date.
-
Removal of all importations, for instance due to border closure (i.e. \({\nu }_{ait}\)= 0).
All scenarios affect every region in the same way (i.e. we do not consider local NPIs). We generate 100 simulations under each scenario.
Data
Several publicly available data sources are used to implement the model and are summarised in Table 1. Since compilation of COVID-19 surveillance reports in centralised databases was interrupted in early 2022 in many countries, the majority of the data, including the case and death data, is imported from country-specific sources.
Case and death data
Local case data at a NUTS-3 level was used in all countries (age-stratified in France and Czechia). For Czechia, death data at NUTS-3 level was used, but for Italy and France, death data was only available at NUTS-1/NUTS-2 level. Numbers of deaths were taken to be those reported in the national databases for each country, regardless of potential differences in how COVID-19 deaths were defined between countries.
Covariate data
Where available, daily age-stratified (for the age-stratified model) vaccination data at NUTS-3 regional level is used, though most sources provide weekly data, and a two-week delay for protection from each dose to develop is assumed. Publicly available testing data varies considerably in spatial resolution (for some countries only national data is available) and age stratification (for some countries only total data is available), so age-stratified national testing data is used as a default and subnational data used when available. Age-stratified regional population data is drawn from a different source for each country (Table 1). Data on the proportion of different variants among sequenced cases is drawn from the ECDC variant database [47]. The urban-rural status of each NUTS-3 region is taken from the Eurostat database (database labelled “urban-rural remoteness”) [48]. Daily numbers of cases in the rest of Europe are taken from the World Health Organisation database of daily numbers of cases and deaths in different countries [49].
Calibration analysis
We evaluate the ability of the model to generate accurate (“well-calibrated”) and reliable case and death forecasts in each country by fitting the model up to a set of dates in a calibration period, generating 28-day-ahead forecasts from these dates using the model fits, and comparing these forecasts with the data. The forecasts are generated by re-fitting the model for each calibration date (i.e. we do not use the data posterior to the calibration date to generate the forecasts). We used weekly dates from 29th October 2022 to 22nd April 2023 (i.e. 25 calibration dates), as the calibration period. A model is deemed well-calibrated if it can identify its own uncertainty in making predictions, i.e. if the data points are evenly distributed across the prediction intervals generated by the model. Since the reliability of our forecasts is likely to depend on the prediction horizon, we compare the performance of the model for one-, two-, three-, and four-week-ahead forecasts, to determine how quickly the quality of the forecasts declines.
We use various metrics and figures to evaluate the calibration. Firstly, we visually compare our weekly national-level case and death forecasts, obtained by summing up the age-stratified and local forecasts, with the ensemble forecasts produced by the European COVID-19 Forecast Hub, which serves as a benchmark of short-term forecasts of COVID-19 incidence. We also compare our national-level forecasts with the ensemble forecasts quantitatively via the weighted interval score (WIS) [31], a proper scoring rule (i.e. one that measures both calibration – how accurate the forecasts are – and sharpness – how precise the forecasts are) for quantile forecasts, and the squared error of the median. This comparison assesses whether the overall performance of our model is in line with other COVID-19 prediction models.
Secondly, we evaluate our local forecasts against those of a baseline model (chosen as the Endemic-Epidemic model without transmission between regions, covariates or seasonality, i.e. the model in Eqs. (1)- (4) but with \({w}_{ji}=1\) when \(j=i\) and \({w}_{ji}=0\) otherwise and \(log\left({\varphi }_{it}\right)={\alpha }^{\left(\varphi \right)}\) and \(log\left({\nu }_{it}\right)={\alpha }^{\left(\nu \right)}\)), using proper scoring rules, namely the ranked probability score (RPS), Dawid-Sebastiani score (DSS) and squared error score (SES) (see Supplementary Material for definitions). We also generate the predictive probability distribution of the local, age-stratified (if available) forecasts at each calibration date. We then use the Probability Integral Transform (PIT) histogram to assess the calibration of the model: in models with good calibration, the data should follow the predictive probability distribution, and the PIT histogram should be uniform. We computed a non-randomised yet uniform version of the PIT histogram, to correct for the use of discrete values, as described in Czado et al. [58].
Comparison of age-stratified and non-age-stratified model
As age-stratified case and death data is only available for certain countries, we explore the impact of fitting the model to subnational total case counts for France and Czechia on the ability of the model to predict subnational total numbers of cases and deaths. We do this by fitting the non-age-stratified equivalent of the age-stratified model in Eqs. (3)- (4) and (A1)-(A2) with the age-stratified covariates removed, i.e. Equations (1)- (2) and (A4) (see Supplementary Material), and re-running the calibration analysis of the predictive performance of the model described above.
RShiny application
All forecasts and predictors generated by the model are available in an R Shiny Application (https://github.com/EU-ECDC/RShinyCovidApp). The users can use the application to see the latest case and death forecasts of the model in each country, compare past forecasts to recent data points, generate different transmission and NPI scenarios, and observe the regions most at risk of transmission according to the model. The forecasts contained in the application are updated weekly.
Results
Calibration analysis: assessing the performance of the model
Case forecasts calibration
Firstly, we compute weekly case forecasts across all regions and age groups, by summing up all local age stratified forecasts, across the calibration period (last six months of data). We then compare the one, two, three, and four-week ahead forecasts with the observed data, and the weekly forecasts from the European COVID-19 Forecast Hub.
The comparison of the forecasts is shown in Fig. 1; Table 2. It demonstrates that our model was able to reliably capture the dynamics observed in the data up to two weeks ahead. The magnitude of the peak of the different outbreaks is accurately estimated in all three countries, and the trend forecasted by our model corresponds to the data. Beyond a two-week forecast horizon, the difference between the forecasts and the data becomes starker, in particular during outbreaks (for instance around December 2022 in France), with changes in case numbers being captured with a bigger lag in the forecasts. Based on the squared error of the median, the median forecasts generated by the model are on par with the ensemble forecasts from the European COVID-19 Forecast Hub across all forecast horizons (Table 2). There are periods where our model outperforms the ensemble model (e.g. two-week ahead forecasts between November and December 2022 in Italy), and others where the ensemble model outperforms ours (January-March 2023 in Czechia across all forecast horizons).
However, the dispersion of our forecasts is considerably narrower than that of the ensemble forecasts, particularly for France. The data points are therefore more often outside the 95% prediction interval of our forecasts, despite the median forecasts being close to the data points, and the median and 95% interval of the weighted interval score is higher for France for our model than the ensemble model (Table 2). The narrow prediction interval is a consequence of aggregating all local and age-stratified forecasts to get the national-level estimate rather than directly estimating the national number of new cases: since the dispersion of a sum of negative binomial random variables is much smaller than that of each individual negative binomial random variable. This is also why the prediction interval is narrowest in France, where the number of groups is highest (846 groups = 94 regions times 9 age groups).
We now assess whether the dispersion of local estimates is in line with the data (i.e. whether the prediction intervals from the model include the observed data points). To do so, we compute the case PIT histogram for each country and forecast horizon (Fig. 2). We also generate PIT histograms stratified by broad age groups for France and Czechia (Supplementary Figs. 2 and 3). In one-week-ahead forecasts, we observe an inverse U-shaped histogram in Italy, indicating that the forecasts are slightly overdispersed. The performance of the age-stratified model varies in the different age groups. The model tends to overestimate the number of cases in younger age groups (Supplementary Figs. 2 and 3), while the histograms in age groups between 20 and 80 years old were flat, indicating good calibration. The two-week ahead PIT histogram for Italy also shows very good performance, with little sign of bias, although the model tends to slightly overestimate the number of cases (few observations fall into the highest categories). Over longer time horizons, the PIT histogram becomes U-shaped, i.e. skewed towards extreme values, showing that the forecasts generated by the model are under-dispersed.
The histograms for Czechia and France are U-shaped, indicating that the model is overly confident (i.e. the prediction interval is too narrow). The overestimation of cases in younger age groups may be due to case incidence being determined by the age-structured contact rates used in the model (derived from contact surveys), but cases in this age group may also be under-reported as they typically have milder symptoms, and so are harder to spot without active case finding campaigns. Forecasts past two weeks ahead are more strongly biased, and underdispersed, but better performance is still observed for Italy, where the model is not age-stratified. Similar underdispersion was observed in Czechia and France when a non-age-stratified model was implemented (Supplementary Figs. 6 and 7).
We also generated local forecasts using a baseline model (with no transmission between regions, covariates, seasonality, or age-specific intercepts), and compared them to the forecasts from the full model (Supplementary Table 2). The addition of transmission between regions, covariates and seasonality to the model substantially improved the predictive performance up to 2 weeks ahead for all countries, with an average 37% improvement across the three countries in the median RPS at a 2-week forecast horizon. This was especially true in Czechia and France, where the age-specific dynamics were hard to capture without covariates.
Death forecasts calibration
The death forecast comparison is shown in Fig. 3; Table 3. The performance of the model is not homogeneous across countries. The model performs better in Italy and Czechia, where the median estimate is closer to the observed data, and the 95% prediction interval includes the data more often than in France. In France, the model was not able to accurately capture the drop in number of deaths following the outbreak in December 2022. For most dates, the median estimate of the number of deaths is similar in our model and the European COVID-19 Forecast Hub ensemble model (Fig. 3), and the median of the squared errors of the median predictions across all time points in the prediction period is similar (Table 3). The predictive accuracy and coverage of our model is closer to that of the ensemble model for the death forecasts than the case forecasts (the WIS is closer, Table 3), and the impact of forecast horizon on the performance is not as severe as for the case forecasts, for instance for Italy four-week-ahead forecasts are still in line with the number of reported deaths (Fig. 3). This is due to the larger prediction intervals generated with the death model relative to the number of deaths.
For Czechia and France, the PIT histograms generated for the local age-stratified weekly death forecasts follow much more uniform distributions than those for the case forecasts, for all forecast horizons (Fig. 4). Since the model was age stratified for both countries, the majority of weekly death counts in the data were 0, which was easier for the model to capture and forecast. In Italy, the prediction intervals tend to be too low, since few observed weekly death counts fall into the lower categories of the PIT histograms. Therefore, our model tends to underestimate the number of weekly deaths per region in Italy. The age-stratified PIT histograms in Czechia and France (Supplementary Figs. 4 and 5) are uniform for younger age groups (where almost all observations are 0), and follow an inverse U-shape in older age groups, indicating that the forecasts are slightly overdispersed.
Visualising the predictions in the R Shiny application
The RShiny app is designed to make four features accessible to the user:
-
1)
Forecasts: Current and previous four-week ahead forecasts of the number of cases and deaths in each region (and age group when available).
-
2)
Predictors: The risks of secondary transmission and importations estimated by the model
-
3)
Scenarios: The impact of changes in transmission, be it due to more infectious variants, changes in behaviour, or NPIs on case and death forecasts. These changes in transmission represent the potential impact of new variants and/or control measures on transmission, and should be interpreted with caution in a constantly changing epidemiological situation (impacted for example by behaviour, adherence and other factors). Similarly, this model only considers the epidemiological impact of NPIs; the social or economic costs of different control measures is not considered in this analysis.
-
4)
Replicate calibration: The forecasts generated over the calibration period, at a local and age-stratified level.
Previous and current forecasts
Case and death forecasts can be viewed via a map of the predicted median case/death incidence over the next two weeks, or via time series plots by region (and age group when age-stratified data is available) of the median numbers of daily cases/weekly deaths over the next four weeks (with 50% and 95% prediction intervals) (Fig. 5). Forecasts can be visualised at different geographical scales, either at NUTS-3 level or NUTS-1/NUTS-2 level. The accuracy of past forecasts at different geographical scales and over different time horizons can be visually assessed by varying the date from which to predict (to a date in the past) and comparing past predictions to the observed data.
Local predictors of transmission and importations
Spatial heterogeneity in case incidence, transmission and importation risk at the latest date of the forecasts is displayed in the app via three different maps showing the spatial distribution of cases at a NUTS-3 or NUTS-1/NUTS-2 level, the local risk of transmission (the population-weighted average value of the epidemic component for each region across all age groups), and the local risk of importation (the endemic component for each region summed across age groups) (Fig. 6). The local risk of transmission and importation are shown as a percentage of the highest value of the predictor in the country, and give insight into the spatial heterogeneity in risk in the country. In all three countries, the risk of transmission is more homogeneous across the regions than the risk of importation. The risk of importation, quantified by the endemic predictor, is heavily influenced by the number of inhabitants in a region, so the regions gathering most of the importation risks are regions containing the major cities in all three countries.
Short-term transmission scenarios
Finally, simulations showing the impact of various changes in transmission (either increases due to changes in behaviour or new variants, or decreases caused by targeted or global NPIs) on predicted numbers of cases and deaths are shown in the app via similar figures to those used to display the case and death forecasts (Fig. 7). Options for exploring the impact of changes in transmission include combinations of:
-
Increasing transmission by 0, 20, or 40% to represent different properties of an emerging variant, or changes in behaviour that lead to increased risks of spread.
-
Dropping transmission by 0, 20, or 40% to represent the impact of increased NPIs.
-
Targeting NPIs at the whole population or a specific age group.
-
Removing importations from outside the selected country to represent stringent border closures.
Throughout all three countries, removing importations has very little impact on the transmission dynamics in the case forecasts. Local transmission is sufficient to maintain transmission without new cases being added through the endemic component. On the other hand, the moderate and large changes in transmission through NPIs or changes in behaviour have a large impact on the forecasted number of cases and deaths.
Discussion
We have developed a framework to forecast subnational COVID-19 case and death incidence up to 4 weeks ahead, and explore the potential impact of changes in transmission on reported incidence. The framework has been applied to France, Czechia and Italy. The model outputs are based on routinely collected, publicly available surveillance data. We have also developed a RShiny app, where users can visualise the 4-week-ahead forecasts of both reported cases and deaths, and the predicted impact of changes in transmission. The code we developed to implement the model and the RShiny App is publicly available in two Github repositories: https://github.com/EU-ECDC/RShinyCovidApp and https://github.com/EU-ECDC/BackendCovidApp. The model fits and scenario simulations are updated automatically every week.
The main aim of the framework developed here was to evaluate the local risk of transmission in the presence of complex immunity patterns. As more data sources became available, we selected the covariates that were associated with changes in level of immunity (recent incidence, vaccine coverage), or with changes in behaviour and transmission risk (variant, testing). As outbreaks unfold, the inclusion of new covariates during real-time forecasting can stem from new datasets becoming available (e.g. testing data, number of additional vaccine doses), or changes in transmission patterns, such as the emergence of new variants. Since such changes may take several weeks to identify and include in the model, the scenario simulation may highlight that recent dynamics would be in line with changes in transmission.
Case and death forecasts aggregated at a country level perform comparably with the European COVID-19 Forecast Hub ensemble model, a benchmark in epidemic forecasting built as an ensemble of forecasts from many independent models. In addition, our framework provides NUTS-3-region-level (i.e. much more highly spatially resolved) and, if age-stratified data is available, age-stratified forecasts. Therefore, it can be used for more targeted policy making and planning at a local level. Given the significant spatiotemporal heterogeneity in COVID-19 incidence, and changes in age patterns, the detailed local-level visualisation provided by our model is particularly useful to evaluate targeted control measures. This study highlights the need for surveillance systems that gather accurate, timely, age-stratified data, and the value of making such data publicly available to improve understanding and prediction of local transmission and outbreak response planning.
As demonstrated by our calibration analysis, the subnational and age-stratified case and death forecasts are accurate up to 2 weeks ahead: the case PIT histograms are flat for Italy and for most age groups for Czechia and France, while the median country-level forecast is closer to the data than the European COVID-19 Forecast Hub ensemble forecast. However, the forecasts become less reliable beyond a 2-week horizon. This may be due to fundamental predictability limits; the difficulty of forecasting changes in behaviour and/or sudden changes in transmission, e.g. due to the emergence of a new highly transmissible variant, more than a couple of weeks into the future; or other factors that cause the model to be misspecified, and reflects similar findings from other forecasting efforts [7, 17, 59]. If COVID-19 transmission dynamics become similar to those of seasonal influenza, sudden unexpected changes in transmission may become rarer, which would improve the performance of the model. The age-stratified PIT histograms in both Czechia and France show that calibration is especially difficult in younger age groups, where changes in case-finding strategy had a big impact that could not be fully captured by the testing covariates [60, 61].
The model was built to provide accurate predictions of case and death incidence at a regional level. Our results show that generating reliable case forecasts several weeks ahead is challenging, even when using age-stratified local data. However, the accuracy of death forecasts decreased less rapidly as the forecast horizon increased. The quality of the case and death forecasts was robust to the large reporting changes observed throughout the fitting period, and improved at the latest calibration dates where incidence was low in all three countries. This may be because local outbreaks in groups at risk in low-incidence settings are easier to forecast, rather than outbreaks where all regions are equally vulnerable. Past two weeks, forecasts generated by the model appear to be overconfident, and underestimate the uncertainty in the potential level of transmissions. The alternative transmission scenarios in the Shiny App therefore help to illustrate the full range of variability in short-term regional transmission dynamics that is possible. The local risk of outbreaks and importations is also represented in the Shiny App, using the local predictors of the Endemic-Epidemic model. The forecasts, scenarios, and risk map constitute a reliable, thorough representation of the local risk of transmission, and thus provide a useful aid for planning control measures.
The model fits and simulations highlight several features of COVID-19 dynamics common to all three countries: the risk of background importation of cases (i.e. new cases that were not linked to recent local infections) is always very strongly associated with the number of inhabitants in the region. Hence, more populated and urban areas are at higher risk of further importations of SARS-CoV-2 infections. On the other hand, the risk of transmission is relatively homogeneous across regions, with no regions where the number of secondary cases expected is less than half the region most at risk (Fig. 6B). This could change in future COVID-19 dynamics, where the local level of immunity may be sufficient to completely avoid transmission in certain regions, as is currently observed for pathogens such as measles [34]. The vast majority of cases stem from the epidemic component of the Endemic-Epidemic model, showing that the local transmissibility is sufficient to maintain transmission, even without input from background importations. Therefore, the removal of importations (e.g., by border closures) consistently resulted in a minimal impact on the expected number of case numbers up to four weeks ahead. In contrast, altering the transmission risk within the regions (e.g. via interventions targeted at specific age groups or encompassing the entire population) can substantially change the forecasted case dynamics. Even with strategies targeting a particular age group, we observe indirect effects on incidence in all age groups. We did not observe a strong effect of changes in transmission risk on the four-week-ahead death forecasts. However, we emphasise that we considered a three-week delay between cases and deaths and, hence, the impact of changes in transmission on deaths would be visible for longer-term death forecasts (beyond four weeks ahead).
Including other countries
The framework is currently implemented for France, Czechia and Italy, but other countries can be straightforwardly incorporated, provided the data required for making the subnational daily case and weekly death forecasts (namely NUTS-3-level daily case counts, subnational daily death counts, NUTS-3-/NUTS-2-level daily/weekly vaccinations administered by dose, and NUTS-3-/NUTS-2-/national-level daily/weekly numbers tested) for those countries is available and up-to-date. Full instructions for incorporating a new country are available in the public GitHub repository for the model code: https://github.com/EU-ECDC/BackendCovidApp.
Age stratification
Our comparison of forecasted and observed numbers of cases and deaths over all age groups for France and Czechia indicates that the non-age-stratified model has similar predictive performance in terms of overall cases/deaths as the age-stratified model (Supplementary Figs. 6–9), albeit with slightly worse prediction of deaths (Supplementary Fig. 9). We believe the loss in performance observed in death forecasts to be due to changes in the age structure of reported cases from changes in case detection. Indeed, active case finding may lead to milder cases being reported, while interruption of such a strategy would mean that a larger proportion of the cases are severe. Such changes are likely to be reflected in the age structure of the cases and such information is lost in the non-age-stratified version of the model.
The overall good performance of the non-age-stratified model is encouraging for the application of the model to other countries, very few of which report age-stratified case and death data. While we did not observe a significant improvement by age-stratifying the model, we emphasise that age-stratified data and models are crucial for evaluating the impact of targeted (age-specific) non-pharmaceutical and pharmaceutical interventions, as well as for health-economic analyses (e.g., the computation of DALYs).
Limitations
Our forecasting framework does have some limitations. First, as for all forecasting studies, the forecasts are only as reliable as the input data, and we cannot guarantee the accuracy of the data imported from public data sources, which may have reporting errors or biases that we are not able to account for. In addition, the definition of a COVID-19 death may vary across countries. The framework will only work as long as subnational (and age-stratified, for France and Czechia) case and death data continues to be reported online in the same location and format as it is currently, but several countries have ceased reporting subnational case and death data or changed the format or location of their reporting since we started developing the framework. Changes in data availability could potentially be addressed via changing the spatial resolution of the model, but this would involve substantial modification of the model structure. We only forecast reported cases, which reflect a combination of underlying incidence of infections and reporting, rather than the “true” number of cases. Thus, if testing levels are very low, forecasted case numbers could be low even if circulation is high. However, testing level is adjusted for in the fitting of the model, so forecasts reflect the expected incidence given current testing levels, and still provide a useful indication of changes in transmission. The model for the death forecasts, which uses an estimate of the recent CFR (with uncertainty) to predict deaths, is relatively simple and assumes a constant relationship between changes in case numbers and changes in the CFR over time. However, the calibration analysis shows it provides a relatively straightforward and reliable means of translating case forecasts into death forecasts.
The number of daily cases in each region and age group is drawn from a negative binomial distribution. In this model framework, the overdispersion parameter is constant throughout the fitting period, which implies that the variability around the mean estimate is not affected by the epidemiological situation. This is a strong assumption, and extensions accounting for time-varying overdispersion, or association between overdispersion and covariates would improve the Endemic-Epidemic framework. Furthermore, because of the large number of regions and age groups per country, the national-level estimates of the number of cases were computed as a sum of hundreds of negative binomial draws. This leads to very narrow prediction intervals, especially in France, which do not fully capture the uncertainty around the national estimates.
The contact matrices used to fit the model for France and Czechia were taken from pre-pandemic studies, and may not reflect the contact patterns between age groups in 2020 and 2021. Ideally, time-varying matrices would have been used to estimate age-stratified contact, but such matrices were not available or straightforward to implement in the Epidemic-Endemic framework [62]. The addition of an age-specific intercept in the epidemic component allowed the model to modify the risk of transmission in each group where the number of contacts was not in line with the case counts. However, since these coefficients were not time-dependent, the model considers this age-stratified risk of new cases to be constant (if all other covariates do not vary).
We do not include information on NPIs or population mobility in the covariates in the model despite the improvements in predictions these may yield, since centralised databases of these covariates (such as Google Mobility data [42], the Oxford Stringency Index [43], and the ECDC-JRC Response Measures Database [2]) are no longer being updated, and binary covariates made the model less consistent and comparable between countries, while potentially not being specific enough (i.e. they incorporated the impact of other parameters associated with transmission). Including more NPI covariates also led to issues with identifying the effects of different interventions as many were implemented at the same or overlapping times.
Generalising beyond COVID-19
The flexible framework developed in this paper could be used or readily adapted to model incidence of both novel and seasonal pathogens of public health importance, such as influenza, in order to predict local health burden and inform outbreak response. The Endemic-Epidemic framework underlying our model has already been applied to a variety of other pathogens including measles, cholera, leishmaniasis and pertussis [34, 63,64,65]. However, the number of model parameters that can be estimated directly depends on the amount of data available. Therefore, in its current specification (with a large number of parameters), the model may not be suitable for the early stages of an outbreak (except if the pathogen is seasonal, with available data on previous outbreaks). However, it can be used to estimate the impact of various covariates (from different data sources) on transmission risk.
Availability of data and materials
All source data used in the study were openly available. The code used to develop the analysis and links to the data sources are available on Github (back-end code: https://github.com/EU-ECDC/BackendCovidApp, code for the RShiny App: https://github.com/EU-ECDC/RShinyCovidApp).
References
Chapman LAC, Barnard RC, Russell TW, Abbott S, van Zandvoort K, Davies NG, et al. Unexposed populations and potential COVID-19 hospitalisations and deaths in European countries as per data up to 21 November 2021. Eurosurveillance. 2022;27(1):2101038.
Response Measures Database (RMD) [Internet]. European Centre for Disease Prevention and Control. 2022 . Available from: https://www.ecdc.europa.eu/en/publications-data/response-measures-database-rmd. Cited 2023 May 19.
Davies NG, Abbott S, Barnard RC, Jarvis CI, Kucharski AJ, Munday JD, et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science. 2021;372(6538):3055. https://doi.org/10.1126/science.abg3055.
Sonabend R, Whittles LK, Imai N, Perez-Guzman PN, Knock ES, Rawson T, et al. Non-pharmaceutical interventions, vaccination, and the SARS-CoV-2 delta variant in England: a mathematical modelling study. Lancet. 2021;398(10313):1825–35.
Birrell P, Blake J, van Leeuwen E, Gent N, De Angelis D. Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave. Philos Trans R Soc Lond B Biol Sci. 2021;376(1829):20200279.
Friedman J, Liu P, Troeger CE, Carter A, Reiner RC, Barber RM, et al. Predictive performance of international COVID-19 mortality forecasting models. Nat Commun. 2021;12(1):1–13.
Funk S, Abbott S, Atkins BD, Baguelin M, Baillie JK, Birrell P et al. Short-term forecasts to inform the response to the Covid-19 epidemic in the UK. medRxiv. 2020. 2020.11.11.20220962. Available from: https://www.medrxiv.org/content/https://doi.org/10.1101/2020.11.11.20220962v2. Cited 2023 May 19.
Fonseca-Rodríguez O, Gustafsson PE, San Sebastián M, Connolly A-MF. Spatial clustering and contextual factors associated with hospitalisation and deaths due to COVID-19 in Sweden: a geospatial nationwide ecological study. BMJ Glob Health. 2021;6(7):e006247. https://doi.org/10.1136/bmjgh-2021-006247.
Sartorius B, Lawson AB, Pullan RL. Modelling and predicting the spatio-temporal spread of COVID-19, associated deaths and impact of key risk factors in England. Sci Rep. 2021;11(1):5378.
Wang Q, Dong W, Yang K, Ren Z, Huang D, Zhang P, et al. Temporal and spatial analysis of COVID-19 transmission in China and its influencing factors. Int J Infect Dis. 2021;105:675–85.
Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400.
Danon L, Brooks-Pollock E, Bailey M, Keeling M. A spatial model of COVID-19 transmission in England and Wales: early spread, peak timing and the impact of seasonality. Philos Trans R Soc Lond B Biol Sci. 2021;376(1829):20200272.
Gatto M, Bertuzzo E, Mari L, Miccoli S, Carraro L, Casagrandi R, et al. Spread and dynamics of the COVID-19 epidemic in Italy: effects of emergency containment measures. Proc Natl Acad Sci U S A. 2020;117(19):10484–91.
Pei S, Kandula S, Shaman J. Differential effects of intervention timing on COVID-19 spread in the United States. Sci Adv. 2020;6(49):eabd6370. https://doi.org/10.1126/sciadv.abd6370.
Read JM, Bridgen JRE, Cummings DAT, Ho A, Jewell CP. Novel coronavirus 2019-nCoV (COVID-19): early estimation of epidemiological parameters and epidemic size estimates. Philos Trans R Soc Lond B Biol Sci. 2021;376(1829):20200265.
Jewell CP, Hale AC, Rowlingson BS, Suter C, Read JM, Roberts GO. Bayesian inference for high-dimensional discrete-time epidemic models: spatial dynamics of the UK COVID-19 outbreak. arXiv. 2023. 2306.07987. Available from: https://doi.org/10.48550/arXiv.2306.07987.
Cramer EY, Ray EL, Lopez VK, Bracher J, Brennen A, Castro Rivadeneira AJ, et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proc Natl Acad Sci U S A. 2022;119(15): e2113561119.
stsmodel. GitLab. Available from: https://gitlab.com/chicas-covid19/stsmodel. Cited 2023 May 19.
Giuliani D, Dickson MM, Espa G, Santi F. Modelling and predicting the spatio-temporal spread of cOVID-19 in Italy. BMC Infect Dis. 2020;20(1):700.
Douwes-Schultz D, Sun S, Schmidt AM, Moodie EEM. Extended bayesian endemic-epidemic models to incorporate mobility data into COVID-19 forecasting. Can J Stat. 2022;50(3):713–33.
Fronterre C, Read JM, Rowlingson B, Alderton S, Bridgen J, Diggle PJ, Jewell CP. COVID-19 in England: spatial patterns and regional outbreaks. medRxiv. 2020.05.15.20102715. https://doi.org/10.1101/2020.05.15.20102715.
Ssentongo P, Fronterre C, Geronimo A, Greybush SJ, Mbabazi PK, Muvawala J, et al. Pan-African evolution of within- and between-country COVID-19 dynamics. Proc Natl Acad Sci U S A. 2021;118(28):e2026664118. https://doi.org/10.1073/pnas.2026664118.
Dickson MM, Espa G, Giuliani D, Santi F, Savadori L. Assessing the effect of containment measures on the spatio-temporal dynamic of COVID-19 in Italy. Nonlinear Dyn. 2020;101(3):1833–46.
Grimée M, Bekker-Nielsen Dunbar M, Hofmann F, Held L. SUSPend modelling consortium. Modelling the effect of a border closure between Switzerland and Italy on the spatiotemporal spread of COVID-19 in Switzerland. Spat Stat. 2022;49:100552.
Berlemann M, Haustein E. Right and Yet Wrong: A Spatio-Temporal Evaluation of Germany’s COVID-19 Containment Policy. 2020 . Available from: https://papers.ssrn.com/abstract=3662054. Cited 2023 May 22.
Fritz C, Kauermann G. On the interplay of regional mobility, social connectedness and the spread of COVID-19 in Germany. J R Stat Soc Ser a Stat Soc. 2022;185(1):400–24.
Bekker-Nielsen Dunbar M, Held L. Endemic-epidemic Framework used in Covid-19 modelling. Revstat Stat J. 2020;18(5):565–74.
Celani A, Giudici P. Endemic-epidemic models to understand COVID-19 spatio-temporal evolution. Spat Stat. 2022;49:100528.
Meyer S, Held L, Höhle M. Spatio-temporal analysis of epidemic phenomena using the R package surveillance. J Stat Softw. 2017;77(11). Available from: http://www.jstatsoft.org/v77/i11/.
Temporal. and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena [R package surveillance version 1.21.1]. 2023 May 19 ; Available from: https://CRAN.R-project.org/package=surveillance. Cited 2023 May 19.
Bracher J, Ray EL, Gneiting T, Reich NG. Evaluating epidemic forecasts in an interval format. PLoS Comput Biol. 2021;17(2): e1008618.
Bracher J, Held L. hhh4addon: extending the functionality of surveillance: hhh4. R package. 2019.
Bracher J, Held L. Endemic-epidemic models with discrete-time serial interval distributions for infectious disease prediction. Int J Forecast. 2022;38(3):1221–33.
Robert A, Kucharski AJ, Funk S. The impact of local vaccine coverage and recent incidence on measles transmission in France between 2009 and 2018. BMC Med. 2022;20(1):77.
Alene M, Yismaw L, Assemie MA, Ketema DB, Gietaneh W, Birhan TY. Serial interval and incubation period of COVID-19: a systematic review and meta-analysis. BMC Infect Dis. 2021;21(1):257.
Pung R, Mak TM, CMMID COVID-19 working group, Kucharski AJ, Lee VJ. Serial intervals in SARS-CoV-2 B.1.617.2 variant cases. Lancet. 2021;398(10303):837–8.
Nishiura H, Linton NM, Akhmetzhanov AR. Serial interval of novel coronavirus (COVID-19) infections. Int J Infect Dis. 2020;93:284–6. https://doi.org/10.1016/j.ijid.2020.02.060.
Meyer S. Age-Structured Spatio-Temporal Models for Infectious Disease Counts [R package hhh4contacts version 0.13.1]. 2020 Mar 20 ; Available from: https://CRAN.R-project.org/package=hhh4contacts. Cited 2023 May 19.
Meyer S, Held L. Incorporating social contact data in spatio-temporal models for infectious disease spread. Biostatistics. 2017;18(2):338–51.
Béraud G, Kazmercziak S, Beutels P, Levy-Bruhl D, Lenne X, Mielcarek N, et al. The French connection: the first large Population-based contact survey in France relevant for the spread of Infectious diseases. PLoS ONE. 2015;10(7): e0133203.
Prem K, van Zandvoort K, Klepac P, Eggo RM, Davies NG, Centre for the Mathematical Modelling of Infectious Diseases COVID-19 Working Group. Projecting contact matrices in 177 geographical regions: an update and comparison with empirical data for the COVID-19 era. PLoS Comput Biol. 2021;17(7): e1009098.
Google LLC, Google. COVID-19 Community Mobility Reports. Available from: https://www.google.com/covid19/mobility. Cited 2023 Apr 12.
Hale T, Angrist N, Goldszmidt R, Kira B, Petherick A, Phillips T, et al. A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker). Nat Hum Behav. 2021;5(4):529–38.
Linton NM, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov AR, Jung S-m, Yuan B, Kinoshita R, Nishiura H. Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data. J Clin Med. 2020;9:538. https://doi.org/10.3390/jcm9020538.
Khalili M, Karamouzian M, Nasiri N, Javadi S, Mirzazadeh A, Sharifi H. Epidemiological characteristics of COVID-19: a systematic review and meta-analysis. Epidemiol Infect. 2020;148:e130.
Challen R, Brooks-Pollock E, Tsaneva-Atanasova K, Danon L. Meta-analysis of the severe acute respiratory syndrome coronavirus 2 serial intervals and the impact of parameter uncertainty on the coronavirus disease 2019 reproduction number. Stat Methods Med Res. 2022;31(9):1686–703.
European Centre for Disease Prevention and Control. Data on SARS-CoV-2 variants in the EU/EEA. Available from: https://www.ecdc.europa.eu/en/publications-data/data-virus-variants-covid-19-eueea. Cited 2022 Jul 26.
Eurostat. Methodology - Rural development. Available from: https://ec.europa.eu/eurostat/web/rural-development/methodology. Cited 2022 Jul 26.
World Health Organization. Daily cases and deaths by date reported to WHO. 2023. Available from: https://covid19.who.int/WHO-COVID-19-global-data.csv. Cited 2023 Jan 30.
Santé publique France. Données de laboratoires pour le dépistage. Available from: https://www.data.gouv.fr/fr/datasets/donnees-de-laboratoires-pour-le-depistage-a-compter-du-18-05-2022-si-dep/. Cited 2022 Jul 26.
Ministerstvo Zdravotnictví České republiky. COVID-19 in the Czech Republic: Open data sets and downloadable sets. Available from: https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19. Cited 2022 Jul 26.
Dipartimento della Protezione Civile. Dati COVID-19 Italia. Available from: https://github.com/pcm-dpc/COVID-19. Cited 2022 Jul 26.
datavaccin-covid. Données vaccination par tranche d’âge, type de vaccin et département / région. Available from: https://datavaccin-covid.ameli.fr/explore/dataset/donnees-vaccination-par-tranche-dage-type-de-vaccin-et-departement/information/. Cited 2022 Jul 26.
European Centre for Disease Prevention and Control. Data on COVID-19 vaccination in the EU/EEA. Available from: https://www.ecdc.europa.eu/en/publications-data/data-covid-19-vaccination-eu-eea. Cited 2022 Jul 26.
Insee. Estimation de la population au 1er janvier 2022. Available from: https://www.insee.fr/fr/statistiques/1893198. Cited 2022 Jul 26.
Eurostat. Population on 1 January by age group, sex and NUTS 3 region. Available from: https://ec.europa.eu/eurostat/web/products-datasets/product?code=demo_r_pjangrp3. Cited 2023 Jan 2.
Wahltinez O. and others. COVID-19 Open-Data: curating a fine-grained, global-scale data repository for SARS-CoV-2. 2020; Available from: https://goo.gle/covid-19-open-data.
Czado C, Gneiting T, Held L. Predictive model assessment for count data. Biometrics. 2009;65(4):1254–61.
Sherratt K, Gruson H, Grah R, Johnson H, Niehus R, Prasse B et al. Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations. Elife. 2023;12. https://doi.org/10.7554/eLife.81916.
Leng T, Hill EM, Holmes A, Southall E, Thompson RN, Tildesley MJ, et al. Quantifying pupil-to-pupil SARS-CoV-2 transmission and the impact of lateral flow testing in English secondary schools. Nat Commun. 2022;13(1):1106.
Paltiel AD, Zheng A, Walensky RP. Assessment of SARS-CoV-2 screening strategies to Permit the Safe Reopening of College Campuses in the United States. JAMA Netw Open. 2020;3(7):e2016818.
Munday JD, Abbott S, Meakin S, Funk S. Evaluating the use of social contact data to produce age-specific short-term forecasts of SARS-CoV-2 incidence in England. PLoS Comput Biol. 2023;19(9):e1011453. Available from: https://doi.org/10.1371/journal.pcbi.1011453.
Nightingale ES, Chapman LAC, Srikantiah S, Subramanian S, Jambulingam P, Bracher J, et al. A spatio-temporal approach to short-term prediction of visceral leishmaniasis diagnoses in India. PLoS Negl Trop Dis. 2020;14(7): e0008422.
Munro AD, Smallman-Raynor M, Algar AC. Long-term changes in endemic threshold populations for pertussis in England and Wales: a spatiotemporal analysis of Lancashire and South Wales, 1940-69. Soc Sci Med. 2021;288: 113295.
Jeandron A. Tap water access and its relationship with cholera and other diarrhoeal diseases in an urban, cholera-endemic setting in the Democratic Republic of the Congo [doctoral]. London School of Hygiene & Tropical Medicine; 2020. Available from: https://researchonline.lshtm.ac.uk/id/eprint/4659288/. Cited 2023 May 22.
Acknowledgements
We acknowledge Dr Naomi Waterlow (London School of Hygiene and Tropical Medicine) and Dr Sophie Meakin (Epicentre, Paris) for their feedback on early versions of the RShiny App, which helped us clarify the presentation of the results. We also acknowledge Dr Johannes Bracher (Karlsruhe Institute of Technology) for helpful discussions regarding the hhh4addon and hhh4contacts R packages.
Funding
The work received funding from the European Centre for Disease Prevention and Control (ECDC) via the specific contract “Analysis of COVID-19 outbreak risk at subnational level in the vaccine era: development of an RShiny app” (REOP/2021/SMS/13060) under the framework contract “Mathematical Modelling & Economic Evaluation” (OJ-2019-SRS-10962). AR was supported by the National Institute for Health Research (NIHR200908), LACC was supported by the Wellcome Trust (grant: 206250/Z/17/Z) and NIHR (NIHR200908). SF was supported by a Wellcome Trust Senior Research Fellowship in Basic Biomedical Science (210758/Z/18/Z), AJK was supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (206250/Z/17/Z).
Author information
Authors and Affiliations
Contributions
AR, LACC, AJK and SF led on conceptualisation and methodology. AR and LACC were in charge of software, validation, formal analysis, data curation, visualisation, writing - original draft preparation, and writing - review and editing. BP, RN, FS, and RG were in charge of project administration, funding acquisition, and writing - review and editing. AJK and SF were responsible for supervision and writing - review and editing.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable: this study only involved secondary analysis of publicly available aggregated surveillance data.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1:
Supplementary Figure 1. Lag distribution used in the epidemic component in the model. Supplementary Table 1. Definitions and sources of covariates included in the model. Supplementary Table 2. Median and 95% interval of ranked probability score (RPS), Dawid-Sebastiani score (DSS) and squared error score (SES) of one-, two-, three- and four-week-ahead local age-stratified case forecasts for Czechia and France and non-age-stratified forecasts for Italy for the baseline Endemic-Epidemic model (without transmission between regions, covariates or seasonality) and full Endemic-Epidemic model (with covariates and seasonality) across all time points in the prediction period from 29th October 2022 to 22nd April 2023. Supplementary Figure 2. PIT histograms showing the calibration of the daily case forecasts from the model for Czechia for 0-19 year-olds, 20-59 year-olds, 60-79 year-olds and 80+ year-olds (rows, top to bottom) for one-, two-, three- and four-week-ahead forecast horizons (columns, left to right) for 29th October 2022 to 22nd April 2023. Supplementary Figure 3. PIT histograms showing the calibration of the daily case forecasts for France for 0-19 year-olds, 20-59 year-olds, 60-79 year-olds and 80+ year-olds (rows, top to bottom) for one-, two-, three- and four-week-ahead forecast horizons (columns, left to right) for 29th October 2022 to 22nd April 2023. Supplementary Figure 4. PIT histograms showing the calibration of the weekly death forecasts from the model for Czechia for 0-19 year-olds, 20-59 year-olds, 60-79 year-olds and 80+ year-olds (rows, top to bottom) for one-, two-, three- and four-week-ahead forecast horizons (columns, left to right) for 29th October 2022 to 22nd April 2023. Supplementary Figure 5. PIT histograms showing the calibration of the weekly death forecasts from the model for France for 0-19 year-olds, 20-59 year-olds, 60-79 year-olds and 80+ year-olds (rows, top to bottom) for one-, two-, three- and four-week-ahead forecast horizons (columns, left to right) for 29th October 2022 to 22nd April 2023. Supplementary Figure 6. Comparison of one-, two-, three- and four-week-ahead national-level case forecasts (rows, top to bottom) for Italy, Czechia and France (columns, left to right) from non-age-stratified version of our model (Endemic-Epidemic (EE) model) with ensemble forecasts from the European COVID-19 Forecast Hub (ensemble) and observed cases (data) for 29th October 2022 to 22nd April 2023. Supplementary Figure 7. PIT histograms showing the calibration of the daily case forecasts from the non-age-stratified version of the model for Italy, Czechia and France (columns, left to right) for one-, two-, three- and four-week-ahead forecast horizons (rows, top to bottom) for 29th October 2022 to 22nd April 2023. Supplementary Figure 8. Comparison of one-, two-, three- and four-week-ahead national-level death forecasts (rows, top to bottom) for Italy, Czechia and France (columns, left to right) from non-age-stratified version of our model (Endemic-Epidemic (EE) model) with ensemble forecasts from the European COVID-19 Forecast Hub (ensemble) and observed cases (data) for 29th October 2022 to 22nd April 2023. Dashed lines show median forecasts, shaded regions the 95% prediction interval for the forecasts. Supplementary Figure 9. PIT histograms showing the calibration of the weekly death forecasts from the non-age-stratified version of the model for Italy, Czechia and France (columns, left to right) for one-, two-, three- and four-week-ahead forecast horizons (rows, top to bottom) for 29th October 2022 to 22nd April 2023.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Robert, A., Chapman, L.A., Grah, R. et al. Predicting subnational incidence of COVID-19 cases and deaths in EU countries. BMC Infect Dis 24, 204 (2024). https://doi.org/10.1186/s12879-024-08986-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12879-024-08986-x