Predicting subnational incidence of COVID-19 cases and deaths in EU countries

Background Recurring COVID-19 waves highlight the need for tools able to quantify transmission risk, and identify geographical areas at risk of outbreaks. Local outbreak risk depends on complex immunity patterns resulting from previous infections, vaccination, waning and immune escape, alongside other factors (population density, social contact patterns). Immunity patterns are spatially and demographically heterogeneous, and are challenging to capture in country-level forecast models. Methods We used a spatiotemporal regression model to forecast subnational case and death counts and applied it to three EU countries as test cases: France, Czechia, and Italy. Cases in local regions arise from importations or local transmission. Our model produces age-stratified forecasts given age-stratified data, and links reported case counts to routinely collected covariates (e.g. test number, vaccine coverage). We assessed the predictive performance of our model up to four weeks ahead using proper scoring rules and compared it to the European COVID-19 Forecast Hub ensemble model. Using simulations, we evaluated the impact of variations in transmission on the forecasts. We developed an open-source RShiny App to visualise the forecasts and scenarios. Results At a national level, the median relative difference between our median weekly case forecasts and the data up to four weeks ahead was 25% (IQR: 12–50%) over the prediction period. The accuracy decreased as the forecast horizon increased (on average 24% increase in the median ranked probability score per added week), while the accuracy of death forecasts was more stable. Beyond two weeks, the model generated a narrow range of likely transmission dynamics. The median national case forecasts showed similar accuracy to forecasts from the European COVID-19 Forecast Hub ensemble model, but the prediction interval was narrower in our model. Generating forecasts under alternative transmission scenarios was therefore key to capturing the range of possible short-term transmission dynamics. Discussion Our model captures changes in local COVID-19 outbreak dynamics, and enables quantification of short-term transmission risk at a subnational level. The outputs of the model improve our ability to identify areas where outbreaks are most likely, and are available to a wide range of public health professionals through the Shiny App we developed. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-024-08986-x.


Introduction
Local dynamics of COVID-19 depend on a combination of the level of immunity in a population, which in turn depends on previous incidence, vaccine uptake, and the immune escape properties of the currently circulating SARS-CoV-2 variants, and other factors such as population density, social contact patterns, migration patterns and adherence to public health measures.Most countries in Europe have been affected by repeated COVID-19 waves since March 2020, and have implemented extensive vaccination campaigns in order to reduce the health impact of these waves.This led to high national levels of both natural and vaccine-induced immunity [1].Before the emergence of the Delta and Omicron variants of concern (VOCs), this immunity provided considerable and durable protection against severe outcomes (hospitalisation and death) and some transient protection against infection [2].The emergence of new VOCs, and the extent of their immune escape, along with the waning of efficacy observed for a two-dose vaccine course, precipitated the expansion and acceleration of "booster" vaccination campaigns, further shifting the immune landscape of the population.Overall immunity to the Omicron BA.1 variant rose quickly after its emergence due to the unprecedentedly large wave of BA.1 cases that occurred throughout Europe in late 2021-early 2022, and vaccine booster campaigns.Since then, repeated emergence of Omicron subvariants with high levels of immune escape has led to multiple waves of infections, with lower case burden than the first BA.1 wave.
Currently, the overall level of immunity against infection is high compared to the early phase of the COVID-19 pandemic.The immune landscape of the population is spatially heterogeneous due to considerable variation in infection histories (e.g.multiple reinfections with different variants) and differences in vaccine uptake and timing across the population.If overall immunity rises, this immune landscape should result in the risk of outbreaks becoming more spatially and demographically heterogeneous, with the potential for distinct identifiable outbreaks.Therefore, there is interest from public health agencies in forecasting short-term incidence at a subnational level, and in different age groups, to anticipate large localised spikes in case numbers and local pressure on health care systems.Predicting such spikes in health care demand will become increasingly relevant as time moves forward if COVID-19 becomes more seasonal and influenza-like in its dynamics.This project was developed in collaboration with the European Centre for Disease Prevention and Control (ECDC), with the objective of developing a statistical framework for forecasting local case and death incidence in a range of European countries, and visualisation tools to communicate the predictions.The forecasts and scenario analysis can then be used to optimise planning and allocation of resources.
There have been numerous attempts to model subnational incidence of COVID-19 over the course of the pandemic.These have tended to fall into two broad categories: mechanistic "spatial" susceptible-exposedinfectious-recovered transmission models and statistical time series/spatiotemporal models.Most time series and transmission models have treated subnational regions as independent (fitted the model and made predictions separately for each region) without accounting for spatiotemporal correlations in incidence between regions [3][4][5][6][7], despite these being strong [8][9][10].A limited number of transmission models have instead treated subnational regions as connected sub-populations via a metapopulation approach, and used geographical distance or mobility/commuting data to parametrise connectivity between regions [11][12][13][14][15][16].These models become increasingly complex and hence slow to generate predictions as the number of affected regions increases, and have therefore received limited use for real-time forecasting during the pandemic.Time series models and spatiotemporal statistical models, on the other hand, have been used extensively and successfully to forecast future incidence at national and subnational levels [6,7,17].In the latter category, we focus on Endemic-Epidemic models, a flexible class of spatiotemporal statistical models that can be used to link changes in incidence to recent case numbers and the effect of various different covariates, on which the framework used in this paper is based.Endemic-Epidemic models have already been employed during the pandemic to understand and forecast spatiotemporal spread of COVID-19 at a subnational level [18][19][20][21] and at a national level in Africa [22], and to assess the impact of non-pharmaceutical interventions (NPIs), including lockdowns and border closures [23][24][25][26] (see [27] for a review).Their superiority to time series models that assume independence between regions has been demonstrated on data from northern Italy [28].However, they have not, as yet, been applied to forecast subnational incidence across multiple countries.
In this paper, we present a flexible modelling framework to capture subnational, age-stratified case dynamics of COVID-19, alongside a publicly available RShiny App to visualise the results and forecasts generated.The framework is used to predict subnational incidence of COVID-19 cases and deaths from routinely collected, publicly available surveillance data, and forecast the impact of changes in transmission on short-term dynamics (e.g.due to changes in behaviour or transmissibility).The framework can be run with an age-stratified model if local, age-stratified data is available.Public health professionals can use the RShiny App to visualise the forecasts of reported cases and deaths, and the predicted incidence under different scenarios, to get a full picture of the anticipated short-term burden of COVID-19 in local areas.We apply the framework to forecast cases and deaths at a NUTS-3 (Nomenclature of Territorial Units for Statistics 3) level in France, Czechia and Italy as test cases, and evaluate its ability to predict subnational incidence up to four weeks ahead.

Spatiotemporal modelling framework for reported cases
We model subnational COVID-19 case counts using the Endemic-Epidemic spatiotemporal modelling framework [29], as implemented in the surveillance R package [30].Endemic-Epidemic models provide a flexible framework for relating current incidence of cases to recent case incidence and to imported cases, accounting for the influence of other factors (i.e.covariates) on these relationships.They decompose local incidence in region i at time t into three components: -An autoregressive component that quantifies the number of new infections expected from cases reported in region i at previous time steps.-A neighbourhood component that quantifies the number of new infections expected from cases reported in regions around region i in previous time steps, which depends on the connectivity (or the human mobility) between regions.-An endemic component that quantifies the background number of new infections occurring in region i at time t , independent of the current level of trans- mission, representing importations from regions outside those included in the study (or cases that could not be linked to the mechanistic components).
Endemic-Epidemic models are able to integrate multiple data sources from disease surveillance activity to link case incidence and various covariates.Each component is independently impacted by each covariate.Since we introduce various covariates and controls in our model, we merge the autoregressive and neighbourhood component into an epidemic component to reduce the number of parameters estimated, and avoid identifiability and convergence issues.
Early versions of the Endemic-Epidemic models only included dependence of current cases on cases in the previous time step, but they have since been extended to account for distributed lags via the hhh4addon R package [31][32][33].This allows for a more faithful representation of the impact of recent incidence on current case number, for instance by aligning the lag distribution to the serial interval of the pathogen.
The link between indicators of immunity and risk of SARS-CoV-2 infection is complex and unstable, due to waning of vaccine and infection-induced immunity, and emergence of variants able to escape immunity.The flexibility of the Endemic-Epidemic framework therefore makes it well suited to capture local dynamics of COVID-19 cases, while fully mechanistic frameworks may be too complex to parametrise.Depending on data availability, the model can be stratified by age.

Non-age-stratified model
The total number of cases (over all age groups) in region i at time t , Y it , conditional on the number of cases in the same region and neighbouring regions in the previous p time steps, is modelled as negative binomial: is the transmission poten- tial from recent cases in region j , with [u d ] the normal- ised lag weight for cases d days ago.We set p = 20 days and use a custom composite serial interval distribution for u d that accounts for missing infection generations [34], with a mean and standard deviation for the first infection generation of 5 days and 1.5 days respectively, based on the estimated serial interval for SARS-CoV-2 for pre-Omicron variants [35][36][37], and 80% of the composite serial interval assumed to reflect direct transmission (without missing infections) (Supplementary Fig. 1).The predictors for the epidemic (combined autoregressive and neighbourhood) and endemic components, ϕ it and ν it , determine the number of cases stemming from each component and are assumed to depend on log-linear component-specific predictors: The predictors are independently impacted by different covariates, z (ϕ) it and z (ν) it , i.e. a covariate may be associated with fewer imported cases (endemic component ν it ), but have little impact on the spread of the virus in and between the regions (epidemic component ϕ it ).The association between the covariates and the local number of cases expected is quantified by regression coefficients, (1) , estimated through a maximum-likelihood approach.The weights w ji in the neighbourhood compo- nent quantify the degree of connectivity between region i and surrounding regions j .We use a power-law model for the neighbourhood weights: where o ji is the adjacency order between regions i and j and ρ is a decay parameter to be estimated.The adjacency order is defined as the minimum number of borders that must be crossed to get from i to j : the adjacency degree is equal to 1 between neighbours, 2 between neighbours of neighbours, and so on.The maximum adjacency order we consider is 5.We normalise the weights, such that I i=1 w ji = 1 , with I the number of regions in the country.

Age-stratified model
For countries in which age-stratified subnational case count data is available, we implement an age-stratified version of the model above using modified code from the hhh4contacts R package [38,39].For this model, the number of cases in age group a in region i at time t , Y ait , conditional on the numbers of cases in the previous p time steps in the same and other age groups in region i and the surrounding regions is modelled as negative binomial: ait , where Y t−1 = {Y a,i,t−1 } a=1, ... ,n a ,i=1, ... ,n i with n a age groups and n i regions, and c ba is the mean number of daily contacts in age group a of an individual in age group b .We stratify the population into n a = 9 age groups (0-9, 10-19,…, 70-79, 80+ years).We use age-structured contact matrices from country-specific pre-pandemic contact surveys for c ba where available, e.g. for France [40], and synthetic contact matrices estimated from contact survey and demographic data for countries in which no nationally representative contact surveys have been conducted, e.g. for Czechia [41].We used age-stratified intercepts in the epidemic component.

Covariates
We incorporate various covariates in z it in the log-linear predictors for ϕ it and ν it .The full model equa- tions are given in the Supplementary Material (see Model equation for each country and Supplementary Table 1).The covariates were picked based on potentially having had an effect on transmission and importation risk, or having an effect in future.We compare the forecasts obtained with the full model with covariates against a baseline model without covariates in the Supplement.
We included the same set of covariates in the endemic component in the age-stratified and non-age-stratified models: -population: the total population of region i -urban/rural status: binary covariates for whether the NUTS- These covariates cover the impact of demographic characteristics and transmission in neighbouring countries on the background number of cases in each region (and age group in the age-stratified version of the model).The covariate specifications of the epidemic component depend on the availability of age-stratified data as follows: -population: -age-stratified: two covariates: -total population of region i -proportion of the population of age group a in region i.
-non age-stratified: one covariate: the total population of region i. -testing: -age-stratified: two covariates: - The covariates above cover the impact of local immunity (due to vaccine or previous infection), testing patterns, variants and seasonality on the risks of reported cases.We included covariates that quantify the association between number of tests and incidence so that the model may capture changes in surveillance and reporting patterns, whereby drops in testing can be associated with changes in the number of new cases.For instance, in age groups with a lower proportion of hospitalised cases (e.g.younger age groups) the reporting rate may vary greatly if only severe cases are tested and reported.The covariates quantifying the association between infection or vaccine-acquired immunity and transmission risk depend on various thresholds (e.g.recent incidence corresponds to cases reported in the last month; second vaccine doses are only taken into account for the last 120 days).These thresholds can easily be changed in the code to generate sensitivity analyses, or to adapt to the characteristics of new variants or vaccines.Other covariates that were considered for inclusion in the model were population mobility indices (such as the Google mobility index [42]) and binary covariates for various NPIs (such as the Oxford Stringency Index [43]), but these were found either to lead to parameter identifiability issues or not to improve predictive performance, and their data sources have been discontinued.

Model fitting
To make forecasts of future cases and deaths, we fit the model to case data reported between September 2020 and the latest reported date for each country (currently end of April 2023) to estimate the regression coefficients β (ϕ) and β (ν) .The model is fit separately for each country.We do this via maximum-likelihood estimation, as implemented in the surveillance and hhh4addon R packages.

Case forecasts
We use the parameter estimates from the model fitting to generate four-week-ahead forecasts of daily case numbers in each region and age group by simulating the model forward 28 days with projected values of the covariates.We used the latest value of the covariates describing vaccine coverage and testing, and the mean value over the past 30 days for the number of cases in the rest of Europe in the past month.We run 100 simulations for each country to account for stochasticity (from the negative binomial draws for the number of cases) and parameter uncertainty (from model fitting).We do this by drawing ten sets of parameter values using the parameter covariance matrix estimated from the model fitting, and running ten simulations for each parameter set.We output the median, 2.5th, 25th, 75th, and 97.5th percentiles of the resulting predicted distribution for each date and region (and age group for the age-stratified model) for visualisation in the RShiny app.

Death forecasts
We use simple linear regression models to generate four-week-ahead forecasts of weekly numbers of reported deaths.To do so, we estimate the Case Fatality Rate (CFR), and combine it with the recent number of reported cases to forecast the number of deaths, We first aggregate the number of reported cases and deaths by week, and compute a proxy for the age-stratified CFR assuming a three-week delay between weekly reported cases and reported deaths [44][45][46].Specifically, we compute the CFR for age group ai in region i at week w as: where N aiw and D aiw are the numbers of cases and deaths in age group a , in NUTS-1/NUTS-2 region i , in week w .The death forecast model was implemented at a NUTS-1/NUTS-2 geographical scale since death data was not always available at a NUTS-3 level.
We then calculate ∆CFR aiwx , the change in CFR, and ∆N aiwx , the change in the number of cases, between the prediction date and the forecast horizon: where x is the forecast horizon (1, 2, or 3 weeks).We implement a linear regression model for each forecast horizon and age group, with ∆CFR as outcome, and ∆N as explanatory variable, again using a three-week delay: We then use the estimates of α ax and β ax to predict ∆CFR between the last week of reported case data ( w pred ) and the forecast dates ( w pred + 1 , w pred + 2 , w pred + 3 ), and calculate the predicted CFR as: (using a sample of 10 values of ∆CFR aiwx , computed from the mean estimate and the prediction interval of the linear regression).Finally, we draw the number of new weekly deaths each week using a binomial distribution, from the estimated CFR, and the number of cases reported three weeks before: We draw 500 forecasts for D a,i,w pred +x in total (50 per value of CFR a,i,w pred +x ).Given the three-week delay between cases and deaths, one-, two-, and three-weekahead death forecasts are generated using already reported case data.In contrast, four-week-ahead death forecasts require one-week-ahead case forecasts.We estimate the change in CFR using the regression parameters from the three-week-ahead forecasts and the fourweek change in number of cases in the one-week-ahead case forecasts

Scenario simulations
To explore the impact of variations in transmission intensity or implementation of non-pharmaceutical interventions (NPIs), we generate 28-day forecasts under combinations of the following scenarios, by changing the CFR a,i,w pred +x = CFR aiw pred + ∆CFR aiw pred x D a,i,w pred +x ∼ Binomial(N a,i,w pred +x−3 , CFR a,i,w pred +x ) All scenarios affect every region in the same way (i.e.we do not consider local NPIs).We generate 100 simulations under each scenario.

Data
Several publicly available data sources are used to implement the model and are summarised in Table 1.Since compilation of COVID-19 surveillance reports in centralised databases was interrupted in early 2022 in many countries, the majority of the data, including the case and death data, is imported from country-specific sources.

Case and death data
Local case data at a NUTS-3 level was used in all countries (age-stratified in France and Czechia).For Czechia, death data at NUTS-3 level was used, but for Italy and France, death data was only available at NUTS-1/ NUTS-2 level.Numbers of deaths were taken to be those reported in the national databases for each country, regardless of potential differences in how COVID-19 deaths were defined between countries.

Covariate data
Where available, daily age-stratified (for the age-stratified model) vaccination data at NUTS-3 regional level is used, though most sources provide weekly data, and a two-week delay for protection from each dose to develop is assumed.Publicly available testing data varies considerably in spatial resolution (for some countries only national data is available) and age stratification (for some countries only total data is available), so age-stratified national testing data is used as a default and subnational data used when available.Age-stratified regional population data is drawn from a different source for each country (Table 1).Data on the proportion of different variants among sequenced cases is drawn from the ECDC variant database [47].The urban-rural status of each NUTS-3 region is taken from the Eurostat database (database labelled "urban-rural remoteness") [48].Daily numbers of cases in the rest of Europe are taken from the World Health Organisation database of daily numbers of cases and deaths in different countries [49].

Calibration analysis
We evaluate the ability of the model to generate accurate ("well-calibrated") and reliable case and death forecasts in each country by fitting the model up to a set of dates in a calibration period, generating 28-day-ahead forecasts from these dates using the model fits, and comparing these forecasts with the data.The forecasts are generated by re-fitting the model for each calibration date (i.e.we do not use the data posterior to the calibration date to generate the forecasts).We used weekly dates from 29th October 2022 to 22nd April 2023 (i.e. 25 calibration dates), as the calibration period.A model is deemed wellcalibrated if it can identify its own uncertainty in making predictions, i.e. if the data points are evenly distributed across the prediction intervals generated by the model.Since the reliability of our forecasts is likely to depend on the prediction horizon, we compare the performance of the model for one-, two-, three-, and four-week-ahead forecasts, to determine how quickly the quality of the forecasts declines.We use various metrics and figures to evaluate the calibration.Firstly, we visually compare our weekly nationallevel case and death forecasts, obtained by summing up the age-stratified and local forecasts, with the ensemble forecasts produced by the European COVID-19 Forecast Hub, which serves as a benchmark of short-term forecasts of COVID-19 incidence.We also compare our national-level forecasts with the ensemble forecasts quantitatively via the weighted interval score (WIS) [31], a proper scoring rule (i.e. one that measures both calibration -how accurate the forecasts are -and sharpness -how precise the forecasts are) for quantile forecasts, and the squared error of the median.This comparison assesses whether the overall performance of our model is in line with other COVID-19 prediction models.
Secondly, we evaluate our local forecasts against those of a baseline model (chosen as the Endemic-Epidemic model without transmission between regions, covariates or seasonality, i.e. the model in Eqs. ( 1)-( 4) but with w ji = 1 when j = i and w ji = 0 otherwise and log(ϕ it ) = α (ϕ) and log(ν it ) = α (ν) ), using proper scoring rules, namely the ranked probability score (RPS), Dawid-Sebastiani score (DSS) and squared error score (SES) (see Supplementary Material for definitions).We also generate the predictive probability distribution of the local, age-stratified (if available) forecasts at each calibration date.We then use the Probability Integral Transform (PIT) histogram to assess the calibration of the model: in models with good calibration, the data should follow the predictive probability distribution, and the PIT histogram should be uniform.We computed a non-randomised yet uniform version of the PIT histogram, to correct for the use of discrete values, as described in Czado et al. [58].

Comparison of age-stratified and non-age-stratified model
As age-stratified case and death data is only available for certain countries, we explore the impact of fitting the model to subnational total case counts for France and Czechia on the ability of the model to predict subnational total numbers of cases and deaths.We do this by fitting the non-age-stratified equivalent of the age-stratified model in Eqs. ( 3)-( 4) and (A1)-(A2) with the agestratified covariates removed, i.e.Equations ( 1)-( 2) and (A4) (see Supplementary Material), and re-running the calibration analysis of the predictive performance of the model described above.

RShiny application
All forecasts and predictors generated by the model are available in an R Shiny Application (https:// github.com/ EU-ECDC/ RShin yCovi dApp).The users can use the application to see the latest case and death forecasts of the model in each country, compare past forecasts to recent data points, generate different transmission and NPI scenarios, and observe the regions most at risk of transmission according to the model.The forecasts contained in the application are updated weekly.

Case forecasts calibration
Firstly, we compute weekly case forecasts across all regions and age groups, by summing up all local age stratified forecasts, across the calibration period (last six months of data).We then compare the one, two, three, and four-week ahead forecasts with the observed data, and the weekly forecasts from the European COVID-19 Forecast Hub.
The comparison of the forecasts is shown in Fig. 1; Table 2.It demonstrates that our model was able to reliably capture the dynamics observed in the data up to two weeks ahead.The magnitude of the peak of the different outbreaks is accurately estimated in all three countries, and the trend forecasted by our model corresponds to the data.Beyond a two-week forecast horizon, the difference between the forecasts and the data becomes starker, in particular during outbreaks (for instance around  However, the dispersion of our forecasts is considerably narrower than that of the ensemble forecasts, particularly for France.The data points are therefore more often outside the 95% prediction interval of our forecasts, despite the median forecasts being close to the data points, and the median and 95% interval of the weighted interval score is higher for France for our model than the ensemble model (Table 2).The narrow prediction interval is a consequence of aggregating all local and age-stratified forecasts to get the national-level estimate rather than directly estimating the national number of new cases: since the dispersion of a sum of negative binomial random variables is much smaller than that of each individual negative binomial random variable.This is also why the prediction interval is narrowest in France, where the number of groups is highest (846 groups = 94 regions times 9 age groups).
We now assess whether the dispersion of local estimates is in line with the data (i.e.whether the prediction intervals from the model include the observed data points).To do so, we compute the case PIT histogram for each country and forecast horizon (Fig. 2).We also generate PIT histograms stratified by broad age groups for France and Czechia (Supplementary Figs. 2 and 3).In one-week-ahead forecasts, we observe an inverse U-shaped histogram in Italy, indicating that the forecasts are slightly overdispersed.The performance of the age-stratified model varies in the different age groups.The model tends to overestimate the number of cases in younger age groups (Supplementary Figs. 2  and 3), while the histograms in age groups between 20 and 80 years old were flat, indicating good calibration.The two-week ahead PIT histogram for Italy also shows very good performance, with little sign of bias, although the model tends to slightly overestimate the number of cases (few observations fall into the highest categories).Over longer time horizons, the PIT histogram becomes U-shaped, i.e. skewed towards extreme values, showing that the forecasts generated by the model are under-dispersed.
The histograms for Czechia and France are U-shaped, indicating that the model is overly confident (i.e. the prediction interval is too narrow).The overestimation of cases in younger age groups may be due to case incidence being determined by the age-structured contact rates used in the model (derived from contact surveys), but cases in this age group may also be under-reported as they typically have milder symptoms, and so are harder to spot without active case finding campaigns.Forecasts past two weeks ahead are more strongly biased, and underdispersed, but better performance is still observed for Italy, where the model is not age-stratified.Similar underdispersion was observed in Czechia and France when a non-age-stratified model was implemented (Supplementary Figs. 6 and 7).
We also generated local forecasts using a baseline model (with no transmission between regions, covariates, seasonality, or age-specific intercepts), and compared them to the forecasts from the full model (Supplementary Table 2).The addition of transmission between regions, covariates and seasonality to the model substantially improved the predictive performance up to 2 weeks ahead for all countries, with an average 37% improvement across the three countries in the median RPS at a 2-week forecast horizon.This was especially true in Czechia and France, where the age-specific dynamics were hard to capture without covariates.

Death forecasts calibration
The death forecast comparison is shown in Fig. 3; Table 3.The performance of the model is not homogeneous across countries.The model performs better in Italy and Czechia, where the median estimate is closer to the observed data, and the 95% prediction interval includes the data more often than in France.In France, the model was not able to accurately capture the drop in number of deaths following the outbreak in December 2022.For most dates, the median estimate of the number of deaths is similar in our model and the European COVID-19 Forecast Hub ensemble model (Fig. 3), and the median of the squared errors of the median predictions across all time points in the prediction period is similar (Table 3).The predictive accuracy and coverage of our model is closer to that of the ensemble model for the death forecasts than the case forecasts (the WIS is closer, Table 3), and the impact of forecast horizon on the performance is not as severe as for the case forecasts, for instance for Italy four-week-ahead forecasts are still in line with the number of reported deaths (Fig. 3).This is due to the larger prediction intervals generated with the death model relative to the number of deaths.
For Czechia and France, the PIT histograms generated for the local age-stratified weekly death forecasts follow much more uniform distributions than those for the case forecasts, for all forecast horizons (Fig. 4).Since the model was age stratified for both countries, the majority of weekly death counts in the data were 0, which was easier for the model to capture and forecast.In Italy, the prediction intervals tend to be too low, since few observed weekly death counts fall into the lower categories of the PIT histograms.Therefore, our model tends to underestimate the number of weekly deaths per region in Italy.The age-stratified PIT histograms in Czechia and France (Supplementary Figs. 4 and 5) are uniform for younger age groups (where almost all observations are 0), and follow an inverse U-shape in older age groups, indicating that the forecasts are slightly overdispersed.

Visualising the predictions in the R Shiny application
The RShiny app is designed to make four features accessible to the user: 1) Forecasts: Current and previous four-week ahead forecasts of the number of cases and deaths in each region (and age group when available). 2) Predictors: The risks of secondary transmission and importations estimated by the model 3) Scenarios: The impact of changes in transmission, be it due to more infectious variants, changes in behaviour, or NPIs on case and death forecasts.These changes in transmission represent the potential impact of new variants and/or control measures on transmission, and should be interpreted with caution in a constantly changing epidemiological situation (impacted for example by behaviour, adherence and other factors).Similarly, this model only considers the epidemiological impact of NPIs; the social or economic costs of different control measures is not considered in this analysis.4) Replicate calibration: The forecasts generated over the calibration period, at a local and age-stratified level.

Previous and current forecasts
Case and death forecasts can be viewed via a map of the predicted median case/death incidence over the next two weeks, or via time series plots by region (and age group when age-stratified data is available) of the median numbers of daily cases/weekly deaths over the next four weeks (with 50% and 95% prediction intervals) (Fig. 5).
Forecasts can be visualised at different geographical scales, either at NUTS-3 level or NUTS-1/NUTS-2 level.The accuracy of past forecasts at different geographical scales and over different time horizons can be visually assessed by varying the date from which to predict (to a date in the past) and comparing past predictions to the observed data.

Local predictors of transmission and importations
Spatial heterogeneity in case incidence, transmission and importation risk at the latest date of the forecasts is displayed in the app via three different maps showing the spatial distribution of cases at a NUTS-3 or NUTS-1/NUTS-2 level, the local risk of transmission (the populationweighted average value of the epidemic component for each region across all age groups), and the local risk of importation (the endemic component for each region summed across age groups) (Fig. 6).The local risk of transmission and importation are shown as a percentage of the highest value of the predictor in the country, and give insight into the spatial heterogeneity in risk in the country.In all three countries, the risk of transmission is more homogeneous across the regions than the risk of importation.The risk of importation, quantified by the endemic predictor, is heavily influenced by the number of inhabitants in a region, so the regions gathering most of the importation risks are regions containing the major cities in all three countries.Red dashed lines at relative frequencies of 0.5 and 1.5 show reasonable bounds for calibration compared to desired relative frequency of 1 (blue dashed line)

Short-term transmission scenarios
Finally, simulations showing the impact of various changes in transmission (either increases due to changes in behaviour or new variants, or decreases caused by targeted or global NPIs) on predicted numbers of cases and deaths are shown in the app via similar figures to those used to display the case and death forecasts (Fig. 7).
Options for exploring the impact of changes in transmission include combinations of: Fig. 5 Visualisation of COVID-19 death forecasts for a region of Czechia in the RShiny app.A Map of forecasted 14-day incidence of deaths per 100,000 inhabitants from 9th to 23rd January 2023 at NUTS-3 level.B Age-stratified 28-day-ahead death forecasts for the region outlined in red in (A).Since the prediction date is 4 weeks in the past, the observed number of deaths is also plotted to allow assessment of the accuracy of the forecasts.The closeness of the median forecast to the observed data and the fact that the 95% prediction intervals cover nearly all of the observed data points indicate the good predictive performance of the model for this region.Time series forecast plots for other regions can be viewed in the app by clicking on those regions in the map.Together, panels (A) and (B) can be used to identify which regions appear to be at risk of higher burden Throughout all three countries, removing importations has very little impact on the transmission dynamics in the case forecasts.Local transmission is sufficient to maintain transmission without new cases being added through the endemic component.On the other hand, the moderate and large changes in transmission through NPIs or changes in behaviour have a large impact on the forecasted number of cases and deaths.

Discussion
We have developed a framework to forecast subnational COVID-19 case and death incidence up to 4 weeks ahead, and explore the potential impact of changes in transmission on reported incidence.The framework has been applied to France, Czechia and Italy.The model outputs are based on routinely collected, publicly available surveillance data.We have also developed a RShiny app, where users can visualise the 4-week-ahead forecasts of both reported cases and deaths, and the predicted impact of changes in transmission.The code we developed to implement the model and the RShiny App is publicly available in two Github repositories: https:// github.com/ EU-ECDC/ RShin yCovi dApp and https:// github.com/ EU-ECDC/ The main aim of the framework developed here was to evaluate the local risk of transmission in the presence of complex immunity patterns.As more data sources became available, we selected the covariates that were associated with changes in level of immunity (recent incidence, vaccine coverage), or with changes in behaviour and transmission risk (variant, testing).As outbreaks unfold, the inclusion of new covariates during real-time forecasting can stem from new datasets becoming available (e.g.testing data, number of additional vaccine doses), or changes in transmission patterns, such as the emergence of new variants.Since such changes may take several weeks to identify and include in the model, the scenario simulation may highlight that recent dynamics would be in line with changes in transmission.
Case and death forecasts aggregated at a country level perform comparably with the European COVID-19 Forecast Hub ensemble model, a benchmark in epidemic forecasting built as an ensemble of forecasts from many independent models.In addition, our framework provides NUTS-3-region-level (i.e.much more highly spatially resolved) and, if age-stratified data is available, age-stratified forecasts.Therefore, it can be used for more targeted policy making and planning at a local level.Given the significant spatiotemporal heterogeneity in COVID-19 incidence, and changes in age patterns, the detailed local-level visualisation provided by our model is particularly useful to evaluate targeted control measures.This study highlights the need for surveillance systems that gather accurate, timely, age-stratified data, and the value of making such data publicly available to improve understanding and prediction of local transmission and outbreak response planning.
As demonstrated by our calibration analysis, the subnational and age-stratified case and death forecasts are accurate up to 2 weeks ahead: the case PIT histograms are flat for Italy and for most age groups for Czechia and France, while the median country-level forecast is closer to the data than the European COVID-19 Forecast Hub ensemble forecast.However, the forecasts become less reliable beyond a 2-week horizon.This may be due to fundamental predictability limits; the difficulty of forecasting changes in behaviour and/or sudden changes in transmission, e.g.due to the emergence of a new highly transmissible variant, more than a couple of weeks into the future; or other factors that cause the model to be misspecified, and reflects similar findings from other forecasting efforts [7,17,59].If COVID-19 transmission dynamics become similar to those of seasonal influenza, sudden unexpected changes in transmission may become rarer, which would improve the performance of the model.The age-stratified PIT histograms in both Czechia and France show that calibration is especially difficult in younger age groups, where changes in case-finding strategy had a big impact that could not be fully captured by the testing covariates [60,61].
The model was built to provide accurate predictions of case and death incidence at a regional level.Our results show that generating reliable case forecasts several weeks ahead is challenging, even when using age-stratified local data.However, the accuracy of death forecasts decreased less rapidly as the forecast horizon increased.The quality of the case and death forecasts was robust to the large reporting changes observed throughout the fitting period, and improved at the latest calibration dates where incidence was low in all three countries.This may be because local outbreaks in groups at risk in lowincidence settings are easier to forecast, rather than outbreaks where all regions are equally vulnerable.Past two weeks, forecasts generated by the model appear to be overconfident, and underestimate the uncertainty in the potential level of transmissions.The alternative transmission scenarios in the Shiny App therefore help to illustrate the full range of variability in short-term regional transmission dynamics that is possible.The local risk of outbreaks and importations is also represented in the Shiny App, using the local predictors of the Endemic-Epidemic model.The forecasts, scenarios, and risk map constitute a reliable, thorough representation of the local risk of transmission, and thus provide a useful aid for planning control measures.
The model fits and simulations highlight several features of COVID-19 dynamics common to all three countries: the risk of background importation of cases (i.e.new cases that were not linked to recent local infections) is always very strongly associated with the number of inhabitants in the region.Hence, more populated and urban areas are at higher risk of further importations of SARS-CoV-2 infections.On the other hand, the risk of transmission is relatively homogeneous across regions, with no regions where the number of secondary cases expected is less than half the region most at risk (Fig. 6B).This could change in future COVID-19 dynamics, where the local level of immunity may be sufficient to completely avoid transmission in certain regions, as is currently observed for pathogens such as measles [34].The vast majority of cases stem from the epidemic component of the Endemic-Epidemic model, showing that the local transmissibility is sufficient to maintain transmission, even without input from background importations.Therefore, the removal of importations (e.g., by border closures) consistently resulted in a minimal impact on the expected number of case numbers up to four weeks ahead.In contrast, altering the transmission risk within the regions (e.g. via interventions targeted at specific age groups or encompassing the entire population) can substantially change the forecasted case dynamics.Even with strategies targeting a particular age group, we observe indirect effects on incidence in all age groups.We did not observe a strong effect of changes in transmission risk on the four-week-ahead death forecasts.However, we emphasise that we considered a three-week delay between cases and deaths and, hence, the impact of changes in transmission on deaths would be visible for longer-term death forecasts (beyond four weeks ahead).

Including other countries
The framework is currently implemented for France, Czechia and Italy, but other countries can be straightforwardly incorporated, provided the data required for making the subnational daily case and weekly death forecasts (namely NUTS-3-level daily case counts, subnational daily death counts, NUTS-3-/NUTS-2-level daily/weekly vaccinations administered by dose, and NUTS-3-/ NUTS-2-/national-level daily/weekly numbers tested) for those countries is available and up-to-date.Full instructions for incorporating a new country are available in the public GitHub repository for the model code: https:// github.com/ EU-ECDC/ Backe ndCov idApp.

Age stratification
Our comparison of forecasted and observed numbers of cases and deaths over all age groups for France and Czechia indicates that the non-age-stratified model has similar predictive performance in terms of overall cases/deaths as the age-stratified model (Supplementary Figs.6-9), albeit with slightly worse prediction of deaths (Supplementary Fig. 9).We believe the loss in performance observed in death forecasts to be due to changes in the age structure of reported cases from changes in case detection.Indeed, active case finding may lead to milder cases being reported, while interruption of such a strategy would mean that a larger proportion of the cases are severe.Such changes are likely to be reflected in the age structure of the cases and such information is lost in the non-age-stratified version of the model.
The overall good performance of the non-age-stratified model is encouraging for the application of the model to other countries, very few of which report age-stratified case and death data.While we did not observe a significant improvement by age-stratifying the model, we emphasise that age-stratified data and models are crucial for evaluating the impact of targeted (age-specific) nonpharmaceutical and pharmaceutical interventions, as well as for health-economic analyses (e.g., the computation of DALYs).

Limitations
Our forecasting framework does have some limitations.First, as for all forecasting studies, the forecasts are only as reliable as the input data, and we cannot guarantee the accuracy of the data imported from public data sources, which may have reporting errors or biases that we are not able to account for.In addition, the definition of a COVID-19 death may vary across countries.The framework will only work as long as subnational (and agestratified, for France and Czechia) case and death data continues to be reported online in the same location and format as it is currently, but several countries have ceased reporting subnational case and death data or changed the format or location of their reporting since we started developing the framework.Changes in data availability could potentially be addressed via changing the spatial resolution of the model, but this would involve substantial modification of the model structure.We only forecast reported cases, which reflect a combination of underlying incidence of infections and reporting, rather than the "true" number of cases.Thus, if testing levels are very low, forecasted case numbers could be low even if circulation is high.However, testing level is adjusted for in the fitting of the model, so forecasts reflect the expected incidence given current testing levels, and still provide a useful indication of changes in transmission.The model for the death forecasts, which uses an estimate of the recent CFR (with uncertainty) to predict deaths, is relatively simple and assumes a constant relationship between changes in case numbers and changes in the CFR over time.However, the calibration analysis shows it provides a relatively straightforward and reliable means of translating case forecasts into death forecasts.
The number of daily cases in each region and age group is drawn from a negative binomial distribution.In this model framework, the overdispersion parameter is constant throughout the fitting period, which implies that the variability around the mean estimate is not affected by the epidemiological situation.This is a strong assumption, and extensions accounting for time-varying overdispersion, or association between overdispersion and covariates would improve the Endemic-Epidemic framework.Furthermore, because of the large number of regions and age groups per country, the national-level estimates of the number of cases were computed as a sum of hundreds of negative binomial draws.This leads to very narrow prediction intervals, especially in France, which do not fully capture the uncertainty around the national estimates.
The contact matrices used to fit the model for France and Czechia were taken from pre-pandemic studies, and may not reflect the contact patterns between age groups in 2020 and 2021.Ideally, time-varying matrices would have been used to estimate age-stratified contact, but such matrices were not available or straightforward to implement in the Epidemic-Endemic framework [62].The addition of an age-specific intercept in the epidemic component allowed the model to modify the risk of transmission in each group where the number of contacts was not in line with the case counts.However, since these coefficients were not time-dependent, the model considers this age-stratified risk of new cases to be constant (if all other covariates do not vary).
We do not include information on NPIs or population mobility in the covariates in the model despite the improvements in predictions these may yield, since centralised databases of these covariates (such as Google Mobility data [42], the Oxford Stringency Index [43], and the ECDC-JRC Response Measures Database [2]) are no longer being updated, and binary covariates made the model less consistent and comparable between countries, while potentially not being specific enough (i.e. they incorporated the impact of other parameters associated with transmission).Including more NPI covariates also led to issues with identifying the effects of different interventions as many were implemented at the same or overlapping times.

Generalising beyond COVID-19
The flexible framework developed in this paper could be used or readily adapted to model incidence of both novel and seasonal pathogens of public health importance, such as influenza, in order to predict local health burden and inform outbreak response.The Endemic-Epidemic framework underlying our model has already been applied to a variety of other pathogens including measles, cholera, leishmaniasis and pertussis [34,[63][64][65].However, the number of model parameters that can be estimated directly depends on the amount of data available.Therefore, in its current specification (with a large number of parameters), the model may not be suitable for the early stages of an outbreak (except if the pathogen is seasonal, with available data on previous outbreaks).However, it can be used to estimate the impact of various covariates (from different data sources) on transmission risk.

Fig. 1
Fig. 1 Comparison of one-, two-, three-and four-week-ahead national-level case forecasts (rows, top to bottom) for Italy, Czechia and France (columns, left to right) from our model (Endemic-Epidemic (EE) model) with ensemble forecasts from the European COVID-19 Forecast Hub (ensemble) and observed cases (data) for a calibration period from 29th October 2022 to 22nd April 2023.Dashed lines show median forecasts, shaded regions the 95% prediction interval (97.5th -2.5th percentile) for the forecasts.The gaps in the European COVID-19 Forecast Hub forecasts correspond to periods in which no ensemble forecasts were produced.Prediction intervals for the EE model are calculated from 100 simulations of the model accounting for uncertainty in the fitted parameter values and stochastic variations.The weekly number of cases is shown using a logged axis

Fig. 2
Fig. 2 Probability integral transform histograms showing the calibration of the daily case forecasts from the model for Italy, Czechia and France (columns, left to right) for one-, two-, three-and four-week-ahead forecast horizons (rows, top to bottom) for 29th October 2022 to 22nd April 2023.Uniform histograms indicate well-calibrated forecasts, while U-and inverse U-shaped histograms indicate underdispersed and overdispersed forecasts respectively.Red dashed lines at relative frequencies of 0.5 and 1.5 show reasonable bounds for calibration compared to desired relative frequency of 1 (blue dashed line)

Fig. 3
Fig. 3 Comparison of one-, two-, three-and four-week-ahead death forecasts (rows, top to bottom) for Italy, Czechia and France (columns, left to right) from our model (Endemic-Epidemic (EE) model) with ensemble forecasts from the European COVID-19 Forecast Hub (ensemble) and observed deaths (data) for a calibration period from 29th October 2022 to 22nd April 2023.The gaps in the European COVID-19 Forecasts Hub forecasts correspond to periods in which no ensemble forecasts were produced.Dashed lines show median forecasts, shaded regions the 95% prediction interval for the forecasts

Fig. 4
Fig.4 Probability integral transform histograms showing the calibration of the weekly death forecasts from the model for Italy, Czechia and France (columns, left to right) for one-, two-, three-and four-week-ahead forecast horizons (rows, top to bottom) for 29th October 2022 to 22nd April 2023.Red dashed lines at relative frequencies of 0.5 and 1.5 show reasonable bounds for calibration compared to desired relative frequency of 1 (blue dashed line)

Fig. 6
Fig. 6 Visualisation of spatial heterogeneity in case incidence and risks of transmission and importations in Italy in the RShiny app.A Map of forecasted percentage change in cases in next week compared to the last week of data in Italy.B Map of local risk of transmission (as quantified by the estimated epidemic predictor in the model).C Map of local risk of importation (as quantified by the estimated endemic predictor in the model).All maps are at NUTS-2 level and show values on 7th March 2023

Fig. 7
Fig. 7 Visualisation of impact of different scenarios for changes in transmission (due to changes in behaviour, a new variant, or a change in NPIs) on forecasted cases in France in the RShiny app.Map of forecasted 14-day case incidence from 7th to 21st March 2023 at NUTS-3 level and age-stratified time series plots of national four-week-ahead case forecasts for (A) no change in transmission, and (B) a 40% decrease in transmission among 20-60-year-olds due to NPIs targeted at 20-60 year-olds with a one-week delay to effect w+1,4values of the epidemic predictor ϕ ait , or the endemic pre- dictor ν ait : -Removal of all importations, for instance due to border closure (i.e.ν ait = 0).

Table 1
Case, death, vaccination, test and population data sources used in each country and their spatial and temporal resolution and age stratification

Table 2
Median and 95% interval of weighted interval score (WIS) and squared error of the median of one-, two-, three-and fourweek-ahead national-level case forecasts for Czechia, France and Italy for our model (Endemic-Epidemic, EE) and European COVID-19 Forecast Hub ensemble model (Ensemble) across all time points in the prediction period from 29th October 2022 to 22nd April

Table 3
Median and 95% interval of weighted interval score (WIS) and squared error of the median of one-, two-, three-and fourweek-ahead national-level death forecasts for Czechia, France and Italy for our model (Endemic-Epidemic, EE) and European COVID-19 Forecast Hub ensemble model (Ensemble) across all time points in the prediction period from 29th October 2022 to 22nd April 2023