Age-specific rate of severe and critical SARS-CoV-2 infections estimated with multi-country seroprevalence studies

Background Knowing the age-specific rates at which individuals infected with SARS-CoV-2 develop severe and critical disease is essential for designing public policy, for infectious disease modeling, and for individual risk evaluation. Methods In this study, we present the first estimates of these rates using multi-country serology studies, and public data on hospital admissions and mortality from early to mid-2020. We combine these under a Bayesian framework that accounts for the high heterogeneity between data sources and their respective uncertainties. We also validate our results using an indirect method based on infection fatality rates and hospital mortality data. Results Our results show that the risk of severe and critical disease increases exponentially with age, but much less steeply than the risk of fatal illness. We also show that our results are consistent across several robustness checks. Conclusion A complete evaluation of the risks of SARS-CoV-2 for health must take non-fatal disease outcomes into account, particularly in young populations where they can be 2 orders of magnitude more frequent than deaths. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07262-0.


Background
The SARS-CoV-2 pandemic had impacts of historic proportion in both public health and society. Remarkably, there is considerable uncertainty regarding the full spectrum of health effects of SARS-CoV-2 infection. On the one hand, some of the effects of SARS-CoV-2 are relatively well understood, such as the infection fatality rate (IFR) and its dependence on age. Three different metaanalyses now exist that estimate the age-stratified IFR of SARS-CoV-2 using multi-country seroprevalence studies [1][2][3]. These studies document an exponential increase of the IFR with age and show considerable agreement on their estimated IFRs by age-stratum. On the other hand, the rate of less extreme infection outcomes, and their dependence on age, remains uncertain despite being similarly important for public health. Examples of this are the rate of severe infections (Infection-severe rate, ISR), which we define as infections resulting in hospitalization or out-of-hospital death, and the rate of critical infections (Infection-critical rate, ICR), which we define as infections resulting in admission to intensive care unit (ICU) or out-of-ICU death.
But despite their relevance to analyzing the development of the pandemic and for future planning, estimates of ISR and ICR for these ages using multi-country data are still missing from the literature (see estimates of the infection-hospitalization rate, for France [4,5], for Denmark [6], for Indiana, USA [7], for Connecticut, USA [8], for Qatar [9], and a model-based analysis with early non-serological pandemic data [10]). To fill this gap, we present a meta-analysis of the age-stratified rates of severe and critical disease of SARS-CoV-2 across several locations, combining seroprevalence studies from early to mid 2020 with public data on the numbers of agestratified hospitalizations, ICU admissions, and deaths.

Results
We analyzed locations with seroprevalence studies that were either listed in the meta-analysis of Levin et al. [1], to which we refer for further details, or the studies providing their own age-stratified rates of infection-hospitalization rate. We included in the analysis 15 locations with serosurveys (11 using representative samples and 4 using convenience samples), and 2 locations with comprehensive testing and contact tracing (see methods section M1 for a description of this classification of locations). Together, these locations represent 5% of the world's population. Furthermore, to account for out-of hospital and out-of ICU deaths, which are common among the elderly, we computed the number of severe cases as the number of hospitalizations plus out-of-hospital deaths, and the number of critical cases as the number of ICU admissions plus out-of-ICU deaths.
The estimated probability of severe, critical, and fatal disease outcomes (ISR, ICR, and IFR, respectively) are shown for each age and location as colored points in Fig. 1, using a log-transformed vertical axis. As expected from the reports in previous analyses of IFR [1][2][3] and of the infection-hospitalization ratio [4], the three outcome ratios show an approximately exponential increase in risk with respect to age, which becomes a homogeneous linear effect on the log-scale. Thus, for each outcome rate (ISR, ICR, and IFR) we fitted Bayesian logistic regression models with a linear age effect on the logit scale (this effect becomes non-linear in the risk scale). We used logistic regression because it is a commonly used model for disease outcomes that approximate the log-linear rate-age patterns observed in our study. The logistic regression model had an intercept and age-slope shared across locations, plus location-specific random effects on both the slope and the intercept of the regression to account for the heterogeneity between locations. We also accounted for the uncertainty of the seroprevalence estimates (through the specification of the prior distribution), and the sampling (binomial) variability of the observed outcomes.
The logistic regression models fitted the data well for the three outcome rates (Fig. 1, although see the discussion of section S1 in Additional file 1 about a possible deviation from the trend for the youngest ages), and the patterns observed were similar across locations. Importantly, the slope of IFR with respect to age (0.133, 95% credibility interval: [0.123-0.143]) was higher than the slope of ICR (0.099, [0.089-0.108]) and of ISR (0.076, [0.067-0.083]), indicating that the risks of severe and critical disease are more evenly distributed across Fig. 1 Rates of severe and critical SARS-Cov-2 outcomes (ISR and ICR, respectively) and death rates (IFR) estimated with seroprevalence data from 2020. The colored points show the proportion of individuals infected with SARS-CoV-2 that develop severe disease (left), critical disease (center), or fatal disease (right) (in logarithmic scale) for each location and age-stratum used in our analysis. Color indicates whether the number of infections were obtained from a representative serosurvey, a convenience serosurvey, or from comprehensive testing corrected for under-ascertainment. Data points coming from a given location are joined by colored lines. The black line shows the outcome rate estimated using a hierarchical Bayesian logistic regression model, and the shaded regions show the 95% credibility intervals. We used 105 data points from 16 locations for the estimation of ISR, 78 data points from 11 locations for ICR, and 119 data points from 17 locations for IFR  Table S1 for finer age stratification, and Table S2 for the estimated model parameters in Additional file 1). Also, we verified that our estimates are robust to the correction for out-ofhospital and out-of-ICU deaths (Additional file 1: Figs. S1, S2), the ages used to fit the model (Additional file 1: Fig. S3), the method of estimating SARS-CoV-2 infections (Additional file 1: Fig. S4), the date of outcome data collection (Additional file 1: Figs. S5, S6), and the delay between the epidemic wave and the seroprevalence study (Additional file 1: Fig. S7, to control for seroreversion). More details about these comprehensive robustness analyses are provided in the Additional file 1.
Next, we validated our estimates by estimating the ISR and ICR of SARS-CoV-2 indirectly using a novel ratio-of-ratios approach. We start from the age-specific IFR reported in the three different meta-analyses [1][2][3], which were not used in the analysis of Fig. 1. Because the IFR is the expected ratio between deaths and infections, we can estimate the ISR as the ratio IFR/SFR, where SFR is the ratio between deaths and severe infections (severe fatality rate). We approximated the agespecific SFR by fitting a Bayesian logistic regression model to published data of COVID-19 hospital mortality ( Fig. 2A, data sources listed in Table 4), which is the ratio between in-hospital deaths and hospitalizations. The approximation of SFR by hospital mortality assumes that all deaths occur in hospitals, which is expected to hold well for all but the oldest age bins (see Additional file 1: Section S2, Fig. S2). Then, we estimated the age-specific ISR by taking the ratio between the IFRs and the SFRs. We applied the same procedure to estimate ICR, using the ICU mortality of COVID-19 patients. The values of the parameters obtained by fitting the model to hospital and ICU mortality are shown in Additional file 1: Table S3, and the age-specific estimates are shown in Additional file 1: Table S4.
We note that we used the same hospital and ICU mortality data for this analysis and for correcting for out-ofhospital and out-of-ICU deaths in Fig. 1. Although this means that the two analyses share some data in common, the results of the regression of Fig. 1 are similar when performed on the uncorrected data (Additional file 1: Figs. S1, S2), supporting the use of this validation method.
The estimates from the indirect method are shown by the colored points in Fig. 2B. To aid comparison, we show in black the lines obtained from the fit to serology data of Fig. 1. Firstly, we see that our IFR estimates and the IFR estimates from the three meta-analyses are very similar (Fig. 2B, right). Second, we note that the estimates of ISR and ICR obtained from seroprevalence and disease outcome data with the direct method ( Fig. 1) are in close agreement with the estimates obtained with the indirect method (with the largest differences being with Brazeau et al. [3] for the younger ages; however, this study also reports estimates different than those reported in the other two studies). Table 1 Estimated age-specific frequencies of severe disease (ISR), critical disease (ICR), and fatal disease (IFR) among infected individuals The estimates are obtained from the fits to the serology data from 2020 shown in Fig. 1. Numbers in the parenthesis indicate 95% credibility intervals of the estimates, obtained by taking the 2.5% and 97.5% quantiles of the posterior probability of the bayesian fit a Estimates for the youngest ages may be underestimated by the assumption of a logistic relation between age and severity, see section S1 in Additional file 1 for further discussion and complementary estimates

Conclusions
In conclusion, we present estimates of the rates of severe and critical SARS-CoV-2 infections during the first half of 2020. These are the first estimates based on multicountry seroprevalence data, which we combine in a rigorous way using Bayesian methods, to account for the uncertainty of each study as well as temporal and geographical heterogeneity. We find that while young and middle-aged individuals had low rates of fatal infection, they had much higher rates of severe and critical infection, emphasizing the need to consider these disease outcomes in these populations. The estimates presented here are an important reference of the health impacts of COVID-19 during 2020, as well as an important baseline over which to build more updated estimates, by combining them with estimates of the relative change in risk across locations and time.

Methods
The data sets included in this study come from locations where age-stratified seroprevalence studies have been performed (see M1), plus locations with age-stratified prevalence coming from exhaustive contract tracing (see M2). For each of these locations, we searched for age-stratified data on Hospitalizations and ICU admissions (see M3). We used these two sources of data to estimate the age-stratified rates of severe (ISR) and critical (ICR) SARS-CoV-2 infections for each of the locations. We used this data to fit Bayesian random-effects logistic regression models for each of the outcomes (see M4). We also searched for studies reporting age-stratified mortality for COVID-19 patients admitted to the hospital or to the ICU (see M5). We used this data to fit a Bayesian random-effects logistic regression model to obtain the agespecific hospital and ICU mortality for COVID-19. This regression was then combined with estimates of age-specific IFR extracted from the literature, to estimate the ISR and ICR through a ratio-of-ratios method (see M6). The regression of hospital and ICU mortality was also used to correct the hospital and ICU data described in M3 for out-of-hospital and out-of-ICU deaths (see M7). All data and code are available online (see M8).

(M1) Data from seroprevalence studies
We used a curated list of seroprevalence studies released prior to 18 September 2020 that is presented in Levin [1]-a systematic review and a meta-analysis. The list is restricted to developed countries; we refer the reader to Levin et al. [1] for an exhaustive list of other existing studies, and the criteria used for excluding seroprevalence studies from their final analysis. The locations used by Levin et al. [1] can be divided into three groups: those with representative seroprevalence studies, those with convenience seroprevalence studies, and those with comprehensive testing and tracing. Representative seroprevalence studies refer to those in which the population included in the serosurvey aims to be a representative sample of the population. Convenience seroprevalence studies are those that perform the serosurvey over samples that are conveniently available but not necessarily representative of the population, like blood donor samples. Locations with comprehensive testing and tracing are defined by Levin et al. [1] as locations that up to the date of interest, had over 300 tests performed for each detected case (in these locations, we corrected for underascertainment following Levin et al. [1]). We then searched for age-stratified hospitalization and ICU data to match the representative seroprevalence studies listed in Additional file 1: Appendix Tables I1  and I3, and the convenience seroprevalence studies listed in Appendix Table I2 from Levin et al. [1]. From the 11 representative seroprevalence studies included in those lists, we were able to find age-stratified hospitalization or ICU data for 7 locations (England; France; Ireland; Netherlands; Spain; Atlanta, USA; Geneva, Switzerland) and we failed to find such data for the 4 remaining locations (Italy; Portugal; Indiana, USA; Salt Lake City, USA). From the 4 convenience seroprevalence studies listed, we were able to find hospitalization or ICU data for all three locations (Ontario, Canada; Sweden; Belgium; New York, USA).
In addition to the seroprevalence studies used in Levin et al. [1], we included the seroprevalence study carried out in Iceland up to April 4 2020, which reports the results of a representative sample of the population [46], and the representative seroprevalence studies from Indiana, Connecticut and Denmark, which were used to estimate the ISRs in the existing literature [6][7][8].
Furthermore, in three cases we also changed the use of some seroprevalence studies with respect to Levin et al. to match them to the available hospitalization and ICU data. The first case is the seroprevalence study from France, which offers data by region. We were only able to find the age-stratified hospitalization data for the region of Île-de-France; therefore, we only used seroprevalence data from this region. The second case is the New York seroprevalence study, where we could only find hospitalization data for New York City but not for New York State; thus, we only used the seroprevalence for New York City.
For Ontario, Canada, we could only find age-stratified hospitalization and ICU data up to July 31 2020, and so we used the seroprevalence report for this date, which is different from the report date used by Levin et al. [1]. Table 2 summarizes the final list of seroprevalence studies included in our analysis.

Matching age-bins
The age-bins reported by each of the studies did not always match the age-bins in the corresponding hospitalization and ICU reports. Therefore, in some cases we extrapolated or interpolated the seroprevalence estimates obtained for a given age-bin into a different age-bin. For example, for New York City, seroprevalence was reported for the 18-34 years old age range, but hospitalization data was reported for the 18-44 year old age range. Therefore, to make use of this hospitalization data, we assumed that the proportion of seropositive individuals in the 18-44 years old range is the same as the proportion for the 18-34 year old age range. All such decisions were contrasted with other available data, and agreed upon by the two authors. Furthermore, these assumptions are all documented in the publicly available analysis code.

Correcting for test characteristics
The positive rate of a test depends on disease prevalence and on the test characteristics. Most of the seroprevalence estimates used were already corrected for test characteristics. For the results that were not corrected

(M3) Hospitalizations, ICU admissions, and deaths data
We obtained the age-stratified hospitalizations, ICU, and death data in relevant government websites of the locations, using google search, and looking for relevant region-wide studies. We selected the data reports that were closest to the end of the serosurvey date. The list of data sources and the end dates for their cumulative outcome numbers are shown in Table 3. Also, as described above for the serology data, in some cases we interpolated or extrapolated some data for these disease outcomes, or we combined different data sources with incomplete data (e.g., age-distribution of an outcome from one source, with the total count of the outcome from another source) to obtain the data for these outcomes with the appropriate age bins. For example, for Belgium, we were only able to find the unstratified number of cumulative ICU admissions at the desired date of May 8th, 2020, but we were able to find the age distribution for cumulative ICU admissions up to June 14th, 2020. Therefore, we distributed the cumulative ICU admissions of May 8th across age strata, following the distribution from June 14th. As mentioned above, all these decisions were agreed upon by the authors, and they are thoroughly documented in the publicly available analysis code.

(M4) Estimation of outcome probabilities with serology and outcome data
We fitted Bayesian logistic regression models to the serology and outcome data. We describe the model for severe SARS-CoV-2 outcome; the same model was fitted to severe, critical, and fatal disease outcomes. prevalence = testpositiverate + specificity − 1 sensitivity + specificity − 1 .
Let {y la , x la } represent the number of severe SARS-Cov-2 infections (y la ) experienced among x la individuals infected with SARS-CoV-2, at the location l (l = 1, …, L), for the age stratum a (a = 1, …, A l ) of the l th location. The ISR for this location-stratum is defined as θ R,la = E[ y la x la ].

Modeling the number of SARS-Cov-2 cases from seroprevalence data
The selected seroprevalence studies provide age-stratified estimates (and SE) of disease prevalence. Rather than assuming that prevalence was known with complete certainty, we used the reported point estimates and SE to specify a Beta prior for prevalence for each location. Specifically, we used the reported prevalence and its SE to estimate (through first and second moments matching) the shape parameters of the Beta distribution used for each location. Then, prevalence was modeled as θ P,la ∼ Beta(α 1,la , α 2,la ) , where α 1,la and α 2,la are locationage-stratum specific shape parameters. Then, the number of cases was defined as x la = N la × θ P,la , where N la is the size of the population at location l and age stratum a.

Modeling severity rates using random-effects logistic regression
Infection-severity rates were modeled using a logit of the form.
Therefore, θ R,la = e η la 1+e η la where η la = [µ + u l ] + [β + b l ] × age la . Above, µ and β are the average intercept and slopes across locations, and u l and b l are location-specific random effects on the intercept and the slope, respectively, with prior distribution . The shared intercept ( µ ) and regression coefficient ( β ) were assigned flat priors, and the standard deviations for the random effects, σ u and σ b were assigned gamma priors with shape and rate parameters equal to 4.
For each age stratum a at location l, the value of age la used for fitting corresponded to the median age of the stratum. For age strata with an open upper bound (e.g. 70 + age), we used 90 years as the upper bound of the stratum. log θ R,la 1 − θ R,la = [µ + u l ] + [β + b l ] × age la .  The posterior distribution of the model described above does not have a closed form; therefore, we used Monte Carlo Markov Chain (MCMC) methods to generate samples from the posterior distribution for all the model unknowns {µ, β, x, u, b, σ 2 µ , σ 2 b } . We used 4 chains with 2500 iterations each. A script that implements the above model in Stan [64] is available in the online code.

Prediction of outcome rates
We used the samples of the posterior distribution to generate posterior samples for the infection severity rates for specific ages using the inverse-logit function: θ R (age) s = e µs+age×βs 1+e µs+age×βs , where s is an index for the sample from the posterior distribution. We then used these samples to estimate the posterior means and posterior credibility regions reported in Figs. 1 and 2. We report the severity rates for age intervals by estimating the rate of the mean age of the interval.
The predicted outcome rates obtained from the model fit are shown in Additional file 1: Table S1, and the mean and credible intervals for the main model parameters are shown in Additional file 1: Table S2.

(M5) Hospital and ICU mortality data
Our robustness analysis was based on an indirect estimator (a ratio-of-ratios) of ISR and ICR. To derive this estimator, we used mortality data from hospitalized and ICU SARS-Cov-2 patients. We searched in the literature for reports on age-stratified mortality of patients admitted to the hospital or the ICU with a COVID-19 diagnosis. We also used the data sources from Table 3 that provided mortality numbers for hospitalized or ICU patients. We identified 8 ICU mortality reports and 8 mortality hospital reports with age-stratified data, which were either published studies in the literature or public reports from official organisations. The reports used are listed in Table 4.

(M6) Indirect estimation of ISR and ICR using IFR and hospital mortality data
To validate the estimates obtained with the data and methods described above, we used an alternative source of data and a different estimation method to obtain agespecific ISR and ICR. Specifically, we combined age-specific reports of IFRs from the literature with the hospital and ICU mortality data listed in Table 4 to obtain the ISR and ICR using a ratio-of-ratios method, as explained below.
Let IFR sa be the expected ratio between deaths and infections estimated in a study s (s = 1, …, S) for age stratum a (a = 1, …, A s ) and let SFR a be the expected ratio between deaths and severe COVID-19 cases for age stratum a (a = 1, …, A s ). Then, we have that the estimated ISR for age stratum a estimated from study s is ISR sa = IFR sa SFR a . Thus, by estimating the values of SFR for different ages, we can use age-specific IFR values reported in the literature to obtain estimates of age-specific ISR.
To approximate the age-specific SFR, we fitted a Bayesian logistic regression to age-stratified hospital death for COVID-19 patients. Let {d la , h la } represent the number of deaths (d la ) among h la individuals hospitalized with COVID-19 for the age stratum a (a = 1, …, A l ) in location l. The hospital mortality for this location-stratum is defined as θ HM,la = E d la h la . To estimate θ HM (age) , we used Bayesian randomeffects logistic regressions, like the one described in section M4, to the hospital death data. The only difference with the procedure in M4 is that, in this case, the denominators h la were known, and thus we directly used these fixed h la values (unlike the x la from M4, for which The date up to which the cumulative numbers for these outcomes were reported are shown in the second column a distribution over possible values was obtained using seroprevalence estimates). We use θ HM (a) as our estimate of SFR a . These two quantities are equal if we assume that all deaths occur in the hospital (note that our definition of severe case, stated in the main text, is a case that results in either hospital admission or out-of-hospital death). As discussed in section S3 and shown in Additional file 1: Figs. S1, S2, out-of-hospital deaths make only a very small fraction of severe cases for all but the oldest age-strata. Also, we find that out-of-hospital deaths make up a minority of the deaths for all but the oldest ages (analysis not shown).
Then, to account for the uncertainty of the IFR sa estimates in our estimations, we fitted a Beta distribution to the mean and credible interval of each IFR sa through moment matching, to obtain IFR sa ∼ Beta(α 1,sa , α 2,sa ) (for Brazeau et. al. (2020) we only used the point estimates since credible intervals on the mean estimates are not reported).
Finally, we estimated ISR sa = IFR sa SFR a by generating samples from the posterior distribution of each SFR a (obtained from the Bayesian logistic regression model) and from the Beta distribution fitted for each IFR sa . In total, we generated 50.000 samples of this ratio for each ISR sa .
The same procedure was applied to estimate the ICR sa , by fitting the model to ICU death data. The estimated hospital and ICU mortality rates obtained from these models are shown in Additional file 1: Table S3, and the parameters obtained from fitting the model are shown in Additional file 1: Table S4.

(M7) Correction for out-of-hospital and out-of-ICU deaths
Some COVID-19 deaths occur outside of the ICU, or outside of the hospital. This happens when the patient prognosis is poor, such as in elderly and frail patients, and it may be accentuated when health systems are operating at high occupancy. This phenomenon is particularly notable in our data for some locations and ages, where the number of reported deaths is larger than the number of reported ICU admissions (in some cases by more than one order of magnitude).
Our definitions of severe and critical COVID-19 outcomes include these out-of-hospital and out-of-ICU deaths, besides hospitalizations and ICU admissions. Therefore, we obtained the number of severe cases by adding to hospitalizations the number of out-of-hospital deaths. Likewise, we obtained the number of critical cases by adding to the number of ICU patients the number of out-of-ICU deaths. For some locations, we could obtain data on the out-of-hospital and out-of-ICU deaths, but for other locations this data was absent, and so we estimated it using the death data.
Let y la be the cumulative number of hospitalizations for a location l and age stratum a, for which no out-of-hospital death data is available. Also, let m tot la be the total number of deaths reported for this location and age stratum. First, we obtained the expected number of in-hospitaldeaths, m h la , by combining the number of hospitalizations with the expected hospital mortality for this age, θ HM (a) (fitted as described in section M6), m h la = y la × θ HM (a) . Then, we obtain the expected number of out-of-hospital m ooh la deaths by subtracting from the total number of deaths, m ooh la = m tot la − m h la (setting m ooh la to 0 if the result is negative).
The same procedure is performed for the ICU data to obtain the number of critical cases.
See sections Additional file 1: Supplementary S3 and Figs. S1, S2 for an analysis showing the effect of this correction method on the data, and the robustness of the results to the removal of this correction.

Discussion
In this work, we present the first estimates of ISR and ICR of SARS-CoV-2 obtained through a meta-analysis of serology studies from early to mid 2020. Our estimates show that, like the IFR, the ISR and ICR increase exponentially with age; however, the rate of increase in the risk of severe and critical disease outcomes with age is smaller than the rate of increase in lethality, which is in agreement with previous studies [5][6][7][8][9][10][11]. However, previous studies show considerable variability, probably due to the uncertainty in serology estimates, differences in local reporting protocols, and geographical variability in the impacts of COVID-19. Thus, this analysis presents the most up to date estimation and comparison of these rates, summarizing the best available evidence of several locations with a Bayesian approach. Our simple Bayesian regression analysis that takes into account several sources of uncertainty and this novel ratio-of-ratio methods constitute two complementary methods that may aid future work on estimating these parameters under the changing nature of the pandemic. For example, these methods can be extended so as to estimate the outcome rates for the new variants of SARS-CoV-2.
Furthermore, we provide extensive validation of our estimates (see Additional file 1). First, we performed several robustness analyses, controlling for various potential sources of bias in estimates. In Additional file 1: Fig. S3 we show that despite adverse outcomes concentrating in the older ages (giving them more statistical weight), the estimates for younger ages are robust to excluding older ages from the regression. In Additional file 1: Fig. S4 we show that our estimates are also robust to excluding the locations where prevalence was estimated from non-representative samples, which have increased risk of bias. In Additional file 1: Fig. S6, we show that our estimates are robust to excluding the locations with the fastest changing epidemics at the time of data collection, and are thus robust to the choice of dates for outcome data collection.
Then, in Additional file 1: Fig. S7 we show that our results are robust to excluding the locations with the longest delays between epidemic wave and seroprevalence study, which are the most susceptible to seroreversion, and thus our estimates are not strongly affected by seroreversion. Finally, besides these comprehensive robustness analyses, a highlight of our study is the validation of our results with an independent estimation method, based on the ratio-of-ratios approach (Fig. 2).
Our results are highly relevant for aspects of COVID-19 modeling, such as estimating the number of unreported infections from hospital and ICU data, allowing to better estimate the present levels of natural immunity [4,12,13]; prediction of the effects of public-health policies implemented along the pandemic [14]; evaluating policy decisions such as vaccine allocation [15,16]; the prediction of health outcomes in countries with high or low vaccination rates that account for the age-distribution of each country [17,18]. Particularly, our estimates are important for analyzing the risk of COVID-19 for younger populations. These populations have very low risks of death, but as seen in our estimates, the risk of severe disease can be 2 orders of magnitude larger than the risk of death, and thus severe and critical outcomes are essential to properly characterize the risk of these populations. One illustrative example is the discussion around vaccination of young individuals against COVID-19. The FDA estimates that the rate of mRNA-vaccineinduced myocarditis, a side effect which is mild in some cases but which can result in severe outcomes, is of 1/5000 for males in ages between 16-17, the population at highest risk [19] (in line with reports from other locations such as Israel [20]). Although doing a risk-benefit analysis for adolescent vaccination is very complicated, and outside of the scope of this work, it is notable that while our estimated IFR is 0.12 [0.07-0.20] times the rate of vaccine induced myocarditis (i.e. 8 times smaller), our estimated ISR is 12 [7][8][9][10][11][12][13][14][15][16][17][18][19] times larger this rate. Thus, although such a direct comparison has many limitations, and should not be taken as a risk-benefit analysis, it shows how radically the conclusions of risk evaluation for young individuals depends on the disease outcome being considered.
Importantly, we note that the dynamic nature of the COVID-19 pandemic makes any estimates of outcome rates transient, since those rates are expected to change in space and time as new variants emerge and social behavior and medical practices change. As such, generalization of our estimates across time and space requires caution. For example, substantial drops in hospital and ICU mortality between the first and second waves have been reported in developed countries [21][22][23][24][25][26], partly due to improvements in care, although this is accompanied by considerable geographical heterogeneity [27]. On the other hand, the emergence of variants of concern was associated with increased rates of hospital mortality [28][29][30] and severe, critical and fatal cases [31][32][33][34][35][36][37] for early variants (Alpha, Gamma, Delta), and with a decrease in disease severity for the posterior variant Omicron [38]. Recent changes in disease severity are also due to the introduction of effective vaccines, with reductions in the rates of hospitalization and death of over 90% for the BNT162b2 vaccine [39]. More recently, oral antiviral medications have become available that can reduce the risk of hospitalization from COVID-19 close to 90% in patients at high risk of developing severe COVID-19 [40]. However, the effects of all of these changes on disease severity have been estimated in relative terms, rather than in absolute changes in risk. The estimates of severe and critical disease that we provide in this work can thus serve as the baseline to estimate the absolute risks of COVID-19 after such changes (e.g. the risk of severe disease for vaccinated individuals, or for unvaccinated individuals in the presence of Omicron). One particularly important problem where we may wish to calculate such absolute risks of COVID-19 severe and critical disease is to anticipate the effects of variants with potential for immune escape [41], or the effects of waning of vaccine effectiveness [42].
Finally, we note that the current work has some limitations that should be considered. One limitation is that our estimates rely on publicly available data on the number of hospitalizations, ICU admissions, and deaths, which may not match exactly the real number of severe and critical cases. Hospitalizations can underestimate the number of severe cases if a health system is overwhelmed, and some severe cases are not admitted or are not reported properly. On the other hand, the number of COVID-19-related hospitalizations may overestimate the number of severe cases if some people are admitted without severe COVID-19 infection, for example due to an abundance of caution in low-occupancy situations, or due to an incidental positive test at the time of admission in high-prevalence situations. Another limitation stems from the fact that the protocols for testing, admitting, and treating patients can vary between locations, and may depend on the strain of the healthcare system. For example, a large COVID-19 wave may induce a higher rate of severe disease, due to limited treatment capacity, and conversely to a smaller rate of hospital admissions due to limited resources. The underlying health of each population, and quality of care will also determine the outcome rates as specific locations. For example, race and socioeconomic status have been reported as risk factors for critical COVID-19 [43,44], and striking differences in COVID-19 IFR between developed and developing countries have been reported [45]. In line with this, our statistical analysis shows that there is significant variability in the adverse outcome rates in between locations (in our models this is captured by the standard deviation of the intercept and slope of the log-ISR-and the log-ICRage regression, see Additional file 1: Table S2). Thus, although we rigorously take variability between countries into account in our estimates, caution is required when extrapolating our estimates of the average ISR and ICR to specific locations.