Estimation of the Reproduction Number of Inﬂuenza A(H1N1)pdm2009 in South Korea Using Heterogeneous Models

Background: The reproduction number is one of the most crucial parameters in determining disease dynamics, providing a summary measure of the transmission potential. However, estimating this value is particularly challenging owing to the characteristics of epidemic data, including non-reproducibility and incompleteness. Results: In this study, we propose mathematical models with diﬀerent population structures; each of these models can produce data on the number of cases of the inﬂuenza A(H1N1)pdm09 epidemic in South Korea. These structured models incorporating the heterogeneity of age and region are used to estimate the reproduction numbers at various terminal times. Subsequently, the age- and region-speciﬁc reproduction numbers are also computed to analyze the diﬀerences illustrated in the incidence data. Conclusions: Numerical results reveal that the introduction of heterogeneity into the population to represent the general characteristics of dynamics is essential for the robust estimation of parameters.


Background
The reproduction number is defined as the average number of secondary cases generated by a typical primary case. It is a measure of the transmission potential associated with the contact rate, duration of infectivity, and probability of transmission per contact. The maximum reproduction number is attained when an infectious person is introduced into a totally susceptible population and is called the basic likelihood estimations have been used to analyze this number [1][2][3][4].
When an infection spreads throughout a population, the time-dependent effective reproduction number, R t , is often more useful for assessing the transmission potential throughout a pandemic, especially during the period with the highest level of activity. Real-time estimation continues to track the number of secondary infections caused by a single infective, providing a quantitative measure of the time evolution of the epidemic force. Cruz-Pacheco et al. demonstrated the manner in which sanitary measures reduce the prevalence of an infected population [1]. Estimates of the reproduction number were shown to decrease from 1.4-1.5 initially to 1.1-1.2 later in the summer, which was most likely because of the vacation period and the seasonality of influenza transmission [5]. In addition to capturing temporal dynamics, it is important to consider heterogeneous patterns of the transmission.
It is well known that school-age children are disproportionately responsible for influenza transmissions. Estimates of the age-specific reproduction number help with our understanding of the role of each group in the transmission dynamics and with devising effective targeting mitigation strategies.
If all incident cases could be traced back to their index cases, estimating the reproduction number would simply be a matter of counting the number of secondary cases. However, with most epidemics, only the epidemic curve is observed, and there is no available information regarding who infected whom. To appropriately estimate the reproduction number from the influenza outbreak data, it is essential that the selected model capture the underlying dynamics embedded in the data.
The objective of this study is to estimate the reproduction numbers based on the incidence data.
When the World Health Organization announced the emergence of influenza A(H1N1)pdm09 (pH1N1) in 2009 [6], the first probable patient in South Korea was identified on April 28. A total of 763,759 confirmed cases, of which 270 were fatal, were reported by the end of August 2010 [7]. During the initial epidemic phase, the main control measure was containment through quarantine and isolation. Surveillance programs in schools and medical facilities were implemented, and all confirmed cases were investigated. However, when community outbreaks were  Figure 1 shows the temporal incidence distribution of pH1N1 in South Korea. The amount of antiviral agents prescribed and the number of incident patients soared from mid-October, reached its peak at the end of October, and started declining in mid-November. Demographic and regional characteristics are illustrated in Table 1 and Figure   2. The incidence rate is higher in children and students than in other age group individuals ( Table 1). The rate is higher in urban areas than in rural areas and is the highest in the national capital and the south-eastern region ( Figure 2).
In this paper, two different structured models are proposed to estimate reproduction numbers on the basis of the epidemic curve. We begin by introducing a basic SIR model to describe a single outbreak and build age-and region-structured models by incorporating population heterogeneity. Numerical simulations are conducted to analyze the impact of terminal time and the effect of heterogeneous structures on the estimation of parameters. Finally, the proposed models are applied to the 2009 incidence data of novel pH1N1 in South Korea to compute the age-and regionspecific reproduction numbers.

Basic SIR model
We consider the standard SIR model to represent single-outbreak influenza dynamics. The model classifies individuals into three key compartments: susceptible, infected, and recovered. The nonlinear system of differential equations describing S (t) = −βS(t)I(t)

Age-structured model
An age-structured model is employed to estimate the reproduction number of pH1N1 because the transmission rate is higher in preschool and schoolchildren than in other age group individuals, in general. We consider a subgroup SIR model where the population is divided into n a age groups with different transmission dynamics.
We denote the number of susceptible and infected individuals within the i th age group by S i and I i , respectively. Let β ij refer to the transmission from the j th age group to the i th age group, and β= [β ij ] denote the transmission matrix, also known as Who-Acquires-Infection-From-Whom matrix. Putting these elements together, we have the following system of differential equations.
In a general structured model of the form (1) with n a distinct classes, n 2 a transmission terms are required. However, one transmission term is available at most for each class. The typical way to address this lack of specificity is to constrain the structure of the transmission matrix and/or to use prior knowledge of social mixing behavior. For an age-structured model, we assume that the transmission rates are proportional to the rates of social contact, which can be estimated from contact patterns. A large multi-country population-based survey conducted in Europe as a part of the POLYMOD [9] enables us to implement this approach. The transmission is modeled as the product of the contact rate in the survey and an age-specific proportionality factor to account for characteristics related to susceptibility and infectiousness, which are not captured by contact rates. This leads to where c ij is the contact rate and q i and σ are proportionality factors.
Based on the age-structured SIR model (1), the reproduction number can be calculated by following Driessche and Watmough [10]. It is the spectral radius of the next generation matrix M where The details are given in Appendix.

Region-structured model
The second mechanism incorporates a heterogeneous population based on regions to account for the wave of the pH1N1 pandemic. We denote the number of susceptible and infected individuals within the i th region by S i and I i , respectively. Let β ij refer to the transmission from the j th subgroup to the i th subgroup and β= [β ij ] denote the transmission matrix. In the same manner as the age-structured model, we have the following system of differential equations. We assume that transmission rates between distinct regions in the regionstructured model can be expressed as the frequency of transportations multiplied by a region-specific proportionality factor. The transportation information was extracted from the highway portal site and Kakao map for number of buses and highway traffic, respectively [11,12]. Let the number of buses and highway traffic from region j to region i be denoted by w i,j and W i,j , respectively. Note that w is symmetric because the bus route is circular, although W is not necessarily. The transmission rate can be written as where q i is the proportionality factor, and σ l are σ g can be chosen such that they balance the weight between different types of transportations.
The same argument as that presented in the age-structured model gives the expression of the effective reproduction number R t , which is the spectral radius of the following next generation matrix

Study subjects and parameter estimation
Study subjects were patients who were prescribed antiviral agents from the national stockpile from August 21, 2009 to April 30, 2010. Because of mandatory antiviral agent management program during study period, all patients who were prescribed antiviral agent were included in this study. The data employed to estimate the parameters are the daily number of incident patients in figure 1. It was estimated based on the aggregation of prescribed antiviral agents from deidentified database.
This study was approved by the Institutional Review Board (IRB) of Yonsei University Health System. Since this study used retrospective data and the study subjects were anonymized, the IRB waived the requirement for written consent from the patients.
Our goal is to estimate the optimal parameters that provide the states that are best fit to the given data. This section briefly reviews the parameter estimation technique of the least squares method. In general, parameter estimation is conducted by minimizing the cost function, which measures the difference between the model prediction and observation. Let θ be the parameter set and time points t j (j = 1, . . . , N ) are uniformly distributed with daily time step. The data vector y j (j = 1, . . . , N ) denotes the number of cases at time t j . It is a scalar for basic SIR model, but it is a vector structured by age and region for age-stratified and regionstratified models, respectively. For example, the y j is a column vector of length 15 for the age-structured model. We recast the mathematical model as and assume a statistical model for measurement of the form where f (t j ; θ) is the model prediction at time t j with parameter θ and the measurement error ε j ∼ N (0, a 2 ). The least squares estimator can be obtained by minimizing the following cost function over the given parameter space Ω θ [13]: The parameter sets to be estimated for the basic SIR model, age-structured model and region-structured model, are where n a and n r denote the number of age groups and regions, respectively.

Time-dependent reproduction number
We illustrate the proposed methodology and investigate its performance by applying it to 2009 incidence data of pH1N1 in South Korea. This section presents the results of the estimation obtained by applying the least squares method to basic SIR, age-and region-structured models. In each experiment, data with different  Table 1). Estimated parameters are possible indicators to determine the feasibility of models. Incorporation of the The general characteristics of regional difference led us to consider a second type of heterogeneity in the model. The nation is split into 252 in the region-structured model (2), where the transmission rates are implemented based on transportation patterns. Figure 5 compares the predicted cases based on the region-structured SIR model with the observed data over the course of the epidemic. As it was discussed in the previous experiment, it is not earlier than the epidemic peak for estimation to start adjusting to outbreak data. Since this outbreak, the incidence data is well described in the form of the characteristic exponential rise, turnover, and decline pattern predicted by the process model. The estimated time-dependent effective reproduction numbers are illustrated in figure 5, which demonstrates a patterns similar to that obtained using the age-structured SIR model in figure 4.

Age-specific and region-specific reproduction numbers
It is widely known that the transmission is considerably different among various age groups. We also observe from the pH1N1 epidemic data that the incidence rate is higher in children and students than in other age groups ( Table 1). Estimates of the age-specific reproduction number help in clarifying the role of each age group in the transmission dynamics and in suggesting guidelines for effective targeting intervention strategies. The estimated age-specific reproduction numbers are displayed in figure 6. The result is closely related to the cumulative incidence for each age group because it is often the contact rate within the same age group is higher than with other groups.
The incidence rate is higher in urban areas than in rural areas, and the highest in the national capital and the south-eastern region, as shown in figure 2. We estimated the region-specific reproduction number and observed that it is more than two in some areas and less than one in the others (Figure 7). This is consistent with regions having larger cumulative incidence with a similar argument regarding contact patterns to age-specific cases. We discussed parameter estimation methodologies based on deterministic SIR models incorporating population heterogeneity. Age-structured and region-structured models were employed to describe the underlying epidemic process. The proposed mechanisms were applied to influenza A(H1N1)pdm09 in South Korea to compute the time-dependent effective reproduction numbers. We found that the introduction of heterogeneity into the population and sufficient data to represent general characteristics of dynamics are essential to the robust estimation of parameters.
Real-time estimation showed that the reproduction number started increasing in early October, peaked at 2.5 on October 17, and then decreased to unity at the end of October. The effective number rose sharply during the mid-October when the number of patients increased dramatically. The reproduction number fell below unity at the end of October and remained lower that unity, indicating that the epidemic starts decreasing, which is consistent with the incidence data.
Subsequently, age-specific and region-specific basic reproduction numbers were estimated to account for the differences of incidence. We observe from the pH1N1 epidemic data that the incidence rate is higher in children and students than in other age groups. The estimated age-specific reproduction numbers agree with the cumu- lative incidence for each age group because the mixing is assortative. The incidence rate is higher in urban areas than in rural areas, highest in the national capital and in the south-eastern region. We estimated the region-specific reproduction number whose trend is similar to the number of cases in each region. Estimates of the agespecific and region-specific reproduction number help to predict the transmission dynamics, and to suggest guidelines for effective targeting intervention strategies.
This study has both limitations and strengths. The first limitation is that the number of cases is estimated from the amount of prescribed antiviral agents assuming the time lag between symptom onset and antiviral agent prescription, the proportion of prescription and the proportion of pH1N1 confirmation among AFRI patients. The effective contacts were employed from POLYMOD contact survey, which possibly yields discrepancy in mixing pattern of Korea. The structured models may not be enough to account for some underlying mechanism of dynamics due to those factors. However, they provide feasible estimates of reproduction number despite the limitations, which was not possible using simple SIR model. We will be able to improve the outcome as we gather more information, because additional knowledge is required to achieve a better result. Another limitation is that

Appendix
The reproduction number for the age-structured SIR model (1), can be calculated following the approach of Driessche and Watmough [10]. Let F i be the new infections and V i be the transitions of i th compartment, then for i = 1, · · · , n a .
Subsequently, the derivatives of respectively, and the next generation matrix is