### Basic SIR model

We consider the standard SIR model to represent single-outbreak influenza dynamics. The model classifies individuals into three key compartments: susceptible, infected, and recovered. The nonlinear system of differential equations describing the dynamics is given by the following equation.

$$ \begin{aligned} S^{\prime}(t) &= -\beta S(t)I(t) \\ I^{\prime}(t) &= \beta S(t)I(t) - \gamma I(t) \\ R^{\prime}(t) &= \gamma I(t). \end{aligned} $$

The state variables *S*(*t*),*I*(*t*), and *R*(*t*) denote the number of individuals who are susceptible, infected, and recovered, respectively, at time *t*. The number of contact events sufficient for transmitting an infection during unit time per individual is *β**N* based on the mass action incidence. Infective individuals leave the compartment at the recovery rate *γ*, thereby acquiring immunity to the disease. We can drop the equation for *R*(*t*) as it has no effect on the dynamics of others and hence is determined once *S*(*t*) and *I*(*t*) are known. Based on the SIR model, we have the time-dependent net reproduction number *R*_{t}=*β**S*(*t*)/*γ*, which quantifies the level of transmission at time *t*. Note that *R*_{t} is the per-infective rate at which new infections occur within the average duration of infection at time *t*.

### Age-structured model

An age-structured model is employed to estimate the reproduction number of pH1N1 because the transmission rate is higher in preschool and schoolchildren than in other age group individuals, in general. We consider a subgroup SIR model where the population is divided into *n*_{a} age groups with different transmission dynamics. We denote the number of susceptible and infected individuals within the *i*^{th} age group by *S*_{i} and *I*_{i}, respectively. Let *β*_{ij} refer to the transmission from the *j*^{th} age group to the *i*^{th} age group, and *β* =[*β*_{ij}] denote the transmission matrix, also known as Who-Acquires-Infection-From-Whom matrix. Putting these elements together, we have the following system of differential equations.

$$ \begin{aligned} S_{i}^{\prime}(t) &= -\sum_{j=1}^{n_{a}} \beta_{{ij}} S_{i}(t)I_{j}(t) \\ I_{i}^{\prime}(t) &= \sum_{j=1}^{n_{a}} \beta_{{ij}} S_{i}(t)I_{j}(t) - \gamma_{i} I_{i}(t). \end{aligned} $$

(1)

In a general structured model of the form (1) with *n*_{a} distinct classes, \(n_{a}^{2}\) transmission terms are required. However, one transmission term is available at most for each class. The typical way to address this lack of specificity is to constrain the structure of the transmission matrix and/or to use prior knowledge of social mixing behavior. For an age-structured model, we assume that the transmission rates are proportional to the rates of social contact, which can be estimated from contact patterns. A large multi-country population-based survey conducted in Europe as a part of the POLYMOD [9] enables us to implement this approach. The transmission is modeled as the product of the contact rate in the survey and an age-specific proportionality factor to account for characteristics related to susceptibility and infectiousness, which are not captured by contact rates. This leads to

$$ \beta_{{ij}}=\left\{\begin{array}{ll} q_{i} c_{{ij}} & \quad i=j \\ \sigma c_{{ij}} & \quad i\neq j \end{array}\right. $$

where *c*_{ij} is the contact rate and *q*_{i} and *σ* are proportionality factors.

Based on the age-structured SIR model (1), the reproduction number can be calculated by following Driessche and Watmough [10]. It is the spectral radius of the next generation matrix *M* where

$$M_{{ij}} =\frac{\beta_{{ij}} S_{i}(t)}{\gamma_{i}}. $$

The details are given in Appendix.

### Region-structured model

The second mechanism incorporates a heterogeneous population based on regions to account for the wave of the pH1N1 pandemic. We denote the number of susceptible and infected individuals within the *i*^{th} region by *S*_{i} and *I*_{i}, respectively. Let *β*_{ij} refer to the transmission from the *j*^{th} subgroup to the *i*^{th} subgroup and *β* =[*β*_{ij}] denote the transmission matrix. In the same manner as the age-structured model, we have the following system of differential equations.

$$ \begin{aligned} S_{i}^{\prime}(t) &= -\sum_{j=1}^{n_{r}} \beta_{{ij}} S_{i}(t)I_{j}(t) \\ I_{i}^{\prime}(t) &= \sum_{j=1}^{n_{r}} \beta_{{ij}} S_{i}(t)I_{j}(t) - \gamma_{i} I_{i}(t). \end{aligned} $$

(2)

We assume that transmission rates between distinct regions in the region-structured model can be expressed as the frequency of transportations multiplied by a region-specific proportionality factor. The transportation information was extracted from the highway portal site and Kakao map for number of buses and highway traffic, respectively [11, 12]. Let the number of buses and highway traffic from region *j* to region *i* be denoted by *w*_{i,j} and *W*_{i,j}, respectively. Note that *w* is symmetric because the bus route is circular, although *W* is not necessarily. The transmission rate can be written as

$$ \beta_{{ij}}=\left\{\begin{array}{ll} q_{i} & \quad i=j \\ q_{i} \sigma_{l} w_{{ij}} + q_{i} \sigma_{g} W_{{ij}} & \quad i\neq j \end{array}\right. $$

where *q*_{i} is the proportionality factor, and *σ*_{l} are *σ*_{g} can be chosen such that they balance the weight between different types of transportations.

The same argument as that presented in the age-structured model gives the expression of the effective reproduction number *R*_{t}, which is the spectral radius of the following next generation matrix

$$\left[ \frac{\beta_{{ij}} S_{i}(t)}{\gamma_{i}}\right]. $$

### Study subjects and parameter estimation

Study subjects were patients who were prescribed antiviral agents from the national stockpile from August 21, 2009 to April 30, 2010. Because of mandatory antiviral agent management program during study period, all patients who were prescribed antiviral agent were included in this study. The data employed to estimate the parameters are the daily number of incident patients in Fig. 1. It was estimated based on the aggregation of prescribed antiviral agents from deidentified database. This study was approved by the Institutional Review Board (IRB) of Yonsei University Health System. Since this study used retrospective data and the study subjects were anonymized, the IRB waived the requirement for written consent from the patients.

Our goal is to estimate the optimal parameters that provide the states that are best fit to the given data. This section briefly reviews the parameter estimation technique of the least squares method. In general, parameter estimation is conducted by minimizing the cost function, which measures the difference between the model prediction and observation. The simplex algorithm proposed by John Nelder and Roger Mead is applied to solve the optimization problem [13]. Let *θ* be the parameter set and time points *t*_{j}(*j*=1,...,*N*) are uniformly distributed with daily time step. The data vector *y*_{j}(*j*=1,...,*N*) denotes the number of cases at time *t*_{j}. It is a scalar for basic SIR model, but it is a vector structured by age and region for age-stratified and region-stratified models, respectively. For example, the *y*_{j} is a column vector of length 15 for the age-structured model. We recast the mathematical model as

$$ \boldsymbol{x}^{\prime}(t)= \boldsymbol{g}\left(t, \boldsymbol{x}(t), \boldsymbol{\theta}\right), $$

and assume a statistical model for measurement of the form

$$ \boldsymbol{y}_{j} = \boldsymbol{f}\left(t_{j};\boldsymbol{\theta}\right) + \varepsilon_{j}, \qquad j=1,\cdots,N $$

where *f*(*t*_{j};*θ*) is the model prediction at time *t*_{j} with parameter *θ* and the measurement error \(\varepsilon _{j} \sim \mathcal {N}\left (0, a^{2}\right)\). The least squares estimator can be obtained by minimizing the following cost function over the given parameter space *Ω*_{θ} [14]:

$$ \sum_{j=1}^{N} \left[\boldsymbol{y}_{j} - \boldsymbol{f}\left(t_{j};\boldsymbol{\theta}\right)\right]^{T}\left[\boldsymbol{y}_{j} - \boldsymbol{f}\left(t_{j};\boldsymbol{\theta}\right)\right] $$

(3)

The parameter sets to be estimated for the basic SIR model, age-structured model and region-structured model, are

$$\begin{aligned} \boldsymbol{\theta}_{\text{basic}} &= \left\{\beta, \gamma\right\} \text{and initial values of {S} and {I},}\\ \boldsymbol{\theta}_{\text{age}} &= \left\{q_{i}, \sigma, \gamma_{i} \text{ for} i=1, 2,\cdots, n_{a}\right\} \text{and initial values of \(S_{i}\) and \(I_{i}\),} \\ \boldsymbol{\theta}_{\text{region}} &= \left\{q_{i}, \sigma_{l}, \sigma_{g}, \gamma_{i} \text{ for} i=1, 2,\cdots, n_{r} \right\} \text{and initial values of \(S_{i}\) and \(I_{i}\)}\\ \end{aligned} $$

where *n*_{a} and *n*_{r} denote the number of age groups and regions, respectively.