Study area and ILI surveillance data
Japan is a bow-shaped strip of islands, stretching from 24°N to 46°N for approximately 2400 km. At its widest point, Japan is no more than 230 km across. Japan is divided into 47 prefectures for local administration. Hokkaido is the northernmost prefecture; Okinawa is the southernmost prefecture. Most regions of Japan lie in the temperate zone with humid subtropical climate. However, Japan’s climate varies from a cool humid continental climate in the north, such as in northern Hokkaido, to a warm tropical rainforest climate in the south, such as in Ishigaki, Okinawa.
Influenza (excluding avian influenza and pandemic influenza, e.g. novel influenza or re-emerging influenza) is subject to sentinel surveillance under the National Epidemiological Surveillance for Infectious Disease in Japan. The number of patients diagnosed with ILI is reported from approximately 5000 sentinel medical institutions (SMIs) (3000 for pediatrics and 2000 for internal medicine) across Japan on a weekly basis (ISO 8601 week date system according to the Weeks Ending Log [36]). The criteria for reporting ILI used by SMIs have been previously described elsewhere [37]. The data are aggregated at the National Institute of Infectious Diseases into weekly total number of cases and weekly average number of cases per sentinel for both the national and prefectural levels [37]. The surveillance data tables are published on the website of the Infectious Disease Weekly Report (IDWR) [38] every Tuesday. A detailed description of infectious diseases surveillance system in Japan has been made available [39].
In our study, an influenza season was defined to range anywhere from week 35 in September of each year up to week 34 in August of the following year. We downloaded IDWR surveillance data tables from week 35 of 2012 to week 34 of 2018 (from 2012-09-02 to 2018-08-26 in terms of week ending date). Our study period covered six influenza seasons from 2012/2013 through 2017/2018 (Additional file 1: Fig. S1). Only the weekly number of ILI cases per sentinel was used in the following estimation of epidemic onsets, so as to be compatible with the empirical epidemic threshold.
Methods for estimating epidemic onset
We estimated the onset time of influenza epidemics in each prefecture for each of the six influenza seasons from 2012/2013 to 2017/2018 using three methods: the ETM, SRM, and MCM. The epidemic end is equivalent to the epidemic onset in reverse chronological order. The duration of an epidemic is defined as the period from its onset time to its ending time. Therefore, we focused on describing the algorithm for estimating epidemic onset.
The empirical threshold method (ETM)
The ETM defines an epidemic as occurring when the weekly number of ILI cases per sentinel has been reported to exceed a prespecified threshold Y0 for three consecutive weeks [40]. The first week of the three consecutive weeks corresponds to the epidemic onset. We used the criterion Y0 = 1.0 C/S/W, which is the threshold for the nationwide onset of an influenza epidemic in Japan. This threshold was empirically defined in the year 2000 based on more than 10 years of observations from sentinel surveillance of influenza in Japan [34]. The details of implementing the ETM are described in the Additional file 1: Text S1 and Fig. S2.
The segmented regression method (SRM)
Different from the above threshold-based method, the SRM fits piecewise linear models to determine the breakpoint in the first half of the epidemic curve, which corresponds to the epidemic onset. In other words, the breakpoint is the optimal knot location with the maximal difference-in-slope between the two fitted straight lines (Additional file 1: Figure S3). To find the optimal breakpoint, the log-likelihood function for the breakpoint is maximized. Further details of using the SRM to determine epidemic onset refer to [7, 31]. We implemented the SRM using the R package segmented [41], and the procedure is summarized in the Additional file 1: Text S2. An illustration of the SRM is shown in Additional file 1: Figure S3.
The maximum curvature method (MCM)
Given the unique feature of the epidemic curve in Japan, it may be more appropriate to identify the epidemic onset in terms of curvature. Therefore, we developed the MCM to detect epidemic onset and end. Inspired by the SRM definition of epidemic onset as the point of maximum change in the slope, the MCM defines epidemic onset as the point of maximum curvature located within the increasing phase of the epidemic curve. Likewise, epidemic end is defined as the point of maximum curvature located within the decreasing phase of the epidemic curve. To reduce the effect of small fluctuations in the epidemic curve, instead of directly calculating the osculating circle at each point on the curve, the MCM fits a least-squares circle to the n points around it. n ≥ 3 because three points are required to determine a circle and n is odd for the sake of symmetry. The curvature of the fitted circle only measures how fast the epidemic curve is changing direction at a given point. We further used the directional angle of the tangent vector at the given point to indicate its changing direction. In the first half of the epidemic curve, the point with maximum curvature and a directional angle between [0°, 90°] is defined as the epidemic onset; in the second half, the point with maximum curvature and a directional angle between [270°, 360°] is determined as the epidemic end. Any possible points that occur above an upper threshold, h C/S/W, are eliminated, because they are already in an epidemic state.
Let {yt, t = 1, 2, … , T} denote the weekly epidemic curve of an influenza season with T weeks, where yt is the number of ILI cases per sentinel reported at week t, which is referred to as intensity hereafter, for the sake of simplicity. The steps for using the MCM to detect epidemic onset and end are as follows.
Step 1. At a given point K (t, yt)(t = 1, 2, … , T), a circle with center \( O\ \left({t}_{\mathrm{c}},{y}_{t_{\mathrm{c}}}\right) \) and radius r is determined by least-squares fitting to n points \( \left(t-\frac{n-1}{2},{y}_{t-\frac{n-1}{2}}\right),\dots, \left(t+\frac{n-1}{2},{y}_{t+\frac{n-1}{2}}\right) \) surrounding K, using the algorithm proposed by Pratt [42]. When K is at the edge of the epidemic curve (\( t=1,\dots, \frac{n-1}{2}\ \mathrm{or}\ t=T-\frac{n-3}{2},\dots, T \)), the first (or last) two points of the epidemic curve are linearly extrapolated to pad the curve with \( \frac{n-1}{2} \) extra points. The raw curvature Ct at K is the reciprocal of the radius r.
Step 2. The tangent point \( P\ \left(\widehat{t},\widehat{y_t}\right) \) closest to K, is determined by intersecting the line OK with the fitted circle. The directional angle θt (in degrees) of the tangent vector \( \overrightarrow{PQ} \) is then calculated.
Step 3. The raw curvature Ct is filtered based on the directional angle θt and the upper threshold h.
$$ {C}_t^{\prime }=\left\{\begin{array}{c}{C}_tI\left({0}^{{}^{\circ}}\le {\theta}_t\le {90}^{{}^{\circ}}\right)I\left({y}_t\le h\right),\mathrm{if}\ t\le {t}_p,\\ {}{C}_tI\left({270}^{{}^{\circ}}\le {\theta}_t\le {360}^{{}^{\circ}}\right)I\left({y}_t\le h\right),\mathrm{otherwise}\end{array}\right. $$
where I is an indicator function, \( {t}_p=\underset{t=1,\dots, T}{\arg \max}\left\{{y}_t\right\} \) is the peak timing.
Step 4. Find the points with the maximum filtered curvature \( {t}_o=\underset{t=1,\dots, {t}_p}{\arg \max}\left\{{C}_t^{\prime}\right\} \) and \( {t}_e=\underset{t={t}_p,\dots, T}{\arg \max}\left\{{C}_t^{\prime}\right\} \) for each half of the epidemic curve.
Step 5. The coordinates of the tangent point at \( \left(\widehat{t_o},\widehat{y_{t_o}}\right) \) correspond to the epidemic onset and the epidemic onset intensity. Likewise, the coordinates of the tangent point at \( \left(\widehat{t_e},\widehat{y_{t_e}}\right) \) correspond to the epidemic end and the epidemic ending intensity.
In our study, n = 5 and h = 5.0 were used for estimating epidemic onsets, ends, and their intensities. The MCM is illustrated in Figs. 1 and 2 with an animation of fitting least-squares circles provided in Additional file 2: Movie S1.
Comparison of epidemic characteristic parameters derived by different methods
For each season, epidemic characteristic parameters including epidemic onset, end, duration, and intensities at epidemic onset and end were estimated using the above ETM, SRM, and MCM, nationally and for each prefecture. The threshold for the nationwide onset of an influenza epidemic in Japan has been empirically defined as 1.0 C/S/W since 2000 [34]. However, the prefecture-specific thresholds for epidemic onsets have yet to be determined. We presumed that the epidemic onset thresholds at prefecture level would be similar to the national threshold and thus specified Y0 to be 1.0 C/S/W when using the ETM to estimate epidemic characteristic parameters for each prefecture. Owing to the continued success of the nationwide epidemic onset indicator in Japan, estimates of the ETM using this indicator were used as the reference standard, against which epidemic characteristic parameter estimates using the other two methods were compared. A sensitivity analysis varying n (3, 5, and 7) and h (4.0, 6.0, 8.0, and 10.0) was performed to examine the MCM’s robustness. For each combination of n and h, epidemic characteristic parameters estimated by the MCM were also compared with those from the ETM.
Establishment of prefecture-specific thresholds for epidemic onset and end
With the epidemic characteristic parameters estimated by the MCM (n = 5, h = 5.0) in hand, the prefecture-specific thresholds for epidemic onset were calculated by averaging the epidemic onset intensities over the six available seasons, 2012/2013 to 2017/2018. The prefecture-specific epidemic ending thresholds were also calculated using the same procedure.
All methods and analyses were implemented in R 3.4.2 [43]. The datasets and codes are available under MIT license at the GitHub repository [44].