Recognizing spatial and temporal clustering patterns of dengue outbreaks in Taiwan

Background Dengue fever is the most common arboviral infection in humans, with viral transmissions occurring in more than 100 countries in tropical regions. A global strategy for dengue prevention and control was established more than 10 years ago. However, the factors that drive the transmission of the dengue virus and subsequent viral infection continue unabated. The largest dengue outbreaks in Taiwan since World War II occurred in two recent successive years: 2014 and 2015. Methods We performed a systematic analysis to detect and recognize spatial and temporal clustering patterns of dengue incidence in geographical areas of Taiwan, using the map-based pattern recognition procedure and scan test. Our aim was to recognize geographical heterogeneity patterns of varying dengue incidence intensity and detect hierarchical incidence intensity clusters. Results Using the map-based pattern recognition procedure, we identified and delineated two separate hierarchical dengue incidence intensity clusters that comprise multiple mutually adjacent geographical units with high dengue incidence rates. We also found that that dengue incidence tends to peak simultaneously and homogeneously among the neighboring geographic units with high rates in the same cluster. Conclusion Beyond significance testing, this study is particularly desired by and useful for health authorities who require optimal characteristics of disease incidence patterns on maps and over time. Among the integrated components for effective prevention and control of dengue and dengue hemorrhagic fever are active surveillance and community-based integrated mosquito control, for which this study provides valuable inferences. Effective dengue prevention and control programs in Taiwan are critical, and have the added benefit of controlling the potential emergence of Zika. Electronic supplementary material The online version of this article (10.1186/s12879-018-3159-9) contains supplementary material, which is available to authorized users.


Background
The global emergence and resurgence of epidemic arboviruses such as dengue and Zika have been dramatic in recent years. Dengue fever is the most common arboviral infection in humans, with viral transmission occurring in more than 100 countries in tropical regions. It is estimated that 390 million dengue infections occur annually, of which 50-100 million cases have apparent clinical manifestations [1][2][3]. The geographical areas in which transmission of the dengue virus is common have been expanding over the past few decades and all four dengue virus serotypes (DENV1-DENV4) now circulate in Asia, the Americas, and Africa [4]. Compared with other tropical infectious diseases, dengue has a relatively low mortality; however the large scale of human suffering and economic resources used for dengue prevention and control makes it a major global public health problem [1,5,6]. There are several factors that contribute to the increased frequency and magnitude of dengue fever and the emergence of dengue hemorrhagic fever, a severe form of the disease. The most important factors are unprecedented growth of human population, unplanned and uncontrolled urbanization, a lack of effective vector control, and globalization [7,8].
The geographical extension of dengue viral transmission has followed the increased geographic distribution and population densities of Aedes aegypti, the principal mosquito vector, which transmits dengue viruses in urban areas of the tropics [7]. Even though great progress has been made in dengue research, particularly in identifying and treating dengue and understanding the structure and replication of the virus, we still do not fully understand why most individuals do not have complications while others experience a severe and fatal hemorrhagic disease. Many unanswered questions remain regarding the virus-host interaction, immune pathology, and influence of genetic variation in the host and virus [9].
Taiwan is infested with both Ae. aegypti and Aedes albopictus (a secondary mosquito vector), which transmit dengue viruses. The two largest dengue outbreaks in Taiwan since World War II occurred recently, with 15,492 autochthonous cases confirmed in 2014 and 43,419 cases confirmed in 2015. Dengue cases nearly disappeared from the island of Taiwan for 40 years until an outbreak of 4389 cases occurred in 1988. In addition to an outbreak of 5336 cases in 2002, a few small outbreaks occurred between 1989 and 2013. Before World War II, large dengue outbreaks were reported in 1915 and 1931 [10,11].
Accurately recognizing geographical discrepancies and heterogeneity in dengue incidence patterns and detecting the geographical areas in which the exposure to environmental or viral agents may be responsible for intense dengue incidence will inform disease control and prevention efforts and provide important insights into the etiology of this disease. In this study, we used the map-based pattern recognition procedure and scan test to systematically explore geographical and temporal clustering patterns of dengue incidence in an analysis of Taiwan's dengue outbreaks in 2014 and 2015. The map-based pattern recognition procedure is designed to recognize hierarchical incidence intensity patterns for some disease over geographical spaces by searching for hierarchical (in intensity) clusters of mutually adjacent areas with high rates [12]. The procedure incorporates information about the intensity rank order into the ordinary adjacency-based test statistic [13], which is designed to analyze data from irregularly arranged and shaped geographic units like the irregular county boundaries within a US state.
Our analysis of the largest Taiwan dengue outbreak in 2015 showed that multiple geographic units with the highest rates of dengue incidence significantly aggregated into 2 separate geographical areas located in Tainan and Kaohsiung in southern Taiwan. More importantly, we determined 3 distinct groups within these geographic units that had the highest dengue incidence rates according to their intensity and delineated 2 separate clusters of hierarchical dengue incidence intensity.
Using the scan test, we found that dengue incidence tended to peak simultaneously and homogeneously among the neighboring geographic units with high rates in the same cluster [14].

Study population
Dengue fever is a notifiable communicable disease in Taiwan. Information on dengue cases collected in Taiwan since 1988 is publicly available through the Taiwan Centers for Disease Control (http://www.cdc.gov.tw/english/ index.aspx) and the Taiwan Government Open Data website (http://data.gov.tw/en). This information includes the date an individual was diagnosed with dengue infection and his or her residence at diagnosis, place of infection, gender, and age. The study population used for this investigation is patients with laboratory confirmed autochthonous dengue infection, which thus excludes imported cases of dengue. The spectrum of clinical presentations of dengue infection with any one of the 4 viral types is broad. Thus, laboratory confirmation of dengue infection is crucial. Confirmed dengue viral infection in Taiwan is based on a positive diagnosis from any one of 4 laboratory tests: virus isolation, nucleic acid amplification tests, antigen detection, and serological tests. Data on the place where the infection occurred are used in the analysis. If they are unavailable, the individual's residence at diagnosis is used. Information on the daily local climate variables, including temperature, rainfall, and relative humidity, is available from Taiwan's Central Weather Bureau (https://www.cwb.gov.tw/eng/index.htm).
Tainan and Kaohsiung are the two largest cities in the southern, tropical region of Taiwan. Kaohsiung is bigger than Tainan in population and area, with 2.78 million residents and 2952 km 2 . Tainan has a population of 1.89 million and 2192 km 2 . Ae. aegypti, is dispersed primarily in Tainan, Kaohsiung, and the area to the south of these cities. Ae. albopictus, has a widespread distribution throughout most of Taiwan. Figure 1 shows the yearly frequency distribution of confirmed autochthonous dengue cases that occurred between 1987 and 2016. In 2014, there were 15,492 confirmed autochthonous cases in Taiwan, among which 97% (15,034 cases) occurred in Kaohsiung and < 1% (150 cases) occurred in Tainan. In 2015, 98% of the total 43,419 confirmed autochthonous dengue cases occurred in Tainan and Kaohsiung combined, with 22,842 in Tainan and 19,746 in Kaohsiung. The 2015 Taiwan city (and county)-specific dengue incidence intensity distribution is presented in Fig. 2. In 2015, the 3 places with the highest numbers of dengue cases per 100,000 persons were Tainan, with a rate of 1212, Kaohsiung, with 711, and Pingtung County, with 48. All other cities (and counties) had numbers less than 8. In 2016, there were 380 confirmed autochthonous cases in Taiwan.

Map-based pattern recognition procedure for hierarchical clusters of disease
The method developed by Mantel [15] was generalized by Cliff and Ord, who proposed the test statistic B = (1/ 2) Σ ω ij x i x j where x i = 1 if area i is a high-risk area for some disease and 0 otherwise, and where ω ij = 1 if areas i and j are mutually adjacent geographically and 0 otherwise, ω ij = ω ji , ω ii = 0 [16]. The sum ranges over all pairs of areas. It is an adjacency-based test statistic that measures spatial autocorrelation for binary data and uses the distribution of the number of adjacencies of geographic units. When high-risk areas tend to be geographically adjacent to each other, the value of B tends to be large. Using the test statistic B, one can test the null hypothesis of the random allocation of high-risk areas over the geographical region; that is, high-risk areas do not cluster. Cliff and Ord derived the expressions for the mean and variance of B under the assumptions of binomial and hypergeometric distributions [13].
Instead of selecting a specific threshold rate of incidence, the map-based pattern recognition procedure proposes to first list the areas under study in rank order based on the disease intensity rates [12]. It starts with classifying the 2 top ranking areas as high-risk areas and calculates the value of B. Subsequently, the procedure includes the area with the 3rd highest rate and the other 2 areas with higher rates as high-risk areas and calculates the corresponding value of B. The p-value is the probability that B is equal to or higher than the observed number of adjacencies involved between these 3 areas with the highest disease intensity rates. The procedure proceeds successively, including exactly one area with high rate according to the rank order and the other areas with higher rates as high-risk areas at each step with the use of B.
Therefore, the procedure provides the p-value of B when the k top ranking areas among all areas under study are classified as high-risk areas for each k where k = 2, 3, 4….. The procedure can classify as many areas as high-risk areas as possible; however, it is unlikely that one would inquire about the possibility of clusters of more than 20% high-risk areas. The main feature of the procedure is to determine the hierarchical incidence intensity pattern through the distribution of p-values for k = 2, 3, 4…, which will be illustrated in Results.
Instead of relying on the assumptions associated with the asymptotically normal distribution [13], we propose to use simulation-based permutations using 1 million replicates based on the exact district boundary map under study to obtain the null distribution of B. The basic geographic unit used in this report is a "district", which is the administratively defined subdivision of a city in Taiwan, and which regularly reports health-related information to the city government through its health department. There are 37 and 38 districts in Tainan and Kaohsiung, respectively. The distribution was simulated by randomly selecting exactly k districts among the 75 districts of Tainan and Kaohsiung combined 1 million times and counting the number of the adjacent pairs appearing among the k districts for each of the 1 million replicates. This process was applied for k = 2, 3, 4… 14. Each of the 13 distributions of B for k = 2, 3, 4… 14 is given in Table 1. In this setting, at most 19% (14/75) of the 75 total districts are high-risk districts. With Table 1, we do not require the assumption of asymptotically normal distributions for B. The distribution of B would closely approximate a Poisson  Table 1.

Scan test
The scan test is frequently used to detect disease clustering over a temporal series and is structured to test for the largest cluster. The scan test employs a moving window of pre-determined length and finds the maximum number of cases of disease revealed through the window as it slides over the entire period. The scan statistic is the maximum number of events in a window (t, t + w), where w is the pre-determined window size as t takes on all values in a certain time frame. The model of the scan test that we applied here is based on the assumption of a uniform distribution of events [14]. Here, the scan test was used to test for clustering of dengue incidence and detect the date of the occurrence of maximum dengue incidence in a district.

Tainan and Kaohsiung dengue outbreak
Tainan and Kaohsiung are geographically adjacent to each other and, therefore, are considered as a single geographical region. The rates, which were the numbers of dengue cases per 100,000 persons, ranged from 0 to 4497 among the 75 districts. Fig. 3a presents the district-specific dengue incidence intensity. We used a cut-off point of 908 in the rates to dichotomize because the difference of 152 between 908 and the next lowest rate of 756 is largest in comparison with the difference between the rates of any other two districts among the upper spectrum of rates, except for the top 5 rates. There were exactly 14 districts with a rate equal to 908 or higher. In this analysis, at most 14 districts with the highest rates were classified as high-risk districts using the map-based pattern recognition procedure. Fifty-five districts or 73% (= 55/75) had rates of 480 or lower. The 14 top ranking districts are listed by rank according to their rates in Table 2. With each district with a high rate, Table 2 gives the rate, the observed value of B (= the observed number of adjacencies involved between the district and districts with higher rates), and the p-value (determined from the distribution in Table 1). When the (k =) 2 top ranking districts in Table 2 (West Central and North districts) are classified as the high-risk districts and the remaining 73 districts are not, the observed value of B is equal to 1, giving a p-value = Pr(B ≥ 1│k = 2) = 0.066183 (= 66,183/1 million from Table 1), shown in the 3rd row of Table 2. Table 2 gives the p-value when the k top ranking districts are classified as high-risk districts for k = 2, 3, 4… 14. Low p-values were found for each high-risk district listed in Table 2, except for the North district shown in the 3rd row. The null hypothesis of the randomization of k districts with the highest rates was rejected at a nominal significance level of 0.05 for 3 ≤ k ≤ 14; that is, the k districts with the highest rates showed significant clustering based on B. We also provide the p-values in Table 2 based on the Poisson approximation for k ≤ 7.
We note that lower p-values of B indicate high degrees of clustering, which conform to the adjacency-based definition of a cluster [12,13]. In Table 2, the p-values of the 14 high-risk districts appear to be cycling over the rates and are at their relative lowest at the points where the East and Anping districts enter the ranking. We observed a relatively low p-value of 0.000121 when we included the East district (4th in rank) and the other 3 districts with higher rates. In this scenario, the number of high-risk districts was 4 and the observed value of B was 5, giving a p-value = Pr(B ≥ 5│k = 4) = 0.000121 (= 121/1 million from Table 1), shown in the 5th row. The p-value jumped to 0.000706 when we included the Qianzhen district (5th in rank) as a high-risk district because the number of high-risk districts became 5 and the observed value of B remained 5, as shown in the 6th row. The 4 top ranking districts are located in Tainan while the Qianzhen district is in Kaohsiung. The next relatively lower p-value of Pr(B ≥ 7│k = 6) = 0.000077 (= (66 + 10 + 1)/1 million from Table 1) occurred by including the Anping district (6th in rank) and the other 5 districts with higher rates, leading to the number of high-risk districts = 6 and the observed value of B = 7, as shown in the 7th row of Table 2.
Correspondingly, we determined the 3 groups of districts to use in constructing hierarchical clusters of mutually neighboring high-risk districts with different levels of intensity using the map-based pattern recognition method [12]. Level-1 districts are the 4 top ranking districts in Table 2 (West Central, North, South, and East districts). Level-2 districts are Qianzhen and Anping, which are respectively the 5th and 6th by rank. Level-3 districts are the 8 districts that rank from 7 to 14. When the level-specific intensity is placed on the map, 2 hierarchical dengue incidence intensity clusters clearly emerge and are located in the urban areas of Tainan and Kaohsiung, respectively, as shown in Fig. 3b. The first cluster geographically expands from the 4 Level-1 districts to 7 mutually adjacent high-risk districts. This geographical area displays the highest dengue incidence intensity, accounting for 50% of dengue cases. In comparison, the second cluster that consists of the other 7 high-risk districts explains 28% of dengue incidence. The scan test was used not only to determine the clustering of the dengue incidence but also to identify the date of the occurrence of maximum incidence in a high-risk district. Using a window width of 7 days and the time period from week 24 to week 52 of 2015, the scan test was applied to each of the 14 top ranking districts. A very small p-value for the scan test was obtained for each of the 7 high-risk districts in Tainan, as shown in Table 3. More importantly, the date of occurrence of the district-specific largest dengue cluster (maximum dengue cases over any 7 consecutive days in a high-risk district) overlapped on 09/18-09/20 among 6 of the 7 adjacent high-risk districts, indicating that the 2015 Tainan dengue outbreak peaked almost simultaneously and homogeneously among the geographically neighboring high-risk districts. A similar phenomenon was observed in Kaohsiung, as shown in Table 3. The peaks of incidence for 5 high-risk districts were during a 2-week period of 11/12-11/25.

Kaohsiung dengue outbreak
We used the same approaches to study the pattern and distribution of dengue incidence in 2014 Kaohsiung alone. Each of the 6 distributions of B for k = 2, 3… 7 is shown in Additional file 1: Table S1. Correspondingly, Table 4 gives the rate, observed value of B, and p-value (determined from the distribution in Additional file 1: Table S1) for each of the 7 top ranking districts. Small p-values were found by including 4 or more districts with the highest rates. These 7 top ranking districts account for 81% of the 2014 Kaohsiung dengue incidence. The weekly distributions of the frequency and rate in 2014 and 2015 in Kaohsiung are given in Additional file 2: Figure S1A. Applying the scan test to each of the 7 high-risk district in the same setting, Additional file 1: Table S2 presents the results, indicating that 4 of the 7 high-risk districts had the date of occurrence of the district-specific largest dengue cluster during a period of 10/15-10/25.

Effects of temperature, rainfall, and relative humidity
Local weather affects dengue viral transmission and infection [17]. The effects of local climate variables on dengue incidence were considered, including temperature, rainfall, and relative humidity. All 3 local climate variables were similar in Tainan and Kaohsiung. This  finding is not surprising as the 2 cities are mutually adjacent. As seen in Fig. 4a, b, relative humidity was consistently higher in Tainan than in Kaohsiung by at most 10% over the outbreak period, and no appreciable difference in temperature and rainfall was observed in 2015. The local weather did not seem to have an effect on the difference in intensity between the 2 distinct hierarchical clusters. The Tainan dengue outbreak peaked on week Fig. 4 (A, B) Figure S1B, poor environmental management for effective integrated vector controls may be responsible for the worse dengue outbreak and later date of peak incidence occurred in 2015 Kaohsiung.

Discussion
Historically, Tainan and Kaohsiung experienced the worst dengue incidence in large dengue outbreaks in Taiwan [10,11]. One major reason is that most areas of Tainan and Kaohsiung are infested with the principal vector, Ae. aegypti. In the analysis of the 2015 dengue outbreak, the 4 Level-1 districts had high population density, respectively the 2nd, 1st, 6th, and 3rd by rank in population density in Tainan. This small area experienced extraordinarily high dengue incidence, explaining 37% of the dengue incidence in Tainan and Kaohsiung combined. The 7 districts with the highest dengue incidence rates in 2015 Kaohsiung were also among the districts with the highest population density in Kaohsiung. This indicates that dengue viruses have adapted to the domesticated Ae. Aegypti and most transmission occurs in and around the domestic environment in Tainan and Kaohsiung. In addition, it is possible that poor physical environments in Tainan and Kaohsiung could be contributing factors for recent dengue outbreaks and our results of analysis call for better environmental management for integrated vector controls to reduce chance of dengue outbreaks in these regions. We note that the dengue incidence outbreak appeared to be initiated earlier and more concentrated geographically in Tainan than in Kaohsiung, shown in Figs. 4a and 3a, b. They may explain why the dengue outbreak occurred earlier and appeared to rise and fall more rapidly in Tainan. Similar climate might also be yet another reason for the dengue outbreaks in Tainan and Kaohsiung. Because dengue incidence rates vary substantially by districts and because we attempt to accurately recognize geographical heterogeneity patterns of varying dengue incidence intensity, the map-based pattern recognition procedure is used and provides important epidemiologic pattern analysis. Our investigation exactly delineates the 2 tight clusters, which are distinct in location, intensity, and date of peak incidence. As stressed by Kulldorff (2001), p-values should be used as an indicator concerning the evidence for true spatial or space-time clusters rather than maintaining a strict cut-off for the p-value to decide whether to investigate detected clusters or not.
The amount of efforts for the investigation should depend on this evidence [18].
A global strategy for dengue prevention and control was established more than 10 years ago, and many efforts have been made to focus on 3 fundamental objectives: surveillance for planning and response, reducing the disease burden and changing behaviors to improve vector control [4]. However, the factors that drive dengue viral transmission and infection continue unabated, and effective vector control remains elusive [6].
In addition, the US Centers for Disease Control (1990) issued a set of guidelines for investigating clusters of health events. According to the guidelines, the four stages are (1) initial contact with and response to the individual who reported the cluster; (2) a preliminary assessment, including evaluations of whether an excess has occurred; (3) a formal feasibility study; and (4) a full etiologic investigation [19]. This study provides valuable information and inference in the second and third stage of the guidelines. We acknowledge some limitations of this study, including (1) this investigation is observational by nature and the exact cause effects cannot be concluded, and (2) the data on the location at which the infection occurred are missing for many individuals, for those the individual's residence at diagnosis is used in the analysis.
The Zika virus essentially has the same epidemiology and mosquito vectors in urban areas as dengue and is following the same path of global spread via competent mosquito vectors [20]. The potential for the Zika virus to emerge in Taiwan is great due to increased air travel. Thirteen imported cases were reported in Taiwan in 2016.

Conclusion
Beyond significance testing for disease clustering, our investigation of dengue incidence distribution over spatial and temporal series is desired by and useful for health authorities who require optimal characteristics and patterns of disease incidence on maps and in a temporal series for effective prevention and control programs. Effective prevention and control programs for dengue in Taiwan are critical, and have the added benefit of controlling the potential emergence of Zika.

Additional files
Additional file 1