Spatio-temporal clustering analysis and its determinants of hand, foot and mouth disease in Hunan, China, 2009–2015

Background Hand, foot and mouth disease (HFMD) is one of the highest reported infectious diseases with several outbreaks across the world. This study aimed at describing epidemiological characteristics, investigating spatio-temporal clustering changes, and identifying determinant factors in different clustering areas of HFMD. Methods Descriptive statistics was used to evaluate the epidemic characteristics of HFMD from 2009 to 2015. Spatial autocorrelation and spatio-temporal cluster analysis were used to explore the spatial temporal patterns. An autologistic regression model was employed to explore determinants of HFMD clustering. Results The incidence rates of HFMD ranged from 54.31/10 million to 318.06/10 million between 2009 and 2015 in Hunan. Cases were mainly prevalent in children aged 5 years and even younger, with an average male-to-female sex ratio of 1.66, and two epidemic periods in each year. Clustering areas gathered in the northern regions in 2009 and in the central regions from 2010 to 2012. They moved to central-southern regions in 2013 and 2014 and central-western regions in 2015. The significant risk factors of HFMD clusters were rainfall (OR = 2.187), temperature (OR = 4.329) and humidity (OR = 2.070). The protect factor was wind speed (OR = 0.258). Conclusions The HFMD incidence from 2009 to 2015 in Hunan showed a new spatiotemporal clustering tendency, with the shifting trend of clustering areas toward south and west. Meteorological factors showed a strong association with HFMD clustering, which may assist in predicting future spatial-temporal clusters. Electronic supplementary material The online version of this article (10.1186/s12879-017-2742-9) contains supplementary material, which is available to authorized users.


Background
Hand, foot and mouth disease (HFMD) is a highly contagious disease caused mainly by human enterovirus 71 (EV71) and coxsackievirus A16 (CoxA16), [1]. Although the infection is typically mild and self-limiting, infants and children, especially those under 5 years old are at greatest risk of severe disease affecting the central nervous system [2]. Currently, there are no available vaccines, chemoprophylaxis and effective anti-virus therapies for dealing with HFMD [3,4].
The past decades have witnessed several outbreaks of HFMD across the world, affecting millions of people in countries such as Bulgaria, Australia, Japan, and Vietnam [5][6][7][8]. In March 2008, a nation-wide epidemic of HFMD started in Fuyang City, Anhui Province, China, recording 6049 cases and 22 deaths [9]. In view of these influences, HFMD has been listed as a reportable disease by the Chinese Ministry of Health to facilitate better disease prevention and control. Surveillance data shows that over 7.2 million cases and 2457 fatal cases were reported in China between 2009 and 2012, making HFMD the top reportable disease in China during the period under review [10].
Many studies showed that some climatic factors were associated with HFMD infection. By using a time series analysis, Huang et al. found that temperature and relative humidity were statistically related to HFMD occurrence [11]. However, a study carried out by Li et al. showed atmospheric pressure having a negative relationship with HFMD incidence in Guangzhou city [12]. Although several spatial analyses on HFMD prevalence have been conducted, only few studies have focused their analyses on determinant factors which attributed to HFMD spatio-temporal clusters [13][14][15]. Most studies only described the spatial patterns and cluster locations of cases. The objective of our study was to map county-level epidemiological characteristics and spatio-temporal distribution of HFMD incidence and to explore the determinants of HFMD in different clustering areas. Findings of the study were to guide the allocation of public health resources to maximize cost-effectiveness in the prevention and control of HFMD epidemics.

Study area
Hunan province is one of the central inland provinces of China situated between 108°47′ to 114°15′ east longitude and 24°38′ to 30°08′ north latitude with a typically subtropical monsoon climate. Hunan, with a population of approximately 67.8 million, covers an area of about 211,800 km 2 , and has 14 districts and 124 counties. High humidity, abundant heat, high population density and high levels of migration promote the development and spread of enteroviruses within the Province.

Data description
Surveillance data on HFMD in Hunan was obtained from the China Information System for Disease Control and Prevention (http://www.chinacdc.cn). This data covered the study period (2009 to 2015), and included case types as well as pathogen types (only for laboratory diagnosed cases) for all the new cases identified during the study period. The diagnostic criteria were based on the 2010 HFMD Diagnosis and Treatment Guidelines by the National Health and Family Planning Commission of the People's Republic of China. Cases without home addresses, clinical or laboratory diagnosis information were excluded from the study. Meteorological data was obtained from the Hunan Meteorological Administration. The Hunan map at county level was also provided by the Information department of the China Disease Control and Prevention Center.

Spatial autocorrelation analysis
Spatial autocorrelation is defined as the spatial dependence among the given attribute value of one geographic unit and its neighboring units, which lies in almost all spatial observations [16]. Global spatial autocorrelation is used to measure the overall clustering tendency in the study region while the local one can be further used to clarify the patterns and the exact location of the clusters among local counties [17][18][19]. In this study, we used global Moran's I statistic and local indicators of spatial autocorrelation (LISA) to investigate spatial association regarding incidence of HFMD in the Hunan province [20].
The global Moran's I statistic ranges from −1 to 1. An I > 0 indicates positive autocorrelation and the closer it is to 1, the more aggregated the distribution is. An I < 0 indicates negative autocorrelation and the closer it is to −1, the more dispersed the distribution is. An I = 0 indicates no autocorrelation, suggesting that the disease is randomly distributed. The significance of Moran's I is assessed by Monte Carlo test using Z statistic and P value (Z ≥ 1.96 or Z ≤ −1.96 indicates statistical significance at the 5% level) [21]. By using the LISA map, four patterns of spatial correlation can be detected: high-high clusters; low-low clusters; high-low clusters; and lowhigh clusters [22]. In reality, the high-high clusters (i.e. high-incidence districts surrounded by other highincidence districts) are the most important patterns for the purpose of disease prevention and control.
The formula for global or local Moran's I is: Where n is the number of districts or counties; x i and x j are the incidence rates in districts or counties i and j; x is the average incidence rate in the entire study area; and w ij is the spatial weight between districts or counties i and j. In this study, by choosing the binary spatial weight matrix in Rook contiguity rule [23], Geoda 1.6 software was used for the spatial autocorrelation analysis whilst ArcGis 10.2 software was used to visualize the results.

Spatio-temporal cluster analysis
Kulldorff's method of retrospective space-time scan statistic was used to detect the geographical clusters of HFMD cases at the county level based on a discrete Poisson model [24]. In this approach, a dynamic cylindrical window was used to scan for different time and geographic area to detect all possible clusters. The circular geographic base moves with the radius varying with the population range of the area, while the height varying with the defined time interval [25]. For each scanning, the relative risk (RR) is calculated using the ratio of the observed case number to the expected case number within and outside the windows as well as the log likelihood ratio (LRR). The P values for clusters detected are calculated by Monte Carlo randomization method [26,27]. The window with the maximum LLR is assumed to be the most likely cluster, and other windows with a statistically significant LLR are defined as secondary clusters ranked according to their LLR value.
The LLR for a given window can be calculated as follows: Where N is the total number of cases; n is the observed number of cases within the scan window; E(n) and N − E(n) are the expected number of cases within and outside the scan window under the null hypothesis (H 0 : The spatio-temporal clustering of the study area are caused by random factors.), respectively; and I ′ is an indicator function. It is equal to 1 when the window has more cases than expected under the null hypothesis, otherwise it is 0. In space-time scan analysis, the spatial unit is county, with 124 counties in Hunan Province; the temporal unit is month, with 84 months from 2009 to 2015. The spatial size of scanning window was limited to 30% of the total population at risk whilst the temporal size was set to 30% of the total study period in order to scan for small to large clusters. The number of Monte Carlo randomization was set at 999. By setting the time frame of the scan analysis to be one month, we can control the time trends and observe the cluster changes in the entire study. SaTScan 9.4 software was used to perform the spatiotemporal cluster analysis of HFMD and ArcGis 10.2 software was used to visualize the results.

Autologistic regression model
Spatial autocorrelation is frequently present in the observations and it appears that neighboring units tend to have more similar value than those that are distributed randomly [28]. In considering spatial dependence, Besag et al. present the autologistic regression model which is a special type of ordinary logistic model introducing a spatial autocorrelation term (SAuto cov i ) in the form of weighting coefficients [29]. Temporal autocorrelation commonly exists in infectious disease. In our study, we modified the autologistic regression by introducing TAuto cov i as the temporal autocovariate, which is set with the incidence cases of last month in the same county. Meanwhile, many previous studies have found that population density is significantly associated with HFMD occurrence [30,31], so we introduced the population density cov i as the covariate to adjust for population density in different counties.
For the modified autologistic regression model, the form of the model is given by equation: The probability of the occurrence of the outcome can be illustrated as follows: where y i is the dependent variable which refers to the probability of spatiotemporal clusters occurrence, P i is the mean of y i , or equivalently, the probability of y i =1, and x denotes the exposure factors in meteorology (the temporal resolution of independent variables is one month which coincide with the time frame of the SaTScan analysis). cov i is the population density covariate. SAuto cov i , TAuto cov i are the spatial and temporal autocovariate, respectively. β, p, q and r are the regression coefficient. i is the index of the county.
SAuto cov i is the weighted average of the probabilities of the geographic unit i which is surrounded by other k i geographic units. w ij is the spatial weight between county i and j, w ij ¼ 1 and h ij is the Euclidean distance between the centroids of county i and j. P j is the probability of the event occurring in the neighbors of county i, which is equal to 1 when the event occurs, otherwise it is 0. In this study, P j refers to the probability of spatiotemporal clusters occurrence in neighboring county j of county i. A county with the distance within 31,250 m from county i was defined as the neighbors of county i [31]. (see Additional file 1: Table S1-1, Table 2).
In this study, after calculating the variance inflation factor (VIF) and tolerance to decrease multicollinearity (see Additional file 1: Table S3-1), five independent variables were selected: monthly average rainfall, monthly average temperature, monthly average wind speed, monthly average relative humidity, monthly total sunshine. We also introduced the population density as the covariate. All variables were standardized. A forward stepwise method was used in the ordinary logistic regression model. We then introduced the SAuto cov i and TAuto cov i until each variable was statistically significant. Cox & Snell R 2 and Nagelkerke R 2 were used to test the goodness of fit of logistic or autologistic model. The higher the statistic value are, the better the model fits the observations. We used the Relative Operating Characteristic (ROC) to compare the probability predicted by the model with the reality (the occurrence of the spatiotemporal clusters). Closer to 1 means a better fit, which rages from 0.5 to 1. SPSS 22.0 software was used to conduct the autologistc regression model and ArcGIS 10.2 software was used to perform spatial analysis.

Epidemiological characteristics
There were 895,429 HFMD cases reported in Hunan Province from 2009 to 2015, with the average annual incidence rate of 194.57 per 100,000 (ranged from 54.31 per 100,000 in 2009 to 318.05 per 100,000 in 2014). Of the total cases, 8263 (0.92%) were severe and 352 (0.39‰) were death.
A total of 559,033 male cases and 336,396 female cases were reported during the study period and the male-tofemale incidence ratio was 1.66:1 (ranged from 2.01:1 in 2009 to 1.53:1 in 2014). Most patients were younger than 5 years old, accounting for more than 90% of all reported cases. Scattered children have the highest incidence rate, followed by nursery children and school students. Among 30,377 laboratory confirmed cases, CoxA16, EV71 and other EV accounted for 16.67%, 47.62% and 35.72%, respectively. The predominant pathogen was EV71 in most years except for 2013 and 2015 (Table 1).
The monthly distribution of HFMD cases in Hunan is shown in Fig. 1, which indicates that the major peak appeared between April and July whilst the minor peak observed between September and November. The occurrence of HFMD shows significant seasonality with the wave-like increasing tendency.

Spatial autocorrelation analysis
Global Moran's I statistics was performed to investigate the presence of global autocorrelation, which were listed in Table 2. It clearly shows that the HFMD cases were not randomly distributed and that there was high global autocorrelation among HFMD number at county level in Hunan Province. The results of local autocorrelation analysis were mapped in Fig. 2. The LISA map shows that the high-high clusters (dark red color) were mostly in central and northern districts including Changsha, Yiyang, Loudi, Xiangtan, while southwestern districts of Hunan Province were low-low clusters (dark blue color). Nevertheless, contrary to other cities in western districts, Xiangxi showed a high-high pattern in the LISA map in recent years. Table 3 and Fig. 3 show the scanning results of statistically significant spatial-temporal clusters of HFMD from 2009 to 2015. Seventeen (17) spatial-temporal clusters were detected each year during the study period, including 7 most likely clusters and 10 secondary clusters. All the clusters occurred in the period of the major peak of

Autologistic regression model
Considering the spatial and temporal autocorrelation which existed in Hunan HFMD cases, this study introduced the auto covariates SAuto cov i and TAuto cov i on the basis of the ordinary logistic regression model (see Additional file 1: Table S1-4). The occurrence of the HFMD spatio-temporal cluster in the given county was the dependent variable (It is equal to 1 when the spatiotemporal cluster occurs, otherwise it is 0), while the selected exposure factors were the independent variables (they were continuous variables so should not to be assigned). Table 4 shows the results of the autologistic regression. The OR values for monthly average rainfall, monthly average temperature and monthly average relative humidity are all greater than 1, which indicates that these variables are risk factors that are positively related to the occurrence of HFMD spatial-temporal clusters. On the contrary, the OR value for monthly average wind speed is less than 1.
In addition, the goodness of the model was measured by the Cox & Snell R 2 and Nagelkerke R 2 statistics and the ROC value was used to compare the reality with the predicted outcome. All values were listed in Table 5 which indicated that the autologistic regression model had a better goodness of fit than the ordinary one.

Discussion
In this study, we confirmed that Hunan Province was one of the most serious HFMD epidemic areas in China with the average incidence rate of 194.57 per 100,000, which was much higher than the national average of 139.78 per 100,000 [32]. Furthermore, the incidence rate showed a rapidly increasing trend within the recent 4-year period, suggesting that the prevention and control of HFMD was still a major public health problem in the province. Scattered children under 5 years old accounted for more than 85% of all cases, and the incidence rate was especially high in the 1-year-old group, comparing well with previous reports [33,34]. Poor immunity as well as a lack of available vaccines made it easier for children to succumb to HFMD viruses. Consistent with Deng's reports [3], our study found that incidence rate among males was about 1.66 times greater than females, and this could be attributed to the fact that males spent more time engaged in outdoor activities thereby exposing them to pathogens.
Although EV71 and CoxA16 were still predominant in newly occurring HFMD cases around the world [35,36], our study showed that other EV pathogens ranked first in the recent three-year period. Notably, some other reports have suggested that CoxA6, ECHO30 and CoxA10 were going to be the important causative agents for HFMD [37][38][39]. This means that surveillance systems should look out for new pathogenic strains alongside the regular EV61 and CoxA16 virus to prevent possible outbreaks associated with other new enteroviruses.
Our study showed significant seasonality in the incidence of HFMD (it peaked in April to July and September to November), similar to other studies conducted in Singapore [40] and Taiwan [41]. The first peak might be the result of the warm temperature, abundant rainfall and high atmospheric pressure during the season. Many children enter kindergarten in September thereby increasing the occurrence of the clustering of cases associated with the second minor peak. Thus, measures such as morning checks, case isolation and school closure should be implemented to reduce incidence and spread of HFMD during the high incidence period.  The global spatial autocorrelation analysis showed that the incidence of HFMD in Hunan during the 2009-2015 period was not randomly distributed at county level. The high-high clusters were aggregated in the provincial capital city, Changsha, and its neighboring areas, whereas some counties in rural areas such as Huaihua, Chenzhou and Yongzhou were low-low clusters in the LISA map. These hot spots were mainly located in areas with highly developed economy, high population density, and good health similar to the observations by Deng et al. [3]. Furthermore, spatiotemporal clusters of HFMD cases appeared annually during the study periods. Nearly all the clusters occurred in April to July, which was the same as the major peak of incidence of HFMD in Hunan. Consistent with most previous researches, the most likely clusters stayed almost the same in the urban areas [3,42,43], nevertheless the secondary clusters were diverse during the 2009-2015 period with a shifting trend toward the south and west. According to our study, the occurrence of HFMD spatiotemporal clusters is showing a new trend  which needs further research to provide early warning signals for HFMD epidemics. Although the effect of climatic factors on HFMD incidence has been revealed in many previous researches, there is few supporting studies about the effect of weather conditions on HFMD spatiotemporal clusters. Our autologistic regression analysis results showed that monthly average rainfall (OR = 2.187), monthly average temperature (OR = 4.329) and monthly relative humidity(OR = 2.070) were risk factors for HFMD spatiotemporal clusters, which were in agreement with some HFMD incidence reports [12,44]. Warm weather, abundant rainfall and high humidity may facilitated the reproduction and transmission of enteroviruses, and facilitated the HFMD clusters. We also found that wind speed (OR = 0.258) has negative correlation with HFMD spatiotemporal clusters, and this compares well with observations by Gui et al. [42]. Wind speed may reduce the concentration of viruses per unit volume and could prevent HFMD virus spread through air and direct contact.
Our research showed that autologistic regression model has a higher goodness of fit than the ordinary one. The value of the constant is the prediction residual error of the regression model. The smaller the value is, the better the interpretation function of the model. By introducing the covariates SAuto cov i and TAuto cov i , the constant decreased significantly whilst the p-value of two covariates were less than 0.001,which indicated the existence of spatial and temporal autocorrelation. Thus, autologistic regression model could reduce the inherent residuals and bias of the model and improve the accuracy in evaluating the effect of determinants in spatiotemporal clustering of HFMD.
There are some limitations in this study. Firstly, the quality of case reports in the surveillance system might vary from area to area due to differences in the availability of medical resources [4]. Some mild cases not going to hospitals also resulted in the underreporting of HFMD cases. Secondly, we chose the county as the least spatial unit which might lose some detailed information.  It would have been a good idea to use a more finer areal unit scale such as village or community.

Conclusions
In summary, this study highlighted that the HFMD incidence from 2009 to 2015 in Hunan Province exhibited dynamic spatiotemporal distribution at county level, with the new shifting trend of clustering areas toward south and west. The autologistic regression model has a higher goodness of fit. After controlling for spatialtemporal heterogeneity, some meteorological parameters, especially rainfall, temperature and humidity might be associated with these clusters. Nevertheless, wind speed was the potential protect factors for the spatiotemporal patterns' change of HFMD. Our findings provide information for a better understanding of epidemic trends and subsequently the development of more effective HFMD prevention and control strategies.

Additional file
Additional file 1: Supplementary