Using the Kulldorff’s scan statistical analysis to detect spatio-temporal clusters of tuberculosis in Qinghai Province, China, 2009–2016

Background Although the incidence of tuberculosis (TB) in most parts of China are well under control now, in less developed areas such as Qinghai, TB still remains a major public health problem. This study aims to reveal the spatio-temporal patterns of TB in the Qinghai province, which could be helpful in the planning and implementing key preventative measures. Methods We extracted data of reported TB cases in the Qinghai province from the China Information System for Disease Control and Prevention (CISDCP) during January 2009 to December 2016. The Kulldorff’s retrospective space-time scan statistics, calculated by using the discrete Poisson probability model, was used to identify the temporal, spatial, and spatio-temporal clusters of TB at the county level in Qinghai. Results A total of 48,274 TB cases were reported from 2009 to 2016 in Qinghai. Results of the Kulldorff’s scan revealed that the TB cases in Qinghai were significantly clustered in spatial, temporal, and spatio-temporal distribution. The most likely spatio-temporal cluster (LLR = 2547.64, RR = 4.21, P < 0.001) was mainly concentrated in the southwest of Qinghai, covering seven counties and clustered in the time frame from September 2014 to December 2016. Conclusion This study identified eight significant space-time clusters of TB in Qinghai from 2009 to 2016, which could be helpful in prioritizing resource assignment in high-risk areas for TB control and elimination in the future. Electronic supplementary material The online version of this article (doi:10.1186/s12879-017-2643-y) contains supplementary material, which is available to authorized users.


Background
Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis. Over 80% of the new TB cases, globally, were reported in developing countries. According to a World Health Organization report, the TB burden of China is the second largest in the world [1]. In recent years, although the Chinese Government has paid an increasing amount of attention to the control of TB, prevention measures are still insufficient, especially in areas with inadequate medical resources, such as Qinghai, a province where most of the population suffers a high risk of TB even now [2].
A large number of studies on the spatial and temporal distribution of TB have demonstrated that TB has a highly complex dynamics and is spatially heterogeneous at the provincial, national, and international levels during certain periods of time; however, the variations in small area are always be ignored by using a relatively large scale [3][4][5][6]. In our previous study, the Moran's I spatial autocorrelation analysis method was used to analyze the TB incidence data from 2009 to 2013 in Qinghai and found that the distribution of TB in this province was not random [7]. However the Global Moran's I spatial autocorrelation analysis only evaluates the distribution characteristics of the disease in several specific time points. Moreover, this method can not estimate the risk level of high-risk cluster areas [8][9][10][11]. It is a known fact that time is a critical confounder that might directly bias the determination of the high-risk regions of TB. The Kulldorff 's space-time scan statistical method can detect the distribution characteristics in both the temporal and spatial axes, bringing them closer to real-world conditions [12][13][14][15]. Additionally, the relative risk of disease in a cluster area can be estimated by comparing it with the area outside the cluster area. This method has been used wildly in the epidemiology studies of infectious diseases [16][17][18][19].
Analyzing and evaluating the spatio-temporal patterns and trends of TB in Qinghai is necessary for TB control and elimination. In this study, our aim was to use the Kulldorff's scan statistical analysis to explore the spatial, temporal, and space-time dynamics of TB at the county level in Qinghai.

Methods
Qinghai is located in the northwest China and lies to the northeast of the Qinghai-Tibet Plateau. The average altitude is 3000-5000 m. Qinghai comprises eight prefectures, including a total of 46 counties (Fig. 1). The province is comparatively less developed, with a high annual incidence of TB. The total population is about 5.7 million people.

TB data
We collected data (2009-2016) of TB cases in Qinghai from the China Information System for Disease Control and Prevention (CISDCP). We also extracted the demographic data of 46 counties from Qinghai's statistical yearbooks (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). TB is one of the notifiable infectious diseases in China. It is mandated that each case of TB must be reported online within 24 h after diagnosis in a hospital. Cases of TB were diagnosed using radiography, pathogen detection, and pathologic diagnosis, based on the diagnostic criteria recommended by the National Health and Family Planning Commission of the People's Republic of China (2008).
A total of 48,274 incident cases of TB and 84 TB-related deaths were reported across 771 hospitals and medical institutions in Qinghai from January 2009 to December 2016. In this study, 48,165 cases, aggregated at the county level monthly, were analyzed to detect the spatio-temporal high-risk areas of TB. And 109 cases without detailed information on the residential address were excluded from the analysis. In order to check the missing reports of TB, we randomly selected 261 from all 771 hospitals and medical institutions, and double checked all the medical records of these selected institutions. No missing cases or outbreaks of TB were recorded.

Statistical methods
We used Kulldorff's space-time scan statistical analysis to detect the temporal, spatial, and space-time clusters of TB, and to verify whether the geographic clustering of TB was caused by random variation or not [20]. Since the population in several areas was very small, we used the radius of the population coverage instead of the geographical radius. The discrete Poisson probability model was used for scanning since the TB incidence was not very high [18]. The window with the maximum likelihood is defined as the most likely cluster area, and other clusters with statistically significant log-likelihood ratios (LLR) were defined as the secondary potential clusters. The Pvalues of LLR were estimated through 9999 Monte Carlo simulations [16,18]. A P-value <0.05 indicates a significantly high risk inside of the scan window, which might be a potential cluster of a high risk of TB. The relative risk (RR) of TB in each cluster was calculated to evaluate the risk of TB in the cluster areas [12,21,22].
The results of spatio-temporal scan are sensitive to various parameters, like the maximum cluster sizes of spatial and temporal. Thus, the selection of the maximum radius of the spatial scanning window and the maximum length of the temporal scanning window were very important [23]. In order to select optimal parameters, we analyzed the data of 2009 using the maximum spatial cluster sizes from 4% to 50% of total population at risk by increments of 1%. The radius was considered as an optimal radius for analysis if there were fewer overlaps between the areas defined by the radius, and the biggest area covered less than seven counties or 15% of all the counties [24,25]. Similarly, we found an optimal temporal cluster size by testing the maximum temporal cluster sizes from 10% to 50% of the total study period by increments of 1% to analyze the data of the preceding 5 years (2009-2013). Based on the the optimal spatio-temporal parameters, retrospective space-time scanning analysis was applied to identify the geographic areas and time periods of potential clusters with significantly higher TB incidents than that of nearby areas.
We also used Global Moran's I spatial autocorrelation analysis to depict the spatial clustering of annual TB incidence at the county level. The Moran's I > 0, = 0, and <0 indicate a positive spatial autocorrelation, random distribution, and negative spatial autocorrelation, respectively [12].
Additionally, we conducted time series seasonal decomposition analysis to identify the seasonality of TB incidence in Qinghai province [26][27][28]. The seasonal index was also calculated to examine the seasonal pattern of TB. The index was calculated as the ratio of the average number of cases for a given month to the average monthly incidents of 8*12 months (2009-2016). An index value close to 1.0 indicates no seasonal trend [29].
The SaTScan™ software (v 9.4.1, Kulldorff and Information Management Services, Inc.) was used for spatial, temporal, and spatio-temporal analyses. Then, we used ArcGIS (v10.2.2, ESRI Inc., Redlands, CA, USA) to visualize the relative risk of TB in high-risk cluster areas. Open GeoDa software (Arizona State University, AZ, USA) was used for Global Moran's I spatial autocorrelation analysis. P < 0.05 indicates a statistical significance.

Determination of the optimal time window for temporal scanning
We conducted temporal scanning by using the time window with the length that covers 10-50% of the total study period, by increments of 1%. The scanning results indicated that the high-risk cluster of TB was predominantly concentrated in the time period between January 2012 and May 2013 (Additional file 1: Table S1). Therefore, the maximum temporal cluster size was set as 30% in this study. For each year, the maximum scan time length was 3 months.
Determination of the optimal space window for spatial scanning In order to detect an appropriate space window for the spatial scan, we conducted several times of spatial scanning using different maximum circular spatial windows. We started from the radius covering from 4% to 50% of the population, increasing by 1% each time. The results are shown in Additional file 1: Fig. S1. When the maximum spatial scan size was set to 8-50%, the high-risk clustering areas overlapped, and the most likely cluster covered more than 15% of all the counties. While for the sizes of 4-7%, selfsame area was detected as the most likely cluster area, but the secondary clusters were slightly different. The cluster areas identified using the windows of 6-7% covered the largest high-incidence areas. According to the Venn diagram (Fig. 2), we finally set 7% as the maximum circular spatial window for spatial scanning, covering a population of 0.392 million. Two counties, Huangzhong and Datong, were not included in the scanning window. The incidence rates of TB in these two counties were relatively low. Therefore, the exclusion would not be influential.

Distribution of TB temporal clustering
The time series seasonal decomposition analysis of TB incidents showed a significant seasonal periodicity, but the seasonal trend was not obvious between 2014 and 2015 ( Fig. 3a and b). This was consistent with the result of the seasonal index (Fig. 3e). The maximum seasonal index value was 1.23 in March, and the value appeared to be less than 1.0 after July. There was a slowly increase trend for TB incidents from 2009 to 2016 (Fig. 3c). The temporal cluster analysis also showed that TB incidents were mainly concentrated in the spring and early summer, annually, ranged from January to May. The high aggregated period for TB was observed in all districts from January 2015 to August 2016 (LLR = 353.28, P < 0.05). During this period, a total of 12,746 TB cases were reported, and the risk of TB related incidents was 32% (RR = 1.32) higher than that in other time periods (Table 1).

Distribution of TB spatial clustering
Spatial clustering analysis of the entire 8 years identified a total of nine statistically significant high-risk areas, covering a total of 21 counties. Similarly, the Global Moran's I values of each year at the county level also indicated a positive spatial autocorrelation in Qinghai, ranging from 0.398 to 0.631 (all P < 0.05). The high-risk areas with a relative risk greater than three, including the most likely cluster area and two secondary cluster areas, were mainly concentrated in the southwest of Qinghai. The center of the most likely cluster area was located in Dari County, 33.48°N and 99.41°E (LLR = 3225.58, P < 0.001). This circular area covered six counties with a radius of 184.91 km, including Dari, Gande, Maqin, Banma, Jiuzhi, and Maduo. The total number of TB cases was 5408, and the risk of TB related incidents was 2.97 times (RR = 3.97) higher than that outside this area (Table 2 and Additional file 1: Fig. S2). The incidences of TB inside the cluster areas were significantly higher than that in the areas outside every year ( Table 3).

Distribution of TB spatio-temporal clustering
The results of spatio-temporal cluster analysis suggested a special characteristic in temporal and spatial distribution for TB incidents in Qinghai. We detected a most likely cluster area and seven secondary cluster areas by Fig. 2 Venn diagram of spatial clustering for TB incidents in Qinghai, China, 2009. The scan window used in this analysis was set to be 7% of population using temporal and spatial scanning (Fig. 4). The most likely spatio-temporal cluster area was located at the southwest of the province, and the high-risk period was from September 2014 to December 2016 (LLR = 2547.64, P < 0.001). The center of this area was in Yushu County, 32.91°N and 96.68°E, which was a circular area with a radius of 261.34 km, covering seven counties: Yushu, Nangqian, Chengduo, Zaduo, Maduo, Qumalai, and Dari. A total of 3916 TB cases were reported in this area during the high-risk period, and the RR was 4.21 (Table 4). A seasonal trend was decomposed from the time-series of TB incidents; c: A long-term trend was decomposed from the time-series of TB incidents; d: The residual data after excluding of seasonality and a long-term trend; e: Estimated seasonal index of 12 months ranged from 0.80 to 1.23, and the maximum value was recorded in March

Discussion
In this study, spatial patterns and the secular trends of TB in Qinghai from 2009 to 2016 were examined using the Kulldorff's scan statistical analysis. To the best of our knowledge, no other similar study has been done in this area. Our study demonstrated that there was a significantly space-time clustering in distribution of TB incidents in Qinghai province. The high-risk areas were mainly concentrated in the southwest Qinghai, and the temporal clusters were mainly concentrated in spring and early summer.
Kulldorff's retrospective scan statistics take multiple testing problems into account, which is known as the most powerful method for evaluating geographical and temporal distribution by using routinely collected data [18]. This method has been used worldwide to detect the clusters of diseases [2,4,5,30,31]. As is known, in the temporal and spatial model, selection of a suitable time window and spatial window was very important for model identification. Currently, there are two methods for selecting the size of spatial window: one is based on the geographical area, and the other method is based on the population size covered by the scanning area [16]. In this study, we used the radius of population coverage, because deviation of population in different counties was very large. Similarly, size of time scanning window is another important parameter for analysis. Generally, the window size of time was set as 50% of the entire time period of the study. However, there exists some evidence with regard to whether or not this window size is reasonable [32]. Yue Ma et al. conducted a simulation study to explore how to choose an appropriate scanning window. They found that the window might be too large to include the low-risk area if the window covered 50% of the population [23]. Therefore, this situation might lead to a high false positive rate. However, the window which covered a smaller population might be too small to detect the real high-risk area, and the high-risk area would be separated. Thus, the high false negative rate would be an issue. Tango and Takahashi suggested that when using the irregular scan statistic to detect the aggregated region, the coverage area of a single region should not be more than 15% of the whole study area [24,25]. In addition, several studies also suggested choosing an appropriate window which could identify the cluster areas with less overlap [4]. Based on the Tango's criteria, we analyzed the data for many times by using one window value at a time. Finally, we selected the temporal window covering 30% of whole study period and the spatial window covering 7% of the population at risk. And the overlapping of the identified high-risk clusters was not observed.
Our temporal scanning results indicated that there was a high-risk period for TB incidents every year and during the entire 8 years, which mainly occurred in spring and early summer, from January to May. As is known, during winter, the reduction in exposure to ultraviolet rays from sunlight and the poor ventilation in   indoor settings may increase the incidences of TB infections. Additionally, in the case of infectious diseases, time is needed for the symptoms to develop and patients may lack the knowledge on where to seek care for TB. All these factors may delay TB diagnosis and treatment [4]. Some studies reported that the average incubation period of TB infection ranges from 4 to 8 weeks, with a 2-month interval from the appearance of symptoms to medical diagnosis [33]. Therefore, the high-risk periods in spring and early summer complied with the disease characteristics. Such seasonal patterns were consistent with the previous studies done in Yunnan province, where is the registration peak of TB cases during spring [34]. Additionally, spatial scanning results displayed statistically significant 5-7 cluster areas for TB diseases in Qinghai each year, which were similar to the results of our previous study [7]. Compared with the median incidence rate of TB (69/100000) out of the cluster areas, TB incidence in our identified TB cluster was higher than 263/100000. This indirectly suggested a relatively high sensitivity of this scanning method. The spatiotemporal model used in this study simultaneously considered time and space distributions. Compared with the separated spatial scanning model and temporal scanning model, the time-space scanning makes a conclusion more closely to the real-world situation. Using this model to detect the spatio-temporal distribution of TB in Qinghai, from 2009 to 2016, we found that the highrisk counties were concentrated in the southwest Qinghai, from September 2014 to December 2016. During this period, the risk of TB infection in these areas was obviously higher than in other areas, especially in the Zhiduo County. In these areas, the inhabitants have very low income, as well as poorer living conditions and sanitation compared to the eastern region of Qinghai. Many studies have showed that poverty is one of the most important  social factors responsible for the high prevalence of TB, and also the socio-economic status may contribute to the high risk of TB [4]. Our result indicated that further prevention and special TB control strategies should be considered in relation with the economical and sanitary level in the clustered areas.
Our study also demonstrated the usefulness of spatial and temporal clustering analysis using the ArcGIS and SaTScan to identify the significant space-time clusters of TB in Qinghai. This could be used to provide strategies for TB prevention at the county level. However, the study had limitations on analysis. First, it is important to note that the data were analyzed at the county level, which is not the smallest unit of administrative regionalization. Thus, we may exclude several critical factors. Second, the influence of weather and socio-economic factors were not included in this study.

Conclusions
Our study analyzed the spatial, temporal, and space-time clusters of TB incidents at the county level in Qinghai, from 2009 to 2016, using the Kulldorff's retrospective scan statistic methods. The spatial and temporal clusters were statistically significant every year, and the spacetime scanning result indicated eight high-risk areas for TB incidents which were predominantly located in the southwest Qinghai. These results suggested that it is urgent to establish the preventive and controlling strategies to decrease the TB incidence in Qinghai by Qinghai government and the Center for Disease Control and Prevention.

Additional file
Additional file 1: Table S1. Temporal clustering of TB incidents monthly in Qinghai, China, 2009-2013. We set the maximum size for temporal scanning to be 18 months, nearly 30% of the total study period, by which the scan result was best to fit the raw time-series data of TB incidents.