- Research article
- Open Access
- Open Peer Review
Using the Kulldorff’s scan statistical analysis to detect spatio-temporal clusters of tuberculosis in Qinghai Province, China, 2009–2016
BMC Infectious Diseases volume 17, Article number: 578 (2017)
Although the incidence of tuberculosis (TB) in most parts of China are well under control now, in less developed areas such as Qinghai, TB still remains a major public health problem. This study aims to reveal the spatio-temporal patterns of TB in the Qinghai province, which could be helpful in the planning and implementing key preventative measures.
We extracted data of reported TB cases in the Qinghai province from the China Information System for Disease Control and Prevention (CISDCP) during January 2009 to December 2016. The Kulldorff’s retrospective space-time scan statistics, calculated by using the discrete Poisson probability model, was used to identify the temporal, spatial, and spatio-temporal clusters of TB at the county level in Qinghai.
A total of 48,274 TB cases were reported from 2009 to 2016 in Qinghai. Results of the Kulldorff’s scan revealed that the TB cases in Qinghai were significantly clustered in spatial, temporal, and spatio-temporal distribution. The most likely spatio-temporal cluster (LLR = 2547.64, RR = 4.21, P < 0.001) was mainly concentrated in the southwest of Qinghai, covering seven counties and clustered in the time frame from September 2014 to December 2016.
This study identified eight significant space-time clusters of TB in Qinghai from 2009 to 2016, which could be helpful in prioritizing resource assignment in high-risk areas for TB control and elimination in the future.
Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis. Over 80% of the new TB cases, globally, were reported in developing countries. According to a World Health Organization report, the TB burden of China is the second largest in the world . In recent years, although the Chinese Government has paid an increasing amount of attention to the control of TB, prevention measures are still insufficient, especially in areas with inadequate medical resources, such as Qinghai, a province where most of the population suffers a high risk of TB even now .
A large number of studies on the spatial and temporal distribution of TB have demonstrated that TB has a highly complex dynamics and is spatially heterogeneous at the provincial, national, and international levels during certain periods of time; however, the variations in small area are always be ignored by using a relatively large scale [3,4,5,6]. In our previous study, the Moran’s I spatial autocorrelation analysis method was used to analyze the TB incidence data from 2009 to 2013 in Qinghai and found that the distribution of TB in this province was not random . However the Global Moran’s I spatial autocorrelation analysis only evaluates the distribution characteristics of the disease in several specific time points. Moreover, this method can not estimate the risk level of high-risk cluster areas [8,9,10,11]. It is a known fact that time is a critical confounder that might directly bias the determination of the high-risk regions of TB. The Kulldorff’s space-time scan statistical method can detect the distribution characteristics in both the temporal and spatial axes, bringing them closer to real-world conditions [12,13,14,15]. Additionally, the relative risk of disease in a cluster area can be estimated by comparing it with the area outside the cluster area. This method has been used wildly in the epidemiology studies of infectious diseases [16,17,18,19].
Analyzing and evaluating the spatio-temporal patterns and trends of TB in Qinghai is necessary for TB control and elimination. In this study, our aim was to use the Kulldorff’s scan statistical analysis to explore the spatial, temporal, and space-time dynamics of TB at the county level in Qinghai.
Qinghai is located in the northwest China and lies to the northeast of the Qinghai-Tibet Plateau. The average altitude is 3000–5000 m. Qinghai comprises eight prefectures, including a total of 46 counties (Fig. 1). The province is comparatively less developed, with a high annual incidence of TB. The total population is about 5.7 million people.
We collected data (2009–2016) of TB cases in Qinghai from the China Information System for Disease Control and Prevention (CISDCP). We also extracted the demographic data of 46 counties from Qinghai’s statistical yearbooks (2010–2017). TB is one of the notifiable infectious diseases in China. It is mandated that each case of TB must be reported online within 24 h after diagnosis in a hospital. Cases of TB were diagnosed using radiography, pathogen detection, and pathologic diagnosis, based on the diagnostic criteria recommended by the National Health and Family Planning Commission of the People’s Republic of China (2008).
A total of 48,274 incident cases of TB and 84 TB-related deaths were reported across 771 hospitals and medical institutions in Qinghai from January 2009 to December 2016. In this study, 48,165 cases, aggregated at the county level monthly, were analyzed to detect the spatio-temporal high-risk areas of TB. And 109 cases without detailed information on the residential address were excluded from the analysis. In order to check the missing reports of TB, we randomly selected 261 from all 771 hospitals and medical institutions, and double checked all the medical records of these selected institutions. No missing cases or outbreaks of TB were recorded.
We used Kulldorff’s space-time scan statistical analysis to detect the temporal, spatial, and space-time clusters of TB, and to verify whether the geographic clustering of TB was caused by random variation or not . Since the population in several areas was very small, we used the radius of the population coverage instead of the geographical radius. The discrete Poisson probability model was used for scanning since the TB incidence was not very high . The window with the maximum likelihood is defined as the most likely cluster area, and other clusters with statistically significant log-likelihood ratios (LLR) were defined as the secondary potential clusters. The P-values of LLR were estimated through 9999 Monte Carlo simulations [16, 18]. A P-value <0.05 indicates a significantly high risk inside of the scan window, which might be a potential cluster of a high risk of TB. The relative risk (RR) of TB in each cluster was calculated to evaluate the risk of TB in the cluster areas [12, 21, 22].
The results of spatio-temporal scan are sensitive to various parameters, like the maximum cluster sizes of spatial and temporal. Thus, the selection of the maximum radius of the spatial scanning window and the maximum length of the temporal scanning window were very important . In order to select optimal parameters, we analyzed the data of 2009 using the maximum spatial cluster sizes from 4% to 50% of total population at risk by increments of 1%. The radius was considered as an optimal radius for analysis if there were fewer overlaps between the areas defined by the radius, and the biggest area covered less than seven counties or 15% of all the counties [24, 25]. Similarly, we found an optimal temporal cluster size by testing the maximum temporal cluster sizes from 10% to 50% of the total study period by increments of 1% to analyze the data of the preceding 5 years (2009–2013). Based on the the optimal spatio-temporal parameters, retrospective space-time scanning analysis was applied to identify the geographic areas and time periods of potential clusters with significantly higher TB incidents than that of nearby areas.
We also used Global Moran’s I spatial autocorrelation analysis to depict the spatial clustering of annual TB incidence at the county level. The Moran’s I > 0, = 0, and <0 indicate a positive spatial autocorrelation, random distribution, and negative spatial autocorrelation, respectively .
Additionally, we conducted time series seasonal decomposition analysis to identify the seasonality of TB incidence in Qinghai province [26,27,28]. The seasonal index was also calculated to examine the seasonal pattern of TB. The index was calculated as the ratio of the average number of cases for a given month to the average monthly incidents of 8*12 months (2009–2016). An index value close to 1.0 indicates no seasonal trend .
The SaTScan™ software (v 9.4.1, Kulldorff and Information Management Services, Inc.) was used for spatial, temporal, and spatio-temporal analyses. Then, we used ArcGIS (v10.2.2, ESRI Inc., Redlands, CA, USA) to visualize the relative risk of TB in high-risk cluster areas. Open GeoDa software (Arizona State University, AZ, USA) was used for Global Moran’s I spatial autocorrelation analysis. P < 0.05 indicates a statistical significance.
Determination of the optimal time window for temporal scanning
We conducted temporal scanning by using the time window with the length that covers 10–50% of the total study period, by increments of 1%. The scanning results indicated that the high-risk cluster of TB was predominantly concentrated in the time period between January 2012 and May 2013 (Additional file 1: Table S1). Therefore, the maximum temporal cluster size was set as 30% in this study. For each year, the maximum scan time length was 3 months.
Determination of the optimal space window for spatial scanning
In order to detect an appropriate space window for the spatial scan, we conducted several times of spatial scanning using different maximum circular spatial windows. We started from the radius covering from 4% to 50% of the population, increasing by 1% each time. The results are shown in Additional file 1: Fig. S1. When the maximum spatial scan size was set to 8–50%, the high-risk clustering areas overlapped, and the most likely cluster covered more than 15% of all the counties. While for the sizes of 4–7%, selfsame area was detected as the most likely cluster area, but the secondary clusters were slightly different. The cluster areas identified using the windows of 6–7% covered the largest high-incidence areas. According to the Venn diagram (Fig. 2), we finally set 7% as the maximum circular spatial window for spatial scanning, covering a population of 0.392 million. Two counties, Huangzhong and Datong, were not included in the scanning window. The incidence rates of TB in these two counties were relatively low. Therefore, the exclusion would not be influential.
Distribution of TB temporal clustering
The time series seasonal decomposition analysis of TB incidents showed a significant seasonal periodicity, but the seasonal trend was not obvious between 2014 and 2015 (Fig. 3a and b). This was consistent with the result of the seasonal index (Fig. 3e). The maximum seasonal index value was 1.23 in March, and the value appeared to be less than 1.0 after July. There was a slowly increase trend for TB incidents from 2009 to 2016 (Fig. 3c). The temporal cluster analysis also showed that TB incidents were mainly concentrated in the spring and early summer, annually, ranged from January to May. The high aggregated period for TB was observed in all districts from January 2015 to August 2016 (LLR = 353.28, P < 0.05). During this period, a total of 12,746 TB cases were reported, and the risk of TB related incidents was 32% (RR = 1.32) higher than that in other time periods (Table 1).
Distribution of TB spatial clustering
Spatial clustering analysis of the entire 8 years identified a total of nine statistically significant high-risk areas, covering a total of 21 counties. Similarly, the Global Moran’s I values of each year at the county level also indicated a positive spatial autocorrelation in Qinghai, ranging from 0.398 to 0.631 (all P < 0.05). The high-risk areas with a relative risk greater than three, including the most likely cluster area and two secondary cluster areas, were mainly concentrated in the southwest of Qinghai. The center of the most likely cluster area was located in Dari County, 33.48° N and 99.41° E (LLR = 3225.58, P < 0.001). This circular area covered six counties with a radius of 184.91 km, including Dari, Gande, Maqin, Banma, Jiuzhi, and Maduo. The total number of TB cases was 5408, and the risk of TB related incidents was 2.97 times (RR = 3.97) higher than that outside this area (Table 2 and Additional file 1: Fig. S2). The incidences of TB inside the cluster areas were significantly higher than that in the areas outside every year (Table 3).
Distribution of TB spatio-temporal clustering
The results of spatio-temporal cluster analysis suggested a special characteristic in temporal and spatial distribution for TB incidents in Qinghai. We detected a most likely cluster area and seven secondary cluster areas by using temporal and spatial scanning (Fig. 4). The most likely spatio-temporal cluster area was located at the southwest of the province, and the high-risk period was from September 2014 to December 2016 (LLR = 2547.64, P < 0.001). The center of this area was in Yushu County, 32.91° N and 96.68° E, which was a circular area with a radius of 261.34 km, covering seven counties: Yushu, Nangqian, Chengduo, Zaduo, Maduo, Qumalai, and Dari. A total of 3916 TB cases were reported in this area during the high-risk period, and the RR was 4.21 (Table 4).
In this study, spatial patterns and the secular trends of TB in Qinghai from 2009 to 2016 were examined using the Kulldorff’s scan statistical analysis. To the best of our knowledge, no other similar study has been done in this area. Our study demonstrated that there was a significantly space-time clustering in distribution of TB incidents in Qinghai province. The high-risk areas were mainly concentrated in the southwest Qinghai, and the temporal clusters were mainly concentrated in spring and early summer.
Kulldorff’s retrospective scan statistics take multiple testing problems into account, which is known as the most powerful method for evaluating geographical and temporal distribution by using routinely collected data . This method has been used worldwide to detect the clusters of diseases [2, 4, 5, 30, 31]. As is known, in the temporal and spatial model, selection of a suitable time window and spatial window was very important for model identification. Currently, there are two methods for selecting the size of spatial window: one is based on the geographical area, and the other method is based on the population size covered by the scanning area . In this study, we used the radius of population coverage, because deviation of population in different counties was very large. Similarly, size of time scanning window is another important parameter for analysis. Generally, the window size of time was set as 50% of the entire time period of the study. However, there exists some evidence with regard to whether or not this window size is reasonable . Yue Ma et al. conducted a simulation study to explore how to choose an appropriate scanning window. They found that the window might be too large to include the low-risk area if the window covered 50% of the population . Therefore, this situation might lead to a high false positive rate. However, the window which covered a smaller population might be too small to detect the real high-risk area, and the high-risk area would be separated. Thus, the high false negative rate would be an issue. Tango and Takahashi suggested that when using the irregular scan statistic to detect the aggregated region, the coverage area of a single region should not be more than 15% of the whole study area [24, 25]. In addition, several studies also suggested choosing an appropriate window which could identify the cluster areas with less overlap . Based on the Tango’s criteria, we analyzed the data for many times by using one window value at a time. Finally, we selected the temporal window covering 30% of whole study period and the spatial window covering 7% of the population at risk. And the overlapping of the identified high-risk clusters was not observed.
Our temporal scanning results indicated that there was a high-risk period for TB incidents every year and during the entire 8 years, which mainly occurred in spring and early summer, from January to May. As is known, during winter, the reduction in exposure to ultraviolet rays from sunlight and the poor ventilation in indoor settings may increase the incidences of TB infections. Additionally, in the case of infectious diseases, time is needed for the symptoms to develop and patients may lack the knowledge on where to seek care for TB. All these factors may delay TB diagnosis and treatment . Some studies reported that the average incubation period of TB infection ranges from 4 to 8 weeks, with a 2-month interval from the appearance of symptoms to medical diagnosis . Therefore, the high-risk periods in spring and early summer complied with the disease characteristics. Such seasonal patterns were consistent with the previous studies done in Yunnan province, where is the registration peak of TB cases during spring .
Additionally, spatial scanning results displayed statistically significant 5–7 cluster areas for TB diseases in Qinghai each year, which were similar to the results of our previous study . Compared with the median incidence rate of TB (69/100000) out of the cluster areas, TB incidence in our identified TB cluster was higher than 263/100000. This indirectly suggested a relatively high sensitivity of this scanning method. The spatio-temporal model used in this study simultaneously considered time and space distributions. Compared with the separated spatial scanning model and temporal scanning model, the time-space scanning makes a conclusion more closely to the real-world situation. Using this model to detect the spatio-temporal distribution of TB in Qinghai, from 2009 to 2016, we found that the high-risk counties were concentrated in the southwest Qinghai, from September 2014 to December 2016. During this period, the risk of TB infection in these areas was obviously higher than in other areas, especially in the Zhiduo County. In these areas, the inhabitants have very low income, as well as poorer living conditions and sanitation compared to the eastern region of Qinghai. Many studies have showed that poverty is one of the most important social factors responsible for the high prevalence of TB, and also the socio-economic status may contribute to the high risk of TB . Our result indicated that further prevention and special TB control strategies should be considered in relation with the economical and sanitary level in the clustered areas.
Our study also demonstrated the usefulness of spatial and temporal clustering analysis using the ArcGIS and SaTScan to identify the significant space-time clusters of TB in Qinghai. This could be used to provide strategies for TB prevention at the county level. However, the study had limitations on analysis. First, it is important to note that the data were analyzed at the county level, which is not the smallest unit of administrative regionalization. Thus, we may exclude several critical factors. Second, the influence of weather and socio-economic factors were not included in this study.
Our study analyzed the spatial, temporal, and space-time clusters of TB incidents at the county level in Qinghai, from 2009 to 2016, using the Kulldorff’s retrospective scan statistic methods. The spatial and temporal clusters were statistically significant every year, and the space-time scanning result indicated eight high-risk areas for TB incidents which were predominantly located in the southwest Qinghai. These results suggested that it is urgent to establish the preventive and controlling strategies to decrease the TB incidence in Qinghai by Qinghai government and the Center for Disease Control and Prevention.
China Information System for Disease Control and Prevention
Geographical information system
- LLR :
- RR :
Zumla A, George A, Sharma V, Herbert RH, Baroness Masham of I, Oxley A, et al. The WHO 2014 global tuberculosis report--further to go. Lancet Glob Health. 2015;3(1):e10–2.
Zhao F, Cheng S, He G, Huang F, Zhang H, Xu B, et al. Space-time clustering characteristics of tuberculosis in China, 2005-2011. PLoS One. 2013;8(12):e83605.
Cao K, Yang K, Wang C, Guo J, Tao L, Liu Q, et al. Spatial-Temporal Epidemiology of Tuberculosis in Mainland China: An Analysis Based on Bayesian Theory. Int J Environ Res Public Health. 2016;13(5):e469.
Ge E, Zhang X, Wang X, Wei X. Spatial and temporal analysis of tuberculosis in Zhejiang Province, China, 2009-2012. Infect Dis Poverty. 2016;5:11.
Areias C, Briz T, Nunes C. Pulmonary tuberculosis space-time clustering and spatial variation in temporal trends in Portugal, 2000-2010: an updated analysis. Epidemiol Infect. 2015;143(15):3211–9.
Middelkoop K, Bekker LG, Morrow C, Zwane E, Wood R. Childhood tuberculosis infection and disease: a spatial and temporal transmission analysis in a South African township. S Afr Med J. 2009;99(10):738–43.
Rao HX, Zhang X, Zhao L, Yu J, Ren W, Zhang XL, et al. Spatial transmission and meteorological determinants of tuberculosis incidence in Qinghai Province, China: a spatial clustering panel analysis. Infect Dis Poverty. 2016;5(1):45.
Sadeq M. Spatial patterns and secular trends in human leishmaniasis incidence in Morocco between 2003 and 2013. Infect Dis Poverty. 2016;5(1):48.
Varga C, Pearl DL, McEwen SA, Sargeant JM, Pollari F, Guerin MT. Area-level global and local clustering of human Salmonella Enteritidis infection rates in the city of Toronto, Canada, 2007-2009. BMC Infect Dis. 2015;15:359.
Lopez D, Gunasekaran M, Murugan BS, Kaur H, Abbas KM. Spatial Big Data Analytics of Influenza Epidemic in Vellore. India Proc IEEE Int Conf Big Data. 2014;2014:19–24.
Zulu LC, Kalipeni E, Johannes E. Analyzing spatial clustering and the spatiotemporal nature and trends of HIV/AIDS prevalence using GIS: the case of Malawi, 1994-2010. BMC Infect Dis. 2014;14:285.
Xia J, Cai S, Zhang H, Lin W, Fan Y, Qiu J, et al. Spatial, temporal, and spatiotemporal analysis of malaria in Hubei Province, China from 2004-2011. Malar J. 2015;14:145.
Vieira CP, Oliveira AM, Rodas LA, Dibo MR, Guirado MM, Chiaravalloti NF. Temporal, spatial and spatiotemporal analysis of the occurrence of visceral leishmaniasis in humans in the City of Birigui, State of Sao Paulo, from 1999 to 2012. Rev Soc Bras Med Trop. 2014;47(3):350–8.
Qian H, Huo D, Wang X, Jia L, Li X, Li J, et al. Detecting spatial-temporal cluster of hand foot and mouth disease in Beijing, China, 2009-2014. BMC Infect Dis. 2016;16:206.
Kulldorff M, Nagarwalla N. Spatial disease clusters: detection and inference. Stat Med. 1995;14(8):799–810.
Abbas T, Younus M, Muhammad SA. Spatial cluster analysis of human cases of Crimean Congo hemorrhagic fever reported in Pakistan. Infect Dis Poverty. 2015;4:9.
Zhang Y, Shen Z, Ma C, Jiang C, Feng C, Shankar N, et al. Cluster of human infections with avian influenza A (H7N9) cases: a temporal and spatial analysis. Int J Environ Res Public Health. 2015;12(1):816–28.
Alemu K, Worku A, Berhane Y. Malaria infection has spatial, temporal, and spatiotemporal heterogeneity in unstable malaria transmission areas in northwest Ethiopia. PLoS One. 2013;8(11):e79966.
Coleman M, Coleman M, Mabuza AM, Kok G, Coetzee M, Durrheim DN. Using the SaTScan method to detect local malaria clusters for guiding malaria control programmes. Malar J. 2009;8:68.
Jones SG, Kulldorff M. Influence of spatial resolution on space-time disease cluster detection. PLoS One. 2012;7(10):e48036.
Hjalmars U, Kulldorff M, Gustafsson G, Nagarwalla N. Childhood leukaemia in Sweden: using GIS and a spatial scan statistic for cluster detection. Stat Med. 1996;15(7–9):707–15.
Huang L, Kulldorff M, Gregorio D. A spatial scan statistic for survival data. Biometrics. 2007;63(1):109–18.
Ma Y, Yin F, Zhang T, Zhou XA, Li X. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic. PLoS One. 2016;11(1):e0147918.
Tango T, Takahashi K. A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geogr. 2005;4:11.
Tango T, Takahashi K. A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters. Stat Med. 2012;31(30):4207–18.
Xu Z, Hu W, Zhang Y, Wang X, Zhou M, Su H, et al. Exploration of diarrhoea seasonality and its drivers in China. Sci Rep. 2015;5:8241.
Willis MD, Winston CA, Heilig CM, Cain KP, Walter ND, Mac Kenzie WR. Seasonality of tuberculosis in the United States, 1993-2008. Clin Infect Dis. 2012;54(11):1553–60.
Xu Z, Hu W, Su H, Turner LR, Ye X, Wang J, et al. Extreme temperatures and paediatric emergency department admissions. J Epidemiol Community Health. 2014;68(4):304–11.
Zhang Q, Lai S, Zheng C, Zhang H, Zhou S, Hu W, et al. The epidemiology of Plasmodium vivax and Plasmodium falciparum malaria in China, 2004-2012: from intensified control to elimination. Malar J. 2014;13:419.
Wang WL, Wang HJ, Deng Y, Song T, Lan JM, Wu GZ, et al. Serological Study of An Imported Case of Middle East Respiratory Syndrome and His Close Contacts in China, 2015. Biomed Environ Sci. 2016;29(3):219–23.
Zhang WY, Wang LY, Liu YX, Yin WW, Hu WB, Magalhaes RJ, et al. Spatiotemporal transmission dynamics of hemorrhagic fever with renal syndrome in China, 2005-2012. PLoS Negl Trop Dis. 2014;8(11):e3344.
Wang LY, Zhang WY, Ding F, Hu WB, Soares Magalhaes RJ, Sun HL, et al. Spatiotemporal patterns of Japanese encephalitis in China, 2002-2010. PLoS Negl Trop Dis. 2013;7(6):e2285.
Li XX, Wang LX, Zhang H, Du X, Jiang SW, Shen T, et al. Seasonal variations in notification of active tuberculosis cases in China, 2005-2012. PLoS One. 2013;8(7):e68102.
Huang L, Li XX, Abe EM, Xu L, Ruan Y, Cao CL, et al. Spatial-temporal analysis of pulmonary tuberculosis in the northeast of the Yunnan province, People's Republic of China. Infect Dis Poverty. 2017;6(1):53.
The authors have no external support of funding to report. This study was supported by the Qinghai Center for Disease Control and Prevention (CDC). The authors gratefully acknowledge the staff involved in TB surveillance at all participating levels in Qinghai Province, China.
The authors have no support or funding to report.
Availability of data and materials
The dataset analyzed during the current study, while not publicly available, is available from the corresponding author on reasonable request.
Ethics approval and consent to participate
This retrospective study was consulted to the Ethics Review Board of the Qinghai Center for Disease Control and Prevention. Ethics approval was not available in this study because we did not include any data of patients’ personal or health information, including name, identity information, address, telephone number, etc.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Huaxiang Rao and Xinyu Shi are co-first authors
Table S1. Temporal clustering of TB incidents monthly in Qinghai, China, 2009–2013. We set the maximum size for temporal scanning to be 18 months, nearly 30% of the total study period, by which the scan result was best to fit the raw time-series data of TB incidents. Fig. S1. Spatial clustering of TB incidents at the county level in Qinghai, China, 2009. Fig. S2. Spatial clustering of TB incidents at the county level in Qinghai, China, 2009–2016. (DOCX 682 kb)