Investigating the effects of climate variations on bacillary dysentery incidence in northeast China using ridge regression and hierarchical cluster analysis

Background The effects of climate variations on bacillary dysentery incidence have gained more recent concern. However, the multi-collinearity among meteorological factors affects the accuracy of correlation with bacillary dysentery incidence. Methods As a remedy, a modified method to combine ridge regression and hierarchical cluster analysis was proposed for investigating the effects of climate variations on bacillary dysentery incidence in northeast China. Results All weather indicators, temperatures, precipitation, evaporation and relative humidity have shown positive correlation with the monthly incidence of bacillary dysentery, while air pressure had a negative correlation with the incidence. Ridge regression and hierarchical cluster analysis showed that during 1987–1996, relative humidity, temperatures and air pressure affected the transmission of the bacillary dysentery. During this period, all meteorological factors were divided into three categories. Relative humidity and precipitation belonged to one class, temperature indexes and evaporation belonged to another class, and air pressure was the third class. Conclusion Meteorological factors have affected the transmission of bacillary dysentery in northeast China. Bacillary dysentery prevention and control would benefit from by giving more consideration to local climate variations.


Background
Shigella infections remain an important public health problem in China, especially among children and old people [1,2]. Bacillary dysentery is an infectious disease of the intestinal tract caused by bacteria of the genus Shigella and is spread by contact with patients or carriers and though food or water contaminated by their feces. The increases in temperature have been observed in most countries and China is no exception [3]. The effects of climate variations on bacillary dysentery incidence have gained more concerns recently. Several studies have explored the association between diarrhoeal diseases and climate variation [4,5]. For example, temperature, rainfall and relative humidity directly affect the rate of replication of bacterial and protozoan pathogens, they may also have an impact on the environmental reservoirs. Many existing studies neglect the multi-collinearity among meteorological factors which affected the accuracy of correlation with bacillary dysentery incidence.
As a remedy, the improved method, ridge regression was suggested and applied. It uses a revised Least Square method to handle correlated predictors by allowing a small amount of bias in the estimates of the coefficients. And due to the fact that many climate factors may be unnecessary for predication, hierarchical cluster analysis was adopted to decompose correlations into different pieces for grouping variables. Based on the combination of these above two methods and existing surveillance data , the present study aims to investigate the effects of climate variations on bacillary dysentery incidence in northeast China.

Study area and data collection
Shenyang, capital city of Liaoning Province and also the largest city in northeast China, 41°N and 123°E ( Figure  1), was selected as the study area. It has a temperate climate and a population of about 7.6 million. Shenyang has a semiwetness continent climate of the north temperature zone as well as concentrated precipitation and distinct seasons because of the monsoon. Annual average temperature is about 8.1°C, the highest monthly average temperature is 24.0°C in July, and the lowest monthly average temperature is -8.5°C in January. Its annual rainfall is 501.5 mm and the non-frost period is 183 days. Four seasons are spring, March -May; summer, June -August; autumn, September -November; winter, December -February. Demographic information for Shenyang was collected from local government report.
Bacillary dysentery is a legally mandated notifiable disease in China, the Law on the Prevention and Control of Infectious Diseases [6] requires health-care staff to report any of the 37 infectious diseases, including bacillary dysentery, to the Center for Disease Control and Prevention (CDC) through the National Noticeable Infectious Disease Reporting system (NIDR). The data of monthly incidence of bacillary dysentery in Shenyang from 1950 to 1996 was obtained from Liaoning Center for Disease Control and Prevention. Because the study period covers a long time span, four time subgroups were obtained based on historical and economic development of China.
In order to control seasonal effects, two models were set up in each time subgroup, one is based on data from January to July, and the other is base on data from August to December. The meteorological information was collected from Shenyang Meteorological Bureau. Meteorological data consisted of the corresponding monthly air pressure, average temperature, maximum temperature, minimum temperature, precipitation, evaporation and relative humidity.

Data analysis Correlation analysis
The relationship between monthly mean meteorological factors and the monthly incidence of bacillary dysentery was examined. Spearman's correlation was performed to quantify the relationship between monthly weather variables and the monthly incidence of bacillary dysentery a lag of one to four months.

Multi-colliearity Diagnosis
The judgment of multi-colliearity was made by checking related statistics, such as tolerance value or variance inflation factor (VIF), and condition index. The ith tolerance value is defined as 1-, where is the coefficient of determination for regression of the ith independent variable on all the other independent variable. VIF is just the reciprocal of a tolerance value. With the recommendations of several statisticians [7], large VIF value greater than 10 and/or average VIF greater than 6 indicates strong collinearity, and that those variables are collinear if their condition indexes are more than 20 and corresponding variance proportions are more than 0.5.

Ridge regression analysis
There was collinearity among the meteorological factors, especially between constant and air pressure, average temperature and maximum temperature, and minimum temperature and relative humidity. To avoid the multicollinearity, ridge regression was used to quantify the relationship between weather variables and bacillary dysentery incidence. By using an improved least square method, ridge regression sought standardized coefficients (i = 1, 2,..., m), its formal equation was written as: Compared to Ordinary Least Square(OLS) linear regression, the basic principle of ridge regression is to artificially reduce correlation coefficient r ij of each pair of variables x i and x j (including response variable and independent variables) to r ij /(1+k), k is called ridge parameter, and usually 0 < k < 1. K value was selected when all the regression coefficients were relatively stable and the sign of the coefficients did not change.

Hierarchical cluster analysis
Hierarchical cluster analysis was adopted to group the weather variables. In this study, the agglomerative method with between-group average linkage algorithm was adopted and the measure for similarity was the Pearson correlation.
All the above statistical analyses were performed by Statistical Product and Service Solutions (SPSS 12.0 for windows, SPSS Inc., Chicago, IL, USA).

Ethical review
The present study was reviewed by research institutional review board of China Medical University and found to be utilization of disease surveillance data and meteorological data not requiring oversight by an ethics committee.

Statistical description of bacillary dysentery in Shenyang city
The distribution of bacillary dysentery incidence in Shenyang between 1950 and 1996 is shown in Figure 2. Figure  2a shows the long-term trend, which is that the incidence of bacillary dysentery in Shenyang rose over periods 1950-1966 and 1970-1981, and declined over periods 1966-1970 and 1981-1996. Figure 2b indicates that there was a seasonal distribution of bacillary dysentery in the study area, with about 85% of cases occurring in summer and autumn The incidence of bacillary dysentery increased from January to August and decreased from August to December.

Spearman's correlation analysis
The relationship between monthly mean meteorological factors and the monthly incidence of bacillary dysentery in the studied period was examined by Spearman's correlation analysis ( Table 1). Due to the fact that the history data covered more than 40 years, four time subgroups were set for analysis based on historical and economic development of China. Table 1 shows that temperatures, precipitation, evaporation and relative humidity were positively related to the monthly incidence of bacillary dysentery, while air pressure was inversely correlated with the incidence. Lagged effects existed in air pressure from 1987 to 1996 and in relative humidity from 1967 to 1976 and from 1987 to 1996.

Multi-collinearity Diagnosis
The diagnosis of multi-collinearity is shown in Table 2 and Table 3. In Table 2, the largest VIF is 164.187, and the average VIF is 54.574, therefore, climatic variables are strongly collinear. In addition, Table 3 shows that the largest condition index is 815.54 which also indicates strong multi-collinearity. With the help of condition index and variance proportion, we found that collinearity existed between the constant and air pressure, the average tem-perature and maximum temperature, as well as minimum temperature and relative humidity.

Ridge regression analysis
Ridge regression was adopted to examine the relationship between the monthly mean weather variables and the monthly incidence of bacillary dysentery. Ridge trace maps ( Figure 3) show that when ridge parameter k reached 0.5, ridge regression coefficients were stable. When ridge parameter was set as 0.5, the ridge regression coefficients were then obtained (Table 4). Different patterns for the analysis unit Sub-grouping improved the determinant coefficients of all models. However, determinant coefficients of each unit differ from 0.490 to 0.892. Most models give the following similar results: relative humidity and precipitation have a positive association with incidence from January to July, and have no association from August to December, while evaporation is the opposite. Some results vary different for the four time sub- A B  Figure 4 shows the effects of climate variations on bacillary dysentery incidence, the regression coefficients were listed above the arrows. Due to the long period, only the model for 1987 to 1996 was listed. During this period, all meteorological factors were divided into three categories. Relative humidity and precipitation belonged to one class, temperature indexes and evaporation belonged to another class, and air pressure was the third class ( Figure  4).

Discussion
Although incidence of intestinal infectious diseases has declined considerably in developed countries, they are still major public health problems in developing countries, especially in the economic developing regions [8,9]. The transmission of Shigella is determined by many factors, such as human dietary lifestyle, and its strains' susceptibility to drugs and climate variations. Climate variability that impacts the incubation rate of Shigella is considered one of the major environmental contributors to Shigella transmission.
Climate change, especially global warming, has already brought and will continue to bring about challenges to public health and communicable disease control and prevention. The effects of climate variations on bacillary dysentery incidence have gained more recent concern. The historical climatic data from 1950 to 1996 demonstrate a significant increase in temperatures in Shenyang city. It has been reported that temperature was the key climatic indicator in the transmission of bacillary dysentery, a potential 1°C increase in maximum or minimum temperature was associated with up to 12% increase in bacillary dysentery cases in the northern city if other affecting factors remain unchanged [10]. Public health interventions should be developed at this stage to adapt and mitigate the possible consequences of climate change in the future with consideration of local climatic conditions [11]. The model constructed in the present study showed that explanatory climatic variables for the incidence of bacillary dysentery could be divided into three categories, namely relative humidity and precipitation; temperature indexes and evaporation; and air pressure. Among them, only air pressure showed negative correlations, all other variables showed positive correlations. From the point of effect size, temperature indexes and precipitation had greater effect which is of part concordance with the results of other studies in China [12,13]. Precipitation, relative humidity and air pressure may affect the transmission of food-borne diseases by influencing the pathogens and then affect the incidence of bacillary dysentery. The impact of precipitation on diarrhoeal disease is far from conclusion. Different studies obtained inconsistent results [14][15][16]. The three categories of climatic variables could be used as predictors for the number of bacillary dysentery cases in Shenyang. The results of the present study may be used to assist public health decision making and community health education. Further studies about this issue are necessary.  Due to the multi-collinearity in independent variables, the regression coefficients are very unstable with excessive standard error. A general approach to treat the existence of variables that are non significant to the effect of response variables is to remove them one by one until all the variables reserved in the equation are significant. In the present study, high multi-collinearity existed between the constant and air pressure, average temperature and maximum temperature, minimum temperature and relative humidity. We proposed a modified method to combine ridge regression and cluster analysis to solve this problem. The adoption of ridge estimated method to construct the regression model improved the effect of multi-collinearity, reduced the standard error of regression coefficients and produced more stable results. The stable regression coefficients then made the estimation of effect more reliable.
Though the meteorological factors are nonmodifiable, we can modify our dietary lifestyle, such as reducing intake of raw vegetables or cleaning vegetables and fruits more thoroughly. Local weather forecasting may have policy implications for bacterial dysentery prevention and control. Strengthening the education of public health will help to eliminate or reduce the epidemic of bacterial dysentery.
One limitation of this study was that we only focused on the relationship between the meteorological factors and the incidence of bacillary dysentery. The socio-economic status, especially the awareness of health care has been improved in China. So in order to remove the effect of preventive measures, this study used only the historical data before 1996 and four time subgroups that were obtained based on historical and economic development of China. The changes in social and economic conditions need to be addressed in further studies. Bacillary dysentery is a legally mandated notifiable disease in China, as a provincial capital city, Liaoning provincial CDC and Shenyang municipal CDC have kept good records for Shenyang bacillary dysentery incidence. However, there is still another limitation of the present study, which is that under-reporting is possible in the disease surveillance system, especially before the computer and web were widely applied. An investigation of missing reports of notifiable diseases in China in 2005 indicated that there was much to be improved upon in Chinese medical facilities as far as reporting of infectious diseases [17]. So, the correlation between climatic variation and bacillary dysentery may be underestimated. However, there is no evidence to suggest there was any trend in under-reporting during the four time subgroups in Shenyang city. Therefore it is considered that the impact of under-reporting during each time sub group on the results is not significant.

Conclusion
In summary, relative humidity, temperature indexes and atmospheric pressure affected the transmission of the bacillary dysentery in Shenyang, located in the northeast of China. Climate variations have brought and will continue Ridge trace between climatic variables and logarithmic incidence of bacillary dysentery (1950-1996, Shenyang, China) to bring about challenges to communicable disease control and prevention. Bacillary dysentery prevention and control should be taken by giving more consideration to local climate variations and integrated modelling based on data systems is necessary to disease control and prevention.
Combination of cluster analysis and ridge regression analysis between climatic variables and logarithmic incidence of bacillary dysentery (1987-1996, Shenyang, China)