Early warning signal for dengue outbreaks and identification of high risk areas for dengue fever in Colombia using climate and non-climate datasets

Background Dengue has been prevalent in Colombia with high risk of outbreaks in various locations. While the prediction of dengue epidemics will bring significant benefits to the society, accurate forecasts have been a challenge. Given competing health demands in Colombia, it is critical to consider the effective use of the limited healthcare resources by identifying high risk areas for dengue fever. Methods The Climate Risk Factor (CRF) index was constructed based upon temperature, precipitation, and humidity. Considering the conditions necessary for vector survival and transmission behavior, elevation and population density were taken into account. An Early Warning Signal (EWS) model was developed by estimating the elasticity of the climate risk factor function to detect dengue epidemics. The climate risk factor index was further estimated at the smaller geographical unit (5 km by 5 km resolution) to identify populations at high risk. Results From January 2007 to December 2015, the Early Warning Signal model successfully detected 75% of the total number of outbreaks 1 ~ 5 months ahead of time, 12.5% in the same month, and missed 12.5% of all outbreaks. The climate risk factors showed that populations at high risk are concentrated in the Western part of Colombia where more suitable climate conditions for vector mosquitoes and the high population level were observed compared to the East. Conclusions This study concludes that it is possible to detect dengue outbreaks ahead of time and identify populations at high risk for various disease prevention activities based upon observed climate and non-climate information. The study outcomes can be used to minimize potential societal losses by prioritizing limited healthcare services and resources, as well as by conducting vector control activities prior to experiencing epidemics. Electronic supplementary material The online version of this article (doi:10.1186/s12879-017-2577-4) contains supplementary material, which is available to authorized users.


Background
Dengue is complicated. There are four serotypes of the dengue virus, and dengue infection occurs in almost all age groups [1,2]. Dengue is endemic in many parts of the tropics and subtropics, and dengue endemic countries are also exposed to the risk of periodic outbreaks [1,3]. In Colombia, dengue has been prevalent over the last 20 years with different degrees of incidence rates and epidemics in various geographical locations [4,5]. Due to the complexity of the disease, there are still large knowledge gaps regardingthe causes of dengue epidemics [6][7][8][9]. Infection with one serotype provides life-long immunity to that specific serotype. Therefore, subsequent introduction of the same serotype in a community would be less likely to cause the occurrence of a dengue epidemic if there were a small population of denguesusceptible individuals [7,8,10]. However, due to a high degree of antigenic cross-reactivity, sequential infection of two different serotypes can bring favorable or detrimental outcomes depending upon known and unknown factors including timing of infection [2,11,12]. For example, a primary infection may help slow the spread of secondary heterologous infection when some degrees of cross-protection are conferred [2,[12][13][14]. On the other hand, many studies have shown that subsequent heterologous infection would likely increase the probability of experiencing severe dengue fever [15][16][17][18]. One of the known mechanisms is the antibody dependent enhancement (ADE) during the second infection mediated by non-protective heterotypic antibodies arising from the primary infection [2,11,14,19]. In dengue endemic countries such as Colombia, the number of dengue cases is periodically reported to the upperlevel health management unit (i.e. provincial or Ministry of Health) from various health facilities at the municipality level [4,20]. In the case of dengue fever, like any other diseases, severe cases are detected more easily than mild symptoms, which in turn, leads to a higher volume of reported caseload [21]. Thus, having more severe cases is also related to the high likelihood of observing dengue epidemics when an epidemic is determined based on official statistics of reported cases.
While it is undeniable that all of these aspects would affect the occurrence of dengue epidemics directly and indirectly, it does not appear to be practical in proving the impacts of these factors on the occurrence of dengue epidemics due to the following reasons: (1) despite various efforts to disentangle the complexity of the disease [11], it is still uncertain to generalize how one serotype reacts with another in terms of cross-protection or ADE for all possible scenarios among four serotypes, as well as the duration of the interactions [22,23]; (2) even if this uncertainty is going to be uncovered in the near future, it would be very difficult to obtain the details of sero-prevalence history over a long period of time for each cohort in all specific locations. These limitations make it difficult to understand how much of each factor would contribute to the actual probability of a dengue epidemic occurrence [7,9,24].
A more practical way is to focus on the basic principle of the occurrence of a dengue epidemic. Simply put, a dengue epidemic occurs when a large number of people become infected within a short period of time [2,7]. It requires a large number of vector mosquitoes (Aedes aegypti), as well as high transmission probability, and frequent contact between people and the vectors (biting rate) to sustain transmission [2,3,7]. In other words, a dengue epidemic would more likely occur when vector mosquitoes increase within a short time period in a location where dengue viruses are currently circulating and population density with no immunity to one of the four serotypes is high during the same period [8,9,24]. Further, the importation of infected cases into a community where there is no immunity to that specific serotype would cause an epidemic as well.
Following this principle, the main concept of this study lies in the increase of vector mosquitoes as a primary factor of a dengue epidemic taking into account population density at different elevation levels. As a vector-borne viral disease, there is a wide range of factors that influence the spatial and temporal dynamics of mosquito populations: temperature, rainfall, and humidity, etc. [9,24,25]. There have been several efforts to understand the relationship between dengue epidemics and climate change. Juffrie and Focks used sea surface temperature anomalies to identify the occurrence of dengue epidemics in Yogyakarta, Indonesia and Bangkok, Thailand [26]. Lowe et al. developed an epidemic early warning system for Southeast Brazil using several climate and nonclimate datasets [27]. More recently, Huang et al. found that El Nino-Southern Oscillation climate cycles and temperature were important factors affecting the weekly occurrence of the four dengue serotypes in Cairns, Australia [23]. Adde et al. also identified summer Equatorial Pacific Ocean sea surface temperatures and Azore high sea-level pressure as significant indicators in predicting dengue epidemics in French Guiana [28]. While some of the climate factors were more commonly used due to the nature of a vectorborne disease, their applications varied and were geographically focused. These findings from previous literature showed that climate factors play a significant role in the occurrence of dengue epidemics.
This study first attempts to predict a dengue epidemic by developing an Early Warning Signal (EWS) model based upon the temporal relationship between the occurrence of dengue epidemics and climate variability which affects mosquito populations in Colombia. Furthermore, using climate data and topographical information, the study identifies population at high risk for dengue fever for efficient disease prevention activities.

Methods
Dengue Incidence Proxy (DIP) was created to observe the trend of the dengue incidence in Colombia. The number of dengue fever cases and population data were obtained from SIVIGILA and Departamento Administrativo Nacional de Estadistica (DANE) which are both official governmental programs in Colombia [4,29]. Dividing the dengue fever cases reported by population can be used as a good proxy to observe the overall trend of dengue fever. SIVIGILA also provides a weekly report on epidemiological events (Boletin Epidemiologico) which discloses the proportions of municipalities that were not responsive for each department [30]. Thus, the number of cases was adjusted by the proportions for underreporting by assuming that a non-responsive municipality would have the average number of cases per responsive municipality of that department: the reported cases by department was divided by the number of the responsive municipalities in that department, applied to non-responsive municipalities, and added to the reported cases by department. DIP was estimated by dividing the adjusted cases by population. While Boletin Epidemiologico was available over the study period, a more consistent pattern of the underreporting system was observed in the reports since 2011 after the large outbreak in 2010. Because a robust case reporting system is critical for determining the relationships between DIP and climate data, some departments out of 31 departments were excluded if over 20% of underreporting based on Boletin Epidemiologico occurred more than twice since 2011. An outbreak was defined as a relative term in this study. In other words, as long as an unusual peak in DIP was observed in a department, it was considered as an outbreak even if the DIP value in that department was relatively low compared to other departments where dengue is more prevalent. An unusual peak was marked by department if the slope of DIP over every six months fell into the highest 10% of the observations. Table 1 summarized the datasets used in this study. Considering the spatial and temporal dynamics of mosquito populations, three climate datasets and two non-climate datasets were selected as factors which can explain variation in DIP. The climate raster datasets include air temperature, precipitation, and specific humidity [31][32][33]. The monthly climate datasets were obtained from 2006 to 2015, and all the raster files were resampled into 0.008 by 0.008 degree resolution by taking the nearest neighbor assignments. It should be noted that the study presumed that it is critical to consider how long favorable conditions for vector mosquitoes persist [9,23]. In other words, a current epidemic is a result of the climate conditions consistently observed during the past months, rather than single temporal (monthly or daily) values at present. For example, if warm temperature and high humidity were observed only for a short time period of each year, these conditions would less likely affect the larval development or virus replication to cause an epidemic [25]. Thus, after checking crosscorrelograms to define a proper period, the 12-month moving average of the mean values of each climate data was estimated by department (Additional file 1).
In addition to the climate factors, night light data and elevation raster files were included [34,35]. Night lights data which is available by year was used to understand population density instead of conventional population statistics. The use of the night-lights data provides more flexibility to estimate population density at various levels of geographical units over time than the projected population data [36]. Prior to applying the night-lights data, correlations between night-lights data and population data were tested to ensure that the night-lights data can be used as an appropriate proxy (ρ = 0.94). The most recent nightlights data was for 2013 at the time of the research. As the population level does not change dramatically during a short period of time, the population level in 2013 was assumed to be consistent in 2014 and 2015. High population density would have two opposite effects in terms of transmission intensity depending upon the level of a reproduction number: (1) dilution of infectious individuals by having a large pool of host populations, (2) a large number of susceptible hosts to be infected, leading to the surge of infected cases. For the latter case, while transmission would be more intensive in a place where population density is high, holding other climate factors constant, it does not have to be necessarily true in areas at high elevations [9]. A previous study found that it is difficult for Aedes aegypti mosquitoes to survive at an elevation of 6000-8000 ft or even at lower elevations in temperate latitudes [37]. Because many people in Colombia live at high elevations (i.e. Bogota), the mean value of the night lights was used to estimate population density separately for people living under 1500 m and those living over 1500 m by department [38].
The three climate datasets are partially correlated but also have their own distinctive characteristics. In order to preserve all the information contained in each of the climate datasets, the Climate Risk Factor (CRF) index was created. The advantage of using a composite index is that it prevents multicollinearity when running regressions against independent variables with some level of the correlations among the variables. The three climate variables and population density under 1500 m were used by department. The precipitation variable, which has a negative relationship with DIP, was reversed, so all variables go towards the same underlying concept (the increase in DIP). The variables were first standardized individually by subtracting the mean and dividing by the standard deviation. The standardized values were then averaged across the variables [36,39]. The final values were converted into a range from zero (low risk) to one (high risk) and multiplied by 100 for an easier interpretation. It should be noted that the temperature and specific humidity data used in this study are measures at the surface level. More precisely, air temperature is at 2 m above the ground surface, and specific humidity is measured near surface at sea level with pressure level 1000 millibars. Thus, it would be desirable to adjust the CRF index by the risk proportion at low and high elevation. The proportion at risk was estimated by dividing the sum of the night lights observed under 1500 m elevation by the sum of the total night lights in each department. The final CRF index was the product of the raw CRF index and the proportion at risk. There were two dominant patterns observed during past dengue epidemics in Colombia: (1) rapid increase of the CRF index, (2) relatively steady increase of the CRF index at different levels of the CRF and DIP values. In other words, the slope of the CRF index curve at various levels of the CRF index and DIP values appeared to be critical in predicting the occurrence of dengue epidemics. In order to assess this combined relationship, the elasticity of the CRF index curve was estimated. This is defined as the percentage change in DIP in response to a 1 % change in the CRF index [40,41]. The stationarity of the dataset was tested to ensure that there were no trend and periodic seasonal effects. The augmented Dickey-Fuller (ADF) unit-root test was used to test whether the dataset is stationary by department [42,43]. DIP is non-negative integer values, and count models were used to fit DIP as a function of the CRF index (Additional file 1: Supplementary 2). The DIP dataset consists of two parts: (1) model dataset, (2) validation dataset. The model was constructed based on monthly DIP and the CRF index by department from January 2007 to December 2015. The validation dataset which was separated from the model dataset was established from January 2016 to April 2016 and used to validate the model performance. Overdispersion-where the variance is greater than the mean-was tested using the Zscore test at the 5% significant level [44][45][46]. In addition, the Akaike Information Criterion (AIC) fit test was used to compare the model fits between Poisson and negative binomial models. Being a non-linear model, the elasticity of the CRF function can be given as [46]: where exp x ′ i β À Á is the expected DIP values, β k is the coefficient of CRF, x is the explanatory, and y is the response.
As shown above, the main interest of the study lay in estimating elasticities, and count models were used as an intermediary step in calculating elasticities. Given the geographical variations of dengue outbreaks, it is critical to estimate the elasticities separately by department with varying coefficient values of CRF. In this context, the current model was preferred to non-linear mixed models with a fixed coefficient and random effects since the use of coefficients and the measure of marginal effects and elasticities were more straightforward, reducing any possibility of potential overspecification (i.e. multiple adjustments) [46,47]. Because the model was run separately for each department allowing variation in the CRF index by department, there is no concern about creating the effect of spatial autocorrelation. The elasticities were derived for every six months from January 2007 to December 2015. Early Warning Signal (EWS) was modeled such that dengue epidemics in Colombia can likely occur when the elasticity of the CRF index is maximized given the instantaneous slopes of DIP and the CRF index over time are positive minimizing the squared residuals. Maximize: Elasticity; E Subject to: where DIP − and CRF are the means of DIP and CRF, T is time (month). The elasticities were then categorized into three percentiles: low level warning (0-50%), medium level warning (50-75%), and high level warning (75-100%). As expressed by Adde et al., the hit rate (HR) and false alarm rate (FAR) were defined as below [28]: In addition, a sensitivity analysis was conducted with various moving average scenarios to make sure that the 12-month moving average is the most suitable period for the performance of the EWS model.
Given that the CRF index is statistically significant to explain variance of DIP for the departments where significant underreporting was not observed, the CRF index was further estimated at the smaller geographical level (5 km by 5 km resolution) for the entire country and used to identify high risk areas.

Results
During the period from January 2007 to December 2015, two major outbreaks were observed in many parts of Colombia. Figure 1 presents the overall trends of the three climate factors, as well as the DIP from 2007 to 2015 in Valle del Cauca, one of the departments where dengue fever is highly prevalent (see Additional file 1: Supplementary 3 for other departments). Looking at the bottom right panel in Fig. 1, there were two major outbreaks in 2010 and 2013 in the department. Comparing the trend of DIP with the climate factors, DIP appears to be positively correlated with temperature and humidity, but has a negative relationship with precipitation.
13 of 31 departments in Colombia were chosen after checking the robustness of the case reporting system. The ADF test showed that we reject the null hypothesis, which means that the dataset is stationary. As shown in Table 2, the CRF index is highly significant for all departments except Guaviare and Magdalena, thus 11 departments were selected for further analysis.
The CRF index and DIP were plotted over time to show the overall trend in Fig. 2 (see Additional file 1: Supplementary 4 for other departments). It is clear that the epidemic that occurred in 2010 was picked up by the steep increase of the CRF index. In 2013, another epidemic was observed. While there was no rapid change in terms of the CRF index during a short period in 2013, the CRF index reached its high level following the steady increase of the index since 2012. These provide an important point where an occurrence of future dengue epidemic can be related not only to the rapid increase of the CRF index, but also to the various levels of the CRF index and DIP. These combined relationships can be further explained by the elasticity of the CRF index which was used to develop an Early Warning Signal (EWS) model. In Fig. 3, the EWS based on the elasticity of the function was demonstrated for Valle del Cauca. In the department, the peak DIP was observed in March 2010, and the EWS signaled the high level warning sign two months before the peak (January 2010). Similarly, the second peak occurred in May 2013, and the EWS level went up from low to medium in January 2013 and remained at the same level until the end of the peak. It The EWS model predictability was examined with the validation data in 2016 which was separated from the model. It is interesting to see that the EWS already signaled the high level warning sign at the end of 2015, which accurately predicted another outbreak in two months (February 2016) that is out of the study period. Figure 4 further demonstrates the EWS model performance with the validation data for all 11 departments. 6 of 11 departments experienced outbreaks between January 2016 and April 2016. The EWS model successfully predicted these outbreaks 1~5 months ahead of AICs for non-selected count models were presented for comparison. The AIC fit test was consistent with the Z-score test in terms of choosing a better model fit except Boyaca and Cundinamarca. Since the AIC differences were trivial for the two departments, the Bayesian Information Criterion (BIC) was further assessed, and NB was preferred over Poisson * Significance at the 10% level, ** at the 5% level, *** at the 1% level In other words, sensitivity (HR), specificity, positive predictive value, and negative predictive value of the validation data were as follows: 83.3%, 100%, 100%, and 83.3%. The sensitivity analysis was performed with different moving average scenarios (12 months, 6 months, current value). As shown in Table 3, the hit rate was the highest with the 12-month moving average scenario, meaning that the current model produced the most accurate prediction compared to the 6-month and no-movingaverage scenarios. The false alarm rate increased as the moving average period was shortened. This is mainly because the index becomes too sensitive and changes quickly due to the short duration of the moving averages of the climate datasets. As a result, it does not distinguish between minor fluctuations and major outbreaks (Fig. 5). This sensitive behavior of the CRF index with the shorter term scenarios proves our presumption that a current dengue epidemic is a result of the consistent long-term patterns of the climate conditions.
Given that the CRF index explains variation in DIP reasonably well, the CRF index was estimated at 5 km by 5 km resolution, and the most recent time of the index (December 2015) was presented in Fig. 6 (see Additional file 1: Supplementary 5 for more details). As expected, populations at high risk are concentrated in the Western  part of the country due to more suitable climate conditions for vector mosquitoes and the high population level compared to the East. Using the geo-coordinates of the high risk areas at 5 km by 5 km resolution, it is possible to identify the locations for people at high risk more accurately for efficient disease prevention activities.

Discussion
This study confirms that dengue fever transmission is strongly related to climate factors as well as population density at different topographical conditions. One of the advantages of the CRF index is to prevent multicollinearity by combining all relevant climate indicators which may have some degrees of correlations with each other but have distinctive characteristics at the same time. During the study period from January 2007 to December 2015, the nationwide dengue epidemic occurred in 2010 was well explained by the rapid changes of the CRF index. Even if the CRF index increased steadily, the study found that it was still possible to detect an epidemic by adopting the elasticity of the function which takes into account not only the slopes but also the various levels of CRF and DIP. In 2015, some inconsistent patterns between CRF and DIP were observed for some departments (Additional file 1: Supplementary 7). This inconsistency may be related to the unexpected emergence of Zika, which started being reported in 2015. As shown in Fig. 2, the number of Zika cases has continuously increased since 2015. However, it is still premature to make any firm statements regarding the impact of Zika on dengue fever due to uncertainty of the diseases. Given that reported cases are mainly based on clinical symptoms, there may have been a chance of misdiagnosis between the two diseases. In addition, due to the surge of an unfamiliar disease (Zika) imposing more difficulties on resource allocation at the local health facility level, it would be difficult to keep a consistent pattern in the casereporting system from municipality-level heath facilities. Excluding 2015, a number of false alarms where EWS sends out the medium or high level signals but DIP remains low were only observed twice in Cauca (April and December 2014) during the study period.
Some areas of uncertainty deserve attention. While the CRF index performed well for 11 of 13 departments, the index was not statistically significant in Magdalena and Guaviare. This may have been caused partly by the inconsistent patterns of reported cases over time. Because the EWS was estimated based upon the most recent observed climate datasets, the EWS in this study is limited to issuing alerts with short-time intervals (1~5 months ahead). Given that, at present there are 1~2 month delays until the climate data become available, EWS with the short intervals (i.e. less than two months) may not, for now, be practical in operational modes. However, this limitation can be improved based upon the availability of the climate datasets in real-time in the future, and the 1~5 month intervals would provide enough room for public health officials to prepare for selected vector control activities and healthcare interventions (i.e. increase the number of beds at high risk areas) in the dengue-endemic setting [9,26]. It should be noted that the study did not attempt to produce any longer-term predictions due to chaos and uncertainty in climate forecasts in the long run. Considering that longterm climate forecasts could be variable depending upon assumptions (i.e. future CO 2 omission level), the method proposed in this study could minimize potential bias which may be caused by uncertainty in input datasets. The climate datasets have coarse resolutions. While the datasets were resampled using the nearest option in this study, the model outcomes can be further improved with finer scale resolutions. It is worth noting that the cycling of El Niño and La Niña, called El Niño Southern Oscillation (ENSO), may have indirect impacts on the occurrence of dengue epidemics in South America by changing the patterns of climate variables such as temperature, precipitation, and humidity [28]. While any unusual changes of the climate variables affected by such events were captured by using the 12month moving averages, further investigation would be needed to identify accurate impacts of El Niño on climate factors including its timing. Nonetheless, our model provided accurate forecasts for the validation period for 5 of 6 departments that experienced outbreaks in 2016. In addition, this study identified populations at high risk for dengue at 5 km by 5 km resolution. The study findings can be used to accelerate introduction of dengue prevention activities and prioritize alternative health interventions among competing health demands in Colombia.