 Research
 Open access
 Published:
Assessing the relationship between malaria incidence levels and meteorological factors using clusterintegrated regression
BMC Infectious Diseases volume 24, Article number: 664 (2024)
Abstract
This paper introduces a novel approach to modeling malaria incidence in Nigeria by integrating clustering strategies with regression modeling and leveraging meteorological data. By decomposing the datasets into multiple subsets using clustering techniques, we increase the number of explanatory variables and elucidate the role of weather in predicting different ranges of incidence data. Our clusteringintegrated regression models, accompanied by optimal barriers, provide insights into the complex relationship between malaria incidence and wellestablished influencing weather factors such as rainfall and temperature.
We explore two models. The first model incorporates lagged incidence and individualspecific effects. The second model focuses solely on weather components. Selection of a model depends on decisionmakers priorities. The model one is recommended for higher predictive accuracy. Moreover, our findings reveal significant variability in malaria incidence, specific to certain geographic clusters and beyond what can be explained by observed weather variables alone.
Notably, rainfall and temperature exhibit varying marginal effects across incidence clusters, indicating their differential impact on malaria transmission. High rainfall correlates with lower incidence, possibly due to its role in flushing mosquito breeding sites. On the other hand, temperature could not predict highincidence cases, suggesting that other factors other than temperature contribute to high cases.
Our study addresses the demand for comprehensive modeling of malaria incidence, particularly in regions like Nigeria where the disease remains prevalent. By integrating clustering techniques with regression analysis, we offer a nuanced understanding of how predetermined weather factors influence malaria transmission. This approach aids public health authorities in implementing targeted interventions. Our research underscores the importance of considering local contextual factors in malaria control efforts and highlights the potential of weatherbased forecasting for proactive disease management.
Introduction
Malaria persists as a significant health concern globally, particularly in subSaharan Africa, posing a substantial threat to nearly half of the world’s inhabitants [1,2,3]. Annually, it contributes to a minimum of one million fatalities, with more than \(90\%\) of these occurrences transpiring in Africa [4, 5].
Malaria has a significant socioeconomic impact, making it not only a public health concern but also a major contributor to poverty and underdevelopment [6, 7]. Malaria vaccinations are likely insufficient, and despite more than 60 years of research, an effective vaccine that outperforms naturally acquired immunity has yet to be developed [8,9,10,11]. This underscores the importance of malaria research for socioeconomic advancement in Africa and globally. In Nigeria, where malaria transmission is widespread, 97% of the population is at risk. In 2019, Nigeria was one of six countries that accounted for 55% of global malaria cases [12]. Factors such as poor sanitation, overcrowded living conditions, and high population density contribute to the prevalence of malaria in Nigeria [13]. Malaria is actively transmitted across all 36 states of Nigeria [14]. Despite efforts to increase coverage, the proportion of InsecticideTreated Net (ITN) usage remains low in many endemic regions [15,16,17], and the nonsignificant negative relationship between malaria transmission and ITN coverage is concerning. This appears to be due to the barriers to ITN or LongLasting Insecticidal Net (LLIN) utilization which include heat, adverse reactions to the chemicals, unpleasant odors, and cost [18, 19].
Moreover, meteorological factors play a crucial role in driving malaria transmission, influencing the life cycles of both vectors and parasites [20]. Mosquito breeding, which relies on stagnant water accumulation, is directly affected by precipitation levels. While adequate rainfall can create breeding sites, intense rainfall can also impact mosquito habitats [21,22,23]. Additionally, ambient temperature is another key factor, as temperatures above 20\(^{\circ }\)C are essential for the development of the Plasmodium parasite. This is particularly true for Plasmodium falciparum, the most common malaria parasite in tropical regions [24].
Numerous studies have explored the intricate connection between malaria incidence and various meteorological factors [25,26,27,28,29]. For instance, Gunda et al. [28] conducted an assessment of malaria incidence and its correlation with weather variables in three rural districts in SubSaharan Africa (SSA), highlighting a significant association with precipitation and mean temperature at specific lag periods. Similarly, Akinbobola and Omotosho [29] examined the relationship between weather variables and reported malaria cases in two stations from different geopolitical zones in Nigeria. Their study revealed a notable increase in malaria cases associated with changes in weather variables. Specifically, they found that rainfall and humidity had positive associations with the incidence of malaria, while maximum temperature exhibited both inverse and direct relationships, depending on the region under consideration. Further, the study in [30] found that malaria incidence in Nigeria is significantly influenced by environmental factors such as rainfall, temperature, and proximity to water. The research found that higher rates of malaria are associated with increased rainfall and temperatures. The spatiotemporal study identified specific hotspots, facilitating the development of targeted intervention strategies. Additionally, the study indicates that malaria is more prevalent in the northern regions and rural areas compared to the southern and urban regions. Moreover, recent research in Uganda has demonstrated strong associations between malaria incidence and climate variables, indicating a positive correlation with rainfall as well as average temperature [29]. Similarly, a decadelong investigation of regional and temporal patterns of malaria incidence in Mozambique found a higher risk when maximum temperatures exceeded \(28^{\circ }\)C and humidity reached 95% [31]. In addition, regions such as Ethiopia and Senegal have shown similar spatial relationships between climatic variability, such as rainfall, and malaria occurrence [32]. These findings underscore the significant influence of climatic factors on malaria prevalence within endemic environments.
Driven by the evident seasonal fluctuations in malaria prevalence, particularly with a notable percentage of cases occurring during wet seasons [33,34,35] (see also Fig. 1b), this study investigates the relationship between panel malaria incidence data from Nigeria and meteorological variables. Our main objective is to discover the incidence ranges that are most efficiently predicted by meteorological factors, contrasting the traditional approach of predicting weather components based on incidence levels. To achieve this, we develop a clusteringintegrated multiple regression model for monthly panel data, incorporating meteorological factors such as rainfall and average temperature. By categorizing incidence data into distinct clusters using a clustering method [36,37,38,39], we improve model fitting and identify clusters where meteorological factors inadequately predict incidence. By varying clustering barriers to minimize mean squared error, we aim to optimize model performance. This approach addresses challenges posed by limited data availability, (particularly, the additional confounding factors) in developing countries, by proposing a clustering strategy to enhance modeling accuracy. Additionally, utilizing panel data from different states across Nigeria enriches our analysis.
Data and study area
The dataset used in this study comprises monthly reports of malaria cases over five years (20142018), obtained from the National Malaria Elimination Programme, Federal Ministry of Health, Nigeria. The data encompasses all six geopolitical zones in Nigeria, with two states representing each zone. This includes a total of twelve states, which collectively represent onethird of Nigeria’s regions: Anambra, Ebonyi, Bauchi, Gombe, Bayelsa, Delta, Kastina, Kebbi, Nassarawa, Niger, Ogun, and Ondo. The visualization of reported malaria cases is depicted in Fig. 1.
To address the disparity in data magnitude between meteorological and incidence data, which may stem from population variations among regions, we employ normalized density data to ensure interpretability and numerical stability. Population data required for normalization are sourced from the demographic statistics bulletin of Nigeria.
The corresponding mean monthly rainfall and temperature data were obtained for each of the aforementioned states from the World Weather Online. These states altogether comprise areas of low and intense temperatures and rainfall. The plot of the climate data is given in Figs. 2 and 3.
In Nigeria, the climate exhibits significant diversity, ranging from tropical conditions in the south to semiarid conditions in the far north. Precipitation patterns vary accordingly, with the annual rainfall below 500mm (20 inches) in the extreme northeast, increasing to 1,000 to 1, 500mm (40 to 60 in) in the central region, and exceeding 2, 000mm (80 in) in the south, particularly in the far southeast. Temperatures also display notable fluctuations across different climatic zones.
In the northern regions, winters are warm and dry, with daytime temperatures soaring to uncomfortable levels of up to 40\(^{\circ }\)C (104\(^{\circ }\)F), while nights are generally cool. In hilly areas of the north, temperatures can drop to freezing (0\(^{\circ }\)C or 32\(^{\circ }\)F). From February onwards, temperatures rise across inland areas, reaching scorching levels from March to May, with temperatures often surpassing 40\(^{\circ }\)C (104\(^{\circ }\)F) in the centernorth.
Conversely, in the southern regions, temperature increases are more moderate due to the proximity to the ocean and the onset of rain showers earlier in the year. Rainfall intensifies and becomes more frequent, gradually spreading northwards until it affects the entire country by June. This information is sourced from the Nigerian Meteorological Agency (https://nimet.gov.ng/).
Additionally, analysis of average monthly rainfall trends in Fig. 1c, illustrates that the rainy season typically commences between March and April, although variations occur from year to year and from state to state. The peak of rainfall typically occurs in August, followed by a tapering off of the rainy season from September to October, with the dry season prevailing from November onwards. Notably, Fig. 1d highlights a twomonth lag between the peak of rainfall and the peak in malaria cases.
Methods
Crosscorrelations and autocorrelations
We determine the time delays between malaria incidence and meteorological factors prior to their integration into the modeling process. Spearman’s correlation coefficient is employed for this purpose due to its resilience to monotonic transformations across datasets. Using a dataset comprising 60 data points, we fix the last 12 malaria incidence data points and shift 48month windows of meteorological parameters backward in time for all states. This approach allows us to explore correlations over a full oneyear period shift. For each fixed window, correlation coefficients are computed, and the maximum correlation is identified. The corresponding time lags associated with these maximal correlation coefficients are summarized in Table 1.
Furthermore, autocorrelation analysis of malaria cases is conducted by consolidating spatiotemporal data into a single time series, considering the relatively minor variations in normalized data. Lags up to 6 months from the present are selected, resulting in each covariate augmenting the dataset by 12 times 60 observations. Subsequently, Spearmanrank correlation coefficients are computed, as illustrated in Table 2.
Panel regression
As mentioned in the Introduction, this study focuses on modeling malaria incidence in Nigeria using rainfall and temperature data obtained from various states across different periods. The data used in this study is categorized as panel data, encompassing both crosssectional (across states) and timeseries dimensions, as it provides insights into individual behavior over time.
Commonly two primary models are employed for panel data analysis; the fixed effects model and the random effects model. Consider the multiple linear regression model for individual \(i = 1, \ldots ,N\), observed at various time points \(t = 1, \ldots , T\):
Here, \(y_{it}\) represents the dependent variable, \(x^{\prime }_{it}\) denotes a Kdimensional row vector of timevarying explanatory variables, \(z^{\prime }_i\) signifies a Mdimensional row vector of timeinvariant explanatory variables (excluding the constant term), \(\alpha\) stands for the intercept, \(\beta\) represents a Kdimensional column vector of parameters, \(\gamma\) denotes a Mdimensional column vector of parameters, \(c_i\) denotes an individualspecific effect, and \(u_{it}\) signifies the idiosyncratic error term.
The fixed effects model accommodates for individualspecific effects (\(\alpha _i\)) that may be correlated with the regressors x. In contrast, the random effects model assumes that these individualspecific effects (\(\alpha _i\)) are distributed independently from the regressors. The selection between the fixed and random effects models is determined using the Hausman test [40, 41]. This test evaluates whether there is a significant difference exists between the fixed and random effects estimators. Specifically, the test statistic is computed solely for the timevarying regressors. If the Hausman test yields an insignificant result, the random effects model is employed. Otherwise, the fixed effects model is preferred [42].
Given the limited availability of incidence data for modeling, utilizing panel data offers specific advantages in uncovering the correlations between malaria incidence and meteorological factors. This stems from several established benefits of employing panel data compared to relying solely on time series or crosssectional data [40, 43].
One advantage lies in incorporating individualspecific components within the model, which enables addressing heterogeneity across individuals. Integrating this component elucidates correlations among observations over time that are not solely attributable to dynamic trends, thereby mitigating unexplained variability. Additionally, incorporating individualspecific effects helps mitigate the issue of omitted variable bias.
Another advantage, particularly in the time series analysis, is leveraging the available data across individuals to compensate for shorter series lengths, obviating the need for extensive longitudinal data. Consequently, constructing an accurate model becomes feasible by identifying commonalities among individuals.
Conversely, compared to crosssectional data, panel data’s temporal dimension enhances estimation precision through additional temporal data points.
Incidenceweather relation:(without clustering strategy)
Let i and j represent the state and time indices respectively, where \(i\in \{1,\cdots ,S=12\}\) and \(j\in \{1,\cdots ,N\}\). Our strategy for modeling the monthly malaria cases in the 12 chosen Nigerian states involves directly linking collected variables. These variables consist of current (lag0) reported cases \(C =(c_{ij})\), cases reported in the preceding six months (lag1, \(\cdots\), lag6) from the current period \(C_{1} = (c_{i,j1}),\cdots ,C_{6} =(c_{i,j6})\), lagged monthly rainfall \(R =\mathbbm {1}_{S}\otimes (r_{jlag(i)})\), and lagged monthly temperature \(T =\mathbbm {1}_{S}\otimes (t_{jlag(i)})\); where lag(i) corresponds to the crosscorrelation outcome in Table 1. The symbols \(\mathbbm {1}_{S}\) and \(\otimes\) denote the column vector of size S with entries being 1, and the Kronecker product between two matrices respectively. The total number of observations is the length of the entire time window minus the maximum autoregressive lag. Let \(\beta _0\) denote the intercept and \(\beta _{\text {ind}}=(\beta _1,\cdots ,\beta _{S1})\) represent the individualspecific effects (reduced by one term to prevent linear dependence with the intercept). Further, \(\beta _{i}\) (for \(i=1,\cdots ,6\)) signify the marginal effects of the lagged incidence cases while \(\beta _R\), \(\beta _T\) and \(\varepsilon =(\varepsilon _{ij})\) represent the marginal effect of rainfall, marginal effect of temperature and the idiosyncratic error respectively. The direct relationship among these covariates is represented by:
Incidenceweather relation:(with clustering strategy)
Clustering is applied to the response data, while the associated explanatory variables are categorized based on the levels of the response data. This technique allows for the selective use of certain explanatory variables to predict a specific response variable, particularly when the number of explanatory variables is limited. The primary objective of clustering is to accurately allocate explanatory variables in scenarios where they may not predict a particular response variable effectively. Therefore, unlike conventional regression approaches, clusteringintegrated regression aims to identify the ranges of the response variable that are well predicted by the available explanatory variables. By incorporating additional explanatory variables, this approach can enhance model fitting.
The clustering concept involves categorizing the incidence data into M clusters \((\Omega _{k})_{k=1}^M\) separated by barriers \(\theta :=(\theta _{k})_{k=1}^{M1}\). In closed forms, the clusters are defined as \(\Omega _k=\{c:\max \{0,\theta _{k1}\}\le c<\min \{\theta _k,\max _{i,j}c_{ij}\}\}\). Let \(\delta _k(C;\theta ):=(\mathbbm {1}_{\Omega _k}c_{ij})\), where \(\mathbbm {1}_{\Omega _k}\) denotes the characteristic function and assigning a value of 1 to \(c_{ij}\) belonging to \(\Omega _k\) or 0 otherwise. Define \(R^k=R^k(\theta ):=\delta _k(C;\theta )\circ R\) and \(T^k=T^k(\theta ):=\delta _k(C;\theta )\circ T\), where the Hadamard product \(\circ\) represents the elementwise multiplication between matrices. These matrices return the original entries of R and T if their corresponding incidence cases belong to the respective cluster or 0 otherwise. This decomposition ensures that \(\sum _kR^k=R\) and \(\sum _kT^k=T\). Incorporating clustering, the model (2) is modified as follows:
In theory, the number of specified clusters is not limited to a small number, as better fitting can be achieved with more explanatory factors. However, concerns about complexity and interpretability may arise when adopting a large number of clusters. For example, if \(R^{(2)}\) is deemed insignificant, it implies that rainfall fails to predict response cases within the range specified by the middle cluster \(\Omega _2\). This approach allows such cases to remain “unexplained by rainfall”.
The pooled estimator \(\hat{\beta }\) changes with the lower and upper barriers \(\theta =(\theta _{\text {l}},\theta _{\text {u}})\), as do \(R^k\) and \(T^k\). Our objective is to determine the optimal barriers such that the squared error between the data \(C=(c_{ij})\) and the model approximation \(C[\hat{\beta }](\theta )\) is minimized. Mathematically, this translates to the optimization problem:
This problem can be solved using optimization techniques such as bruteforce or particle swarm optimization methods [44].
Data management and statistical analysis
The malaria incidence data, being populationdriven, underwent normalization to account for population variations across states. Specifically, normalization was performed to standardize the number of cases per 100,000 inhabitants based on the 2006 population census. This intentional normalization aimed to ensure comparability of malaria incidence rates across states, facilitating appropriate comparison with weather components unaffected by population dynamics. Due to the skewed nature of the datasets, and given that the applicability of linear regression necessitates that all datasets are identically and independently distributed (i.i.d.). To enhance compatibility with dummy or coded variables, we normalize the incidence and weather datasets. The incidence data transformation was done using the Johnson SU technique [45]. The transformation is defined as follows:
where:

\(\text {shift}\) is the shift parameter,

\(\text {grad}\) is the gradient parameter,

\(\text {div}\) is the divisor parameter,

\(\textbf{x}\) represents the data points.
For incidence data transformation, we use the parameters [shift, grad, div]= [0, 0.1, 0.000003]. We employed the logarithmic transformation for rainfall and temperature data and defined as follows:
where:

\(\textbf{x}\) represents the data points.

\(\text {shift}\) is a shift parameter added to the data to ensure all values are positive before applying the logarithm.

\(\text {div}\) is a divisor that scales the data.
We used the parameter values for [shift, div] as [23, 2e2] and [10, 1] for rainfall and temperature data transformation respectively. The visualisation of the data transformation is presented in Fig. 4.
The primary statistical analysis, involving panel regression modeling (see Panel regression section), was conducted using STATA software. This analysis focused on exploring the relationship between malaria incidence levels and weather variables while accounting for panel data structure and individualspecific effects. The preliminary analyses, such as crosscorrelation and autocorrelation assessments (see Crosscorrelations and autocorrelations section), were performed using MATLAB software. These initial analyses helped identify correlations and patterns in the data, providing insights into the relationship between variables before proceeding with more advanced modeling techniques in STATA. Additionally, the PSO clustering technique used for clustering the incidence data was performed in MATLAB.
Results
Incidenceweather crosscorrelations and incidencespecific autocorrelation
We observed that the correlations between malaria incidence and rainfall predominantly exhibit initial positive coefficients, which gradually transition to negative coefficients as the lag duration increases (see Fig. 5ac). Conversely, the correlations between malaria incidence and temperature demonstrate an opposite trend (see Fig. 5df). The time lags associated with the highest correlation coefficients between malaria cases and rainfall typically range from 1 to 3 months, with a majority occurring at a 2month lag (see Table 1). This conforms with previous studies that have found approximately a twomonth lag between the peaks of rainfall and malaria incidence [46,47,48]. However, for the correlations between malaria incidence and temperature, the maximal correlation coefficients exhibit a wide variability, ranging from 0 to 7 months, and encompass both negative and positive values (see Table 1). This fluctuating pattern of crosscorrelations may be attributed to the monthly data collection frequency, which represents a relatively large time scale for measurement.
In our study, lag values obtained from the crosscorrelation analysis for rainfall are used to construct appropriate variables in the regression models, while those for temperature are excluded due to their unstable nature.
Table 2 shows the casespecific autocorrelations with the Spearmanrank correlation coefficient. This information can be useful in understanding the temporal patterns and potential predictive power of past malaria incidence on current cases. The results in Table 2 suggest there is a positive correlation between malaria incidence in the current month and the incidence in the previous months, with varying strengths depending on the lag. The correlation coefficient of 0.63888 at lag 2 suggests a relatively strong positive correlation between malaria incidence in the current month and the incidence two months ago. The coefficient at lag 6 is 0.19333, indicating a relatively weak positive correlation between the current month and the incidence six months ago. This is followed by the coefficient of 0.34347 at lag 4 which suggests a weaker positive correlation between the current month and the incidence four months ago.
Clustering
The model featuring clusterspecific effects yields improved outcomes (in terms of increased \(R^2\) value and minimized RMSE) when utilizing the arbitrarily chosen cluster barriers, as opposed to when the clustering strategy is not employed [49]. Nevertheless, investigation is undertaken to ascertain whether clustering by dividing into tertiles represents the optimal approach, or if a different set of cluster barriers can outperform it in terms of minimizing the mean square error. Therefore, the optimal lower barrier \(b_l\) and upper barrier \(b_u\) are sought using the method of particle swarm optimization (PSO) [39, 44]. The PSO technique is a metaheuristic algorithm that does not rely on gradient information and utilizes a stochastic approach to converge toward optimal solutions. Specifically, the PSO variant employed in this study involves updating players’ positions (barriers) and velocities iteratively, with parameters such as selfconfidence (1.0), globalbest position attraction (1.0), inertia (1.0), and constriction factor (0.3) carefully tuned for optimal performance. The algorithm utilized 100 players to form a relatively large swarm, balancing between exploration and exploitation. Further details on the implementation can be found in [39].
The optimal barrier values are (\(b_l=0.4124,b_u=0.6990\)). The surface plot of the MSE with respect to the barriers \(b_l\) and \(b_u\) is presented in Fig. 6.
The malaria data and the meteorological data after clustering with optimal barriers are given in Fig. 7 .
Panel regression models
We investigated variable selection for model specification, considering criteria such as fit, complexity, insignificance, negative marginal effects, and multicollinearity stemming from certain variables. For fit and complexity evaluation, we aimed to minimize the Bayesian Information Criterion (BIC) [43, 50]. The BIC incorporates a likelihood function L and penalizes the number of parameters (k) more heavily compared to the Akaike Information Criterion (AIC) [51], particularly for large observation sizes, by including a term proportional to \(\log (N)\), where N represents the sample size. Our objective was to reduce BIC by eliminating certain variables and addressing issues of insignificance and multicollinearity as well.
Significance testing was performed using the standard ttest, while multicollinearity assessment involved computing the Inverse Variance Inflation Factor (1/VIF) for all explanatory variables except the constant term. A 1/VIF value below the threshold of 0.1 indicates multicollinearity associated with the tested variable [52]. Additionally, we monitored the pvalue of the Fstatistic, which indicates whether the overall set of variables is jointly significant; a pvalue smaller than the significance level \(\alpha =0.05\) suggests significance. This approach not only evaluates the model’s adequacy beyond a constant term model but also helps diagnose multicollinearity.
In this study two different models were examined: the first model (Model 1) featured incidence clustering with optimal barriers, while the second model (Model 2) incorporated lag incidence cases and individualspecific effects. The emergence of individualspecific effects automatically categorizes the model as either a fixedeffect or a randomeffect model. We opted for a randomeffect model to account for variability not explicitly addressed by the model variables, a choice also supported by a DurbinWuHausman test.
The model including lag incidence cases and individualspecific effects exhibited the highest Adjusted \(R^2\) Value (0.8036) and lowest BIC value (2222.5), outperforming the model solely based on incidence clustering, which had \(R^2\) and BIC values of 0.7125 and 1804.9 respectively. Model 2 highlights the significance of individualspecific effects on malaria incidence in five cities (Anambra, Ebonyi, Kastina, Nassarawa, and Ogun), indicating meaningful variability in malaria incidence outcomes specific to these cities beyond what can be explained by observed explanatory variables (rainfall and temperature).
Then, we checked if certain marginal effects would be consistent with our autocorrelation study. From Table 2, it is seen how cases in the past 6 months positively predict present cases with the least autocorrelations found from cases from the last 4 to 6 months. The casespecific autocorrelation supports the model specification where lag1 to lag3 incidence are significant predictors for present incidence, whereas lag2 incidence was omitted due to negative marginal effects that may have resulted from certain model specifications.
In both models, all marginal effects corresponding to the rainfall and temperature matrices are positive, except for the effect of rain in the upper cluster, which exhibited a negative marginal effect. This suggests that higher rainfall leads to lower incidence in the upper cluster. In Model 2, both rainfall and temperature have the highest marginal effect in the lower cluster and the least effect in the upper cluster. This pattern is also observed for the marginal effects with respect to rainfall in Model 1, whereas temperature significantly predicts only incidence in the lower cluster. For both models, temperature could not significantly predict cases in the upper cluster.
We also checked that residuals of the models follow a normal distribution which is crucial for ensuring the validity, adequacy, and reliability of the associated inferences. It can be seen from Fig. 8 that the residuals of both models follow a normal distribution with Model 2, conforming slightly better than the first model. The plot of the model fits with the data for Model 2 is given in Fig. 9.
Discussion
We employed a clustering approach to partition datasets into multiple subsets, not only enriching the explanatory variables but also accurately incorporating the role of weather in predicting specific ranges of incidence data. The clusteringintegrated regression models were complemented by optimal barriers. Given that varying the clustering barriers returns different modeling results, we aimed to identify optimal barriers that minimize the mean squared error.
In this study, insights from crosscorrelations and autocorrelations between weather factors (rainfall and temperature) and malaria incidence were utilized to incorporate suitable variables in the regression models.
Two models were deliberated: clusteringintegrated models with and without lag incidence and individualspecific effects. The selection of the model, along with its implications (marginal effects), hinges on the decisionmaker’s priorities. When \(R^2\) and BIC are of paramount importance, we advocated for the clusteringintegrated model with lag incidence cases and individualspecific effects. Notably, the significance of certain individualspecific effects suggests substantial variability in malaria incidence outcomes specific to these five cities, beyond the explanatory capacity of observed variables (rainfall and temperature). Indeed factors, such as mosquito breeding site availability and human behaviors (e.g., healthcareseeking practices, bed net usage), can influence these effects [53, 54].
In the model, all marginal effects related to rainfall and temperature matrices exhibit positivity, except for rainfall’s effect in the upper cluster, which displays a negative marginal effect. This suggests that higher rainfall correlates with lower incidence in the upper cluster, reflecting the intricate and contextdependent relationship. Indeed, high rainfall is known to eliminate mosquito breeding sites [21, 22]. The clusteringintegrated model solely comprising weather components is preferable when weather takes precedence over lag incidence cases or in scenarios where data on individualspecific effects are lacking.
In Model 2, both rainfall and temperature exert the highest marginal effect in the lower cluster and the least effect in the upper cluster. This pattern is consistent with Model 1’s marginal effects concerning rainfall, whereas temperature significantly predicts incidence only in the lower cluster. Interestingly, temperature fails to significantly predict cases in the upper cluster, suggesting physical implications where rainfall can predict incidence cases consistently throughout the year, while temperature can only predict lowtomedium incidence scenarios. Thus, this research suggests that while temperature and rainfall may influence disease incidence under certain conditions, their predictive power varies depending on the severity of the outbreak or other contextual factors.
The increasing demand for confounding factors to explain various incidence levels is mitigated by incidence clustering. This approach supports the notion of considering specific hypothetical factors for predicting malaria incidence and conventional regression modeling with limited explanatory variables [38, 39, 55]. The localization of accurately predicted incidence via weather components bears significant implications for public health authorities, not only informing the extent of prediction through marginal effects but also facilitating proactive measures amidst impending weather changes.
Ultimately, the present study highlights the importance of compiling data on additional confounding factors (e.g., other weather components, bednet availability and usage, presence of stagnant water bodies,etc), which not only introduce more explanatory variables but also enhance the reliability of the analysis.
Availability of data and materials
All the data sources have been mentioned. The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
References
Guinovart C, Navia MM, Tanner M, Alonso PL. Malaria: burden of disease. Curr Mol Med. 2006;6(2):137–40.
Bonner P, et al. Parasite burden and severity of malaria in Tanzanian children. N Engl J Med. 2014;370(19):1799–808.
Beier JC, Killeen GF, Githure JI. Entomologic inoculation rates and Plasmodium falciparum malaria prevalence in Africa. Am J Trop Med Hyg. 1999;61(1):109–13.
Emmanuel OE, Amzat J. Problems of malaria menace and behavioural intervention for its management in subSaharan Africa. J Hum Ecol. 2007;21(2):155–62.
World Health Organization. World malaria report, 2015. Geneva: WHO; 2015.
Sachs J, Malaney P. The economic and social burden of malaria. Nature. 2002;415(6872):680.
Scott N, Ataide R, Wilson DP, Hellard M, Price RN, Simpson JA, Fowkes FJ. Implications of populationlevel immunity for the emergence of artemisininresistant malaria: a mathematical model. Malar J. 2018;17(1):279.
Sherman IW. The elusive malaria vaccine: miracle or mirage? Washington, DC: ASM Press; 2009.
Matuschewski K. Vaccines against malaria still a long way to go. FEBS J. 2017;284(16):2560–8.
ElMoamly AA, ElSweify MA. Malaria vaccines: the 60year journey of hope and final successlessons learned and future prospects. Trop Med Health. 2023;51(1):29.
Stanisic DI, Good MF. Malaria vaccines: progress to date. BioDrugs. 2023;37(6):737–56.
WHO. World malaria report. Geneva; 2020. https://www.who.int/teams/globalmalariaprogramme/reports/worldmalariareport2021. Accessed 9 Aug 2023.
De Silva PM, Marshall JM. Factors contributing to urban malaria transmission in subSaharan Africa: a systematic review. J Trop Med. 2012;2012(1):819563.
Okunlola OA, Oyeyemi OT, Lukman AF. Modeling the relationship between malaria prevalence and insecticidetreated bed net coverage in Nigeria using a Bayesian spatial generalized linear mixed model with a Leroux prior. Epidemiol Health. 2021;43:e2021041. https://doi.org/10.4178/epih.e2021041.
Govella NJ, Okumu FO, Killeen GF. Insecticidetreated nets can reduce malaria transmission by mosquitoes which feed outdoors. Am J Trop Med Hyg. 2010;82(3):415–9.
Killeen GF, et al. Preventing childhood malaria in Africa by protecting adults from mosquitoes with insecticidetreated nets. PLoS Med. 2007;4(7):e229.
Killeen GF, Smith TA. Exploring the contributions of bed nets, cattle, insecticides and excitorepellency to malaria control: a deterministic model of mosquito hostseeking behaviour and mortality. Trans R Soc Trop Med Hyg. 2007;101(9):867–80.
Konlan KD, Kossi Vivor N, Gegefe I, Hayford L. Factors associated with ownership and utilization of insecticide treated nets among children under five years in subSaharan Africa. BMC Public Health. 2022;22(1):940.
Israel OK, Fawole OI, Adebowale AS, Ajayi IO, Yusuf OB, Oladimeji A, Ajumobi O. Caregivers’ knowledge and utilization of longlasting insecticidal nets among underfive children in Osun State, Southwest, Nigeria. Malar J. 2018;17:1–9.
Arab A, Jackson MC, Kongoli C. Modelling the effects of weather and climate on malaria distributions in West Africa. Malar J. 2014;13(1):126.
Eikenberry SE, Gumel AB. Mathematical modeling of climate change and malaria transmission dynamics: a historical review. J Math Biol. 2018;77(g., S.):857–933.
Chaves LF, et al. Indian Ocean dipole and rainfall drive a Moran effect in East Africa malaria transmission. J Infect Dis. 2012;205(12):1885–91.
Parham PE, Michael E. Modeling the effects of weather and climate change on malaria transmission. Environ Health Perspect. 2009;118(5):620–6.
Kurup R, Deonarine G, Ansari AA. Malaria trend and effect of rainfall and temperature within Regions 7 and 8, Guyana. Int J Mosq Res. 2017;4(6):48–55.
Devi NP, Jauhari RK. Climatic variables and malaria incidence in Dehradun, Uttaranchal, India. J VectorBorne Dis. 2006;43(1):21.
Evans OP, Adenomon MO. Modeling the prevalence of malaria in Niger State: An application of Poisson regression and negative binomial regression models. Int J Phys Sci. 2014;2:61–8.
Segun OE, Shohaimi S, Nallapan M, LamidiSarumoh AA, Salari N. Statistical Modelling of the Effects of Weather Factors on Malaria Occurrence in Abuja, Nigeria. Int J Environ Res Public Health. 2020;17(10):3474. https://doi.org/10.3390/ijerph17103474.
Gunda R, Chimbari MJ, Shamu S, Sartorius B, Mukaratirwa S. Malaria incidence trends and their association with climatic variables in rural Gwanda, Zimbabwe, 2005–2015. Malar J. 2017;16(1):1–3.
Akinbobola A, Omotosho JB. Predicting Malaria occurrence in Southwest and North Central Nigeria using Meteorological parameters. Int J Biometeorol. 2013;57:721–8.
Okunlola OA, Oyeyemi OT. Spatiotemporal analysis of association between incidence of malaria and environmental predictors of malaria transmission in Nigeria. Sci Rep. 2019;9(1):17500.
Zacarias OP, Andersson M. Spatial and temporal patterns of malaria incidence in Mozambique. Malar J. 2011;10(1):189.
Alemu A, et al. Climatic variables and malaria transmission dynamics in Jimma town, South West Ethiopia. Parasites Vectors. 2011;4(1):30.
RocaFeltrer A, Schellenberg JR, Smith L, Carneiro I. A simple method for defining malaria seasonality. Malar J. 2009;8(1):1–4.
Ibrahim OR, Lugga AS, Ibrahim N, Aladesua O, Ibrahim LM, Suleiman BA, Suleiman BM. Impact of climatic variables on childhood severe malaria in a tertiary health facility in northern Nigeria. Sudan J Paediatr. 2021;21(2):173–81. https://doi.org/10.24911/SJP.1061599226765.
Samdi LM, Ajayi JA, Oguche S, Ayanlade A. Seasonal variation of malaria parasite density in paediatric population of Northeastern Nigeria. Glob J Health Sci. 2012;4(2):103–9. https://doi.org/10.5539/gjhs.v4n2p103.
West BT, Welch KB, Galecki AT. Linear mixed models: a practical guide using statistical software. Boca Raton: Chapman & Hall/CRC; 2007.
Strand S, Cadwallader C, Firth D. Using statistical regression methods in education research. Southampton: The ReStore team, National Centre for Research Methods; 2011.
Ganegoda NC, et al. Interrelationship between daily COVID19 cases and average temperature as well as relative humidity in Germany. Sci Rep. 2021;11(1):11302.
Wijaya KP, et al. Learning from panel data of dengue incidence and meteorological factors in Jakarta, Indonesia. Stoch Env Res Risk A. 2021;35:437–56.
Sheytanova T. The accuracy of the Hausman Test in panel data: A Monte Carlo study. Sweden: Öeboro University; 2015. http://oru.divaportal.org/smash/get/diva2:805823/FULLTEXT01.pdf. Accessed 13 Dec 2023.
Baltagi BH, Liu L. Random effects, fixed effects and Hausman’s test for the generalized mixed regressive spatial autoregressive panel data model. Econ Rev. 2016;35(4):638–58.
Schmidheiny K, Unversität Basel. Panel data: fixed and random effects. Short Guides Microeconometrics. 2011;7(1):2–7.
Frees EW. Longitudinal and panel data: analysis and applications in the social sciences. New York: Cambridge University Press; 2004.
Hu X, Eberhart R. Solving constrained nonlinear optimization problems with particle swarm optimization. In: Proceedings of the sixth world multiconference on systemics, cybernetics and informatics. Winter Garden: International Institute of Informatics and Systemics (IIIS); 2002.
Friebel L, Friebelová J. Transformation of an empirical distribution to normal distribution by the use of Johnson system of translation and symmetrical quantile method. Acta Univ Bohemiae Meridionales. 2006;9(1):75–9.
Krefis AC, Schwarz NG, Krüger A, Fobil J, Nkrumah B, Acquah S, Loag W, Sarpong N, AduSarkodie Y, Ranft U, May J. Modeling the relationship between precipitation and malaria incidence in children from a holoendemic area in Ghana. Am J Trop Med Hyg. 2011;84(2):285.
Dlamini SN, Fall IS, Mabaso SD. Bayesian Geostatistical Modeling to Assess Malaria Seasonality and Monthly Incidence Risk in Eswatini. J Epidemiol Global Health. 2022;12(3):340–61.
Briët OJ, Vounatsou P, Gunawardena DM, Galappaththy GN, Amerasinghe PH. Temporal correlation between malaria and rainfall in Sri Lanka. Malar J. 2008;7:1–4.
Senvar O, Sennaroglu B. Comparing performances of clements, boxcox, Johnson methods with weibull distributions for assessing process capability. J Ind Eng Manag. 2016;9(3):634–56.
Raftery AE. Bayesian model selection in social research. Sociol Methodol. 1995;25:111–63.
Akaike H. Information theory and an extension of the maximum likelihood principle. In: Parzen E, Tanabe K, Kitagawa G, editors. Selected papers of Hirotugu Akaike. New York: Springer; 1998.
Mansfield ER, Helms BP. Detecting multicollinearity. Am Stat. 1982;36:158–60.
Singh R, Musa J, Singh S, Ebere UV. Knowledge, attitude and practices on malaria among the rural communities in Aliero, Northern Nigeria. J Fam Med Prim Care. 2014;3(1):39–44.
Fatunla OAT, Olatunya OS, Ogundare EO, Fatunla TO, Babatola AO, Adeniyi AT, Oyelami OA. Malaria prevention practices and malaria prevalence among children living in a rural community in Southwest Nigeria. J Infect Dev Ctries. 2022;16(2):352–61. https://doi.org/10.3855/jidc.14894.
Kuhn K, CampbellLendrum D, Haines A, Cox J, Corvalán C, Anker M. Using climate to predict infectious disease epidemics. Geneva: World Health Organization; 2005. pp. 16–20.
Funding
This work was supported by the Research Council of Finland.
Author information
Authors and Affiliations
Contributions
M.A. drafted the work and performed the computations; K.K.W.H.E interpreted data and conducted preliminary analysis; All authors reviewed earlier drafts and approved its final version.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
All authors have given their consent for the publication of this manuscript.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Amadi, M., Erandi, K. Assessing the relationship between malaria incidence levels and meteorological factors using clusterintegrated regression. BMC Infect Dis 24, 664 (2024). https://doi.org/10.1186/s1287902409570z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287902409570z