Study of the influence of meteorological factors on HFMD and prediction based on the LSTM algorithm in Fuzhou, China
BMC Infectious Diseases volume 23, Article number: 299 (2023)
This study adopted complete meteorological indicators, including eight items, to explore their impact on hand, foot, and mouth disease (HFMD) in Fuzhou, and predict the incidence of HFMD through the long short-term memory (LSTM) neural network algorithm of artificial intelligence.
A distributed lag nonlinear model (DLNM) was used to analyse the influence of meteorological factors on HFMD in Fuzhou from 2010 to 2021. Then, the numbers of HFMD cases in 2019, 2020 and 2021 were predicted using the LSTM model through multifactor single-step and multistep rolling methods. The root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and symmetric mean absolute percentage error (SMAPE) were used to evaluate the accuracy of the model predictions.
Overall, the effect of daily precipitation on HFMD was not significant. Low (4 hPa) and high (≥ 21 hPa) daily air pressure difference (PRSD) and low (< 7 °C) and high (> 12 °C) daily air temperature difference (TEMD) were risk factors for HFMD. The RMSE, MAE, MAPE and SMAPE of using the weekly multifactor data to predict the cases of HFMD on the following day, from 2019 to 2021, were lower than those of using the daily multifactor data to predict the cases of HFMD on the following day. In particular, the RMSE, MAE, MAPE and SMAPE of using weekly multifactor data to predict the following week's daily average cases of HFMD were much lower, and similar results were also found in urban and rural areas, which indicating that this approach was more accurate.
This study’s LSTM models combined with meteorological factors (excluding PRE) can be used to accurately predict HFMD in Fuzhou, especially the method of predicting the daily average cases of HFMD in the following week using weekly multifactor data.
Hand, foot, and mouth disease (HFMD) is a common infectious disease in children caused by enterovirus infection. Its symptoms are mainly oral pain, anorexia, fever, and minor herpes or ulcers in the hands, feet, mouth, and other body parts. It can lead to fatal complications in severe cases, such as myocarditis, pulmonary oedema, and aseptic meningoencephalitis [1, 2].
HFMD can be transmitted through contact with respiratory secretions, droplets, and pollutants from infected individuals or through the faecal-oral route, which can easily cause school aggregation events, thus affecting children's everyday life and learning. HFMD has led to many outbreaks worldwide and has become a public health problem in Asia . In recent years, the reported incidence of HFMD ranks second only to viral hepatitis among infectious diseases classified under the Infectious Disease Control and Prevention Act in Fujian Province, China, with a substantial significant social impact that has attracted considerable attention from relevant departments.
Meteorological factors have been recognized as risk factors associated with HFMD epidemics [4-7]. Researchers from various countries and regions have studied the impact of climate on HFMD, including air temperature, sunshine, relative humidity, wind speed, and precipitation. The findings, however, have not been entirely consistent. For instance, several studies have shown that the incidence of HFMD significantly increases as the air temperature increases. Nevertheless, in some studies that concluded that HFMD was not significantly affected by air temperature, the air temperature range that affects HFMD was not exactly the same [7-14]. It has been reported that the impact of sunshine on HFMD increases with increasing sunshine intensity. However, another study showed a negative correlation between sunshine duration and the risk of HFMD infection [15-17]. The reasons for these differences include different analysis model schemes [e.g., generalized linear model (GLM), spatiotemporal zero-inflated negative binomial (ZINB) models, generalized additive mixed model (GAMM), distributed lag nonlinear models (DLNMs)], data types (e.g., daily data, weekly data and monthly data), and region-specific characteristics (e.g., socioeconomic factors, living environment, etc.) that may change the impact of meteorological factors on HFMD [7-18].
Fuzhou is the capital of Fujian Province in China and the province’s political, economic, and cultural centre (Fig. 1). It is an important city along the southeast coast of China and the gateway of the maritime Silk Road. The meteorological characteristics of Fuzhou are characterized by high wind, air pressure, and relative humidity. Fuzhou has the greatest incidence of HFMD among cities in Fujian Province. There has been no report on the influence of meteorological factors on HFMD and incidence prediction in Fuzhou. Therefore, considering the importance of Fuzhou and its representativeness in Fujian, it is necessary to understand the specific regional impact of meteorological factors on HFMD in Fuzhou.
The advantages of DLNM include that it can solve the nonlinear time-delay correlation problems such as exposure-delay-response through the cross-basis function, and it can also automatically deal with the regression functions linear model (lm), glm and gam. Zero-inflated model cannot examine how or which covariates significantly affect the non-occurrence Zero-inflated regions . In this study, DLNMs were proposed to analyse the relationship between the daily values of HFMD and meteorological factors in Fuzhou for 12 years from 2010 to 2021. There were eight meteorological indicators used in this study, including common indicators such as air temperature, relative humidity, precipitation, and sunshine, and other indicators that researchers do not commonly use. At present, there has been no research report on the impact of air pressure differences and air temperature differences on HFMD.
Compared with traditional machine learning methods, long short-term memory (LSTM) produces better results in the deep learning model [19-24]. Previous reports included comparisons between LSTM and other prediction methods, as well as between single-factor and multifactor LSTM predictions. To the best of our knowledge, no studies have compared the prediction accuracy for HFMD using different meteorological multifactor LSTM methods. In this study, the cases of HFMD were combined with meteorological variables, and the cases of HFMD were predicted using the LSTM model through multifactor single-step and multistep rolling methods, and the prediction effect was evaluated. The purpose was to provide a basis and technical support for constructing an HFMD prediction and early warning system in Fuzhou city and Fujian Province, and to help relevant departments detect and respond to possible HFMD outbreaks in advance.
Materials and methods
The HFMD and population data of Fuzhou from January 1, 2010, to December 31, 2021, were derived from the China Disease Prevention and Control Information System, and the daily meteorological data were derived from the meteorological data network of the China Meteorological Administration (http://data.cma.cn). The missing data were proofread and completed by the Fujian Climate Center. The population with HFMD was stratified by sex (male and female), age (0 ~ 3 years, 4 ~ 6 years, and ≥ 7 years) and area (urban and rural), of which the age-stratified population was divided according to the epidemiological characteristics of HFMD in Fuzhou. The meteorological factors in this study included 8 indicators: air pressure (PRS, hPa), air pressure difference (PRSD, hPa), relative humidity (RHU, %), precipitation (PRE, mm), air temperature (TEM, °C), air temperature difference (TEMD, °C), wind speed (WIN, m/s), and sunshine duration (SSD, h). PRS, RHU, TEM and WIN were measured as daily averages, PRE was measured as the daily cumulative precipitation, SSD was measured as the number of sunshine hours in one day, PRSD was defined as the difference between the maximum and minimum values of daily air pressure, and TEMD was defined as the difference between the highest and lowest values of daily air temperature. The number of lag days in this study was defined as the number of days delayed by the date of HFMD onset compared to the statistical date of the corresponding meteorological factors.
Statistical analysis of the data
The regional map of Fig. 1 was drawn using ArcGIS 10.2 software (ESRI, Redlands, CA, USA).
R 4.1.0 software (R Foundation for Statistical Computing, Vienna, Austria) was used to analyse the daily HFMD and meteorological data. First, a simple analysis of the HFMD and meteorological factors was conducted, and the time series for the variables were plotted. Then, a Spearman correlation analysis and correlation coefficient significance test map between the meteorological indicators and HFMD were generated, and differences with P < 0.05 were considered statistically significant. Finally, a DLNM was used to analyse the influence of meteorological factors on HFMD.
The DLNM incorporates both nonlinear dependency and delay effects, with the essential goal of adding a lag dimension to the exposure–response relationship through a cross-basis function, thereby describing the variation distribution of its effects in both the independent and lagging dimensions . A cross-base matrix for daily meteorological and HFMD data was established, and the quasi-Poisson connection function was used for estimation. After controlling for the effects of day of the week, seasonality and long-term trends [26, 27], the relationship between meteorological factors and HFMD was fitted using the DLNM. The basic model is as follows:
Yt is the t-day cases of HFMD, α is the constant term, xi is the influencing factor, βi is the coefficient, Zj is the potential confounding factor, Dow is the dummy variable for the effect of the day of the week, df is the degrees of freedom, and NS (⋯) is a natural spline function. Lag days and df are determined by the Akaike information criterion (AIC) minimum criterion, which ultimately determined that the df of meteorological factors in this study were all 3. Accounting for the epidemic characteristics, incubation period, and pretest results of HFMD, the maximum lag days were determined to be 14 days, and the cumulative effects of meteorological factors on the risk of HFMD in each population were measured with lags of 3 d, 7 d and 14 d. The average of each meteorological factor was used as a reference value.
Python 3.8.13 software (Python Software Foundation, Delaware, USA) and Tensorflow 2.8.0 software (Google Brain Team, Mountain View, CA, USA) were used to predict the daily and weekly cases of HFMD through LSTM combined with meteorological factors, and the results were plotted.
LSTM is an artificial intelligence deep learning algorithm that is suitable for time series data analysis. Its key feature is the ability to connect the network model in front of and behind neurons so that the network can process the time series data from both directions. The neurons change their state information with the previous data flow, process the current input data according to the current state and output the results. This structure gives neurons a certain memory ability. LSTM has a well-designed structure called a gate to remove or add information to the neuron state to avoid the problem of long-term dependence and retain the long-term information in the sequence. Gates provide a means for information to be passed selectively. LSTM has three gates, a forgetting gate, an input gate and an output gate, to protect and control the state of neurons.
The core idea of LSTM is shown in Fig. 2.
The first step is to decide what information to discard from the neuron state, which is done through the sigmoid layer of the "forgetting gate". ht−1 represents the output of the previous neuron state, Xt represents the input of the current neuron state, and σ represents the sigmoid function. The sigmoid layer outputs a numeric value between 0 and 1, denoting how much of each part can pass through, with 0 representing complete discard and 1 representing complete retention. The expression is as follows:
The second step is to determine what kind of new information is stored in the neuron state. There are two parts here: first, the sigmoid layer of the "input gate" determines which value will be updated; then, the tanh layer creates a new candidate value vector t (a value between -1 and 1) that is added to the state and multiplied by the value of the sigmoid function, updating the old neuron state; Ct-1 is updated to Ct, and finally, the output determines the part to output.
The expression is as follows:
Finally, the output must be determined by the "output gate". First the sigmoid layer is run to determine which part of the neuron state to output; then, the neuron state is processed by tanh (given a value between -1 and 1) and multiplied by the output of the sigmoid gate, and finally, the determined part is output. The expression is as follows:
The root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and symmetric mean absolute percentage error (SMAPE) were used to quantify the accuracy of the model's predictions, and the smaller the value was, the higher the prediction accuracy and the higher the confidence [28-31].
The RMSE calculation formula is as follows:
The MAE calculation formula is as follows:
The MAPE calculation formula is as follows:
The SMAPE calculation formula is as follows:
In the above formulas, Pi is the observed daily incidence of influenza cases on the i day, and Xi is the predicted daily incidence of influenza cases on the i day where i = 1…, n .
In this study, we designed a prediction algorithm based on LSTM to capture the temporal relationship in the sequence. The network model was trained with historical data until it converged. The historical time series data were multifactorial, including time, climate data and HFMD data. After coding, LSTM was input to capture the timing relationship, and then the fully connected layer was entered after coding and splicing to output the timing prediction. A brief description of the operation is shown in Fig. 3.
First, the meteorological and HFMD data from 2010 to 2018 were trained and modelled to predict the cases of HFMD in 2019. Then, the data from 2010 to 2019 were trained and modelled to predict the cases of HFMD in 2020. Finally, the data from 2010 to 2020 were trained and modelled to predict the cases of HFMD in 2021. The prediction was realized by single-step and multistep rolling. Three schemes were adopted in this study. The first method was to input the multifactor value of 1 day to predict the cases of HFMD on the next day. The second method was to input the multifactor value of 7 days to predict the cases of HFMD on the next day. The third method was to input the multifactor value of 7 days to predict the daily average cases of HFMD in the next 7 days. These prediction methods required continuous rolling.
In total, 161,477 HFMD cases were reported in Fuzhou over the study period, with an incidence rate of 187.42/100,000 people and 16 deaths. The incidence rates (1/100000) among males and females were 222.97 and 143.70, respectively. The incidence rates (1/100000) among children aged 0 ~ 3 years, children aged 4 ~ 6 years and children aged ≥ 7 years were 3321.40, 1030.99 and 7.19, respectively.
Table 1 shows significant differences between the sex, age, and area groups of the HFMD-affected population (P < 0.001). Table 2 reports the descriptive statistics for the daily cases of HFMD and meteorological variables.
Figure 4 shows the time series of HFMD and meteorological factors, with a specific seasonal periodicity that shows consistency in their fluctuations, thus indicating a correlation and lag between HFMD and meteorological factors.
The correlation analysis demonstrated a curved correlation between most meteorological factors and HFMD and between the meteorological factors (P < 0.05). RHU, PRE, TEM, WIN, and SSD were significantly positively correlated with HFMD (r > 0, P < 0.01), while PRS and PRSD were significantly negatively correlated with HFMD (r < 0, P < 0.01). Among them, TEM, PRS, and PRSD had the most significant relationship with HFMD, while the relationship between TEMD and HFMD was not noticeable. PRS, PRE, and TEM were significantly correlated with other meteorological factors (P < 0.05). The detailed correlation between HFMD and meteorological factors is presented in Fig. 5.
The risk effect of PRS on HFMD increased gradually in waves with the increase in PRS. Medium PRS (993–1005 hPa) and high PRS (> 1015 hPa) were risk factors for HFMD. The cumulative effect increased with the increase in lag days, and the correlation peaks were 998 hPa (lag 14 d, RR: 1.36, 95% CI: 1.24–1.48) and 1026 hPa (lag 14 d, RR: 7.59, 95% CI: 4.45–12.95), respectively. The cumulative effects of PRS on the risk of HFMD among male children aged 4 ~ 6 years and rural populations were more significant. However, the cumulative risk effect of PRS on HFMD among children aged 4 ~ 6 years first decreased and then increased with the increase in lag days. At the same time, the cumulative risk effect of HFMD among children aged 0 ~ 3 years and ≥ 7 years continued to increase.
Low (4 hPa) and high (≥ 21 hPa) PRSDs were risk factors for HFMD, and the related peak existed at 24 hPa with a lag of 0 days (RR: 1.06, 95% CI: 0.77–1.45). With the increase in lag days, the cumulative risk effect of a low PRSD did not decrease significantly, while that of a high PRSD decreased rapidly. The cumulative effects of PRSD on the risk of HFMD among female children aged 4 ~ 6 years and urban populations were more significant. However, the cumulative risk effect of PRSD on HFMD in the ≥ 7-year-old population first decreased and then increased with the increase in lag days, and the RR rose to 17.12 after a lag of 14 days at 24 hPa.
Low (27–56%) and high (> 73%) RHU were risk factors for HFMD. With the increase in lag days, the cumulative effect of low RHU (< 35%) on the risk of HFMD increased rapidly (27%, lag 14 d, RR = 2.68, 95% CI: 1.44–4.99), while that of medium RHU (> 35%) decreased gradually, and that of RHU (41–56%) faded gradually. The cumulative effects of RHU on the risk of HFMD in female and rural populations were more significant.
Overall, the effect of PRE on HFMD was not significant, although high PRE (> 82 mm) had a significant effect on HFMD among males, children aged 4 ~ 6 years, and rural populations.
Low (≤ 3 °C) and high (> 21 °C) TEMs were risk factors for HFMD. With the increase in TEM and lag days, the cumulative effect of high TEM on the risk of HFMD increased rapidly (33 °C, lag 14 d, RR = 3.51, 95% CI: 2.84–4.34). The cumulative effects of TEM on the risk of HFMD among children aged 0 ~ 3 years and ≥ 7 years and rural populations were more significant. However, the risk of HFMD was not significantly different between men and women.
Low (< 7 °C) and high (> 12 °C) TEMDs were risk factors for HFMD. With the increase in TEMD and lag days, the cumulative effect of a low TEMD on the risk of HFMD decreased (1 °C, lag 3 d, RR = 1.27, 95% CI: 1.12–1.43), while the cumulative risk effect of a high TEMD continued to increase (17 °C, lag 14 d, RR = 2.04, 95% CI: 1.31–3.19). Compared with that observed in urban populations, the cumulative effect of TEMD on HFMD in rural populations was more prominent.
With the increase in WIN, its cumulative effect on the risk of HFMD first decreased and then increased. The RR value of the cumulative effect of WIN on the risk of HFMD lagging for 14 days decreased from 1.21 at 1 m/s to 0.93 at 3 m/s and then gradually increased rapidly to 955.45 at 9 m/s. The cumulative effects of WIN on the risk of HFMD among males, the ≥ 7-year-old population, and urban populations were more significant.
Low (2–4 h) SSD was a risk factor for HFMD, and the cumulative effect increased with increasing lag days (3 h, lag 14 d, RR = 1.06, 95% CI: 1.01–1.12).
All, rural and urban HFMD cases were predicted and evaluated respectively. Figure 8 shows that the cases of HFMD predicted by the three methods from 2019 to 2021 were in good agreement with the actual values, and had high accuracy. Figure 9 shows that the RMSE, MAE, MAPE and SMAPE of using the weekly multifactor data to predict the cases of HFMD on the next day, from 2019 to 2021, were lower than those of using the daily multifactor data to predict the cases of HFMD on the next day. In particular, the RMSE, MAE, MAPE and SMAPE of using weekly multifactor data to predict the next week's daily average cases of HFMD were much lower, and similar results were also found in rural and urban areas, which indicating that this approach was more accurate.
Figure 4 shows that the time series fluctuations of HFMD and PRE had obvious consistency. Nevertheless, the DLNM showed that, overall, the effect of PRE on the risk of HFMD was not significant. However, high PRE (> 82 mm) significantly affected HFMD risk among male children aged 4 ~ 6 years and in rural populations. These results seemed contradictory. However, through Fig. 4, we found that the number of cases of HFMD in the peak period of PRE from 2020 to 2021 were substantially fewer than those in previous years. With the emergence of the coronavirus disease 2019 (COVID-19) pandemic, protective and control measures such as restricted movement, reduced physical contact in public places, frequent hand washing, appropriate ventilation, and disinfection of public areas were initiated. All these measures carried out to prevent coronavirus transmission reduced the probability of contracting HFMD. More importantly, the suspension of classes because of the COVID-19 pandemic substantially reduced the number of HFMD outbreaks, especially in 2020. Due to the impact of the COVID-19 pandemic, the incidence of HFMD decreased abnormally for two consecutive years from 2020–2021, which may have affected the accuracy of the assessment of the cumulative effect of PRE on HFMD through the DLNM.
Figure 5 shows the results of Spearman grade correlation analysis, which showed that PRE was significantly positively correlated with the risk of HFMD. In contrast, the relationship between TEMD and HFMD risk was not apparent. Nevertheless, DLNM analysis showed that the effect of PRE on HFMD risk was not significant, while low (< 7 °C) and high (> 12 °C) TEMDs were risk factors for HFMD. The DLNM integrates nonlinear dependence and delay effects and considers the control of potential confounding factors, while Spearman rank correlation analysis lacks these functions, which means that using a DLNM to analyse the impact of meteorological factors on infectious diseases can obtain more practical and specific results.
In this study, there was no significant correlation between PRE and HFMD risk. This result was consistent with those of some research reports [32, 33] but inconsistent with other research reports [13, 34]. This may be due to regional heterogeneity or the abnormal reduction in HFMD cases caused by COVID-19 prevention and control measures in recent years. Numerous studies have also emphasized the importance of temporal and spatial heterogeneity in meteorological impacts on infectious diseases [27, 35, 36]. In addition, this may be caused by differences in the analysis model scheme and data type. For example, in this study, RHU was significantly associated with HFMD risk, consistent with the findings of several daily value-based research reports [18, 37, 38], while some monthly value-based research reports showed no significant correlation between them [39, 40]. However, our results showed that the characteristics and value range of RHU affecting HFMD risk were different from those reported in other studies; it was not that the higher the humidity was, the more significant the impact. Moreover, this study showed that lower relative humidity is also a risk factor for HFMD. HFMD in Fuzhou has two peak outbreak periods in summer and autumn every year. Fuzhou has a typical subtropical monsoon climate. It is dominated by sunny, hot, and high-temperature weather in summer, with ample rainfall and high humidity. In autumn, the sky is clear and clouds are scarce, with sufficient sunshine, reduced humidity, and appropriate temperatures. During humid days, the HFMD virus could easily attach to small particles in the air or to toy surfaces; therefore, sharing toys and other supplies among children might promote the spread of the disease [41, 42]. In summer, increased RHU is usually accompanied by heavy rainfall in Fuzhou, so outdoor public facilities are frequently washed by rainwater, which reduces the attachment of pathogens and reduces children's outdoor activities when RHU is high. Thus, high (> 73%) RHU was a risk factor for HFMD in Fuzhou, and the cumulative risk effect increased first and then decreased with increasing RHU. Therefore, high (> 73%) RHU may be mainly due to the high incidence of HFMD in summer, and low humidity may be mainly due to the high incidence of HFMD in autumn.
There are few reports on the impact of PRS on HFMD. However, we found that the impact of PRS on the risk of HFMD increased gradually in waves with increasing PRS. Medium PRS (993–1005 hPa) and high PRS (> 1015 hPa) were risk factors for HFMD. In principle, the influencing factors of PRS include temperature, altitude, and air movement. PRS decreases with increasing TEM and increases with decreasing TEM. Fuzhou is mainly characterized by severe cold winters and a subtropical climate in summer and autumn. PRS increases the density of harmful gas and viruses floating in the air, allowing them to fall on the ground or objects quickly. For example, as one of the main pollutants, nitrogen dioxide (NO2) increases the risk of HFMD by affecting immunity, resulting in inflammation and weakening the body's resistance to viral infection . The peak of PRS in Fuzhou is distributed in winter. However, this peak is accompanied by low temperatures (the average temperature in winter is 11 °C), and low temperatures are not conducive to the growth and transmission of the HFMD virus in the external environment. Therefore, the incidence of HFMD in winter is not high. In addition, the cumulative effect of PRS on HFMD risk increased with the increase in lag days, showing that the impact did not easily subside, which may play a chronic role.
This study showed that with the increase in TEM, the cumulative impact of high TEM (> 21 °C) on the risk of HFMD increases rapidly. Several studies have shown that the incidence of HFMD significantly increases as the temperature increases [33, 40, 43-46]. High temperatures can increase enterovirus growth and interfere with the inactivation and recovery of enteroviruses [47, 48]. Temperature can also affect the behavioural patterns of the host population; for instance, warm weather may encourage children to go out to public entertainment areas more often, thereby increasing their frequency of contact with each other and leading to more exposure to pathogens [49, 50]. In addition, the hands easily sweat in high temperatures, which is conducive to the breeding and cross-infection of viruses when in contact with the public. Children are even more active and sweat easily. However, in this study, we found that a low (≤ 3 °C) TEM was also a risk factor for HFMD. The possible underlying mechanism of HFMD can be explained by interactions of pathogens, host population structure, and environmental factors [34, 51, 52]. When the temperature is low, interactions often occur in relatively closed public places with poor ventilation and among people with poor handwashing habits. Moreover, in Fuzhou, RHU is usually very high in low-temperature seasons, such as the end of winter and early spring (not caused by heavy precipitation), which is conducive to the breeding and transmission of the virus.
The impact of the PRSD and TEMD on HFMD has not been reported, but in this study, we found that their low and high values were risk factors for HFMD. The cumulative effect of PRSD on the risk of HFMD among females was more significant, showing that immunity among females may be more susceptible to changes in air pressure. We also found that, compared with urban populations, the cumulative effect of TEMD on HFMD risk in rural populations was more prominent. Many rural areas in Fuzhou are distributed in mountainous regions, with apparent diurnal body temperature differences. In contrast, the differences in the TEMD and diurnal body temperature in urban areas are negligible due to the heat island effect. Nevertheless, the meteorological data measured in this study were from the same meteorological station. In other words, the meteorological values of urban and rural areas came from one meteorological station, and the measured values were the same, but the difference between the two meteorological environments was obvious, which may affect the meteorological evaluation of the HFMD risk effect.
To predict the incidence of HFMD more accurately through meteorological factors, we studied and evaluated various prediction methods. The methods commonly used in the prediction, such as the susceptible-infectious-recovery (SIR) model, autoregressive integrated moving average (ARIMA) model, and the recurrent neural network (RNN), have exhibited good performance, but they are still not satisfactory for the following reasons. The SIR model cannot fully use the information in the multidimensional input data; the ARIMA requires time series data to be stable after differential differentiation and can only capture linear relationships, not nonlinear relationships. At the same time, gradient extinction easily occurs in RNNs, and the problem of long-distance dependence cannot be handled [22-24, 29, 53].
LSTM is an advanced RNN with the ability to learn time patterns and store valuable memories longer . Due to its unique design structure, LSTM can solve gradient extinction problems and nonlinear relationships. In addition, it can incorporate meteorological factors and is also suitable for predicting important events with very long intervals and delays in time series. It has been reported that the accuracy of using LSTM model to predict HFMD was better than other models [28, 54].
This study showed that the RMSE, MAE, MAPE and SMAPE values of the cases of HFMD predicted using the Day-Daily, Week-Daily, and Week-Weekly methods were low. This indicates that it was more accurate to predict HFMD cases using weekly multifactor data, especially to predict the daily average cases in the next week. The prediction of rural and urban areas also presented a similar situation, which further supports this result. Moreover, it was more in line with the actual work to predict the daily average cases of HFMD in the next week by using weekly multifactor data. At the same time, this also indicates that the meteorological indicators in this study can accurately predict the incidence of HFMD through LSTM models.
However, overfitting should be avoided during modelling. LSTM models involve the risk of underfitting or overfitting, which often results in poor prediction performance [28, 55]. In addition, the model's performance deteriorates when the number of memory neurons is less than 32 or the number of training rounds is less than 250 .
In summary, we introduced more abundant meteorological factors and screened out the common meteorological factor PRE in this study, which makes the multifactor parameter setting more comprehensive and reasonable. We also built a multifactor and multistep LSTM prediction model for infectious disease prevention that can flexibly adapt to the input parameters in different scenarios. In this study, combined with the common prediction methods of infectious disease prevention and control, the LSTM model was adapted to the input of the three prediction methods, and the incidence of HFMD cases in Fuzhou achieved accurate prediction results. We also recognize that using weekly multifactor data to predict HFMD cases, especially the daily average cases in the next week, is most accurate. Of course, according to the different needs of practical work, daily forecasts and weekly forecasts can be combined. These meteorological factors and prediction models can be incorporated into the HFMD early warning and prediction system of Fuzhou city and Fujian Province to provide a reference for formulating prevention strategies. They can also be used as risk predictions for adjusting people's lifestyles.
However, this study also has some limitations. First, although meteorological factors are very important for the spread of HFMD, social behaviours, the economy, population mobility and air quality may also affect the occurrence and spread of HFMD. Especially when comparing regions, such as urban and rural regions, the spread of infection is affected by differences in personal hygiene, including hand-washing, toileting habits, food handling habits and food handling personnel, although the prediction of urban and rural regions in this study is very accurate. Therefore, it may be more accurate to include more relevant influencing factors to predict HFMD. However, the influence of these factors can be reflected in the number of cases of HFMD to a certain extent. Therefore, when using HFMD and meteorological factors as multiple factors, it is necessary to regularly incorporate the latest HFMD and meteorological data into the revised prediction model within a short period of time and then repredict HFMD cases; half a year or one year may be appropriate. Second, in the past two years, COVID-19 prevention and control measures, such as the suspension of classes and reduction in outdoor activities, have substantially reduced the incidence of HFMD; thus, the impact of meteorology on HFMD and prediction research may have been affected, and the degree of impact needs to be further studied and evaluated. Third, the pathogenic stratification analysis of HFMD was not carried out in this study because most cases were clinically diagnosed and lacked laboratory results. Because the HFMD cases used in this study were reported by medical and health institutions, whereas laboratory test cases were scarce, the use of cases with laboratory test results for meteorological impact assessment would cause bias in the analysis results. Fourth, the topography and vertical structure are complex in Fuzhou; therefore, the meteorological conditions have also changed greatly. However, the meteorological data in this study came from one station, while the HFMD cases came from various medical and health institutions in the city, which may have affected the research results. Therefore, more meteorological station data need to be included in future studies.
Meteorological factors such as PRS, PRSD, RHU, TEM, TEMD, WIN, and SSD significantly impact HFMD risk in Fuzhou. LSTM models combined with the meteorological factors in this study can accurately predict the risk of HFMD. It is more accurate to predict HFMD cases using weekly multifactor data, especially to predict the daily average cases in the next week. These meteorological factors and prediction models can be incorporated into an early warning and prediction system for HFMD in Fuzhou city and Fujian Province and could be used as a reference in other regions.
Availability of data and materials
The datasets that support the findings of this study are available from Fujian Provincial Centre of Disease Control and Prevention, Fujian Climate Center and meteorological data network of the China Meteorological Administration, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of these three institutions (E-mail: firstname.lastname@example.org).
Akaike information criterion
Autoregressive integrated moving average
Corona virus disease 2019
Distribution lag nonlinear models
Models, generalized additive mixed model
Generalized linear model
Long short-term memory
Mean absolute error
Mean absolute percentage error
- NO2 :
Root mean squared error
Recurrent neural network
Susceptible infectious recovery
Symmetric mean absolute percentage error
Spatiotemporal zero-inflated negative binomial
Cai K, Wang Y, Guo Z, Yu H, Li H, Zhang L, Xu S, Zhang Q. Clinical characteristics and managements of severe hand, foot and mouth disease caused by enterovirus A71 and coxsackievirus A16 in Shanghai, China. BMC Infect Dis. 2019;19(1):285. https://doi.org/10.1186/s12879-019-3878-6.
Ma T, Ji T, Yang G, Chen Y, Xu W, Liu H. Prediction of the incidence trend of hand, foot and mouth disease based on long-term memory neural network. Computer Applications. 2021;41(1):265–9. https://doi.org/10.11772/j.issn.1001-9081.2020060936.
Gu J, Liang L, Song H, Kong Y, Ma R, Hou Y, Zhao J, Liu J, He N, Zhang Y. A method for hand-foot-mouth disease prediction using GeoDetector and LSTM model in Guangxi, China. Sci Rep. 2019;9(1):17928. https://doi.org/10.1038/s41598-019-54495-2.
Song Y, Wang F, Wang B, Tao S, Zhang H, Liu S, Ramirez O, Zeng Q. Time series analyses of hand, foot and mouth disease integrating weather variables. PLoS One. 2015;10(3):e0117296. https://doi.org/10.1371/journal.pone.0117296.
Ma E, Lam T, Wong C, Chuang SK. Is hand, foot and mouth disease associated with meteorological parameters? Epidemiol Infect. 2010;138(12):1779–88. https://doi.org/10.1017/S0950268810002256.
Hii Y L, Rocklöv J, Ng N. Short term effects of weather on hand, foot and mouth disease. PloS One. 2011;6(2):e16796. https://doi.org/10.1371/journal.pone.0016796.
Sumi A, Toyoda S, Kanou K, Fujimoto T, Mise K, Kohei Y, Koyama A, Kobayashi N. Association between meteorological factors and reported cases of hand, foot, and mouth disease from 2000 to 2015 in Japan. Epidemiol Infect. 2017;145(14):2896–911. https://doi.org/10.1017/S0950268817001820.
Thanh T C. Effects OF climate variations ON hand-foot-mouth disease IN HO CHI minh city. Vietnam J Sci Technol. 2016;54(2A):120. https://doi.org/10.15625/2525-2518/54/2A/11920.
Kim B I, Ki H, Park S, Cho E, Chun B C. Effect of Climatic Factors on Hand, Foot, and Mouth Disease in South Korea, 2010–2013. PLoS One. 2016;11(6):e0157500. https://doi.org/10.1371/journal.pone.0157500.
Abdul Wahid N, Suhaila J, Sulekan, A. Temperature effect on HFMD transmission in selangor, Malaysia. Sains Malaysiana. 2020;49(10):2587–2597. https://doi.org/10.17576/jsm-2020-4910-24.
Chen S, Liu X, Wu Y, Xu G, Zhang X, Mei S, Zhang Z, O’Meara M, O’Gara MC, Tan X, Li L. The application of meteorological data and search index data in improving the prediction of HFMD: A study of two cities in Guangdong Province. China Sci Total Environ. 2019;652:1013–21. https://doi.org/10.1016/j.scitotenv.2018.10.304.
Song C, He Y, Bo Y, Wang J, Ren Z, Yang H. Risk Assessment and Mapping of Hand, Foot, and Mouth Disease at the County Level in Mainland China Using Spatiotemporal Zero-Inflated Bayesian Hierarchical Models. Int J Environ Res Public Health. 2018;15(7):1476–91. https://doi.org/10.3390/ijerph15071476.
Abdul Wahid N, Suhaila J, Rahman H A. Effect of climate factors on the incidence of hand, foot, and mouth disease in Malaysia: A generalized additive mixed model. Infect Dis Model. 2021;6:997–1008. https://doi.org/10.1016/j.idm.2021.08.003.
Chen B, Sumi A, Toyoda S, Hu Q, Zhou D, Mise K, Zhao J, Kobayashi N. Time series analysis of reported cases of hand, foot, and mouth disease from 2010 to 2013 in Wuhan. China BMC Infect Dis. 2015;15:495. https://doi.org/10.1186/s12879-015-1233-0.
Chang H, Chio C, Su H, Liao C, Lin C, Shau W, Chi Y, Cheng Y, Chou Y, Li C, Chen K, Chen, K. The association between enterovirus 71 infections and meteorological parameters in Taiwan. PLoS One. 2012;7(10):e46845. https://doi.org/10.1371/journal.pone.0046845.
Huang X, Wei H, Wu S, Du Y, Liu L, Su J, Xu Y, Wang H, Li X, Wang Y, Liu G, Chen W, Klena JD, Xu B. Epidemiological and etiological characteristics of hand, foot, and mouth disease in Henan, China, 2008–2013. Sci Rep. 2015;5:8904. https://doi.org/10.1038/srep08904.
Gui J, Liu Z, Zhang T, Hua Q, Jiang Z, Chen B, Gu H, Lv H, Dong C. Epidemiological Characteristics and Spatial-Temporal Clusters of Hand, Foot, and Mouth Disease in Zhejiang Province, China, 2008–2012. PLoS One. 2015;10(9):e0139109. https://doi.org/10.1371/journal.pone.0139109.
Qi H, Chen Y, Xu D, Su H, Zhan L, Xu Z, Huang Y, He Q, Hu Y, Lynn H, Zhang Z. Impact of meteorological factors on the incidence of childhood hand, foot, and mouth disease (HFMD) analyzed by DLNMs-based time series approach. Infect Dis Poverty. 2018;7(1):7. https://doi.org/10.1186/s40249-018-0388-5.
Soebiyanto RP, Clara W, Jara J, Castillo L, Sorto OR, Marinero S, de Antinori ME, McCracken JP, Widdowson MA, Azziz-Baumgartner E, Kiang RK. The role of temperature and humidity on seasonal influenza in tropical areas: Guatemala, El Salvador and Panama, 2008–2013. PLoS One. 2014;9(6):e100659. https://doi.org/10.1371/journal.pone.0100659.
Soebiyanto RP, Clara WA, Jara J, Balmaseda A, Lara J, Lopez Moya M, Palekar R, Widdowson MA, Azziz-Baumgartner E, Kiang RK. Associations between seasonal influenza and meteorological parameters in Costa Rica, Honduras and Nicaragua. Geospat Health. 2015;10(2):372. https://doi.org/10.4081/gh.2015.372.
Polozov IV, Bezrukov L, Gawrisch K, Zimmerberg J. Progressive ordering with decreasing temperature of the phospholipids of influenza virus. Nat Chem Biol. 2008;4(4):248–55. https://doi.org/10.1038/nchembio.77.
Liu L, Luan RS, Yin F, Zhu XP, Lü Q. Predicting the incidence of hand, foot and mouth disease in Sichuan province, China using the ARIMA model - CORRIGENDUM. Epidemiol Infect. 2016;144(1):152. https://doi.org/10.1017/S0950268815001582.
Pons-Salort M, Grassly NC. Serotype-specific immunity explains the incidence of diseases caused by human enteroviruses. Science. 2018;361(6404):800–3. https://doi.org/10.1126/science.aat6777.
Li Z, Tao B, Zhan M, Wu Z, Wu J, Wang J. A comparative study of time series models in predicting COVID-19 cases. Chin J Epidemiol. 2021;42(3):421–6. https://doi.org/10.3760/cma.j.cn112338-20201116-01333.
Gasparrini A, Armstrong B, Kenward MG. Distributed lag non-linear models. Stat Med. 2010;29(21):2224–34. https://doi.org/10.1002/sim.3940.
Wang J, Li S, Ma H, Dong J, Wang Y, Zhang W, Zhang X, Li P, Li S. Research on the relationship between the daily mean temperature and the daily cases of varicella during 2008–2016 in Lanzhou, China. Chin J Prev Med. 2018;52(8):842–8. https://doi.org/10.3760/cma.j.issn.0253-9624.2018.08.013.
Gao J, Li L, Wang J, Liu X, Wu H, Li J, Liu Q. Progress of research in relation to the impact of climate change on children’s health status. Chin J Epidemiol. 2017;38(6):832–6. https://doi.org/10.3760/cma.j.issn.0254-6450.2017.06.028.
Zhang R, Guo Z, Meng Y, Wang S, Li S, Niu R, Wang Y, Guo Q, Li Y. Comparison of ARIMA and LSTM in Forecasting the Incidence of HFMD Combined and Uncombined with Exogenous Meteorological Variables in Ningbo, China. Int J Environ Res Public Health. 2021;18(11):6174–87. https://doi.org/10.3390/ijerph18116174.
Hu Y, Wang N, Liu S, Jiang Q, Zhang N. Research on Application of Time Series Model and LSTM Model in Water Quality Prediction. Journal of Chinese Computer Systems. 2021;42(8):1569–73. https://doi.org/10.3969/j.issn.1000-1220.2021.08.001.
Zhu H, Chen S, Lu W, Chen K, Feng Y, Xie Z, Zhang Z, Li L, Ou J, Chen G. Study on the influence of meteorological factors on influenza in different regions and predictions based on an LSTM algorithm. BMC Public Health. 2022;22(1):2335–51. https://doi.org/10.1186/s12889-023-15164-2.
Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peer J Comput Sci. 2021;7:e623. https://doi.org/10.7717/peerj-cs.623.
Onozuka D, Hashizume M. The influence of temperature and humidity on the incidence of hand, foot, and mouth disease in Japan. Sci Total Environ. 2011;410–411:119–25. https://doi.org/10.1016/j.scitotenv.2011.09.055.
Huang Y, Deng T, Yu S, Gu J, Huang C, Xiao G, Hao Y. Effect of meteorological variables on the incidence of hand, foot, and mouth disease in children: a time-series analysis in Guangzhou. China BMC Infect Dis. 2013;13:134. https://doi.org/10.1186/1471-2334-13-134.
Nguyen H, Chu C, Nguyen HLT, Nguyen HT, Do CM, Rutherford S, Phung D. Temporal and spatial analysis of hand, foot, and mouth disease in relation to climate factors: A study in the Mekong Delta region. Vietnam Sci Total Environ. 2017;581–582:766–72. https://doi.org/10.1016/j.scitotenv.2017.01.006.
Shi X. Air pollution, climate change and health: from evidence to action. Chin J Prev Med. 2019;53(1):1–3. https://doi.org/10.3760/cma.j.issn.0253-9624.2019.01.001. (PMID: 30605957).
Zhu L, Wang X, Guo Y, Xu J, Xue F, Liu Y. Assessment of temperature effect on childhood hand, foot and mouth disease incidence (0–5years) and associated effect modifiers: A 17 cities study in Shandong Province, China, 2007–2012. Sci Total Environ. 2016;551–552:452–9. https://doi.org/10.1016/j.scitotenv.2016.01.173.
Yan S, Wei L, Duan Y, Li H, Liao Y, Lv Q, Zhu F, Wang Z, Lu W, Yin P, Cheng J, Jiang H. Short-Term Effects of Meteorological Factors and Air Pollutants on Hand, Foot and Mouth Disease among Children in Shenzhen, China, 2009–2017. Int J Environ Res Public Health. 2019;16(19):3639. https://doi.org/10.3390/ijerph16193639.
Zhang Q, Zhou M, Yang Y, You E, Wu J, Zhang W, Jin J, Huang F. Short-term effects of extreme meteorological factors on childhood hand, foot, and mouth disease reinfection in Hefei, China: A distributed lag non-linear analysis. Sci Total Environ. 2019;653:839–48. https://doi.org/10.1016/j.scitotenv.2018.10.349.
Wang Y, Feng Z, Yang Y, Self S, Gao Y, Longini IM, Wakefield J, Zhang J, Wang L, Chen X, Yao L, Stanaway JD, Wang Z, Yang W. Hand, foot, and mouth disease in China: patterns of spread and transmissibility. Epidemiology. 2011;22(6):781–92. https://doi.org/10.1097/EDE.0b013e318231d67a.
Van Pham H, Phan UTN, Pham ANQ. Meteorological factors associated with hand, foot and mouth disease in a Central Highlands province in Viet Nam: an ecological study. Western Pac Surveill Response J. 2019;10(4):18–23. https://doi.org/10.5365/wpsar.2017.8.1.003.
Fletcher L, Noakes C, Beggs C, Sleigh P. The importance of bioaerosols in hospital infections and the potential for control using germicidal ultraviolet irradiation. Proceedings of the First Seminar on Applied Aerobiology. Murcia, Spain 2004, 1st seminar on Applied Aerobiology.
Yang H, Wu J, Cheng J, Wang X, Wen L, Li K, Su H. Is high relative humidity associated with childhood hand, foot, and mouth disease in rural and urban areas? Public Health. 2017;142:201–7. https://doi.org/10.1016/j.puhe.2015.03.018.
Li T, Yang Z, DI B, Wang M. Hand-foot-and-mouth disease and weather factors in Guangzhou, southern China. Epidemiol Infect. 2014;142(8):1741–50. https://doi.org/10.1017/S0950268813002938.
Dong W, Li X, Yang P, Liao H, Wang X, Wang Q. The Effects of Weather Factors on Hand, Foot and Mouth Disease in Beijing. Sci Rep. 2016;6:19247. https://doi.org/10.1038/srep19247.
Zhang Z, Xie X, Chen X, Li Y, Lu Y, Mei S, Liao Y, Lin H. Short-term effects of meteorological factors on hand, foot and mouth disease among children in Shenzhen, China: Non-linearity, threshold and interaction. Sci Total Environ. 2016;539:576–82. https://doi.org/10.1016/j.scitotenv.2015.09.027.
Wang H, Du Z, Wang X, Liu Y, Yuan Z, Xue F. Detecting the association between meteorological factors and hand, foot, and mouth disease using spatial panel data models. Int J Infect Dis. 2015;34:66–70. https://doi.org/10.1016/j.scitotenv.2015.09.027.
Yeager JG, O’Brien RT. Enterovirus inactivation in soil. APPL ENVIRON MICROB. 1979;38(4):694–701. https://doi.org/10.1128/aem.38.4.694-701.1979.
Rajtar B, Majek M, Polański Ł, Polz-Dacewicz M. Enteroviruses in water environment-a potential threat to public health. Ann Agric Environ Med. 2008;15(2):199–203 (PMID: 19061255).
Liu Y, Wang X, Pang C, Yuan Z, Li H, Xue F. Spatio-temporal analysis of the relationship between climate and hand, foot, and mouth disease in Shandong province, China, 2008–2012. BMC Infect Dis. 2015;15:146. https://doi.org/10.1186/s12879-015-0901-4.
Liu L, Zhao X, Yin F, Lv Q. Spatio-temporal clustering of hand, foot and mouth disease at the county level in Sichuan province, China, 2008–2013. Epidemiol Infect. 2015;143(4):831–8. https://doi.org/10.1017/S0950268814001587.
Barrett B, Charles JW, Temte JL. Climate change, human health, and epidemiological transition. Prev Med. 2015;70:69–75. https://doi.org/10.1016/j.ypmed.2014.11.013.
Khasnis AA, Nettleman MD. Global warming and infectious disease. Arch Med Res. 2005;36(6):689–96. https://doi.org/10.1016/j.arcmed.2005.03.041.
Wei SJ, Zhou YX. Human body fall detection model combining alpha pose and LSTM. Journal of Chinese Computer Systems. 2019;40(9):1886–90. CNKI:SUN:XXWX.0.2019-09-014.
Wang Y, Xu C, Zhang S, Yang L, Wang Z, Zhu Y, Yuan J. Development and evaluation of a deep learning approach for modeling seasonality and trends in hand-foot-mouth disease incidence in mainland China. Sci Rep. 2019;9(1):8046–60. https://doi.org/10.1038/s41598-019-44469-9.
Postalcioglu S. Performance Analysis of Different Optimizers for Deep Learning based Image Recognition. Intern J Pattern Recognit Artif Intell. 2019;34(2):2051003. https://doi.org/10.1142/s0218001420510039.
We thank Springer Nature Author Services (https://authorservices.springernature.com) for its linguistic assistance during the preparation of this manuscript.
This work was supported by National Natural Science Foundation of China (61972187), Natural Science Foundation of Fujian Province (2021J01350, 2020J01094), Young and Middle-aged Backbone Talents Training Project of Fujian Provincial Health Commission (2021GGA037), Construction of Fujian Provincial Scientific and Technological Innovation Platform (2019Y2001).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zhu, H., Chen, S., Liang, R. et al. Study of the influence of meteorological factors on HFMD and prediction based on the LSTM algorithm in Fuzhou, China. BMC Infect Dis 23, 299 (2023). https://doi.org/10.1186/s12879-023-08184-1
- Relative humidity
- Air temperature