The development of a combined mathematical model to forecast the incidence of hepatitis E in Shanghai, China
 Hong Ren^{1},
 Jian Li^{1},
 ZhengAn Yuan^{1},
 JiaYu Hu^{1},
 Yan Yu^{3}Email author and
 YiHan Lu^{2}Email author
DOI: 10.1186/1471233413421
© Ren et al.; licensee BioMed Central Ltd. 2013
Received: 28 January 2013
Accepted: 4 September 2013
Published: 8 September 2013
Abstract
Background
Sporadic hepatitis E has become an important public health concern in China. Accurate forecasting of the incidence of hepatitis E is needed to better plan future medical needs. Few mathematical models can be used because hepatitis E morbidity data has both linear and nonlinear patterns. We developed a combined mathematical model using an autoregressive integrated moving average model (ARIMA) and a back propagation neural network (BPNN) to forecast the incidence of hepatitis E.
Methods
The morbidity data of hepatitis E in Shanghai from 2000 to 2012 were retrieved from the China Information System for Disease Control and Prevention. The ARIMABPNN combined model was trained with 144 months of morbidity data from January 2000 to December 2011, validated with 12 months of data January 2012 to December 2012, and then employed to forecast hepatitis E incidence January 2013 to December 2013 in Shanghai. Residual analysis, Root Mean Square Error (RMSE), normalized Bayesian Information Criterion (BIC), and stationary R square methods were used to compare the goodnessoffit among ARIMA models. The Bayesian regularization backpropagation algorithm was used to train the network. The mean error rate (MER) was used to assess the validity of the combined model.
Results
A total of 7,489 hepatitis E cases was reported in Shanghai from 2000 to 2012. Goodnessoffit (stationary R^{2}=0.531, BIC= −4.768, LjungBox Q statistics=15.59, P=0.482) and parameter estimates were used to determine the bestfitting model as ARIMA (0,1,1)×(0,1,1)_{12}. Predicted morbidity values in 2012 from bestfitting ARIMA model and actual morbidity data from 2000 to 2011 were used to further construct the combined model. The MER of the ARIMA model and the ARIMABPNN combined model were 0.250 and 0.176, respectively. The forecasted incidence of hepatitis E in 2013 was 0.095 to 0.372 per 100,000 population. There was a seasonal variation with a peak during JanuaryMarch and a nadir during AugustOctober.
Conclusions
Time series analysis suggested a seasonal pattern of hepatitis E morbidity in Shanghai, China. An ARIMABPNN combined model was used to fit the linear and nonlinear patterns of time series data, and accurately forecast hepatitis E infections.
Keywords
Hepatitis E Combined mathematical model ForecastBackground
Hepatitis E is a liver disease caused by hepatitis E virus (HEV), a nonenveloped, positivesense, singlestranded RNA virus which is transmitted mainly through contaminated drinking water or uncooked/undercooked food [1]. Since the earliest report of this waterborne disease in New Delhi, India during 1955 to 1956, it has been epidemic in many developing countries [2]. Every year there are 20 million hepatitis E infections, over 3 million acute cases of hepatitis E, and 70,000 hepatitis Erelated deaths in the world. The prevalence is highest in Eastern and Southern Asia [3]. Sporadic hepatitis E has also become an important public health concern in developed countries, causing over 50% of acute viral hepatitis cases in recent years [4–7].
Shanghai is the largest metropolis in China with a permanent population of over 23.8 million. About 14 million are officially registered residents and 9.7 million are migrants. In order to control the spread of HEV, a surveillance system was established and a series of studies of HEV genotype, transmission route, and risk factors for infection have been conducted in Shanghai since 1997 [8–10]. According to surveillance data from Shanghai Municipal Center for Disease Control and Prevention, hepatitis E has been far more common than hepatitis A since 2004. Many researchers have developed mathematical models to forecast the incidence of hepatitis E.
Few mathematical models are applicable for modeling as time series data of hepatitis E infection has both linear and nonlinear characteristics. Autoregressive integrated moving average (ARIMA) has become one of the most popular and convenient linear models in time series forecasting [11–14]. It has advantages in both statistical properties and BoxJenkins methodology in the model building process [15]. Although the ARIMA model could fit several different types of time series data, the major limitation is the preassumed linearity of the model [16]. In contrast, artificial neural networks (ANNs) have the ability to learn and describe highlynonlinear and stronglycoupled relationships between multiinput and multioutput variables [17], and have no need to specify a detailed model. However, ANNs cannot handle both linear and nonlinear patterns equally well [18]. We designed a combined model using an ARIMA model and a neural network to forecast the incidence of hepatitis E in Shanghai.
Methods
Data source
Hepatitis E is one of Nationally Notifiable Infectious Diseases in China. Upon laboratory confirmation, hospital physicians register each patient’s information in the China Information System for Disease Control and Prevention within 24 hours. Community physicians then conduct an epidemiological investigation, health education, and three months followup of each patient and their family members. The morbidity data of hepatitis E from 2000 to 2012 were released from the China Information System for Disease Control and Prevention by Shanghai Municipal Center for Disease Control and Prevention. The annual average population data from 2000 to 2012 was obtained from Shanghai Public Security Bureau.
The model
The ARIMABPNN combined model consisted of an ARIMA model and a back propagation artificial neural network (BPNN). The model was developed to forecast the incidence of hepatitis E in Shanghai. The model was trained using 144 months of morbidity data from January 2000 to December 2011, validated with 12 months of morbidity data from January 2012 to December 2012, and finally employed to forecast the incidence of hepatitis E from January 2013 to December 2013 in Shanghai. The whole process was divided into three steps:
The first step was to determine the bestfitting ARIMA model and to predict the values of each time point. The BoxJenkins approach was applied to seasonal ARIMA (p,d,q)×(P,D,Q)_{n} modeling of time series data. The model was defined with an autoregressive part of order p, a moving average part of order q, a seasonalautoregressive part of order P, a seasonalmoving average part of order Q, differencing and seasonaldifferencing orders d and D, and periodic variable n. This model building process was designed to take advantage of associations in the seasonally and sequentially lagged relationships that usually exist in periodically collected data. Model parameters were estimated using the conditional Least Squares method. Residual analysis, Root Mean Square Error (RMSE), normalized Bayesian Information Criterion (BIC), and stationary R square were conducted to compare the goodnessoffit among ARIMA models.
The third step was to validate the combined model with 12 months of morbidity data from January 2012 to December 2012 and to further forecast the incidence of hepatitis E in 2013.
The mean error rate (MER) was used to explain the comparison of predicted and actual values between single ARIMA and ARIMABPNN combined models in 2012.
Data processing and analysis
An augmented DickeyFuller test and the X12ARIMA seasonal adjustment program of Eviews 5.0 (http://www.eviews.com) were employed to determine the stabilization of time series data [21]. All analyses were performed using SPSS 17.0 (Chicago, IL, USA) and MATLAB 7.0 (Natick, USA).
Ethical review
The study protocol and utilization of hepatitis E morbidity data were reviewed by Shanghai Municipal Center for Disease Control and Prevention and no ethical issues were identified. Therefore, no ethics approval was required by our Investigation Review Board.
Results
General patterns of hepatitis E
The morbidity of hepatitis E in Shanghai from 2000 to 2012 (per 100,000 population)
Year  Male  Female  Total  

Cases  Morbidity*  Cases  Morbidity*  Cases  Morbidity  
(per 100,000 pop.)  (per 100 000 pop.)  (per 100,000 pop.)  
2000  498  7.507  224  3.425  722  4.240 
2001  465  6.972  222  3.377  687  4.010 
2002  354  5.282  178  2.695  532  3.100 
2003  312  4.630  165  2.484  477  2.600 
2004  425  6.269  182  2.720  607  3.460 
2005  505  7.405  220  3.263  725  4.240 
2006  387  5.649  167  2.459  554  3.120 
2007  299  4.340  173  2.527  472  2.540 
2008  305  4.475  157  2.318  462  2.490 
2009  345  5.027  166  2.424  511  2.710 
2010  364  5.282  221  3.201  585  3.050 
2011  391  5.584  233  3.309  624  2.711 
2012  313  4.470  218  3.096  531  2.307 
The bestfitting ARIMA model
Parameters for the final seasonal ARIMA (0,1,1)×(0,1,1) _{ 12 } model
Parameter  Estimation  Standard error  tstatistics  PValue 

Constant  0.000  0.000  −0.113  0.911 
MA_{1}  0.678  0.067  10.12  0.000 
SMA_{1}  0.679  0.093  7.318  0.000 
Predicted and error rates of the single ARIMA model and ARIMABPNN combined model in 2012
Month  Morbidity (per 100,000 pop.)  ARIMA model  ARIMABPNN model  

Predicted rate  Error rate  Predicted rate  Error rate  
Jan  0.226  0.362  0.602  0.345  0.527 
Feb  0.317  0.334  0.054  0.331  0.044 
Mar  0.282  0.347  0.230  0.306  0.085 
Apr  0.252  0.293  0.163  0.282  0.119 
May  0.265  0.215  0.189  0.254  0.042 
Jun  0.200  0.161  0.195  0.214  0.070 
Jul  0.139  0.152  0.094  0.165  0.187 
Aug  0.126  0.135  0.071  0.126  0.000 
Sep  0.096  0.138  0.438  0.117  0.219 
Oct  0.104  0.140  0.346  0.131  0.260 
Nov  0.113  0.162  0.434  0.174  0.540 
Dec  0.191  0.156  0.183  0.187  0.021 
MER  0.250  0.176 
ARIMABPNN combined model
To construct the ARIMABPNN combined model, the predicted morbidity values from the bestfitting ARIMA model and corresponding time values were used as input (2×131 matrix), while the actual morbidity values were used as target data (1×131 matrix) (Figure 1). The model fitted values in 2012 fluctuated from 0.117 to 0.345 per 100,000 population. The MER of the ARIMABPNN combined model was 0.176, lower than the 0.250 MER of the single ARIMA model. This proved that the combined model was more effective.
The combined model was then used to forecast the incidence of hepatitis E in 2013. The prediction was a continued fluctuance within a narrow range from 0.095 to 0.372 per 100,000 population, with a peak during winter (JanuaryMarch) and a nadir during autumn (AugustOctober) (Figure 2).
Discussion
Hepatitis E is generally regarded as a disease predominantly restricted to areas with poor sanitation and polluted drinking water supplies [22]. However, more cases due to zoonotic spread and unclear transmission methods are occurring in nonendemic areas including Shanghai, China [10, 23, 24]. A total of 7,489 hepatitis E cases was reported in Shanghai from 2000 to 2012. The incidence fluctuated between 2.307 and 4.240 per 100,000 population, with seasonal variations. This has led to a major shift in the understanding of the epidemiology of hepatitis E and warranted further study.
Compared to bloodborne infectious diseases (e.g. hepatitis B and C, AIDS), hepatitis E is more affected by environmental and natural factors. These factors lead to a seasonal variation in incidence. The multiple factors involved cause difficulties in modeling. Time series analysis has the advantage of forecasting the incidence without focusing on specific risk factors; however, it cannot describe a nonlinear trend in incidence data. ANNs have been widely accepted as a potentially useful means in modeling complex nonlinear and dynamic systems which could remove the need for model builders to correctly specify the precise functional forms of the relationship that the model seeks to represent. However, they still require the need for knowledge as well as prior information about the systems of interest [25–27]. It has been argued that combining multiple models for forecasting may provide better estimates than single time series models, by taking advantage of each model’s capabilities [18, 28]. Accordingly, we constructed a hybrid architecture which comprised an ARIMA model and a neural network for forecasting hepatitis E incidence and validated its efficacy. The MER of the single ARIMA model and the ARIMABPNN combined model were 0.250 and 0.176, respectively. The combined model forecasted that the incidence of hepatitis E in Shanghai in 2013 would be similar to that of previous years, and that there would be a seasonal variation with a peak during winter and a nadir during autumn.
We determined that an ARIMABPNN combined model better fit time series data of hepatitis E morbidity in Shanghai than a single ARIMA model. This combined method could not be applied to all time series data without assuming that the relationship between the linear and nonlinear components was additive. If the relationship was different (e.g. multiplicative), the combined method would lower the capacity [29]. The morbidity of hepatitis E was influenced by many environmental and natural factors which are dynamic and possibly evolving over time. Thus, the parameters of an ARIMABPNN combined model should be periodically reassessed according to continuously updated data to maintain longterm sustainability and precision.
Conclusions
Time series analysis demonstrated a seasonal pattern of hepatitis E infection in Shanghai, China. An ARIMABPNN combined model was used to describe the linear and nonlinear patterns of the time series data. This model effectively forecasts hepatitis E infection. We focused on the ARIMABPNN combined model because single ARIMA and BPNN models had been intensively studied. The construction and interpretation of other combined analyses should be explored.
Abbreviations
 ARIMA:

Autoregressive integrated moving average model
 BPNN:

Back propagation neural network
 BIC:

Bayesian information criterion
 MER:

Mean error rate
 HEV:

Hepatitis E virus
 RMSE:

Root mean square error.
Declarations
Acknowledgements
This work was supported by grants from the China National Natural Science Funds (Grant #81001264) and Shanghai Municipal Health Bureau (Grant #2010177 & #20124380).
Authors’ Affiliations
References
 Aggarwal R, Naik S: Epidemiology of hepatitis E: current status. J Gastroenterol Hepatol. 2009, 24 (9): 14841493. 10.1111/j.14401746.2009.05933.x.View ArticlePubMedGoogle Scholar
 Vishwanathan R: Infectious hepatitis in Delhi (1955–1956): a critical study: epidemiology. Indian J Med Res. 1957, 45 (Suppl. 1): 129.Google Scholar
 WHO: Hepatitis E. http://www.who.int/mediacentre/factsheets/fs280/en,
 Dalton HR, Bendall R, Ljaz S, Banks M: Hepatitis E: an emerging infection in developed countries. Lancet Infect Dis. 2008, 8 (11): 698709. 10.1016/S14733099(08)70255X.View ArticlePubMedGoogle Scholar
 Colson P, Romanet P, Moal V, Borentain P, Purgus R, Benezech A, Motte A, Gérolami R: Autochthonous infections with hepatitis E virus genotype 4, France. Emerg Infect Dis. 2012, 18 (8): 13611364. 10.3201/eid1808.111827.View ArticlePubMedPubMed CentralGoogle Scholar
 Sainokami S, Abe K, Kumagai I, Miyasaka A, Endo R, Takikawa Y, Suzuki K, Mizuo H, Sugai Y, Akahane Y, Koizumi Y, Yajima Y, Okamoto H: Epidemiological and clinical study of sporadic acute hepatitis E caused by indigenous strains of hepatitis E virus in Japan compared with acute hepatitis A. J Gastroenterol. 2004, 39 (7): 640648.View ArticlePubMedGoogle Scholar
 Dalton HR, Fellows HJ, Gane EJ, Wong P, Gerred S, Schroeder B, Croxson MC, Garkavenko O: Hepatitis E in New Zealand. J Gastroenterol Hepatol. 2007, 22 (8): 12361240. 10.1111/j.14401746.2007.04894.x.View ArticlePubMedGoogle Scholar
 Zheng Y, Ge S, Zhang J, Guo Q, Ng MH, Wang F, Xia N, Jiang Q: Swine as principal reservoir of hepatitis E virus that infects humans in eastern China. J Infect Dis. 2006, 193 (12): 16431649. 10.1086/504293.View ArticlePubMedGoogle Scholar
 Zhang W, Yang S, Shen Q, Liu J, Shan T, Huang F, Ning H, Kang Y, Yang Z, Cui L, Zhu J, Hua X: Isolation and characterization of a genotype 4 Hepatitis E virus strain from an infant in China. Virol J. 2009, 16 (6): 24View ArticleGoogle Scholar
 Li YT, Zhu YY, Shen WG, Zhang AX, Zhang JM, Ren H, Yuan GP, Gu LJ: Study on risk factors of sporadic hepatitis E virus cases in some districts of Shanghai. Chin J Epidemiol. 2006, 27 (4): 298301.Google Scholar
 Loha E, Lindtjørn B: Model variations in predicting incidence of Plasmodium falciparum malaria using 1998–2007 morbidity and meteorological data from south Ethiopia. Malar J. 2010, 16 (9): 166View ArticleGoogle Scholar
 Akinbobola A, Omotosho JB: Predicting Malaria occurrence in Southwest and North central Nigeria using Meteorological parameters. Int J Biometeorol. 2012, 27: in pressGoogle Scholar
 Qiyong L, Xiaodong L, Baofa J: Forecasting incidence of hemorrhagic fever with renal syndrome in China using ARIMA model. BMC Infect Dis. 2011, 15 (11): 218Google Scholar
 Akhtar S, Rozi S: An autoregressive integrated moving average model for shortterm prediction of hepatitis C virus seropositivity among male volunteer blood donors in Karachi, Pakistan. World J Gastroenterol. 2009, 15 (13): 16071612. 10.3748/wjg.15.1607.View ArticlePubMedPubMed CentralGoogle Scholar
 Box GEP, Jenkins GM: Time Series Analysis: forecasting and Control. 1976, San Francisco: Holden Day, 181218.Google Scholar
 Enders W: Applied Econometric Time Series. 2004, New York: John Wliey & Sons, 2Google Scholar
 Zhang GQ, Patuwo EB, Hu MY: Forecasting with artificial neural networks. Int J Forecasting. 1998, 14: 3562. 10.1016/S01692070(97)000447.View ArticleGoogle Scholar
 Zhang GP: Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing. 2003, 50: 159175.View ArticleGoogle Scholar
 Galushkin A: Qualitative characteristic of neural network architectures neural networks theory. 2007, Berlin: Springer, 4352.Google Scholar
 Chua CG, Goh ATC: A hybrid Bayesian backpropagation neural network approach to multivariate modeling. Int J Numer Anal Methods Geomech. 2003, 27 (8): 651667. 10.1002/nag.291.View ArticleGoogle Scholar
 Findley DF, Monsell BC, Bell WR, Otto MC, Chen BC: New capabilities and methods of the X12ARIMA seasonal adjustment program. Bus Econ Stat. 1998, 16 (2): 127177.Google Scholar
 Purcell RH, Emerson SU: Hepatitis E: an emerging awareness of an old disease. J Hepatol. 2008, 48 (3): 494503. 10.1016/j.jhep.2007.12.008.View ArticlePubMedGoogle Scholar
 Lu YH, Zheng YJ, Hu AQ, Zhu JF, Wang FD, Wang XC, Jiang QW: Seasonal pattern and phylogenetic analysis with human isolates of genotypeIV hepatitis E virus in swine herds, eastern China. Zhonghua Yu Fang Yi Xue Za Zhi. 2009, 43 (6): 504508.PubMedGoogle Scholar
 Xia YG, Li YT, Lu YH, Ren H, Hu AQ, Zhu JF, Wang XC, Jing QW, Zheng YJ: Phylogenetic analysis of sporadic hepatitis E virus in Eastern China. Zhonghua Liu Xing Bing Xue Za Zhi. 2009, 30 (12): 12691272.PubMedGoogle Scholar
 Antanasijević DZ, Pocajt VV, Povrenović DS, Ristić MD, PerićGrujić AA: PM(10) emission forecasting using artificial neural networks and genetic algorithm input variable optimization. Sci Total Environ. 2012, 443C: 511519.Google Scholar
 Arhami M, Kamali N, Rajabi MM: Predicting hourly air pollutant levels using artificial neural networks coupled with uncertainty analysis by Monte Carlo simulations. Environ Sci Pollut Res Int. 2013, [Epub ahead of print]Google Scholar
 Ma L, Khorasani K: New training strategies for constructive neural networks with application to regression problems. Neural Netw. 2004, 17: 589609. 10.1016/j.neunet.2004.02.002.View ArticlePubMedGoogle Scholar
 Zhu Y, Xia JL, Wang J: Comparison of predictive effect between the single auto regressive integrated moving average (ARIMA) model and the ARIMAgeneralized regression neural network (GRNN) combination model on the incidence of scarlet fever. Zhonghua Liu Xing Bing Xue Za Zhi. 2009, 30 (9): 964968.PubMedGoogle Scholar
 TaskayaTemizel T, Casey MC: A comparative study of autoregressive neural network hybrids. Neural Netw. 2005, 18 (5–6): 781789.View ArticlePubMedGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712334/13/421/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.