Developing a forecasting model for cholera incidence in Dhaka megacity through time series climate data

Cholera, an acute diarrheal disease spread by lack of hygiene and contaminated water, is a major public health risk in many countries. As cholera is triggered by environmental conditions in ﬂ uenced by climatic variables, establishing a correlation between cholera incidence and climatic variables would provide an opportunity to develop a cholera forecasting model. Considering the auto-regressive nature and the seasonal behavioral patterns of cholera, a seasonal-auto-regressive-integrated-moving-average (SARIMA) model was used for time-series analysis during 2000 – 2013. As both rainfall ( r ¼ 0.43) and maximum temperature ( r ¼ 0.56) have the strongest in ﬂ uence on the occurrence of cholera incidence, single-variable (SVMs) and multi-variable SARIMA models (MVMs) were developed, compared and tested for evaluating their relationship with cholera incidence. A low relationship was found with relative humidity ( r ¼ 0.28), ENSO ( r ¼ 0.21) and SOI ( r ¼ (cid:2) 0.23). Using SVM for a 1 (cid:3) C increase in maximum temperature at one-month lead time showed a 7% increase of cholera incidence ( p < 0.001). However, MVM (AIC ¼ 15, BIC ¼ 36) showed better performance than SVM (AIC ¼ 21, BIC ¼ 39). An MVM using rainfall and monthly mean daily maximum temperature with a one-month lead time showed a better ﬁ t (RMSE ¼ 14.7, MAE ¼ 11) than the MVM with no lead time (RMSE ¼ 16.2, MAE ¼ 13.2) in forecasting. This result will assist in predicting cholera risks and better preparedness for public health management in the future.

A thorough review on the relationship between cholera incidence and the climatic variables by using different statistical methods in different locations of the world has been summarized in Table 1. In the review, climatic variables rainfall, maximum temperature, minimum temperature, relative humidity, El Niño southern oscillation (ENSO) and southern oscillation index (SOI) were found statistically sig-  (Paz ). In America (Haiti) rainfall plays a vital role (Eisenberg et al. ; Righetto et al. ). In South America, sea surface temperature and ambient temperature have an effect on cholera outbreaks e.g., in Peru (Checkley et al. ; Colwell ). In Asia, rainfall, temperature (mean, minimum and maximum), relative humidity, SST, sea surface height, and river discharge influence increasing cholera outbreaks, e.g. in India (Rajendran et table all over Bangladesh; while in Dhaka city, this water scarcity is due to over-exploitation of groundwater and low river water availability for surface water treatment plants. In laboratory tests, it has been shown that salinity and temperature are important factors for influencing the growth of V. cholerae (Batabyal et al. ), and V. cholerae can survive more when aided by copepods (Huq et al. ). On the other hand, monsoon floods inundating large inland areas with stagnant water and rain-flushed nutrients provide a growth environment for pathogens (Islam et al. ). With the recession of seasonal flood water, available water-borne pathogens including cholera in combination with scarcity of safe drinking water when many drinking water sources are contaminated with flood water cause a second 'post-monsoon' outbreak (Akanda et al. , ). Some environmental indicators such as water temperature and water depth in some water bodies in Bangladesh showed a significant lagged correlation with cholera outbreaks (Huq et al. ). Moreover, climate variability, for example extreme dry conditions and high temperature leading to droughts, or heavy rainfall leading to floods that occurred caused by ENSO, may lead to enhance cholera outbreaks in the future (Field et al. ).  ber-November (post-monsoon). The pre-monsoon peak was higher than the post-monsoon peak during the studied period.

Evolution to SARIMA modelling
The overall flow chart of how SARIMA models were developed is shown in Figure 3. First, to investigate the delayed effects on cholera incidence, climate variables were temporally lagged by 0, 1, 2 and 3 months by cross correlation analysis. When one or more lagged associations of climatic variables with cholera incidence were found, it was then identified to be useful for SARIMA modeling.  for white noise in residuals and a scatter plot of residuals versus predicted values: where κ is the number of independently adjusted parameters within the model and n is the total number of observations.

Model forecasting
The cholera incidence data was divided into two: the data of  (5)) indicate a better fit of the data: j(observed À forcasted)j=number of months (5)

Limitations
There are a few limitations of this study. For example, the cholera incidence of icddr,b data is assumed to be representative of the entire spatial extent of Dhaka megacity. This is due to the fact that detailed lab-tested cholera incidence data is only available at icddr,b, where cholera cases are

Results of evolution of models
By plotting the mean-range for each seasonal period (12 months), the logarithmic transformation was necessary to stabilize the variance of cholera incidence (Figure 4).
All statistical analyses were performed on the logarithmi-  The single variable (SVM) SARIMA models (Table 3) show that an increase of the previous month (lag 1) 1 C maximum temperature resulted in an increase of 7% cholera incidence (p < 0.001; AIC ¼ 47, BIC ¼ 66). At the temporal lag 0, an increase of 100 mm in rainfall resulted in a 4% increase of cholera incidence (p ¼ 0.04; AIC ¼ 52, BIC ¼ 71) and an increase of 1 C in minimum monthly temperature at 1-month lag resulted in a 5% increase of cholera incidence (p < 0.001; AIC ¼ 46, BIC ¼ 65). However, the multi-variable   (Table 6). This means that the interaction of rainfall (p < 0.05) and maximum temperature (p < 0.001) at 1-month lag yielded a significant association with cholera.

Evaluation of model forecast
The performance of models is shown in Figure 8 where the     improved fitting compared to other models (Table 7).

DISCUSSION
The results of this study illustrate that there is distinct seasonality ( Figure 5) observed in V. cholera signatures throughout the world. Among various climatic variables, there is a significant association of rainfall and temperature ( Although a very low positive effect of relative humidity (r ¼ 0.28) was found at the current month (lag 0) and negative values at lags of 1, 2 and 3 months (Table 2), no significant effect of humidity could be found by time series analysis with the SARIMA model (Table 3)    A Cholera incidence with 0-month lagged rainfall and maximum temperature B Cholera incidence with 1-month lagged rainfall and maximum temperature C Cholera incidence with 2-month lagged rainfall and maximum temperature D Cholera incidence with 3-month lagged rainfall and maximum temperature there may be water scarcity to a certain proportion of people of Dhaka city who rely on surface water for washing and bathing; therefore, the likelihood of multiple uses in water bodies may increase.
Temperature and increase of cholera incidence has a robust relationship, which is well documented in many studies ( Table 3) than other climatic variables, i.e. relative humidity, ENSO and SOI. The model run with combined effect of rainfall and maximum temperature (Table 4) showed better results (low AIC and BIC) than individual effects of climatic variables (Table 3), that means, the performance of a multi-variable model (MVM) showed better results than a single variable model (SVM) which answers research question 1. This study also illustrates that previous month's rainfall and maximum temperature showed a better fit in forecasting (Table 7); that means cholera incidence can be forecasted one month earlier which answers research question 2. However, the rainfall and maximum

CONCLUSIONS
This study is aimed at the developing and testing of a cholera forecasting model by establishing a relation between cholera incidence and climatic variables for Dhaka megacity in Bangladesh. The seasonal-auto-regressive-integrated-movingaverage (SARIMA) model was found suitable as a forecasting cholera model because of its auto-regressive nature and seasonal behavior pattern. The SARIMA models showed a strong relation between cholera incidence and climatic variables in Dhaka, Bangladesh individually (rainfall, maximum temperature and minimum temperature) and also combined (rainfall and maximum temperature). For example, individual effect by single variable model showed that for a 1 C monthly maximum temperature increase, cholera incidence increases by 7% (p < 0.001) at 1-month lag. That means the cholera incidence can be forecasted 1-month earlier with the temperature data, which is very promising for preparedness. However, the multi-variable model (Model B) with 1-month lag among all combinations of climatic variables and lags showed the best result with the lowest errors of AIC and BIC. This study also revealed that the relationship between cholera incidence and climatic variables varies with locations and climatic variables. Therefore, a forecasted cholera model is location-specific where climatic variables also vary with locations. Hence, one should analyze the location-specific climatic variables for forecasting cholera incidence.
The results of this study would be very important for a climatologist, an epidemiologist or a public health professional, who works with cholera incidence to develop preparedness and response plans. For a climatologist, it is important because climate change impact on cholera incidence may be predicted from this study as climatologists predict an increase of 1.4-5.8 C in mean temperature over the next 100 years (Houghton et al. ). An epidemiologist would be helped by the new insights on environmental and climatic linkages of cholera outbreaks. A health professional may prepare for potential coping and adaptation strategies for potential climate change related health risks in Bangladesh. This study also contributes towards the development of a climate-based early warning system for cholera (Akanda et al. ).