The adaptive neuro fuzzy inference system (ANFIS) has been proposed to model the time series of water quality data in this study. The biochemical oxygen demand data collected at the upstream catchment of Feitsui Reservoir in Taiwan for more than 20 years are selected as the target water quality variable. The classical statistical technique of the Box-Jenkins method is applied for the selection of appropriate input variables and data pre-processing of using differencing is implemented during the model development. The time series data obtained by ANFIS models are compared to those obtained by autoregressive integrated moving average (ARIMA) and artificial neural networks (ANNs). The results show that the ANFIS model identified at each sampling station is superior to the respective ARIMA and ANN models. The R values at all sampling stations of the training and testing datasets are 0.83–0.98 and 0.81–0.89, respectively, except at Huang-ju-pi-liao station. ANFIS models can provide accurate predictions for complex hydrological processes, and can be extended to other areas to improve the understanding of river pollution trends. The procedure of input selection and the pre-processing of input data proposed in this study can stimulate the usage of ANFIS in other related studies.
INTRODUCTION
It is important to understand surface water quality due to its effect on aquatic ecosystems and public health. The common metric for assessing an aquatic ecosystem's organic pollution level is the biochemical oxygen demand (BOD). Measuring the BOD serves as an indicator of the health of the aquatic system. The difficulty with measuring BOD is that after the initial sample is obtained, it requires 5 days to complete. The final analysis of the results is done on the fifth day and the final result is referred to as BOD5 (Delzer & McKenzie 1999). Due to this time lag and high financial expense of generating field BOD5 measurements, the estimation of BOD5 has led to active research in mathematical models (Roider & Adrian 2007). Mathematical models are based on the underlying physics of the environmental system or approximations of it. One major limitation is that the models can be sensitive to parameter changes that lead to different predictions (Chau 2006).
An alternative to mathematical modeling is time series modeling. This approach has become an area of active research due to its applicability in a variety of fields such as dynamic systems, nonlinear signal processing, pattern recognition, and hydrology. To construct a time series model, a set of historical observations are combined to extrapolate a forecast of future measurements. The advantage of this modeling approach is that it does not rely on the underlying physics of an environmental system, but solely uses the historical data. This approach is also useful when there is a difficulty in collecting all explanatory variables continuously. For modeling water quality, the time series methods of autoregression (AR) and autoregressive moving average (ARMA) have been successfully applied (Tiwari et al. 1996; Cai & Tiwari 2000; Schoch et al. 2009; Abaurrea et al. 2011). More recently a more generalized version of ARMA called autoregressive integrated moving average (ARIMA) has been applied to water quality models (Ahmad et al. 2001; Kurunc et al. 2005; Zou et al. 2009; Abudu et al. 2012). An ARIMA model is particularly applied in some cases where the time series data show evidence of non-stationarity features. Although ARIMA models are quite flexible in that they can represent several different types of time series, their major limitation is the pre-assumed linear form of the model and may not be adequate for complex nonlinear real world problems (Oliveira-Esquerre et al. 2004).
In order to overcome the assumption of linearity, time series have been incorporating various artificial intelligence (AI) techniques (Zhang et al. 1998). Among various AIs, artificial neural networks (ANNs) have been found to be well suited to modeling a system for which an explicit understanding of the complex input and output relationship is unknown. ANNs have been widely applied in engineering fields, and successfully used in hydrological processes, water resources, evaporation, flood forecasting, reservoir operations, and recorded weather data (Hsu et al. 1995; Wen & Lee 1998; Aguilera et al. 2001; Oliveira-Esquerre et al. 2004; Kisi 2008; Gao et al. 2010; Chen et al. 2014; Shiri et al. 2014). ANNs have been also extended to forecast water quality parameters and estimate nutrient concentration from pollution sources of watersheds (Maier & Dandy 1996; Suen & Eheart 2003; Sengorur et al. 2006; Singh et al. 2009; Basant et al. 2010; Ay & Kisi 2012; Kisi et al. 2013; Marti et al. 2013; Verma & Singh 2013; Wen et al. 2013). Maier et al. (2010) reviewed 210 papers on the prediction of water quantity and quality variables in rivers and provided a taxonomy of methods used for the development of ANN models for river systems.
Fuzzy logic theory based on the linguistic uncertainty expression rather than numerical uncertainty was first developed by Zadeh (1965) and has become popular to various engineering problems and hydrological modeling since the late 1990s (Jang et al. 1997). Recently, adaptive neuro fuzzy inference system (ANFIS), which consists of the ANN and fuzzy logic methods, has received attention in database management, system design, and water resources planning and forecasting. Nayak et al. (2005) stated that while ANFIS models do not provide information on the hydrologic process physics, they are still useful for time series forecasting where the main concern is accurate predictions of a variable at specific watershed locations. ANFIS has been successfully used for prediction stream flow (Chang & Chen 2001; Nayak et al. 2004; Firat & Gungor 2007; Firat & Gungor 2008), reservoir inflow or water level (Chang & Chang 2006; Lohani et al. 2012), flood forecast (Nayak et al. 2005; Chen et al. 2006), sediment yield (Firat & Gungor 2010), and evaporation/transpiration (Shiri et al. 2011, 2013a, 2013b, 2015; Kisi et al. 2012; Pour Ali Baba et al. 2013) for many years.
Although the ANFIS method has been certified as a powerful tool for prediction and plenty of applications have been published (primarily in water quantity), there is little discussion in the literature of using this hybrid computing system to model the time series of water quality, especially BOD5 (Maier et al. 2010). Besides, there is no systematic procedure for the development of ANFIS (or ANN) models, nor a comprehensive study to compare different modeling techniques (Wu et al. 2014). The main purpose of this study is to present ANFIS modeling of BOD5 as time series data. This study further provides a systematic procedure of input selection and the pre-processing of the input data, both of which are embedded in the model development. To verify the application of this approach, the upstream catchment of Feitsui Reservoir located at the northeast part of Taiwan is chosen as the case study area. A large amount of BOD5 data in terms of space and time is collected in this area and used to develop the models. The capabilities of proposed ANFIS models are compared to ARIMA and ANN models.
METHODOLOGY
In this section, the basic concepts and modeling process of proposed ARIMA, ANN, and ANFIS models are briefly reviewed.
Box-Jenkins autoregressive integrated moving average (ARIMA)
ARIMA models form an important part of the Box-Jenkins method (Box & Jenkins 1976) to time series modeling. By matching the empirical autocorrelation patterns with the theoretical ones, Box & Jenkins (1976) proposed to compute and examine the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the time series data to identify the orders of p and q in ARIMA. The detailed mathematical background and the steps of model development can be found in Box & Jenkins (1976). In this study, the ARIMA model is created for each sampling station by the Box-Jenkins method and its basic structure and parameters are identified.
Artificial neural networks (ANNs)





The training of the neural network embraces the choices of architecture, activation functions, training algorithms, and network parameters, the testing of the trained network, and the usage of trained neural network for simulation and prediction. All these steps are adopted to develop the ANN models in this study. The three-layer feed-forward ANN model trained by the back-propagation algorithm is developed for each station and the determination of input variables is based on the analysis obtained from Box-Jenkins ARIMA.
Adaptive neuro fuzzy inference system (ANFIS)
Rule 1: If
is
and
is
, then
Rule 2: If
is
and
is
, then








ANFIS architecture for a two-input first order Sugeno fuzzy model with two rules.
ANFIS architecture for a two-input first order Sugeno fuzzy model with two rules.
Performance criteria
STUDY AREA AND BIOCHEMICAL OXYGEN DEMAND DATA
The time series data of BOD5 used in this study were generated through continuous monitoring of water quality at the upstream catchment of Feitsui Reservoir from 1986 to 2012. The data were monitored monthly at eight different sampling stations that cover most of the upstream catchment. Three stations, Ping-lin, Dai-yu-ku Creek, and Jin-gua-liao Creek (Figure 2), are managed by Taipei Feitusi Reservoir Administration, and the total number of measurements is 972 from 1/1986 to 12/2012. Five stations, Kuo-lai, Bi-hu, Shui-yuan Bridge, Da-lin Bridge, and Huang-ju-pi-liao (Figure 2), are managed by Taipei Water Management Committee and the total number of measurements is 1345 from 1/1987 to 5/2009.
Table 1 presents the basic statistics of the measured BOD5 concentrations for each of the sampling stations. The variation in BOD5 concentration corresponds to the nature and types (point and non-point) of sources distributed in the large geographical area of watershed. The larger standard deviation (SD) or coefficient of variation (CV) of BOD5 may be directly related to more industrial, agricultural, and anthropogenic activities at the downstream catchment. In the upper-course (Kuo-lai and Bi-hu stations), the river water quality is mainly dominated by the variables of geogentic origin, and there are no major pollution sources in the region. In the mid- or lower-course (Jin-gua-liau Creek, Shui-yuan Bridge, and Huang-ju-pi-liao stations), the river receives a heavy load of untreated wastewater from nearby townships and is dominated by the variables of anthropogenic origin.
Basic statistics of the measured BOD5 at the upstream catchment of Feitsui Reservoir, Taiwan
Station . | Observation period . | Max (mg/L) . | Min (mg/L) . | Mean (mg/L) . | Median (mg/L) . | SD (mg/L) . | CV (%) . |
---|---|---|---|---|---|---|---|
Ping-lin | 1987–2012 | 4.30 | 0.00 | 0.69 | 0.60 | 0.51 | 75 |
Dai-yu-ku Creek | 1987–2012 | 3.70 | 0.00 | 0.63 | 0.50 | 0.49 | 77 |
Jin-gua-liau Creek | 1987–2012 | 5.90 | 0.10 | 0.72 | 0.50 | 0.63 | 88 |
Kuo-lai | 1987–2009 | 2.70 | 0.10 | 0.55 | 0.50 | 0.35 | 65 |
Bi-hu | 1987–2009 | 2.80 | 0.10 | 0.59 | 0.50 | 0.37 | 63 |
Da-lin Bridge | 1987–2009 | 2.20 | 0.10 | 0.64 | 0.50 | 0.34 | 53 |
Shui-yuan Bridge | 1987–2009 | 3.40 | 0.10 | 0.86 | 0.90 | 0.55 | 63 |
Huang-ju-pi-liao | 1987–2009 | 4.30 | 0.10 | 0.92 | 1.10 | 0.64 | 70 |
Station . | Observation period . | Max (mg/L) . | Min (mg/L) . | Mean (mg/L) . | Median (mg/L) . | SD (mg/L) . | CV (%) . |
---|---|---|---|---|---|---|---|
Ping-lin | 1987–2012 | 4.30 | 0.00 | 0.69 | 0.60 | 0.51 | 75 |
Dai-yu-ku Creek | 1987–2012 | 3.70 | 0.00 | 0.63 | 0.50 | 0.49 | 77 |
Jin-gua-liau Creek | 1987–2012 | 5.90 | 0.10 | 0.72 | 0.50 | 0.63 | 88 |
Kuo-lai | 1987–2009 | 2.70 | 0.10 | 0.55 | 0.50 | 0.35 | 65 |
Bi-hu | 1987–2009 | 2.80 | 0.10 | 0.59 | 0.50 | 0.37 | 63 |
Da-lin Bridge | 1987–2009 | 2.20 | 0.10 | 0.64 | 0.50 | 0.34 | 53 |
Shui-yuan Bridge | 1987–2009 | 3.40 | 0.10 | 0.86 | 0.90 | 0.55 | 63 |
Huang-ju-pi-liao | 1987–2009 | 4.30 | 0.10 | 0.92 | 1.10 | 0.64 | 70 |
Max, maximum; Min, minimum; SD, standard deviation; CV, coefficient of variation.
When developing ARIMA, ANN, and ANFIS models, only the time series data of BOD5 were used to develop the models and no other water quality data were considered, such as pH, temperature, turbidity, electrical conductivity, or suspended solids. The entire dataset for each station is divided into two sets, the training and testing sets. Seventy percent of the data is used for training, and 30% is used for testing. For the Ping-lin, Dai-yu-ku Creek, and Jin-gua-liao Creek stations, the data from 1/1986 to 11/2004 (227 measurements for each station) are used for training, and the data from 12/2004 to 12/2012 (97 measurements for each station) are used for testing. For the Kuo-lai, Bi-hu, Shui-yuan Bridge, Da-lin Bridge, and Huang-ju-pi-liao stations, the data from 1/1987 to 8/2002 (188 measurements for each station) are used for training, and the data from 9/2002 to 5/2009 (81 measurements for each station) are used for testing. The performances of all identified models are evaluated by R, RMSE, MAPE, and MAE.
RESULTS AND DISCUSSION
In this section, the time series data of BOD5 at the upstream catchment of reservoir are used to demonstrate the developments of ARIMA, ANN, and ANFIS models.
Box-Jenkins autoregressive integrated moving average
Before creating the ARIMA model for each sampling station, the seasonality and stationarity of the time series data at each sampling station are examined. The temporal variations of measured BOD5 are plotted (Figure 4) and no seasonality is observed in the measurements for all the stations. The ACF chart at each station is also examined and the correlations show that no seasonality is detected. To test if the data are stationary, the unit root test of augmented Dickey-Fuller test (ADF) (Dickey & Fuller 1979) is employed. The results of this test show that the time series data at all sampling stations are non-stationary; therefore, the first-order difference, i.e., d = 1, is applied to each dataset to achieve stationarity.
(a) ACF and PACF charts, (b) residual ACF and residual PACF charts of BOD5 at Ping-lin station after fitting the ARIMA (7,1,1) model.
(a) ACF and PACF charts, (b) residual ACF and residual PACF charts of BOD5 at Ping-lin station after fitting the ARIMA (7,1,1) model.










After identifying the optimal ARIMA model, it is validated by examining the autocorrelation of residuals. The residual ACF and PACF charts of BOD5 at Ping-lin station are presented in Figure 3(b), and the series are all within the 95% confidence intervals. This then indicates that the residuals are an approximate white noise time series, not significant statistically, and the model is properly identified.
Simulated and measured BOD5 of ARIMA models at: (a) Ping-lin; (b) Dai-yu-ku Creek; (c) Jin-gua-liau Creek; (d) Kuo-lai; (e) Bi-hu; (f) Da-lin Bridge; (g) Shui-yuan Bridge; and (h) Huang-ju-pi-laio.
Simulated and measured BOD5 of ARIMA models at: (a) Ping-lin; (b) Dai-yu-ku Creek; (c) Jin-gua-liau Creek; (d) Kuo-lai; (e) Bi-hu; (f) Da-lin Bridge; (g) Shui-yuan Bridge; and (h) Huang-ju-pi-laio.
The basic structure and performance of optimized ARIMA model for each sampling station
Station . | Structure . | Dataset . | R . | RMSE . | MAPE . | MAE . |
---|---|---|---|---|---|---|
Ping-lin | ARIMA (7,1,1) | Training | 0.3803 | 0.2720 | 84.6107 | 0.2220 |
Testing | 0.2125 | 0.5186 | 95.8707 | 0.3778 | ||
Dai-yu-ku Creek | ARIMA (5,1,1) | Training | 0.3110 | 0.3278 | 90.2363 | 0.2592 |
Testing | 0.1336 | 0.4923 | 92.1265 | 0.3511 | ||
Jin-gua-liau Creek | ARIMA (5,1,1) | Training | 0.4594 | 0.5069 | 77.3537 | 0.3543 |
Testing | 0.1418 | 0.5547 | 113.6521 | 0.4144 | ||
Kuo-lai | ARIMA (6,1,1) | Training | 0.3842 | 0.3193 | 42.1655 | 0.2456 |
Testing | 0.1746 | 0.3770 | 83.1859 | 0.2589 | ||
Bi-hu | ARIMA (6,1,1) | Training | 0.3746 | 0.2441 | 44.2237 | 0.1954 |
Testing | 0.2804 | 0.3850 | 75.6060 | 0.2675 | ||
Da-lin Bridge | ARIMA (8,1,1) | Training | 0.3817 | 0.2280 | 30.8787 | 0.1779 |
Testing | 0.1460 | 0.3501 | 74.4639 | 0.2798 | ||
Shui-yuan Bridge | ARIMA (7,1,1) | Training | 0.4097 | 0.2348 | 33.3921 | 0.1865 |
Testing | 0.1410 | 0.5629 | 80.5037 | 0.4312 | ||
Huang-ju-pi-liao | ARIMA (5,1,1) | Training | 0.4049 | 0.2707 | 35.3087 | 0.1898 |
Testing | 0.1506 | 0.6653 | 78.4656 | 0.4730 |
Station . | Structure . | Dataset . | R . | RMSE . | MAPE . | MAE . |
---|---|---|---|---|---|---|
Ping-lin | ARIMA (7,1,1) | Training | 0.3803 | 0.2720 | 84.6107 | 0.2220 |
Testing | 0.2125 | 0.5186 | 95.8707 | 0.3778 | ||
Dai-yu-ku Creek | ARIMA (5,1,1) | Training | 0.3110 | 0.3278 | 90.2363 | 0.2592 |
Testing | 0.1336 | 0.4923 | 92.1265 | 0.3511 | ||
Jin-gua-liau Creek | ARIMA (5,1,1) | Training | 0.4594 | 0.5069 | 77.3537 | 0.3543 |
Testing | 0.1418 | 0.5547 | 113.6521 | 0.4144 | ||
Kuo-lai | ARIMA (6,1,1) | Training | 0.3842 | 0.3193 | 42.1655 | 0.2456 |
Testing | 0.1746 | 0.3770 | 83.1859 | 0.2589 | ||
Bi-hu | ARIMA (6,1,1) | Training | 0.3746 | 0.2441 | 44.2237 | 0.1954 |
Testing | 0.2804 | 0.3850 | 75.6060 | 0.2675 | ||
Da-lin Bridge | ARIMA (8,1,1) | Training | 0.3817 | 0.2280 | 30.8787 | 0.1779 |
Testing | 0.1460 | 0.3501 | 74.4639 | 0.2798 | ||
Shui-yuan Bridge | ARIMA (7,1,1) | Training | 0.4097 | 0.2348 | 33.3921 | 0.1865 |
Testing | 0.1410 | 0.5629 | 80.5037 | 0.4312 | ||
Huang-ju-pi-liao | ARIMA (5,1,1) | Training | 0.4049 | 0.2707 | 35.3087 | 0.1898 |
Testing | 0.1506 | 0.6653 | 78.4656 | 0.4730 |
Artificial neural networks
The basic structure and performance of optimized ANN model for each sampling station
Station . | Structure (input-node-output) . | Input variables (Δyt−p) . | Dataset . | R . | RMSE . | MAPE . | MAE . |
---|---|---|---|---|---|---|---|
Ping-lin | 7-9-1 | p = 1–7 | Training | 0.6740 | 0.2473 | 69.2710 | 0.1802 |
Testing | 0.5608 | 0.4015 | 76.5663 | 0.2599 | |||
Dai-yu-ku Creek | 5-12-1 | p = 1–5 | Training | 0.6697 | 0.2962 | 64.0387 | 0.2002 |
Testing | 0.5620 | 0.3590 | 81.5944 | 0.2163 | |||
Jin-gua-liau Creek | 5-15-1 | p = 1–5 | Training | 0.6347 | 0.4064 | 69.1898 | 0.3145 |
Testing | 0.4677 | 0.4379 | 105.1526 | 0.3756 | |||
Kuo-lai | 6-18-1 | p = 1–6 | Training | 0.6947 | 0.2480 | 40.7337 | 0.2038 |
Testing | 0.4878 | 0.2605 | 55.1503 | 0.2071 | |||
Bi-hu | 6-15-1 | p = 1–6 | Training | 0.6579 | 0.2194 | 40.7533 | 0.1706 |
Testing | 0.5551 | 0.3035 | 56.6606 | 0.2187 | |||
Da-lin Bridge | 8-9-1 | p = 1–8 | Training | 0.6158 | 0.2037 | 25.5899 | 0.1516 |
Testing | 0.4968 | 0.3102 | 53.9803 | 0.2446 | |||
Shui-yuan Bridge | 7-30-1 | p = 1–7 | Training | 0.7833 | 0.2198 | 29.8330 | 0.1574 |
Testing | 0.4549 | 0.3829 | 46.0420 | 0.2874 | |||
Huang-ju-pi-liao | 5-21-1 | p = 1–5 | Training | 0.6670 | 0.2610 | 30.7647 | 0.1700 |
Testing | 0.3749 | 0.5458 | 56.6385 | 0.3990 |
Station . | Structure (input-node-output) . | Input variables (Δyt−p) . | Dataset . | R . | RMSE . | MAPE . | MAE . |
---|---|---|---|---|---|---|---|
Ping-lin | 7-9-1 | p = 1–7 | Training | 0.6740 | 0.2473 | 69.2710 | 0.1802 |
Testing | 0.5608 | 0.4015 | 76.5663 | 0.2599 | |||
Dai-yu-ku Creek | 5-12-1 | p = 1–5 | Training | 0.6697 | 0.2962 | 64.0387 | 0.2002 |
Testing | 0.5620 | 0.3590 | 81.5944 | 0.2163 | |||
Jin-gua-liau Creek | 5-15-1 | p = 1–5 | Training | 0.6347 | 0.4064 | 69.1898 | 0.3145 |
Testing | 0.4677 | 0.4379 | 105.1526 | 0.3756 | |||
Kuo-lai | 6-18-1 | p = 1–6 | Training | 0.6947 | 0.2480 | 40.7337 | 0.2038 |
Testing | 0.4878 | 0.2605 | 55.1503 | 0.2071 | |||
Bi-hu | 6-15-1 | p = 1–6 | Training | 0.6579 | 0.2194 | 40.7533 | 0.1706 |
Testing | 0.5551 | 0.3035 | 56.6606 | 0.2187 | |||
Da-lin Bridge | 8-9-1 | p = 1–8 | Training | 0.6158 | 0.2037 | 25.5899 | 0.1516 |
Testing | 0.4968 | 0.3102 | 53.9803 | 0.2446 | |||
Shui-yuan Bridge | 7-30-1 | p = 1–7 | Training | 0.7833 | 0.2198 | 29.8330 | 0.1574 |
Testing | 0.4549 | 0.3829 | 46.0420 | 0.2874 | |||
Huang-ju-pi-liao | 5-21-1 | p = 1–5 | Training | 0.6670 | 0.2610 | 30.7647 | 0.1700 |
Testing | 0.3749 | 0.5458 | 56.6385 | 0.3990 |
The performance of: (a) different number of nodes in the hidden layer for the training and testing datasets at Ping-lin station; and (b) different combinations of activation functions used in the ANN model at Ping-lin station.
The performance of: (a) different number of nodes in the hidden layer for the training and testing datasets at Ping-lin station; and (b) different combinations of activation functions used in the ANN model at Ping-lin station.
The identifications of ANN models at other sampling stations followed the same procedures described previously for Ping-lin station. The optimized architecture of the ANN model at each sampling station and its associated performance are shown in Table 3. Regarding the testing datasets, the ANN models at Kuo-lai and Bi-hu stations performed the best and the model at Huang-ju-pi-liao station performed the worst. It is clear from the table that the R values for both training and testing datasets at all stations obtained from identified ANN models are higher than those obtained from identified ARIMA models. The RMSE, MAPE, and MAE values for both training and testing datasets at all stations obtained from identified ANN models are also smaller than those obtained from identified ARIMA models.
Simulated and measured BOD5 of ANN models at: (a) Ping-lin; (b) Dai-yu-ku Creek; (c) Jin-gua-liau Creek; (d) Kuo-lai; (e) Bi-hu; (f) Da-lin Bridge; (g) Shui-yuan Bridge; and (h) Huang-ju-pi-laio.
Simulated and measured BOD5 of ANN models at: (a) Ping-lin; (b) Dai-yu-ku Creek; (c) Jin-gua-liau Creek; (d) Kuo-lai; (e) Bi-hu; (f) Da-lin Bridge; (g) Shui-yuan Bridge; and (h) Huang-ju-pi-laio.
Adaptive neuro fuzzy inference system
The basic structure and performance of optimized ANFIS model for each sampling station
Station . | Structure (input-rule-output) . | Input variables (Δyt−p) . | Dataset . | R . | RMSE . | MAPE . | MAE . |
---|---|---|---|---|---|---|---|
Ping-lin | 7-3-1 | p = 1–7 | Training | 0.9345 | 0.2004 | 15.6456 | 0.0870 |
Testing | 0.8091 | 0.2247 | 32.5112 | 0.1165 | |||
Dai-yu-ku Creek | 5-2-1 | p = 1–5 | Training | 0.8829 | 0.2244 | 38.2991 | 0.1468 |
Testing | 0.8389 | 0.2427 | 41.1580 | 0.1564 | |||
Jin-gua-liau Creek | 5-3-1 | p = 1–5 | Training | 0.8255 | 0.3193 | 26.8040 | 0.1281 |
Testing | 0.8100 | 0.3397 | 42.8392 | 0.2096 | |||
Kuo-lai | 6-2-1 | p = 1–6 | Training | 0.9453 | 0.1124 | 20.5781 | 0.0689 |
Testing | 0.8935 | 0.1548 | 20.7423 | 0.0988 | |||
Bi-hu | 6-3-1 | p = 1–6 | Training | 0.9703 | 0.0971 | 14.3909 | 0.0496 |
Testing | 0.8381 | 0.1391 | 23.8338 | 0.1058 | |||
Da-lin Bridge | 8-4-1 | p = 1–8 | Training | 0.9712 | 0.0518 | 4.7980 | 0.0232 |
Testing | 0.8430 | 0.1215 | 10.2186 | 0.0627 | |||
Shui-yuan Bridge | 7-2-1 | p = 1–7 | Training | 0.9765 | 0.0628 | 3.1509 | 0.0189 |
Testing | 0.8487 | 0.1122 | 13.9100 | 0.0819 | |||
Huang-ju-pi-liao | 5-3n-1 | p = 1–5 | Training | 0.9662 | 0.1869 | 18.8125 | 0.0993 |
Testing | 0.5232 | 0.4083 | 47.0013 | 0.2441 |
Station . | Structure (input-rule-output) . | Input variables (Δyt−p) . | Dataset . | R . | RMSE . | MAPE . | MAE . |
---|---|---|---|---|---|---|---|
Ping-lin | 7-3-1 | p = 1–7 | Training | 0.9345 | 0.2004 | 15.6456 | 0.0870 |
Testing | 0.8091 | 0.2247 | 32.5112 | 0.1165 | |||
Dai-yu-ku Creek | 5-2-1 | p = 1–5 | Training | 0.8829 | 0.2244 | 38.2991 | 0.1468 |
Testing | 0.8389 | 0.2427 | 41.1580 | 0.1564 | |||
Jin-gua-liau Creek | 5-3-1 | p = 1–5 | Training | 0.8255 | 0.3193 | 26.8040 | 0.1281 |
Testing | 0.8100 | 0.3397 | 42.8392 | 0.2096 | |||
Kuo-lai | 6-2-1 | p = 1–6 | Training | 0.9453 | 0.1124 | 20.5781 | 0.0689 |
Testing | 0.8935 | 0.1548 | 20.7423 | 0.0988 | |||
Bi-hu | 6-3-1 | p = 1–6 | Training | 0.9703 | 0.0971 | 14.3909 | 0.0496 |
Testing | 0.8381 | 0.1391 | 23.8338 | 0.1058 | |||
Da-lin Bridge | 8-4-1 | p = 1–8 | Training | 0.9712 | 0.0518 | 4.7980 | 0.0232 |
Testing | 0.8430 | 0.1215 | 10.2186 | 0.0627 | |||
Shui-yuan Bridge | 7-2-1 | p = 1–7 | Training | 0.9765 | 0.0628 | 3.1509 | 0.0189 |
Testing | 0.8487 | 0.1122 | 13.9100 | 0.0819 | |||
Huang-ju-pi-liao | 5-3n-1 | p = 1–5 | Training | 0.9662 | 0.1869 | 18.8125 | 0.0993 |
Testing | 0.5232 | 0.4083 | 47.0013 | 0.2441 |
The performance of: (a) different number of rules in the hidden layer for the training and testing datasets at Ping-lin station; and (b) different membership functions used in the ANFIS model at Ping-lin station.
The performance of: (a) different number of rules in the hidden layer for the training and testing datasets at Ping-lin station; and (b) different membership functions used in the ANFIS model at Ping-lin station.
Simulated and measured BOD5 of ANFIS models at: (a) Ping-lin; (b) Dai-yu-ku Creek; (c) Jin-gua-liau Creek; (d) Kuo-lai; (e) Bi-hu; (f) Da-lin Bridge; (g) Shui-yuan Bridge; and (h) Huang-ju-pi-laio.
Simulated and measured BOD5 of ANFIS models at: (a) Ping-lin; (b) Dai-yu-ku Creek; (c) Jin-gua-liau Creek; (d) Kuo-lai; (e) Bi-hu; (f) Da-lin Bridge; (g) Shui-yuan Bridge; and (h) Huang-ju-pi-laio.
Comparisons of ARIMA/ANN/ANFIS models and other studies



Distribution of the ARIMA, ANN, and ANFIS models discrepancy ratios. The data number 1 represents the first datum (January 1986) at Ping-lin station and the data number 2317 represents the last datum (May 2009) at Huang-ju-pi-liao station.
Distribution of the ARIMA, ANN, and ANFIS models discrepancy ratios. The data number 1 represents the first datum (January 1986) at Ping-lin station and the data number 2317 represents the last datum (May 2009) at Huang-ju-pi-liao station.
In the last 10 years, several studies have demonstrated a successful implementation of ANN modeling in the prediction of BOD concentrations, but very few cases employed ANFIS. The results obtained in this study are compared with those results reported in the ANN related literature. Oliveira-Esquerre et al. (2004) developed a multilayer perceptron neural network (MLP) combined with partial least square (PLS) method, called PLS-MLP, to predict inlet and outlet BOD of an aerated lagoon in Brazil. In their study, the PLS-MLP approach was the best of any method evaluated, and the results of BOD prediction in terms of coefficient of determination (R2) were 0.63–0.81. Singh et al. (2009) computed DO and BOD levels in the Gomti River, India using a three-layer feed-forward ANN model with the back propagation algorithm. The R2 of simulated and measured BOD values for the training, validation, and testing sets were 0.85, 0.85, and 0.77, respectively. The respective values of RMSE for the three datasets were 2.25 for training, 1.84 for validation, and 1.38 for testing. Basant et al. (2010) described the ANN modeling for simultaneous prediction of the DO and BOD levels in the river water and the R2 of simulated and measured BOD concentrations for the training, validation, and testing sets were 0.90, 0.76, and 0.74, respectively. Verma & Singh (2013) studied the ANN models to predict the BOD and COD concentrations and the corresponding R values were 0.976 and 0.981, respectively. In this study, the R values for the training and testing sets at all sampling stations were in the ranges 0.83–0.98 and 0.52–0.89, respectively, and the results are comparative to those reported in the previous literature. Compared to other countries worldwide, the variation of BOD5 in Taiwan is relatively high and the excellent fitting of simulated results implies the powerful capability of the ANFIS method to the water quality prediction.
CONCLUSIONS
Water quality modeling of time series forecasting is an important, yet difficult, task that is required to make appropriate decisions about water systems. Despite the numerous time series models currently available, research for improvement on the effectiveness and accuracy of forecasting models has never ceased. This study demonstrates the potential of applying ANFIS in time series modeling of BOD5 at the upstream catchment of Feitsui Reservoir, Taiwan for the first time. The systematic procedure of input selection and the pre-processing of input data during the model development are also presented.
In this paper, a large amount of BOD5 samples was collected at eight sampling stations from 1986 to 2012 and developed into ANFIS models. The simulated results of ANFIS models were compared to those of both linear ARIMA and nonlinear ANN models. The data pre-processing of using differencing was implemented to the original time series data during the model development. The Box-Jenkins method of using a PACF chart to discover the appropriate lagged times in the ARIMA models was also applied and used to determine appropriate input variables in both ANN and ANFIS models. The performance of testing dataset was carefully examined during the training process of ANNs and ANFIS to avoid overfitting. Simulated results showed that the ANFIS model developed at each sampling station was superior to the ARIMA and ANN models in terms of the performances evaluated by R, RMSE, MAPE, and MAE. The excellent performance indicates the capability of the ANFIS method, particularly valuable for the relatively high variation of BOD5 in Taiwan, and further stimulates the applications of the ANFIS method in other study areas.
ANFIS models can provide effective and accurate predictions for complex hydrological processes of BOD5 because of its nonlinearity in nature and linguistic uncertainty. ANFIS can be seen as a predictive alternative to traditional modeling techniques and extended to other parameters of water quality or other areas to improve the understanding of river pollution trends. The time series modeling of using only time lagged variables can avoid the tedious work of collecting different types of water quality data, and the procedure of input selection and the pre-processing of input data presented in this study can intimate to other related studies. Further studies using more data from more watersheds may be required to strengthen these conclusions. Besides, the issue of climate change also has implications for the variations of BOD5, and is a good candidate for future research.
ACKNOWLEDGEMENTS
This material is based on the work supported by National Science Council (NSC) under award NSC 102-2116-M-019-001. The data supported by Taipei Feitsui Reservoir Administration and Taipei Water Management Committee are gratefully acknowledged.