Kelantan river (Sungai Kelantan in Malaysia) basin is one of the essential catchments as it has a history of flood events. Numerous studies have been conducted in river basin modelling for the prediction of flow and mitigation of flooding events as well as water resource management. Therefore, having multi-step-ahead forecasting for river flow (RF) is of important research interest in this regard. This study presents four different approaches for multi-step-ahead forecasting for the Kelantan RF, using NARX (nonlinear autoregressive with exogenous inputs) neural networks and deep learning recurrent neural networks called LSTM (long short-term memory). The dataset used was obtained in monthly record for 29 years between January 1988 and December 2016. The results show that two recursive methods using NARX and LSTM are able to do multi-step-ahead forecasting on 52 series of test datasets with NSE (Nash–Sutcliffe efficiency coefficient) values of 0.44 and 0.59 for NARX and LSTM, respectively. For few-step-ahead forecasting, LSTM with direct sequence-to-sequence produces promising results with a good NSE value of 0.75 (in case of two-step-ahead forecasting). However, it needs a larger data size to have better performance in longer-step-ahead forecasting. Compared with other studies, the data used in this study is much smaller.

  • Four different approaches for multi-step-ahead forecasting for river flow.

  • Two forecasting approaches are performed in multivariate approach using NARXNN (nonlinear autoregressive with exogenous inputs - neural networks) with respectively recursive and direct auto-regressive mode.

  • Two forecasting approaches are performed using LSTM (long short-term memory) with respectively recursive univariate and sequence-to-sequence multivariate approach.

  • LSTM with a direct sequence-to-sequence multivariate approach performs the best for few-step-ahead forecasting in the limited data size. It has promising application for longer-step-ahead forecasting provided that the data size is sufficiently large.

Graphical Abstract

Graphical Abstract
Graphical Abstract

In a time-series sequence, prediction can be classified into two categories, known as single-step ahead and multi-step ahead (Saroha & Aggarwal 2014). Predicting multiple time steps into the future is called multi-step time-series forecasting. It also includes the forecasting of some variables for some future time steps, given over a significant time span of data (Dabrowski et al. 2020). Moreover, the extent of future forecasting is known as the forecasting horizon. Highly chaotic time series and those with missing data pose important issues in multi-step-ahead prediction, which have been addressed using non-linear filters and neural networks (Chandra et al. 2021). According to Guo et al. (2020), the recursive strategy has been normally applied in several fields among five strategies to address a multi-step-ahead prediction task that has been suggested in the literature. Besides the recursive strategy, a sequence-to-value and sequence-to-sequence forecasting approach has also been used in multi-step-ahead forecasting (Wunsch et al. 2021).

Numerous studies were conducted with regard to this multi-step-ahead predictive modelling. Hernandez-Ambato et al. (2017) conducted research to predict the future reservoir level of a hydroelectric dam in a hydroelectric power station located in Ecuador. Open-Loop Prediction (OLP) and Closed-Loop Prediction (CLP) techniques were used in the study. The results showed that CLP models were better than OLP models due to the results obtained from 22 days of testing in a real environment since the authors compared both the models using the same horizon and season. Yu et al. (2011) applied eight different training algorithms to forecast the water level for 1- to 5 days ahead in the Heshui catchment in China. It was found that all of the models give satisfactory results in 1-day-ahead forecasting where R2 values ranged from 0.85 to 0.932 and RMSE values ranged from 0.094 to 0.115. However, the performances of all models dipped with an increase of the time steps (from 1 to 5 days ahead). Kisi et al. (2015) applied multi-step-ahead modelling to forecast daily lake water levels for three different horizons, which was 1–7 days ahead in Urmia Lake in Iran. The models were established using support vector machine (SVM), together with a firefly algorithm (FA). The authors observed that 1-day-ahead forecasting model was much better as they achieved a value of 0.9999 for R2 and was known to be more accurate than the 7-days-ahead prediction. Other previous studies conducted using the same approach were Chang et al. (2014), who established a real-time multi-step-ahead water level prediction using RNN for urban flood control in Yu-Cheng Pumping Station, Taiwan, as well as Saroha & Aggarwal (2014), who applied multi-step-ahead prediction of wind power using genetic algorithm (GA)-based neural networks. RNN was also used in the study by (Granata & Di Nunno 2021) where RNN-based models using LSTM were built for the prediction of short-term-ahead actual evapotranspiration. With reference to the subtropical climatic conditions of South Florida, LSTM models proved to be more accurate than NARX models, while some exogenous variables such as sensible heat flux and relative humidity did not affect the results significantly.

A parameter that is significant in the hydrological cycle and has the easiest access to the local area is known as river flow (RF). By setting up a portion of the vital relationships that happen between social, physical, environmental, and financial processes, this water parameter takes an essential position in hydrology (Bhagwat & Maity 2012). For over a half century, RF prediction has been attracting the interest of researchers. Nowadays, there are two techniques for water flow prediction. The first is a physically based model that contains mathematical models that trigger the hydrodynamic system of the flow of water. The second comprises data-driven models that are constructed on a statistical relationship between the input and the output variable (Le et al. 2019). This water parameter prediction can prevent conceivable flood harm and water deficiencies as well as assist water management in the agricultural field, thus providing huge financial benefits (Atiya et al. 1999). Furthermore, forecasting river flow has been found to be difficult to measure due to the nonlinear, time varying, and indeterminate nature of river flow data. (Zaini et al. 2018).

Zaini et al. (2018) used the SVM-based model, together with particle swarm optimization (PSO), to predict short-term daily RF at the Upper Bertam Catchment located in Cameron Highland, Malaysia. The authors made four SVM models where SVM1 and SVM-PSO1 contained only historical data, while SVM2 and SVM-PSO2 contained historical data and meteorological variables as input. The outcomes showed that SVM2 and SVM-PSO2 outperformed SVM1 and SVM-PSO1. Moreover, the performance of the hybrid models was much better than that of the basic SVM models. Sahoo et al. (2019) applied LSTM, together with RNN (LSTM-RNN) and the AI method, to predict low-flow series using daily discharge data that were obtained from the Basantapur gauging station, India. The results showed that the LSTM-RNN model outperformed the RNN model and other naïve methods that the authors used for comparison. The value of R (correlation coefficient), Nash–Sutcliffe efficiency () and RMSE for LSTM-RNN obtained were 0.943, 0.878 and 0.487, respectively. Moreover, Noor et al. (2017) applied the NARX model to predict rainfall-based RF in Pelarit and River Jarum, Perlis, Malaysia, and successfully established a model flow of the river 1 day (24 h) in advance according to the current rainfall rates. The authors achieved 0.045, 0.013 and 0.9985 values for RMSE, MAPE and R2, respectively, in Jarum River, while in Pelarit River, the values of RMSE, MAPE and R2 were 0.0113, 0.0038 and 0.999, respectively. Thus, it showed that research was successfully conducted. Furthermore, Zhang et al. (2019) applied a time-series analysis model using autoregressive integrated moving average (ARIMA) as well as a multilayer perceptron neural network (MLPNN) to forecast wastewater inflow in Barrie Wastewater Treatment Facility in Canada.

With regard to recurrent neural network (RNN) and its proven history of performances particularly for an intricate nonlinear system, a dynamic neural network that was established on nonlinear auto-regressive models with exogenous input (NARX) models was extensively chosen. The NARX model, which has a limited number of parameters, is known to be as strong as a connected RNN and is computationally efficient (Tijani et al. 2014). Furthermore, it is considered as a part of RNN, having feedback connections surrounding the hidden layers of the network (Marcjasz et al. 2019). Furthermore, it can be utilized in multi-time-series input and output application since it is a different category of artificial neural network (ANN) that is fit to model time-series and nonlinear systems (Ardalani-Farsa & Zolfaghari 2010).

Di Piazza et al. (2016) established the NARX model to execute hourly speed and solar irradiation prediction based on a multi-step-ahead technique. Temperature is the chosen exogenous variable. The NARX model has been optimized by a GA along with an optimal brain surgeon's (OBS) strategy. The time horizons used in the research ranged from 8 to 24 h ahead. As a result, the greatest outcomes are 8 and 24 h for wind speed, while they are 8 and 10 h for solar irradiation. Di Nunno & Granata (2020), on the other hand, established the NARX model to forecast daily groundwater level fluctuations for 76 monitored springs located in Apulian territory, Italy. Daily groundwater level data and training algorithms such as LM, Bayesian Regularization (RB) and Scaled Conjugate Gradient (SGC) were used in the research. The outcomes showed that the BR training algorithm produced greater accuracy in groundwater level prediction. In another study (Di Nunno et al. 2021), an unprecedented application of nonlinear AutoRegressive with eXogenous inputs (NARX) neural networks to the prediction of spring flows was shown. Discharge prediction models were developed for nine monitored springs located in the Umbria region, along the carbonate ridge of the Umbria-Marche Apennines. In the modelling, precipitation was also considered as an exogenous input parameter. Good performances were achieved for all springs and for both short-term and long-term predictions, passing from a lag time equal to 1–12 months. Furthermore, Wunsch et al. (2018) also applied NARX to forecast short- and mid-term groundwater levels for several groups of aquifers in the states of Baden-Wuttemberg, Bavaria and Hesse, Germany. Precipitation and temperature were chosen as the input of the model. As a result, the NARX model was successfully used to conduct groundwater predictions for uninfluenced observation wells in all aquifer types, but it gave contrasting results for influenced observation wells. In addition, RNN was applied to simulate the operation of three multipurpose reservoirs located in the upper Chao Phraya River basin (Yang et al. 2019). Three RNNs, namely nonlinear autoregressive models with exogenous input (NARX), LSTM and genetic algorithm–based NAXR (GA-NAXR) for reservoir operation, were based on historical data. The results showed that GA-NARX had the highest accuracy among three RNNs and was more stable than the original NARX by optimizing the initial conditions, although it took a longer training time than NARX and LSTM.

Apart from NARX, an LSTM network deep learning as an expansion of RNN also has greater capacity in time-series data prediction. The major contrast between RNN and LSTM is that it can suitably plot between input and output data as well as keep long-range time reliance data (Abbasimehr et al. 2020). According to Zhou et al. (2018), the LSTM network system is highly acceptable to deal with the continuation of data like water quality data since LSTM has a particular memory function (Zhou et al. 2018). There are several gates in the LSTM network, namely, forget gates, output gates and input gates as well as a memory cell in every neuron (Vu et al. 2020; Jia & Zhou 2020).

A number of research studies conducted using the LSTM model, such as the one by Zhang & Jin (2020), have used automatic encoder, together with LSTM, to forecast the concentration of total phosphorous (TP) and total nitrogen (TN) using 13 different variables as inputs. The authors found that the TP model outperformed the TN model by having R2 and RMSE values of 0.924 and 0.0002, respectively, while the TN model achieved R2 and RMSE values of 0.909 and 0.024, respectively. Liu et al. (2019) also applied LSTM to forecast dissolved oxygen (DO) in the next 10 days (m = 10) and the next 6 months (m = 181). The authors achieved MSE values of 0.0020 and 0.0017 for m = 181 and m = 10, respectively. Furthermore, water quality forecasting based on the LSTM learning network was used by Hu et al. (2019) to forecast water temperatures and pH values. The outcomes showed that the accuracy levels from the forecasting were 98.97% and 98.56% for water temperature and pH values, respectively. As for long-term forecasting, accuracy levels of up to 96.88% and 95.76% for water temperature and pH values, respectively, were achieved. Le et al. (2019) also established some LSTM models to predict floods at the Hoa Binh Station on Da River, Vietnam.

Based on the literature review presented, it can be observed that LSTM and NARX forecasting models are commonly used in multi-step-ahead prediction involving nonlinear multivariate data. The challenge in multi-step-ahead modelling is that normally the model achieves high accuracy for one or a few steps ahead. In other words, most of the studies with high accuracy conducted using NARX and LSTM are applied for one or a few steps ahead of forecasting. This study aims to build a model for multi-step-ahead forecasting of RF based on NARX Neural Networks and LSTM in univariate and multivariate approaches. In NARX, we will use direct autoregression and a recursive model. Unlike the recursive NARX approach, the direct autoregression NARX approach tries to train a multi-step-ahead prediction model directly without requiring future exogenous inputs to make multi-step prediction. In LSTM, we will use a direct sequence-to-sequence multivariate model and also a recursive univariate model, both of which will be discussed in Section 2.

Study area

The Kelantan river basin is one of the important catchments as it has a history of flood events (Pradhan & Youssef 2011; Nashwan et al. 2018). The catchment represents most of the land area of Kelantan State, as shown in Figure 1. Several stations such as rainfall, water level, evaporation, water quality and meteorological stations operate in the area.

Figure 1

The study area (using QGIS©).

Figure 1

The study area (using QGIS©).

Close modal

RF data

Machine learning and deep learning algorithms, such as ANN and LSTM, have been successfully proven for data-driven predictive modelling applications. Therefore, the main ingredient for the success of predictive modelling is the data itself in addition to the training algorithm developed. In this study, the RF data were collected with some meteorological parameters. The time-series RF data were collected from the north of Kuala Krai city downstream (merger of two main tributaries before discharging into the sea). The original data were recorded and compiled, which consisted of 348 monthly samples of the Sungai Kelantan RF () spanning from January 1988 to December 2016 (29 years). Rainfall and evaporation are usually measured in a determined station, and only the computed area weighted rainfall (WR) is used to evaluate the whole area rainfall quantity (Faisal & Gaffar 2012). The RF was mainly from one main station (Guillemard), while the WR and evaporation (secondary data) were over the whole river catchment. Table 1 shows the attributes and its basic statistical properties of the data collected and used in this study. The time-series graph is shown in Figure 2. As we can see, the RF pattern is close to the time-series pattern of the WR. Meanwhile, the average evaporation (AE) pattern cannot be visibly observed in relation to RF as the amplitude is relatively small.

Table 1

Variables and their attributes

Name of variableRiver flowWeighted rainfallAveraged evaporation
Notation (unit)   (mm)  (mm) 
Max value 2,853.2 725.8 173.8 
Min value 70.5 3.0 57.6 
Mean value 471.8 207.7 114.4 
Name of variableRiver flowWeighted rainfallAveraged evaporation
Notation (unit)   (mm)  (mm) 
Max value 2,853.2 725.8 173.8 
Min value 70.5 3.0 57.6 
Mean value 471.8 207.7 114.4 
Figure 2

The compiled monthly time-series data (January 1988–December 2016).

Figure 2

The compiled monthly time-series data (January 1988–December 2016).

Close modal

Preliminary data analysis

The stage of preliminary data analysis and pre-processing is crucial in the initial stage of the machine learning model building. This process can significantly affect the prediction accuracy in any type of data (Hayder et al. 2020). The purpose of this study is to build a multi-step-ahead predictive model for RF using a machine learning/deep learning-based approach.

There are basically two ways of performing multi-step-ahead forecasts. It can be produced recursively by iterating a one-step-ahead model, or directly using a specific model for each span (Ben Taieb et al. 2012). In addition, the recorded RF data, together with other climate variables, are basically time-series data. Thus, the multi-step-ahead time-series prediction model can be developed based on two strategies, namely, univariate and multivariate time-series prediction. In the univariate time-series forecasting, it is assumed that RF has its seasonality and it depends on its past values only. On the other hand, in multivariate time-series forecasting, the RF will also depend on the other variables and their past values.

In this study, the multi-step-ahead prediction is performed using a nonlinear machine learning-based model with multivariate and univariate approaches. The two algorithms used are NARX (nonlinear autoregressive with exogenous inputs) Neural Networks and recurrent Neural Networks called LSTM, which fall under the category of deep learning.

Some inherent preliminary data analyses in time-series forecasting are autocorrelation and cross-correlation analysis. Autocorrelation is the correlation of a variable with itself at different time lags. It is used mainly in univariate forecasting mode. For example, represents correlation between RF at the present time () and its lag-1 value () and so on. The autocorrelation coefficients of time-series variable y can be expressed as
formula
(1)

This coefficient is plotted against the lag value (resulting autocorrelation function plot called correlogram.

On the other hand, the cross-correlation function (CCF) seeks a relationship between two time series. For example, it measures the correlation between the series of WR and shifted (lagged) series of RF as a function of the lag. As another instance, the time-series RF may depend on the past lags of WR and AE. The sample CCF will be helpful for identifying lags of the input variable that might be useful predictors of the target variable, i.e. RF. The cross-correlation between and is called and is expressed as
formula
(2)

Data preparation

Prior to feeding the data into the machine learning/deep learning, some data preparation needs to be carried out. First, the data is partitioned into train and test split. Train dataset is used to build the predictive model, which is otherwise called model training. Test dataset is used to evaluate the performance of the model after the model training has been completed. The first 280 samples, i.e. ≈80% (sequence from Jan 1988 to April 2011), are used as train dataset, while the test set is the sequence from April 2011 to December 2016 (81 samples). Notice that there is some sequence overlap in the data partition and we do not have large samples to split. In the overlapped sequence, only the target data is used during the training, which is predicted using feature/input variables in the previous sequence. Thus, the input/feature variables in the overlapped sequence still can be used as test dataset. The illustration of train-test data partition is shown in Figure 3. Furthermore, as the data amplitudes are in wide range, scaling is also performed to normalize the amplitudes of the feature inputs at a range of 0 to 1.

Figure 3

Train and test data partition.

Figure 3

Train and test data partition.

Close modal
Figure 4

The architecture of NARXNN: (a) parallel and (b) series parallel.

Figure 4

The architecture of NARXNN: (a) parallel and (b) series parallel.

Close modal

NARX- and LSTM-based approaches for multi-step-ahead prediction

There are two machine learning models used in this study as mentioned, namely NARX and LSTM. The architecture of NARX can be seen in Figure 4, where two types of connection are common (Thapa et al. 2020). NARX is used in multivariate mode that is applied to predict some step-ahead values of RF based on current and past values (lags) of WR and AE.

Figure 5

Illustration of sequence-to-sequence prediction model used in LSTM1.

Figure 5

Illustration of sequence-to-sequence prediction model used in LSTM1.

Close modal
The NARX model is basically a modified NAR (nonlinear autoregressive) model by including another relevant time-series variable as extra input to the forecasting model, which can be expressed as the following two equations for parallel (a.k.a closed loop) and series parallel (a.k.a open loop), respectively (Pena et al. 2020):
formula
(3)
formula
(4)
where is the mapping function performed by neural networks, is the external input variable (s), is the output variable, p is auto regression order, q is the exogenous input order and d is the exogenous delay number.
Here, we use two different approaches for NARX, namely the recursive (or parallel as in Equation (3)) approach and the direct autoregression approach, where the model is attempting to train a multi-step-head-prediction model directly. We refer to this model as NARX1 (following Equation (3)) and NARX2 (following Equation (5)) throughout the manuscript. Given the output time series to predict and exogenous inputs , the model will generate output/target as
formula
(5)
with k being the prediction step.

Furthermore, LSTM is a type of RNN that was proposed to solve this issue by explicitly introducing a memory unit called the cell into the network. A single memory unit makes a decision by considering the current input, previous output and previous memory and generates a new output and alters its memory. In addition to NARX, LSTM is also used here for multi-step-head-prediction with two different approaches, namely, sequence-to-sequence multivariate approach (we call LSTM2) and univariate approach (we call LSTM1). The illustration of the sequence-to-sequence multivariate approach (LSTM2) is shown in Figure 5 to indicate how many sequences of past values of exogenous inputs () will be used and how many step-ahead predictions will be performed (). The training data will be arranged as per Figure 5, by arranging q sequence of input variables with the next k target variable (including the current step).

For LSTM1, univariate approach is implemented to predict some step-ahead values of RF based on its past values. This univariate model is basically an NAR (nonlinear autoregressive) model, which can be expressed as (Sarkar et al. 2019):
formula
(6)

Metric of model evaluation

Once the model training or learning process is done, i.e. when the model fits the training data, the predictive model developed needs to be evaluated using both training dataset and test dataset. The common metric for regression model accuracy can be found such as in (Pena et al. 2020). In this study, the obtained model accuracy is evaluated by calculating RMSE (root mean squared error) value and Nash–Sutcliffe efficiency coefficient (NSE), which are defined, respectively, as follows:
formula
(7)
formula
(8)

NSE is widely used to evaluate the performance of hydrological models. NSE is even better than other metrics, such as the coefficient of determination, a.k.a regression coefficient. An NSE value of above 0.75 is considerably a good fit model, while less than 0.5 indicates unsatisfactory model performance (Pena et al. 2020).

The experimentation is performed for the predictive model with four different approaches using NARX neural networks and LSTM, namely NARX1, NARX2, LSTM1 and LSTM2. These model approaches are listed in Table 2, together with their information. Despite the fact that experimentation in this study is performed for a local study, the modelling approach presented in this paper can be extended and applied for other international studies. Generally, the predictive modelling approach can also be used not only for RF but also for other hydrology-related variables such as sediment load, water quality, ground water level, etc. This generic data-driven approach can be summarized as a flowchart in Figure 6 that begins with data collection and ends with a decision on possible model deployment upon satisfactory model evaluation.

Table 2

Summary of the four predictive model for RF

Multi-step-ahead predictive modelApproach
NARX1 Recursive (closed loop) approach; refer to Equation (3) 
NARX2 Direct auto-regressive approach; refer to Equation (5) 
LSTM1 Univariate autoregressive approach; refer to Equation (6) 
LSTM2 Sequence-to-sequence multivariate approach; refer to Figure 5  
Multi-step-ahead predictive modelApproach
NARX1 Recursive (closed loop) approach; refer to Equation (3) 
NARX2 Direct auto-regressive approach; refer to Equation (5) 
LSTM1 Univariate autoregressive approach; refer to Equation (6) 
LSTM2 Sequence-to-sequence multivariate approach; refer to Figure 5  
Figure 6

The generic predictive model development approach.

Figure 6

The generic predictive model development approach.

Close modal

Multi-step-ahead prediction using NARX neural networks

The first experimentation is to obtain an optimum NARX model involving lag/delay number () and order for the input and the auto regressive order . Here, autocorrelation and cross-correlation analysis are observed. Figure 7 shows the sample autocorrrelation function for the RF data, which indicates strong correlation of the present RF to its past value at a lag of 1 and around a lag of 12, i.e. for the past 1 month and for the past 12 months.

Figure 7

Sample auto correlation function for the time-series RF.

Figure 7

Sample auto correlation function for the time-series RF.

Close modal
Figure 8

Sample cross-correlation function for RF-WR and RF-AE.

Figure 8

Sample cross-correlation function for RF-WR and RF-AE.

Close modal

Furthermore, cross-correlation between RF and WR as well as between RF and AE are also observed. The result is shown in Figure 8, which indicates the following situation. RF has a positive and negative strong correlation to the current value (lag 0) of WR and AE, respectively. It also reveals a pretty similar pattern at lag 11 and at lag 12 particularly for AE. With this observation, some experimention to obtain the optimum NARX model is performed. The NARX model notation follows the values of set for the model building; for example, NARX_2-12-0 means the model is set with , and .

Table 3 shows the ANN setup used in the NARX model. Table 4 shows the experimental results to obtain the optimum model for the order and lags involved. We start the experiment with a two-step-ahead prediction model and the performance metrics are shown. As we can see, NARX_13-13-0 gives the optimum model and, therefore, this model is used for the rest of the experimentations in different step ahead. The models are evaluated using the test data as shown in Table 4, where NARX_13-13-0 produces and . Throughout the rest of the discussion, this model is called NARX2, which is the direct autoregression approach. We perform the evaluation up to . Above this step, the model performs poorly and is, therefore, not shown. Other poor performances with different delays and orders are also not shown. Later, we will compare the result with NARX1, which is a recursive approach as well as LSTM1 and LSTM2.

Table 3

ANN hyperparameter setup used in NARX

ItemsValueRemarks
Number of neurons [5, 5] Using two hidden layers 
Learning rate 0.01 Using a constant learning rate 
Max epoch 1,000 Interaction using early stopping 
Solver ‘Adam’ Reference (Kingma & Ba 2014
Activation function Relu Rectified linear unit 
ItemsValueRemarks
Number of neurons [5, 5] Using two hidden layers 
Learning rate 0.01 Using a constant learning rate 
Max epoch 1,000 Interaction using early stopping 
Solver ‘Adam’ Reference (Kingma & Ba 2014
Activation function Relu Rectified linear unit 
Table 4

Results for the NARX model evaluated on test data

Step-ahead predictionNARX modelNSERMSE
 NARX_2-12-0 0.36 224.19 
NARX_2-13-0 0.40 217.49 
NARX_13-12-0 0.35 226.87 
NARX_13-13-0 0.41 215.37 
NARX_13-3-10 0.34 228.99 
NARX_13-4-10 0.37 224.89 
 NARX_13-13-0 0.31 235.46 
 NARX_13-13-0 0.41 218.23 
 NARX_13-13-0 0.32 233.98 
 NARX_13-13-0 0.35 223.38 
Step-ahead predictionNARX modelNSERMSE
 NARX_2-12-0 0.36 224.19 
NARX_2-13-0 0.40 217.49 
NARX_13-12-0 0.35 226.87 
NARX_13-13-0 0.41 215.37 
NARX_13-3-10 0.34 228.99 
NARX_13-4-10 0.37 224.89 
 NARX_13-13-0 0.31 235.46 
 NARX_13-13-0 0.41 218.23 
 NARX_13-13-0 0.32 233.98 
 NARX_13-13-0 0.35 223.38 

The next result shows the experimentation on NARX1. With the same setup of delay and order as in NARX2, Figures 9 and 10 show the forecasting result on training data. The result indicates the good fit of the model with NSE = 0.85 and RMSE = 152.29. The higher errors are shown particularly for some extreme values for certain months. In Figure 10, the blue diagonal line represents the regression line for the observed vs forcasted values.

Figure 9

Forecasting result and error graph of NARX1 on training data set.

Figure 9

Forecasting result and error graph of NARX1 on training data set.

Close modal
Figure 10

Observed vs forecasted value of NARX1 on training data set.

Figure 10

Observed vs forecasted value of NARX1 on training data set.

Close modal

Furthermore, Figures 11 and 12 show the sample of graph of the forecasting result on test data where the model is used to perform 52 step-ahead prediction recursively in closed-loop mode. The result indicates the moderate capability of the model with NSE = 0.44 and RMSE = 208.85. The result of NARX1 on test data is slighly better than that of NARX2. However, NARX1 has a disadvantage. It needs the information of immediate past exogenous inputs to make prediction for . This means this approach is not really multi-step-ahead as compared with the direct approach (NARX2).

Figure 11

Forecasting result and error graph of NARX1 on test data set.

Figure 11

Forecasting result and error graph of NARX1 on test data set.

Close modal
Figure 12

Observed vs forecasted value of NARX1 on test data set.

Figure 12

Observed vs forecasted value of NARX1 on test data set.

Close modal

Multi-step-ahead prediction using univariate and multivariate LSTM

In this section, the result using LSTM is presented. LSTM is a specialized deep learning recurrrent neural network able to learn and preserve long-term dependencies that were proposed in the late 90's (Hochreiter & Schmidhuber 1997). As mentioned earlier, the LSTM model used in this experiment is basically a univariate NAR model referred to in Equation (5). Table 5 shows the LSTM setup used during the training. This setup was chosen after a series of experimentations. There is a better way to choose the best hyperparameter setup such as using a grid search. However, this is beyond the scope of our research at the moment.

Table 5

LSTM hyperparameter setup during training

ItemsValueRemarks
Number of neurons 20 Using 1 LSTM layer 
Learning rate 0.01 Using constant learning rate 
Max epoch 100 Interaction using early stopping 
Batch size 32 – 
Solver ‘Adam’ Reference (Kingma & Ba 2014
Activation functions Relu and linear Rectified linear unit (Relu) at LSTM and linear at output (dense) layer 
Dropout 0.2 Applying LSTM and dense layers 
ItemsValueRemarks
Number of neurons 20 Using 1 LSTM layer 
Learning rate 0.01 Using constant learning rate 
Max epoch 100 Interaction using early stopping 
Batch size 32 – 
Solver ‘Adam’ Reference (Kingma & Ba 2014
Activation functions Relu and linear Rectified linear unit (Relu) at LSTM and linear at output (dense) layer 
Dropout 0.2 Applying LSTM and dense layers 

As mentioned earlier, we use two LSTM approaches, namely, univariate recursive (LSTM1) and direct sequence-to-sequence multivariate (LSTM2). While the recursive approach is able to make longer-term prediction, the direct sequence-to-sequence approach may not be able to make long-term prediction as it also needs a long sequence to train the model. This may not be practical when the data is not sufficiently large like in our case. LSTM2 will suffer from a lack of data. We will see the results in the following discussion.

Figure 13 shows the recursive forecasting result of LSTM1 on the test data where RMSE = 191.42 is achieved, which is slightly lower than that achieved with NARX1. Figure 14 shows the observed vs predicted values, and NSE = 0.59 is achieved. This result indicates that LSTM capability for multi-step-ahead prediction in the univariate (NAR) approach is applied in this case.

Figure 13

Forecasting result and error graph on test data set (LSTM1).

Figure 13

Forecasting result and error graph on test data set (LSTM1).

Close modal
Figure 14

Observed vs forecasted value on test data set (LSTM1).

Figure 14

Observed vs forecasted value on test data set (LSTM1).

Close modal

Next, we will show the experimentation result with LSTM2. This approach uses the multivariate direct sequence-to-sequence approach so that we need to specify the step size of the input () and the step size of the output (for k step-ahead prediction). Here, we set as to make the same model order with that of NARX1 and NARX2. The experimentation result is shown in Table 6 for a different number of k. The result is evaluated by rolling on -step-ahead prediction using some portions of test data. We also give the results of NARX2 in the last two columns for comparison. It can be seen that LSTM2 performs better up to a seven-step-ahead prediction. The model performance degrades as increases above 7. This is due to the fact that the LSTM2 model is trained only with = 13. It is most likely that LSTM needs a longer input sequence to make a longer step-ahead prediction. However, this will need more data to be used. The nature of deep learning is that it needs large data to learn from more complex processes. How much of training data still remains is an open question in research (Wunsch et al. 2021). In addition, the model may also need different number of neurons and batch size. This should be an optimization work such as using the grid search method, which is beyond our scope at the moment. Figure 15 shows a sample of time-series graph for two-step-ahead forecasting using LSTM2 rolling on some portions of test data.

Table 6

Experimentation results with different -step ahead in LSTM2

Step-ahead predictionLSTM2
NARX2
NSERMSENSERMSE
 0.78 119.44 0.35 226.87 
 0.54 168.45 0.31 235.46 
 0.42 196.49 0.41 218.23 
 0.39 201.36 0.32 233.98 
 0.42 188.86 0.33 231.97 
 0.23 220.52 0.28 243.54 
 0.13 239.36 0.31 230.07 
 0.13 231.20 0.41 216.68 
 ≍0 259.75 0.35 223.38 
Step-ahead predictionLSTM2
NARX2
NSERMSENSERMSE
 0.78 119.44 0.35 226.87 
 0.54 168.45 0.31 235.46 
 0.42 196.49 0.41 218.23 
 0.39 201.36 0.32 233.98 
 0.42 188.86 0.33 231.97 
 0.23 220.52 0.28 243.54 
 0.13 239.36 0.31 230.07 
 0.13 231.20 0.41 216.68 
 ≍0 259.75 0.35 223.38 
Figure 15

Sample of two-step-ahead forecasting using LSTM2 rolling on some portions of test data.

Figure 15

Sample of two-step-ahead forecasting using LSTM2 rolling on some portions of test data.

Close modal

On the other hand, NARX2 is seen to have more stable performance, at least up to , despite having inferior performance for a shorter step ahead. This suggests that for shorter-step-ahead forecasting, a multivariate LSTM approach is preferable to NARX. For longer terms of step-ahead forecasting, the NARX approach is preferable, especially under conditions of limited size of data like in our case. However, this suggestion is not final as the LSTM model can be further optimized as the predictors' sequence can be increased whenever larger data is available. With the larger number of data and further fine-tuning of the hyperparameters in LSTM, different suggestion may be proposed, i.e LSTM may outperform NARX for longer terms of step-ahead forecasting.

The last part of this section is to highlight the comparison between this paper and other cited studies for multi-step-ahead prediction in hydrological systems. This comparison is viewed from some aspects of model algorithms, data size and model performance. This comparative summary is given in Table 7. We can see that in terms of data size, our study has a much lower number of samples as compared with other studies. This is certainly our major concern for improvement in the future. In terms of algorithms, NARX Neural Networks are commonly used, while LSTM is still rarely explored especially for multi-step-ahead forcasting, i.e. not considering studies in single-step-ahead forecasting. There is one interesting recent study by Guo et al. (2021) pointing out that gradient boosting machine, a new type of machine learning algorithm, performs favourably for some step-ahead prediction. This application could be explored in the future. In terms of accuracy, our proposed method in few-step-ahead forecasting using LSTM particularly is generally on a par with or slightly better than some results from other studies.

Table 7

Comparative analysis between this study and other studies in multi-step-ahead prediction

Research studyOutput variableMethodsData sizeHighlights of results
Bhagwat & Maity (2012)  River flow; Narmada river (India) LS-SVR and ANN 2,556 samples for training The best NSE = 0.49 for two-step-ahead prediction using LS-SVR; Reasonably good up to 5-day-ahead predictions (NSE = 0.3). 
Chang et al. (2014)  Inundation level of flood; Yu–Cheng Pumping Station (Taipei City) ANN, Elman Networks and NARX Neural networks 1,985 samples for training and testing NARX Networks perform the best, producing coefficients of efficiency within 0.9–0.7 (scenario I) and 0.7–0.5 (scenario II) in the testing stages for 10–60-min-ahead forecasts. 
Guo et al. (2021)  River stage; Lan-Yang river basin (Lanyan, Simon and Kavalan stations at Taiwan) Optimized four ML techniques, namely, SVR, RFR, ANN and LBGM ≈7,500 samples for Simon and Lanyan; ≈4,000 samples for Kavalan All models demonstrate favourable performance in terms of of about 0.72 for 1–6 step-ahead forecasting at all stations. The LGBM model achieves more favourable prediction than SVR, RFR and ANN. 
Yu et al. (2011)  Water level; Heshui catchment, China Using eight different types of ANN training algorithms 4,749 samples for training and testing BFGS- and LM-trained ANN models gave the best performance among all of the prediction scenarios. Obtained a coefficient of determination of around for two-step-ahead forecasting and for five-step-ahead forecasting. 
This paper River flow; Kelantan river (Malaysia) NARX Neural networks and LSTM 348 samples for training and testing LSTM with a direct sequence-to-sequence produces NSE = 0.75 and NSE = 0.39 for two-step-ahead and five-step-ahead forecasting, respectively. 
Research studyOutput variableMethodsData sizeHighlights of results
Bhagwat & Maity (2012)  River flow; Narmada river (India) LS-SVR and ANN 2,556 samples for training The best NSE = 0.49 for two-step-ahead prediction using LS-SVR; Reasonably good up to 5-day-ahead predictions (NSE = 0.3). 
Chang et al. (2014)  Inundation level of flood; Yu–Cheng Pumping Station (Taipei City) ANN, Elman Networks and NARX Neural networks 1,985 samples for training and testing NARX Networks perform the best, producing coefficients of efficiency within 0.9–0.7 (scenario I) and 0.7–0.5 (scenario II) in the testing stages for 10–60-min-ahead forecasts. 
Guo et al. (2021)  River stage; Lan-Yang river basin (Lanyan, Simon and Kavalan stations at Taiwan) Optimized four ML techniques, namely, SVR, RFR, ANN and LBGM ≈7,500 samples for Simon and Lanyan; ≈4,000 samples for Kavalan All models demonstrate favourable performance in terms of of about 0.72 for 1–6 step-ahead forecasting at all stations. The LGBM model achieves more favourable prediction than SVR, RFR and ANN. 
Yu et al. (2011)  Water level; Heshui catchment, China Using eight different types of ANN training algorithms 4,749 samples for training and testing BFGS- and LM-trained ANN models gave the best performance among all of the prediction scenarios. Obtained a coefficient of determination of around for two-step-ahead forecasting and for five-step-ahead forecasting. 
This paper River flow; Kelantan river (Malaysia) NARX Neural networks and LSTM 348 samples for training and testing LSTM with a direct sequence-to-sequence produces NSE = 0.75 and NSE = 0.39 for two-step-ahead and five-step-ahead forecasting, respectively. 

LS-SVR, least-square-support vector regression; BFGS, Broyden–Fletcher–Goldfarb–Shanno; LM, Levenberg–Marquardt; LGBM, light gradient boosting machine; RFR, random forest regressor.

Four approaches for multi-step-ahead forecasting for the Kelantan RF in Malaysia using NARX neural networks and deep learning LSTM have been discussed. These approaches use different strategies to perform multi-step-ahead forecasting involving univariate and multivariate methods. The first approach is NARX neural networks using the recursive approach that gives acceptable performance with RMSE = 208.85 and NSE = 0.44 on test dataset. The second approach is NARX neural networks with direct multi-step-ahead prediction, which produces the highest NSE = 0.41 in four-step-ahead and nine-step-ahead forecasting in our experiment. The third approach is LSTM with a direct sequence-to-sequence prediction, which performs better for few-step-ahead forecasting, i.e. NSE = 0.78 for two-step-ahead forecasting. The performance, however, degrades quite significantly when longer-step-ahead is performed. The fourth approach is the LSTM univariate recursive method, which performs slightly better than the first approach. The third approach is promising for few-step-ahead prediction, but it needs larger data to build the model that is able to perform longer-step-ahead forecasting. Future work will involve collecting more data and also investigating optimization of hyperparameters of the machine learning/deep learning-based model such as using grid search or meta-heuristic optimization, which has been done earlier in another study (Hayder et al. 2020). Applications of ensemble and boosting machine learning such as random forest and gradient boosting algorithms can be explored as well. The application of this kind of study is important for water resource management and flood mitigation planning as Kelantan River has recorded some flooding events in the recent past.

This work is supported by the Ministry of Higher Education of Malaysia under Fundamental Research Grant Scheme (FRGS) with reference number FRGS/1/2019/TK01/UNITEN/02/5 (20190105FRGS).

All relevant data are included in the paper or its Supplementary Information.

Abbasimehr
H.
,
Shabani
M.
&
Yousefi
M.
2020
An optimized model using LSTM network for demand forecasting
.
Computers & Industrial Engineering
143
,
106435
.
Ardalani-Farsa
M.
&
Zolfaghari
S.
2010
Chaotic time series prediction with residual analysis method using hybrid Elman–NARX neural networks
.
Neurocomputing
73
(
13–15
),
2540
2553
.
Atiya
A. F.
,
El-Shoura
S. M.
,
Shaheen
S. I.
&
El-Sherif
M. S.
1999
A comparison between neural-network forecasting techniques-case study: river flow forecasting
.
IEEE Transactions on Neural Networks
10
(
2
),
402
409
.
Ben Taieb
S.
,
Bontempi
G.
,
Atiya
A. F.
&
Sorjamaa
A.
2012
A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition
.
Expert Systems with Applications
39
(
8
),
7067
7083
.
https://doi.org/10.1016/j.eswa.2012.01.039
.
Bhagwat
P. P.
&
Maity
R.
2012
Multistep-ahead river flow prediction using LS-SVR at daily scale
.
Journal of Water Resource and Protection
4
(
07
),
528
.
Chandra
R.
,
Goyal
S.
&
Gupta
R.
2021
Evaluation of deep learning models for multi-step ahead time series prediction. arXiv preprint arXiv:2103.14250
.
Chang
F. J.
,
Chen
P. A.
,
Lu
Y. R.
,
Huang
E.
&
Chang
K. Y.
2014
Real-time multi-step-ahead water level forecasting by recurrent neural networks for urban flood control
.
Journal of Hydrology
517
,
836
846
.
Dabrowski
J. J.
,
Zhang
Y.
&
Rahman
A.
2020
ForecastNet: a time-variant deep feed-forward neural network architecture for multi-step-ahead time-series forecasting
. In:
International Conference on Neural Information Processing
.
Springer
,
Cham
, pp.
579
591
.
Di Nunno
F.
,
Granata
F.
,
Gargano
R.
&
de Marinis
G.
2021
Prediction of spring flows using nonlinear autoregressive exogenous (NARX) neural network models
.
Environmental Monitoring and Assessment
193
(
6
),
1
17
.
https://doi.org/10.1007/s10661-021-09135-6
.
Di Piazza
A.
,
Di Piazza
M. C.
&
Vitale
G.
2016
Solar and wind forecasting by NARX neural networks
.
Renewable Energy and Environmental Sustainability
1
,
39
.
Faisal
N.
&
Gaffar
A.
2012
Development of Pakistan's new area weighted rainfall using Thiessen polygon method
.
Pakistan Journal of Meteorology
9
(
17
),
107
116
.
Granata
F.
&
Di Nunno
F.
2021
Forecasting evapotranspiration in different climates using ensembles of recurrent neural networks
.
Agricultural Water Management
255
,
107040
.
https://doi.org/10.1016/J.AGWAT.2021.107040
.
Guo
Y.
,
Xu
Y. P.
,
Sun
M.
&
Xie
J.
2021
Multi-step-ahead forecast of reservoir water availability with improved quantum-based GWO coupled with the AI-based LSSVM model
.
Journal of Hydrology
597
,
125769
.
https://doi.org/10.1016/j.jhydrol.2020.125769
Guo
W. D.
,
Chen
W. B.
,
Yeh
S. H.
,
Chang
C. H.
&
Chen
H.
2021
Prediction of river stage using multistep-ahead machine learning techniques for a tidal river of Taiwan
.
Water (Switzerland)
13
(
7
).
https://doi.org/10.3390/w13070920.
Hayder
G.
,
Solihin
M. I.
&
Mustafa
H. M.
2020
Modelling of river flow using particle swarm optimized cascade-forward neural networks: a case study of Kelantan River in Malaysia
.
Applied Sciences
10
(
23
),
8670
.
https://doi.org/10.3390/app10238670
.
Hernandez-Ambato
J.
,
Asqui-Santillan
G.
,
Arellano
A.
&
Cunalata
C.
2017
Multistep-ahead streamflow and reservoir level prediction using ANNs for production planning in hydroelectric stations
. In:
2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)
.
IEEE
, pp.
479
484
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
https://doi.org/10.1162/neco.1997.9.8.1735
.
Jia
H.
&
Zhou
X.
2020
Water quality prediction method based on LSTM-BP
. In:
2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC)
.
IEEE
, Vol.
1
, pp.
27
30
.
Kingma
D. P.
&
Ba
J.
2014
Adam: A method for stochastic optimization
.
arXiv preprint arXiv:1412.6980.
Kisi
O.
,
Shiri
J.
,
Karimi
S.
,
Shamshirband
S.
,
Motamedi
S.
,
Petković
D.
&
Hashim
R.
2015
A survey of water level fluctuation predicting in Urmia Lake using support vector machine with firefly algorithm
.
Applied Mathematics and Computation
270
,
731
743
.
Liu
P.
,
Wang
J.
,
Sangaiah
A. K.
,
Xie
Y.
&
Yin
X.
2019
Analysis and prediction of water quality using LSTM deep neural networks in IoT environment
.
Sustainability
11
(
7
),
2058
.
Marcjasz
G.
,
Uniejewski
B.
&
Weron
R.
2019
On the importance of the long-term seasonal component in day-ahead electricity price forecasting with NARX neural networks
.
International Journal of Forecasting
35
(
4
),
1520
1532
.
Nashwan
M. S.
,
Ismail
T.
&
Ahmed
K.
2018
Flood susceptibility assessment in Kelantan river basin using copula
.
International Journal of Engineering and Technology (UAE)
7
(
2
),
584
590
.
https://doi.org/10.14419/ijet.v7i2.8876
.
Noor
H. M.
,
Ndzi
D.
,
Yang
G.
&
Safar
N. Z. M.
2017
Rainfall-based river flow prediction using NARX in Malaysia
. In:
2017 IEEE 13th International Colloquium on Signal Processing & its Applications (CSPA)
.
IEEE
, pp.
67
72
.
Pena
M.
,
Vazquez-Patino
A.
,
Zhina
D.
,
Montenegro
M.
&
Aviles
A.
2020
Improved rainfall prediction through nonlinear autoregressive network with exogenous variables: a case study in Andes high mountain region
.
Advances in Meteorology
2020
.
https://doi.org/10.1155/2020/1828319.
Pradhan
B.
&
Youssef
A. M.
2011
A 100-year maximum flood susceptibility mapping using integrated hydrological and hydrodynamic models: Kelantan River Corridor, Malaysia
.
Journal of Flood Risk Management
4
(
3
),
189
202
.
https://doi.org/10.1111/j.1753-318X.2011.01103.x
.
Sarkar
R.
,
Julai
S.
,
Hossain
S.
,
Chong
W. T.
&
Rahman
M.
2019
A comparative study of activation functions of NAR and NARX neural network for long-term wind speed forecasting in Malaysia
.
Mathematical Problems in Engineering
2019
.
https://doi.org/10.1155/2019/6403081.
Saroha
S.
&
Aggarwal
S. K.
2014
Multi step ahead forecasting of wind power by genetic algorithm based neural networks
. In:
2014 6th IEEE Power India International Conference (PIICON)
.
IEEE
, pp.
1
6
.
Thapa
S.
,
Zhao
Z.
,
Li
B.
,
Lu
L.
,
Fu
D.
,
Shi
X.
,
Tang
B.
&
Qi
H.
2020
Snowmelt-driven streamflow prediction using machine learning techniques (LSTM, NARX, GPR, and SVR)
.
Water (Switzerland)
12
(
6
).
https://doi.org/10.3390/w12061734.
Tijani
I. B.
,
Akmeliawati
R.
,
Legowo
A.
&
Budiyono
A.
2014
Nonlinear identification of a small scale unmanned helicopter using optimized NARX network with multiobjective differential evolution
.
Engineering Applications of Artificial Intelligence
33
,
99
115
.
Vu
M. T.
,
Jardani
A.
,
Massei
N.
&
Fournier
M.
2021
Reconstruction of missing groundwater level data by using Long Short-Term Memory (LSTM) deep neural network
.
Journal of Hydrology
597
,
125776
.
https://doi.org/10.1016/j.jhydrol.2020.125776.
Yang
S.
,
Yang
D.
,
Chen
J.
&
Zhao
B.
2019
Real-time reservoir operation using recurrent neural networks and inflow forecast from a distributed hydrological model
.
Journal of Hydrology
579
,
124229
.
https://doi.org/10.1016/J.JHYDROL.2019.124229
.
Yu
J.
,
Qin
X.
&
Larsen
O.
2011
Multistep ahead water level forecasting using different artificial neural network training algorithms
. In:
Proceedings of the IASTED International Conference Environmental Management and Engineering (EME 2011)
,
Calgary, AB, Canada
.
doi:10.2316/P.2011.736-016
.
Zaini
N.
,
Malek
M. A.
,
Yusoff
M.
,
Mardi
N. H.
&
Norhisham
S.
2018
Daily river flow forecasting with hybrid support vector machine–particle swarm optimization
. In:
IOP Conference Series: Earth and Environmental Science
.
IOP Publishing
, Vol.
140
, No.
1
, p.
012035
.
Zhang
H.
&
Jin
K.
2020
Research on water quality prediction method based on AE-LSTM
. In
2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE)
.
IEEE
, pp.
602
606
.
Zhang
Q.
,
Li
Z.
,
Snowling
S.
,
Siam
A.
&
El-Dakhakhni
W.
2019
Predictive models for wastewater flow forecasting based on time series analysis and artificial neural network
.
Water Science and Technology
80
(
2
),
243
253
.
Zhou
J.
,
Wang
Y.
,
Xiao
F.
,
Wang
Y.
&
Sun
L.
2018
Water quality prediction method based on IGRA and LSTM
.
Water
10
(
9
),
1148
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).