Abstract
Streamflow forecasting is essential for planning, designing, and managing watershed systems. This research study investigates the use of artificial neural networks (ANN), recurrent neural networks (RNN), and adaptive neuro-fuzzy inference systems (ANFIS) for monthly streamflow forecasting in the Hunza River Basin of Pakistan. Different models were developed using precipitation, temperature, and discharge data. Two statistical performance indicators, i.e., root mean square error (RMSE) and coefficient of determination (R2), were used to assess the performance of machine learning techniques. Based on these performance indicators, the ANN model predicts monthly streamflow more accurately than the RNN and ANFIS models. To assess the performance of the ANN model, three architectures were used, namely 2-1-1, 2-2-1, and 2-3-1. The ANN architecture with a 2-3-1 configuration had higher R2 values of 0.9522 and 0.96998 for the training and testing phases, respectively. For each RNN architecture, three transfer functions were used, namely Tan-sig, Log-sig, and Purelin. The architecture with a 2-1-1 configuration based on tan-sig transfer function performed well in terms of R2 values, which were 0.7838 and 0.8439 for the training and testing phases, respectively. For the ANFIS model, the R2 values were 0.7023 and 0.7538 for both the training and testing phases, respectively. Overall, the findings suggest that the ANN model with a 2-3-1 architecture is the most effective for predicting monthly streamflow in the Hunza River Basin. This research can be helpful for planning, designing, and managing watershed systems, particularly in regions where streamflow forecasting is crucial for effective water resource management.
HIGHLIGHTS
ANN, RNN, and ANFIS models were used to predict monthly streamflow in the Hunza River Basin, Pakistan.
Temperature and precipitation data were used as inputs for predicting streamflow.
Various transfer functions and architectures were used to evaluate the performance of the AI models.
The models' performance was assessed using RMSE and R2 values.
The ANN model with a 2-3-1 architecture outperformed RNN and ANFIS models in predicting monthly streamflow.
INTRODUCTION
Accurate streamflow forecasting is essential for effective hydrology and water resource management (Yaseen et al. 2019; Chang & Guo 2020). River flow forecasting plays a significant role in managing reservoir outflows during flood and drought conditions, making it a valuable tool for water resource management (Pallavi et al. 2022). Reliable streamflow forecasts are necessary for implementing proper management strategies, such as hydroelectric project design, real-time operation of water resource projects, efficient management tactics, and proactive mitigation efforts to reduce the environmental impact of climatic events (Meng et al. 2021; Roy & Roy 2022).
To generate accurate streamflow forecasts, various hydrological models have been developed and employed. These models fall under three categories: empirical, conceptual, or physically based (Abubaker 2016; Tegegne et al. 2020; Cho & Kim 2022). Physical models, also known as process-based or mechanistic models, are based on an understanding of the physics related to hydrological processes. They have a logical structure that closely mirrors real-world systems and require a large number of physical and process parameters for calibration. In contrast, lumped models use a limited number of input parameters (Zhang 2019; Jaiswal et al. 2020; Lees et al. 2021).
In the past, various statistical methods were utilized for time series-based hydrologic forecasting (Thiesen et al. 2019). Some statistical models, such as the simple regression model, multiple regression models, and autoregressive moving average (ARMA), have been employed for streamflow forecasting (Mahmoudi & Baroumand 2022). These models use classical statistics to analyze historical data and develop strategies for predicting streamflow (Ghimire et al. 2021). However, these models may not always be accurate since they cannot capture the nonlinear dynamics that occur during the transformation of rainfall to runoff.
The artificial neural network (ANN) is a semi-parametric regression estimator that is commonly used for streamflow predictions (McBride et al. 2020). The introduction of neural network technology has led to several promising outcomes in hydrology and water resource simulation. In the past decade, fuzzy logic has been employed in a few applications for water resource forecasting (Kambalimath & Deka 2020). Many studies have demonstrated that data-driven methodologies can be utilized to simulate hydrologic processes, including rainfall–runoff forecasting, flash flood forecasting, and surge water level prediction (Sanders et al. 2022).
The popularity of ANN models has increased due to their ability to describe both linear and nonlinear systems, without requiring the assumptions inherent in most traditional statistical techniques (Onyelowe et al. 2022). ANNs have been successfully used to estimate river flow in various hydrologic situations (Imrie et al. 2000). Lumped models, including ANNs, are highly advantageous for streamflow forecasting under extreme conditions, such as predicting the peaks of streamflow. Therefore, we used an ANN to forecast streamflow for the Hunza river basin.
Artificial intelligence (AI) modeling systems outperform traditional modeling in several ways, including the ability to handle large volumes of noisy data from nonlinear and dynamic systems, especially when the fundamental physical correlations are unknown (Anusree & Varghese 2016). These strategies have proven to be effective when used individually, and their respective capabilities can be integrated efficiently to build strong intelligent systems (Tascikaraoglu & Uzunoglu 2014).
Jing et al. (2019) studied the ability of AI to predict and simulate streamflow, while Drisya et al. (2021) used the feed-forward neural network (FFNN) and the recurrent neural network (RNN) to predict monthly streamflow. Partal (2009) investigated the potential of wavelet, Feed Forward Back Propagation (FFBP), radial basis function network (RBFN), and Generalized Regression Neural Network (GRNN) in simulating streamflow. Carcano et al. (2008) assessed the ability of the RNN to simulate the daily reconstruction of streamflow sequences using precipitation and temperature data. Additionally, Chang et al. (2004) utilized the RNN to predict streamflow in Taiwan's Da-Chia River.
Neuro-fuzzy systems, which combine neural networks and fuzzy logic, are a new research field that has emerged in recent years. These systems offer the benefits of both approaches in a single framework (Nagarajan & Thirunavukarasu 2022). They address the primary issue with fuzzy systems by effectively using the learning capability of ANNs (Mitra & Hayashi 2000). Neuro-fuzzy systems have numerous applications, such as signal processing, information retrieval, automated control, and database management (Subasi 2007).
Deep learning algorithms such as RNNs have been applied to predict streamflow due to their excellent learning capabilities in time series data. RNNs can remember previous inputs and make decisions based on both previous and current input. However, one limitation of RNN is that it may have difficulty retrieving information from long-term previous layers. This limitation is mainly due to the absence of activation functions in the recurrent components of the RNN architecture. Several studies have investigated this issue and proposed solutions, including the use of more complex RNN architectures and alternative deep learning algorithms such as long short-term memory (LSTM) and gated recurrent unit (GRU), which have been shown to outperform traditional RNN in some cases (Bodapati et al. 2021; Nguyen et al. 2021; Torres et al. 2021; Zeebaree et al. 2021; Kilinc 2022).
The main objective of this research work is to predict the streamflow of the Hunza river basin in Pakistan using machine learning techniques such as ANN, RNN, and adaptive neuro-fuzzy inference system (ANFIS) models and compare their results.
STUDY AREA DESCRIPTION, DATA COLLECTION, AND METHODS
Study area description
Data collection
This research study is based on the following datasets, whose details are given in the following.
Digital elevation model
Digital elevation model (DEM) data for the study area have been downloaded from the National Aeronautics and Space Administration (NASA) website (https://earthdata.nasa.gov/). The resolution of the Global Digital Elevation Model (GDEM) is 30 m × 30 m. The DEM data have been used for watershed delineation, and the delineated watershed is demonstrated in Figure 1.
Hydroclimatic data sets
Meteorological data, which include mean monthly precipitation and temperature, were collected from the Pakistan Meteorological Department (PMD) from 1985 to 2014. Similarly, hydrological data, which include monthly streamflow, were obtained from the Water and Power Development Authority (WAPDA) between 1985 and 2014. The mean monthly precipitation and temperature data were used as inputs, while the monthly discharge data were used as outputs for streamflow forecasting.
Methods
Artificial neural network
An ANN is a computational model consisting of interconnected nodes called neurons, inspired by the structure of the human brain (Figure 3). The neurons are connected to each other through weighted links and perform complex and nonlinear computations. The network learns through a training algorithm that adjusts the weights of the connections to minimize the error between predicted and observed values. The ANN models in this study were evaluated using different architectures and statistical performance indicators, including RMSE and R2. The training algorithm minimizes the global error, which is defined as the average of errors for all training combinations, where the error is the difference between predicted and observed values. The selected ANN architecture produced the most accurate predictions of monthly streamflow.
Recurrent neural network
In this study, the RNN model was used to predict monthly streamflow in the Hunza river basin. RNNs are different from traditional neural networks as they allow for the previous time step output to become the input of the current time step (Figure 4). This feature enables RNNs to capture the temporal dependencies between data points, which is essential for time series forecasting. The RNN model was trained using historical temperature and precipitation data to predict monthly streamflow. The model was trained using a backpropagation algorithm to minimize the error between the predicted and actual streamflow values. The performance of the RNN model was evaluated using statistical metrics such as RMSE and R2.
ANFIS
MODEL PERFORMANCE
The model's performance was assessed via two statistical performance indicators, namely RMSE and R2. RMSE computes the error of any model in predicting the observed discharge data. The R2 evaluates the relationship between the observed and predicted discharge values.
- I.
- II.
RESULTS AND DISCUSSION
This study focuses on using two input parameters, mean monthly precipitation and temperature, and one output parameter, mean monthly discharge, with a 30-year historical record. The data were divided into two sets, with 70% used for training and 30% for testing. The goal of the training process was to find the best model performance using performance indicators like R2 and RMSE. The models were developed in two phases: the training phase and the testing phase. The model's actual performance was evaluated using RMSE and R2 values. Tables 1–3 list the results for all three models developed.
Input . | Output . | RMSE . | R2 . | ||
---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | ||
Precipitation and temperature | Discharge | 238.0875 | 253.2569 | 0.7023 | 0.7538 |
Input . | Output . | RMSE . | R2 . | ||
---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | ||
Precipitation and temperature | Discharge | 238.0875 | 253.2569 | 0.7023 | 0.7538 |
Input . | Output . | Transfer Function . | Architecture . | RMSE . | R2 . | ||
---|---|---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | ||||
Precipitation and temperature | Discharge | Tan-sig | 2-1-1 | 244.231 | 239.431 | 0.7838 | 0.8439 |
2-2-1 | 243.2001 | 235.8019 | 0.6805 | 0.71 | |||
2-3-1 | 245.9342 | 238.5462 | 0.6649 | 0.7881 | |||
Log-sig | 2-1-1 | 246.2976 | 234.1649 | 0.7001 | 0.742 | ||
2-2-1 | 242.193 | 234.696 | 0.6835 | 0.8153 | |||
2-3-1 | 240.6467 | 233.7322 | 0.6899 | 0.7486 | |||
Purelin | 2-1-1 | 247.3211 | 238.7068 | 0.6529 | 0.7184 | ||
2-2-1 | 249.313 | 235.549 | 0.6376 | 0.7317 | |||
2-3-1 | 248.0145 | 238.4094 | 0.6524 | 0.7253 |
Input . | Output . | Transfer Function . | Architecture . | RMSE . | R2 . | ||
---|---|---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | ||||
Precipitation and temperature | Discharge | Tan-sig | 2-1-1 | 244.231 | 239.431 | 0.7838 | 0.8439 |
2-2-1 | 243.2001 | 235.8019 | 0.6805 | 0.71 | |||
2-3-1 | 245.9342 | 238.5462 | 0.6649 | 0.7881 | |||
Log-sig | 2-1-1 | 246.2976 | 234.1649 | 0.7001 | 0.742 | ||
2-2-1 | 242.193 | 234.696 | 0.6835 | 0.8153 | |||
2-3-1 | 240.6467 | 233.7322 | 0.6899 | 0.7486 | |||
Purelin | 2-1-1 | 247.3211 | 238.7068 | 0.6529 | 0.7184 | ||
2-2-1 | 249.313 | 235.549 | 0.6376 | 0.7317 | |||
2-3-1 | 248.0145 | 238.4094 | 0.6524 | 0.7253 |
Bolded results are better than others.
Input . | Output . | Architecture . | RMSE . | R2 . | ||
---|---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |||
Precipitation and temperature | Discharge | 2-1-1 | 130.9195 | 131.7096 | 0.92219 | 0.93373 |
2-2-1 | 154.8418 | 121.8403 | 0.90794 | 0.92339 | ||
2-3-1 | 94.7059 | 99.3894 | 0.9522 | 0.9699 |
Input . | Output . | Architecture . | RMSE . | R2 . | ||
---|---|---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |||
Precipitation and temperature | Discharge | 2-1-1 | 130.9195 | 131.7096 | 0.92219 | 0.93373 |
2-2-1 | 154.8418 | 121.8403 | 0.90794 | 0.92339 | ||
2-3-1 | 94.7059 | 99.3894 | 0.9522 | 0.9699 |
Results of the ANFIS model
An ANFIS model was developed using two input parameters (mean monthly precipitation and temperature) and one output parameter (mean monthly discharge). The performance of the model was evaluated using RMSE and R2 values for the training and testing phases. The RMSE for the ANFIS model was found to be 238.0875 for the training phase and 253.257 for the testing phase. Similarly, the R2 value for the training phase was 0.7023, and for the testing phase, it was 0.7538. The results of the ANFIS model are presented in Table 1.
Results of the RNN model
To assess the performance of the RNN model, three transfer functions were used: Tan-sig, Log-sig, and Purelin. These functions generate output between 0 and 1 as the neurons’ net input goes from negative to positive infinity. Among the three, the Tan-sig function gave better results in terms of R2 values for both the training and testing phases. Specifically, the 2-1-1 architecture using the Tan-sig function produced the best R2 values of 0.7838 and 0.8439 for the training and testing phases, respectively. On the other hand, for Log-sig and Purelin, the R2 values were 0.8153 and 0.7317, respectively, for the testing phase.
Results of the ANN model
To evaluate the performance of the ANN model, three different architectures were tested: 2-1-1, 2-2-1, and 2-3-1. Among these architectures, the 2-3-1 architecture produced the best R2 values of 0.9522 and 0.9699 for the training and testing phases, respectively. The ANN model outperformed both the ANFIS and RNN models in terms of model efficiency for both the training and testing data (Dalkiliç & Hashimi 2020).
IMPLICATIONS FOR POLICY AND SUSTAINABLE DEVELOPMENT
Hydrological forecasting is essential for making informed decisions about managing and mitigating severe events, producing hydropower, running water systems, allocating water resources, and watering crops, among others. Hydrological forecasts translate meteorological variables into hydrological variables of interest, such as snowmelt, streamflow, and river stage. In this study, we utilized three models, namely ANN, RNN, and ANFIS techniques, for streamflow forecasting in the Hunza river basin. We evaluated these models and determined that the ANN is the most effective predictive model for this area. The use of this technique can provide potentially valuable information about future streamflow dynamics, which can be beneficial for managing and mitigating extreme events, generating hydropower, operating water systems, planning water resources, and watering crops. Watershed managers and policymakers can utilize this predicted flow information to develop water policies for the Hunza river basin that are based on accurate and reliable data.
CONCLUSION
The results of this study indicate that the ANN model with a 2-3-1 architecture performs the best in predicting streamflow in the Hunza river basin, outperforming both the RNN and ANFIS models. The high R2 values for both training and testing phases suggest that the ANN model is effective in capturing the complex relationships between hydro-meteorological variables and streamflow. The findings of this study highlight the potential of using ANN models in hydrological forecasting for effective water resource management and planning. The use of accurate streamflow predictions can assist in mitigating the impacts of extreme events such as floods and droughts, enhancing hydropower generation, optimizing water system operations, and supporting crop watering. Overall, the findings of this research can aid in bringing sustainability to the water environment under climate change and can be useful for water experts and policymakers in future water-related planning.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.