Abstract
This study investigates changes in river flow patterns, in the Hunza Basin, Pakistan, attributed to climate change. Given the anticipated rise in extreme weather events, accurate streamflow predictions are increasingly vital. We assess three machine learning (ML) models – artificial neural network (ANN), recurrent neural network (RNN), and adaptive fuzzy neural inference system (ANFIS) – for streamflow prediction under the Coupled Model Intercomparison Project 6 (CMIP6) Shared Socioeconomic Pathways (SSPs), specifically SSP245 and SSP585. Four key performance indicators, mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2), guide the evaluation. These models employ monthly precipitation, maximum and minimum temperatures as inputs, and discharge as the output, spanning 1985–2014. The ANN model with a 3-10-1 architecture outperforms RNN and ANFIS, displaying lower MSE, RMSE, MAE, and higher R2 values for both training (MSE = 20417, RMSE = 142, MAE = 71, R2 = 0.94) and testing (MSE = 9348, RMSE = 96, MAE = 108, R2 = 0.92) datasets. Subsequently, the superior ANN model predicts streamflow up to 2100 using SSP245 and SSP585 scenarios. These results underscore the potential of ANN models for robust futuristic streamflow estimation, offering valuable insights for water resource management and planning.
HIGHLIGHTS
The ANN, RNN, and ANFIS models were used to predict streamflow under the CMIP6 SSP245 and SSP585 scenarios.
The ANN model outperforms both RNN and ANFIS, with R2 values of 0.94 for training and 0.92 for testing.
INTRODUCTION
Anthropogenic activities leading to global warming have profound effects on both precipitation patterns and air temperatures, resulting in significant alterations in streamflow (Mahdian et al. 2023; Singh et al. 2023). It has altered the hydrology of numerous rivers in Asia, including Pakistan (Kiran et al. 2023). A study conducted by Khan et al. (2022) used 86 discharge monitoring stations located in all major rivers of Pakistan have found that a 10% increase in precipitation and temperature can result in a 10–35% increase in river flow. Rizwan et al. (2023) found that under various climate change scenarios, namely RCP2.6, RCP4.5, and RCP8.5, future streamflow in the Kabul and Upper Indus basins of Pakistan is expected to increase. By the end of the 21st century, the annual mean precipitation and temperature in Pakistan under various Shared Socioeconomic Pathway (SSP) scenarios are projected to increase by 1.4–4.9 °C and 26.4–159.7%, respectively (Almazroui et al. 2020). Variations in precipitation and temperature can alter the streamflow and present challenges for water management. Therefore, it is important to investigate the hidden hydrological dynamics that occur within the basin under various SSP scenarios.
To predict precise streamflow, researchers have developed and utilized various hydrological models, which can be classified into two categories: empirical or lumped, and physical-based models (Cho & Kim 2022; Islam et al. 2023). Physical-based models, also known as process-based or mechanistic models, are built based on a comprehensive understanding of the physics governing hydrological processes. These models produce statistically significant results but are data-intensive and have long computation times (Yang et al. 2019). On the other hand, lumped models, such as artificial neural network (ANN), recurrent neural network (RNN), and adaptive fuzzy neural inference system (ANFIS), are widely used for rainfall-runoff and streamflow forecasting owing to their low data requirements (Moretta et al. 2023). Lumped models have been criticized for their inability to capture the hidden nonlinearity of the streamflow. However, the recent evolution in deep learning and machine learning (ML) has significantly improved its capability to model the dynamic nature of rainfall and runoff relationships (Razavi 2021; Sobieraj et al. 2022).
The popularity of ANN models has grown significantly because of their capability to represent both linear and nonlinear systems without relying on the assumptions inherent in traditional statistical techniques (Onyelowe et al. 2023). ANNs have showcased successful applications in estimating river flow in various hydrologic scenarios. ANNs offer significant advantages for streamflow forecasting, particularly in extreme conditions, such as predicting peak streamflow. Hence, in this study, we employed the ANN model to forecast the streamflow in the Hunza River Basin. ANN serves as a semi-parametric regression estimator that is widely employed for streamflow predictions (Souaissi et al. 2023). The integration of neural network technology has yielded promising results for hydrological and water resource simulations. In recent years, fuzzy logic has also been applied to water resource forecasting (Gunal & Mehdi 2023). Several studies have demonstrated the effectiveness of data-driven methodologies in simulating various hydrological processes, including rainfall-runoff forecasting, flash flood forecasting, and surge water level prediction (Sanders et al. 2022).
In recent years, a novel research field known as neuro-fuzzy systems has emerged, which combines the strengths of neural networks and fuzzy logic (Nagarajan & Thirunavukarasu 2022). This framework offers the advantages of both approaches within a single system. Neuro-fuzzy systems effectively address the primary limitations of fuzzy systems by harnessing the learning capabilities of ANN and finding extensive applications across various domains such as signal processing, information retrieval, automated control, and database management. The integration of neural networks and fuzzy logic enables improved modeling and decision-making processes in diverse fields (Javaheri et al. 2023).
Deep learning algorithms, particularly RNNs, have gained significant attention for streamflow prediction because of their strong learning capabilities for handling time series data. RNNs can retain information from past inputs and make decisions based on both the current and previous inputs. However, a drawback of RNNs is their difficulty in effectively retrieving information from the previous long-term layers. This limitation stems from the absence of activation functions in the recurrent components of RNN architecture. To address this issue, researchers have explored various solutions, including the adoption of more complex RNN architectures and alternative deep learning algorithms such as long short-term memory and gated recurrent units. These alternative algorithms have demonstrated superior performance compared to traditional RNNs in certain scenarios (Bodapati et al. 2021; Nguyen et al. 2021; Torres et al. 2021; Zeebaree et al. 2021; Kilinc 2022).
The utilization of ML models for streamflow simulations is typically limited to observable time periods and subsequent forecasts (Singh et al. 2023). Very few studies worldwide have used these models to predict long-term futuristic streamflow using Coupled Model Intercomparison Project 6 (CMIP6) models (Ma et al. 2023). Das & Nanduri (2018) used ML models in combination with the CMIP5 to project monthly monsoon streamflow for the Wainganga Basin, India. The CMIP3 and CMIP5 exhibit limitations in accurately simulating extreme precipitation events, which play a major role in shaping the runoff generation within catchments (Singh et al. 2023). Inadequate simulation of extreme precipitation by CMIP3 and CMIP5 introduces significant uncertainties in streamflow predictions. This motivated the authors to integrate CMIP6 with ML models for long-term streamflow prediction. These models are expected to produce more realistic results than previous models because they have demonstrated improvements in accurately representing historical records of rainfall and temperature.
This study, for the first time, examined the potential of three ML models, namely ANN, RNN, and ANFIS, for long-term streamflow prediction over the Hunza River Basin, Pakistan, using different SSP CMIP6 scenarios. Initially, we trained and tested the ML models using observed data (1985–2014) and assessed their accuracy using various statistical indicators. Subsequently, we feed the downscaled, bias-corrected, and ensembled data to the best-performing model to project the future streamflow up to 2100 under the SSP245 and SSP585 scenarios. The findings of this study can be used for water resource planning and management in the region.
STUDY AREA DESCRIPTION, DATA COLLECTION, AND METHODS
Study area description
Data collection
This study is based on the following datasets, the details of which are provided below.
Digital elevation model
Digital elevation model (DEM) data for the study area were downloaded from the National Aeronautics and Space Administration (https://www.earthdata.nasa.gov/learn/find-data). The resolution of the DEM is 30 m × 30 m. The DEM data were used for watershed delineation. The delineated watershed is shown in Figure 1.
Hydroclimatic data sets
General circulation models (GCMs) data
This study utilized precipitation and temperature data, including minimum and maximum temperatures, obtained from 10 general circulation models (GCMs) sourced from the CMIP6 archive (https://esgf-node.llnl.gov/projects/cmip6). The details of these models are listed in Table 1.
Model name . | Country . | Latitude resolution (degree) . | Longitude resolution (degree) . | Description . | Institution/Agency . |
---|---|---|---|---|---|
CMCC-ESM2 | Italy | 2 | 2.8 | Italian research institution | Euro-Mediterranean Center on Climate Change |
MRI-ESM2-0 | Japan | 1.12 | 1.12 | Meteorological Research Institute Earth System Model Version 2.0 | Meteorological Research Institute |
CNRM-CM6-1 | France | 1.4 | 1.4 | Centre National de Recherches Météorologiques Coupled Global Climate Model, version 5 | Centre National de Recherches Meteorologiques/Centre Europeen de Recherche et Formation Avancees en Calcul Scientifique |
INM-CM5-0 | Russia | 1.5 | 2 | Institute of Numerical Mathematics Coupled Model, version 5 | Russian Institute of Numerical Mathematics, Russian Academy of Science |
CNRM-ESM2-1 | France | 1.4 | 1.4 | Centre National de Recherches Météorologiques Coupled Global Climate Model, version 5 | Centre National de Recherches Meteorologiques/Centre Europeen de Recherche et Formation Avancees en Calcul Scientifique |
EC-Earth3-Veg-LR | Europe | 0.70 | 0.70 | EC-Earth Earth System Model Version 3 with Dynamic Vegetation Component | EC-Earth Consortium |
INM-CM4-8 | Russia | 1.5 | 2.0 | Institute of Numerical Mathematics Coupled Model, version 4 | Russian Institute of Numerical Mathematics |
1NESM3 | China | 2 | 2 | Nanjing University of Information Science and Technology | Nanjing University |
MPI-ESM1-2-LR | Germany | 1.87 | 1.87 | Max Planck Institute for Meteorology Earth System Model version 1.2 Low Resolution | Max Planck Institute for Meteorology |
MIROC6 | Japan | 1.4 | 1.87 | Model for Interdisciplinary Research on Climate, version 6 | Japan Agency for Marine Earth Science and Technology, Atmosphere and Ocean Research Institute (The University of Tokyo), and National Institute for Environmental Studies |
Model name . | Country . | Latitude resolution (degree) . | Longitude resolution (degree) . | Description . | Institution/Agency . |
---|---|---|---|---|---|
CMCC-ESM2 | Italy | 2 | 2.8 | Italian research institution | Euro-Mediterranean Center on Climate Change |
MRI-ESM2-0 | Japan | 1.12 | 1.12 | Meteorological Research Institute Earth System Model Version 2.0 | Meteorological Research Institute |
CNRM-CM6-1 | France | 1.4 | 1.4 | Centre National de Recherches Météorologiques Coupled Global Climate Model, version 5 | Centre National de Recherches Meteorologiques/Centre Europeen de Recherche et Formation Avancees en Calcul Scientifique |
INM-CM5-0 | Russia | 1.5 | 2 | Institute of Numerical Mathematics Coupled Model, version 5 | Russian Institute of Numerical Mathematics, Russian Academy of Science |
CNRM-ESM2-1 | France | 1.4 | 1.4 | Centre National de Recherches Météorologiques Coupled Global Climate Model, version 5 | Centre National de Recherches Meteorologiques/Centre Europeen de Recherche et Formation Avancees en Calcul Scientifique |
EC-Earth3-Veg-LR | Europe | 0.70 | 0.70 | EC-Earth Earth System Model Version 3 with Dynamic Vegetation Component | EC-Earth Consortium |
INM-CM4-8 | Russia | 1.5 | 2.0 | Institute of Numerical Mathematics Coupled Model, version 4 | Russian Institute of Numerical Mathematics |
1NESM3 | China | 2 | 2 | Nanjing University of Information Science and Technology | Nanjing University |
MPI-ESM1-2-LR | Germany | 1.87 | 1.87 | Max Planck Institute for Meteorology Earth System Model version 1.2 Low Resolution | Max Planck Institute for Meteorology |
MIROC6 | Japan | 1.4 | 1.87 | Model for Interdisciplinary Research on Climate, version 6 | Japan Agency for Marine Earth Science and Technology, Atmosphere and Ocean Research Institute (The University of Tokyo), and National Institute for Environmental Studies |
Methods
Artificial neural network (ANN)
ANN is a computational model composed of interconnected nodes or neurons, inspired by the way the human brain works. These nodes are organized into layers and are designed to process and transmit information, making them capable to learn from the observed data and adjust the connection weights to minimize the error between the predicted and observed values. In this study, various ANN architectures were used, and their performance was assessed via statistical performance indicators, namely MSE, RMSE, MAE, and R². The model training phase minimizes the global error, calculated as the average error across all training combinations, where the error represents the discrepancy between the predicted and observed values. The selected ANN architecture can yield the most accurate predictions of the futuristic monthly streamflow. The ANN has advantages in uncertainty quantification (UQ) over RNNs and ANFIS. Methods such as Monte Carlo Dropout and Bayesian Neural Networks can be employed in ANNs to produce more reliable uncertainty estimates, which is a valuable feature in hydrological modeling (Sharma & Machiwal 2021; Ghiasi et al. 2022). In contrast, RNNs and ANFIS may have limitations in effectively quantifying uncertainty.
Recurrent neural network (RNN)
RNN, a subclass of ANN models, evolved from feed-forward networks (FFNs). In RNNs, the connections among nodes create a temporal sequence, making them robust for processing variable-length input sequences by utilizing their internal memory. RNNs excel in handling sequential or time series data and can effectively perform data classification tasks through contextual information extraction. The RNN structure comprises successive recurrent layers, distinguishing it from the traditional FFNs. Unlike FFNs, which assign weights solely to input parameters, RNN algorithms leverage their internal memory to allocate weights to both current and preceding inputs, thereby enhancing their capacity for sequence-based tasks. RNN models are particularly advantageous because of the presence of recurrent loops in their hidden layers, which significantly enhances their training capabilities. These loops enable the model to retain and utilize information from previous time steps, making it well-suited for time series forecasting tasks, such as predicting monthly streamflow (Khosravi et al. 2023). RNNs are a suitable choice for time series forecasting, making them a good fit for predicting the monthly streamflow. Their unique ability to model sequential data and capture temporal dependencies aligns with this problem. In terms of the UQ, techniques such as Bayesian RNNs or dropout-based uncertainty estimation can be applied to provide valuable uncertainty estimates for RNN predictions. This UQ advantage aids in improving the reliability of streamflow forecasts (Zhang et al. 2022). In this study, the RNN model was trained using historical temperature (maximum and minimum) and precipitation data to predict monthly streamflow. The training process involved a backpropagation algorithm to minimize the error between the predicted and actual streamflow values. Statistical metrics, such as MSE, RMSE, MAE, and R², were used to assess the model's performance in capturing streamflow patterns in the Hunza River Basin.
Adaptive fuzzy neural inference system (ANFIS)
The ANFIS model is commonly used to establish the relationships between multiple variables. It follows a fuzzy Sugeno structure, with a forwarding network architecture consisting of five layers. Each layer serves a specific function, from adapting nodes based on input variables to computing the final output value. ANFIS is particularly suitable for problems with fuzzy input variables and changes in input data over time. Using ANFIS, a more accurate and comprehensive understanding of the relationship between the variables can be obtained, which is valuable for forecasting future discharge levels based on changes in precipitation and temperature. ANFIS is a suitable choice for this study because it can handle fuzzy input variables and adapt to changing data, offering a unique advantage over other ML models for managing uncertain and nonlinear relationships. In addition, ANFIS is a robust approach to UQ. Techniques such as Monte Carlo simulations using fuzzy rules or incorporating uncertainty information into membership functions provide an effective UQ for ANFIS-based predictions, enhancing their reliability in decision-making and risk assessment in comparison to other ML models (Khazaee Poul et al. 2019; Rahmati et al. 2020).
Model performance
The performances of the ML models were assessed using four statistical performance indicators: MSE, RMSE, MAE, and R2 (Moriasi et al. 2007; Adnan et al. 2020; Yeganeh-Bakhtiary et al. 2023).
- I.
- II.
- III.
- IV.Coefficient of determination (R2): The coefficient of determination or R-squared represents the proportion of variance in the dependent variable, which is explained by the linear regression model. It is a scale-free score, that is, irrespective of whether the values are small or large, the value of R square will be less than one. It is expressed as follows:
Bias correction of GCM
Rating metric, Taylor skill-score, and Taylor diagram
In this study, three techniques were employed to assess and select the most suitable combination of GCMs for constructing MMEs. These techniques include RM, TSS, and Taylor diagrams. By employing these techniques, this study aimed to identify the combination of GCMs that exhibited the highest skill and accuracy in reproducing the observed data, ensuring the selection of the most suitable models for the computation of MMEs.
The RM assesses and ranks the performance of the GCMs based on their similarity to the observed data. This metric considers a range of statistical measures and performance indicators, including correlation coefficients, bias, and RMSE, to evaluate the overall quality and reliability of GCMs. This aids in identifying the GCMs that exhibit the closest resemblance to the observed streamflow data, allowing for the selection of the most accurate models.
The TSS is a statistical measure used to quantify the similarity between model simulations and observed data. It considers various aspects, such as pattern, variability, and amplitude, to evaluate the ability of different GCMs to replicate the observed patterns. The TSS provides a robust basis for selecting the most reliable models for streamflow prediction in the study area.
The Taylor diagram is a graphical representation used to assess the performance of different GCMs based on their agreement with observed data. It provides a comprehensive visualization of multiple statistical measures simultaneously, including the correlation coefficient and the standard deviation ratio. This diagram allows for the comparison of GCMs in terms of their pattern, variability, and amplitude, thereby providing a holistic understanding of their performance.
By employing these techniques, this study aimed to identify the most suitable combination of GCMs for constructing reliable MMEs that can enhance the accuracy and robustness of streamflow predictions in the research area.
MMEs using RF
In this study, the MME approach integrates ML techniques, particularly RF algorithms, to improve the reliability of GCM predictions. The RF algorithm is a powerful ensemble-learning method that combines multiple decision trees to create robust predictive models.
In the context of the GCM, we used the RF technique to generate an ensemble of predictions by training individual decision trees on a subset of GCM outputs. Each decision tree is trained using a different subset of GCMs given their strengths and weaknesses. By combining predictions from multiple decision trees, RF algorithms produce more accurate and robust ensemble predictions. An advantage of using the RF approach in a multimodal ensemble is its ability to handle complex interactions and nonlinear relationships between different climate variables. The complex dynamics within the GCM output can be captured, resulting in better predictions and reduced uncertainty.
RESULTS
The above results clearly show that the statistical indicators for precipitation are extremely low rendering them inappropriate for subsequent analysis. To solve this problem, MME was computed via RF based on bias-corrected GCM models which enhance the similarity between the observed and GCM data. Figure 12(c) and 12(d) presents the results of our MME analysis, demonstrating the effectiveness of the RF algorithm in bridging the gap between the observed and the GCM data. Several statistical indicators, such as NSE, R2, and RMSE, were used to assess the accuracy of the model in replicating observed climate patterns. For precipitation under SSP245 and SSP585, the R2, RMSE, and NSE values were (0.48, 2.09, 0.47) and (0.54, 1.95, 0.52), respectively. For maximum temperature under SSP245 and SSP585, the R2, RMSE, and NSE values were (0.95, 1.32, 0.95) and (0.95, 1.40, 0.95), respectively. For minimum temperature under SSP245 and SSP585, the R2, RMSE, and NSE values were (0.92, 3.85, 0.92) and (0.91, 4.13, 0.91), respectively.
These results demonstrate the successful application of the MME with the RF algorithm as a powerful tool for enhancing the resemblance between the observed and GCM data, ultimately contributing to the advancement of our understanding of regional climate dynamics (Ahmed et al. 2019). In addition to the RF algorithm, various other ML techniques can be utilized for similar purposes. For a more in-depth exploration of these alternative methods, please refer to the study conducted by Ahmed et al. (2020). This source provides comprehensive insights into a range of ML approaches for addressing the same research objectives.
Moreover, there are anticipated increases in average precipitation in the future as well. The domain-averaged precipitation over the Hunza River Basin was projected to surge by 12.2 and 36.1% under the SSP245 and SSP585 scenarios, respectively. Previous studies (Almazroui et al. 2020; Abbas et al. 2023) utilizing CMIP5 and CMIP6 model datasets have consistently identified an anticipated rise in mean summer monsoon rainfall in Pakistan's future climate scenarios.
DISCUSSION
Accurate prediction of the future streamflow with the expected increase in weather events and climate change is very important for water resource planning and management. This research study revealed that the ANN model overperformed the RNN and ANFIS models in terms of streamflow forecasting for the Hunza River Basin. The results of the current study are supported by the literature (Mohammadi et al. 2021; Vatanchi et al. 2023). As a result, these models are especially well-suited for simulating streamflow within a given watershed. When conducting research on extreme hydrological events such as floods, it is efficient to use the ANN model. This advice is based on ANN's innate ability to capture the complex patterns and linkages involved with high-flow occurrences, making them a great tool for correctly recreating and predicting such extreme hydrological phenomena. This study further revealed a significant increase in streamflow in the Hunza River Basin, which is expected to continue up until the year 2100. These projected streamflow patterns for the Hunza River Basin under the SSP245 and SSP585 scenarios show similar tendencies that have been found by past researchers (Tahir et al. 2015, 2016). Wijngaard et al. (2017) carried out a study for the Upper Indus Basin in which they used a fully distributed cryospheric-hydrological model to simulate current and future hydrological fluxes and feed the model with an ensemble of eight downscaled GCMs chosen from the RCP4.5 and RCP8.5 scenarios. They found that the amount of mean discharge and high-flow events will almost certainly increase by the end of the 21st century. These increases could be attributed primarily to rising precipitation and temperature.
Ali et al. (2018a) conducted a study and used Hydrologiska Byrans Vattenbalansavdeling (HBV) model for prediction of future streamflow in the Hunza River using future projected data of three GCMs, i.e., BCC-CSM1.1, CanESM2, and MIROCESM under RCP2.6, 4.5, and 8.5 and predictions were made over three time periods, 2010–2039, 2040–2069, and 2070–2099, using 1980–2010 as the base period. Overall projected climatic data show that temperature and precipitation are the most sensitive parameters affecting Hunza River streamflow. Hussain & Khan (2020) conducted a study on the Hunza River Basin using ML techniques, they also concluded that ML algorithms/models can be used for forecasting river flow with high accuracy which will further improve water and hazard management. Haleem et al. (2022) studied the Upper Indus Basin in Pakistan using a semi-distributed model called the Soil and Water Assessment Tool (SWAT). They conclude that climate change will result in an increase in overall streamflow. The increased streamflow is due to the combined effects of increasing precipitation and temperature, as predicted by the CMIP6 GCMs for future periods under both the SSP245 and SSP585 emission scenarios. These climatic changes, coupled with the physical processes occurring within the basin, have collectively contributed to the observed rise in streamflow in the Hunza River. In summary, the increase in streamflow in the Hunza River Basin is a multifaceted phenomenon driven by both global climate change, as projected by CMIP6 GCMs, and local processes within the basin. This trend is consistent with similar studies conducted in Pakistan and underscores the importance of understanding and managing water resources in the face of changing climate conditions.
CONCLUSION
This study assessed the performance of three ML models (ANN, RNN, and ANFIS) for the Hunza River Basin. The objective was to determine the most suitable model for predicting streamflow responses to future climate change scenarios up to the year 2100 under SSP245 and SSP585, utilizing CMIP6 GCM data. The results demonstrated that the ANN model with the 3-10-1 architecture outperformed the RNN and ANFIS models with better accuracy, as indicated by the MSE, RMSE, MAE, and R2 values. Furthermore, significant variations in streamflow patterns were observed throughout the period up to 2100 for the CMIP6 GCMs under both SSP245 and SSP585. The increase in streamflow is due to an increase in precipitation and temperature patterns which were predicted using climate change signal analysis based on CMIP6 GCMs under the SSP245 and SSP585 emission scenarios. Thus, the outcomes of the overall study indicate that the ANN model is efficient in simulating streamflow in the Hunza River Basin.
The results of this study have significant implications for water resource management and hydrological research. The development of precise streamflow forecasting models utilizing advanced ML algorithms offers the potential to empower decision makers with enhanced strategies for water resource planning, flood mitigation, and drought management. The incorporation of precipitation and temperature datasets, along with bias-corrected CMIP6 data, provides a more comprehensive understanding of the impact of climate change on hydrological processes. Nevertheless, it is important to acknowledge certain limitations, such as data availability constraints, potential challenges related to model generalization, and inherent uncertainties within climate models.
Future research endeavors can explore various avenues, including the application of hybrid ML techniques, the development of real-time streamflow prediction models, and conducting risk assessment studies. By addressing these limitations and pursuing further research in these areas, streamflow forecasting can progress significantly, ultimately contributing to more sustainable water management practices and improved preparedness for water-related challenges on a global scale.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.