Abstract
Reliable and accurate modelling of streamflow is still a challenging task due to their complex behaviour, need for extensive parameter for development as well as lack of complete or accurate data. In this study, the applicability of an emerging data-driven model, specifically a neural network autoregression (NNAR) model, was evaluated for the first time as a substitute to the physically based hydrological model Soil and Water Assessment Tool (SWAT) for predicting streamflow under data-scarce conditions and for immediate high-quality modelling results. The inputs to the NNAR model were the lagged values of the daily streamflow time series data, and the output was the predicted value for the next day. Using streamflow data that was windowed by 20 days, the NNAR model produced the best prediction. The results of the statistical metrics used to evaluate the performance of the NNAR model were satisfactory (R = 0.90, RMSE = 28.27, MAE = 11.92, R2 = 0.83), indicating a high degree of agreement between the predicted and observed streamflow. The NNAR model outputs demonstrated its ability to accurately predict streamflow in the river basin, even without an explicit understanding of the physical processes that govern the system.
HIGHLIGHTS
Using the hydrological model SWAT and machine learning model NNAR, the Meeanchil River Basin's streamflow was predicted.
Projections of future streamflow for the period 2025–2086 under RCP 4.5 and RCP 8.5 were carried out.
Model performance was evaluated using R, RMSE, MAE, and R2.
A performance comparison between SWAT and NNAR was conducted.
INTRODUCTION
Water is a very important natural resource that is necessary for all living things to survive. Water resources must be managed effectively to meet current and future demands, ensure sustainability, and meet the needs of a growing population. With more warming in the 21st century, it is expected that a larger share of the world's population will be affected by water shortages and major river floods (Zang et al. 2007; Sood & Smakhtin 2015; Chen et al. 2016). The most crucial aspect of hydrology and water resource management is the accurate prediction of floods and river flow. This is a difficult modelling task because of the complex behaviour of the streamflow caused by its characteristics like non-linearity, non-stationarity, spatial and temporal variability of rainfall, characteristics of the catchment, etc. The management of water resources, water supply, and flood prevention can all benefit from accurate streamflow estimation.
Runoff, which is a significant part of the hydrologic cycle, is the water flow that happens when soil is saturated and extra water flows over the land surface. Runoff is extremely important to the hydrological cycle because it allows for the regulation of streamflow by discharging excess precipitation into the oceans. They are crucial in understanding and solving a variety of issues encompassing water resources. In order to predict hydrological processes like surface runoff, hydrological modelling is a potent technique. It presents the hydrological cycle in the physical world. Typically, there are two main categories of hydrological models: stochastic and deterministic. The input data in stochastic hydrological models is linked to the output through statistical or mathematical methods. Deterministic hydrological models, on the other hand, characterize the physical processes in the water resources system, and are thus more complex. In general, hydrological models need a wide variety of inputs, including climatic parameters, topographical details, and spatial distributions within the watershed. These models are increasingly seen as a crucial component of environmental and water management planning. Recent years have seen a rise in the use of the Soil and Water Assessment Tool (SWAT) model for assessing basin runoff, sediment transport, and nutrient levels. It is the most widely used physically based model used at the river basin scale and is an effective tool for water resource management applications (Raihan et al. 2021; Saranya & Vinish 2021).
With technological advancements over the past few decades, researchers have become increasingly interested in the estimation of hydrological variables using machine learning (ML). In addition to physical and stochastic models, ML models have grown in popularity as powerful techniques for predicting streamflow. ML models are data-driven models that can recognize patterns in the relationships between input and output variables without being aware of the underlying physical processes. In recent years, data-driven models have become widespread in hydrological applications. In a variety of modelling tasks, including flood forecasting, runoff modelling, and predictions in data-scarce regions, ML techniques have demonstrated unprecedented precision and versatility (Höge et al. 2022). ML techniques have made significant contributions to the development of prediction systems, which have resulted in improved functionality and more cost-efficient solutions. These techniques aim to simulate the complex mathematical expressions of physical processes. The popularity of ML models among hydrologists has greatly increased as a result of their numerous advantages and capabilities. Researchers are currently working to find more precise and effective prediction models by introducing novel ML techniques.
Several recent studies have implemented ML applications in hydrologic engineering. Dalkilic & Hashimi (2020) used models like Artificial Neural Networks (ANNs), Wavelet Neural Networks (WNNs), and Adaptive Neuro-Fuzzy Inference System (ANFIS) in their study to compare how well they could forecast daily streamflow and thus identify the strengths and limitations of each model for a particular dataset. According to the findings of their research, WNN provided the most accurate result with the highest correlation to the observed data. Since it has been noted in some studies that choosing the best relevant input variables is a difficult task, Kisi et al. (2022) proposed a new modelling strategy for doing so. They also introduced a new model called the radial m5 model tree to predict daily streamflow. Additionally, the performance of this model was compared to that of four other data-driven models. For a stream in Konya, Turkey, Asaad et al. (2022) evaluated the long-term forecasting (12, 24, and 36 months) of average monthly streamflow using multilayer perceptron (MLP), long short-term memory (LSTM), and ANFIS. To examine the discrepancies between the actual data and model predicted data, they used the Mann–Whitney test in addition to statistical performance indicators.
Several data-driven models, such as ANN, Support Vector Machine (SVM), Extreme Learning Machine (ELM), LSTM, Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), Auto-Regressive Neural Network (ARNN), Genetic Programming (GP), and Random Forest (RF), have been used by researchers for hydrological prediction, and their accuracy in simulating runoff has been proved to be quite good (Uysal & Şorman 2017; Jimeno-Sáez et al. 2018; Tyralis et al. 2019; Parisouj et al. 2020; Al-Saati et al. 2021; Mohammadi et al. 2021; Pham et al. 2021; Danandeh Mehr et al. 2022; Höge et al. 2022; Hunt et al. 2022; Jimeno-Sàez et al. 2022; Kisi et al. 2022). Unlike physical models, which take into account the logical and physical relationships between variables, ML models use mathematical functions to link inputs to outputs. Under various hydrological conditions and various catchment characteristics, there is no general model that performs better than other models (Mohammadi et al. 2021). Studies on the application of ML to streamflow prediction have shown promising results, with some studies reporting significant improvements in prediction accuracy compared to traditional methods. However, the effectiveness of ML approaches can be influenced by factors such as the quality and availability of input data, the choice of algorithm, and the temporal and spatial scale of the analysis. Furthermore, there are still some challenges associated with the application of ML to streamflow prediction, such as the need for large amounts of high-quality data, the potential for overfitting, and the difficulty in interpreting the results and understanding the underlying relationships between input variables and streamflow.
Based on a review of existing research, we assessed the practicality of different models for predicting streamflow in our study region. These models included Artificial Neural Network with Backpropagation (ANN-BP), Support Vector Regression (SVR), RF, and ELM. We evaluated the effectiveness of these models using various evaluation criteria like R, RMSE, R², and MAE, which are important indicators for measuring a model's fit to the data. However, none of these models yielded an R value surpassing 0.73. Furthermore, when we compared the performance of the calibrated models during testing to that of the calibration period, we observed unfavourable statistical results, and the predictive reliability was notably weak. These challenges led us to opt for the NNAR model, which successfully addressed these limitations and provided improved outcomes.
Forecasting of streamflow is a crucial issue for hydrological research because it is necessary in many significant areas. The hydrological processes that take place in a basin are extremely complex to model, and no model is perfect in every basin. Decision-makers in basin management face a challenge in finding models that precisely simulate the complexity of basin processes using available data. In this study, an emerging data-driven model, specifically a neural network autoregression model (NNAR), was used as an alternative to a physically based SWAT model for predicting streamflow under data-scarce conditions and for immediate high-quality modelling results. This model typically does not incorporate physical laws and constraints, which can result in unrealistic predictions. For instance, they may fail to respect mass balance or conservation principles, leading to implausible estimates of hydrological variables. It is worth noting that recent advancements in the field have explored hybrid approaches that combine the strengths of physically based models and ML models, aiming to mitigate their respective shortcomings and improve hydrological predictions (Yang et al. 2020).
The main goal of this study is to compare the outcomes of the NNAR model with those obtained using the physically based SWAT model in order to assess the performance of the NNAR model for streamflow estimation. The forecasted streamflow data generated by the NNAR model was assessed against the results of the hydrological model SWAT for the historic period. This study is broken down into the five steps listed below: (i) hydrologic simulation of streamflow for the historic period 1980–2017 using SWAT; (ii) future prediction of streamflow in SWAT for the period 2025–2086 under RCP (representative concentration pathway) 4.5 and 8.5 emission scenarios; (iii) development of the optimal NNAR model; (iv) estimation of streamflow for the future period using the NNAR model; and (v) comparison of the results and analysis of model performance.
MATERIALS AND METHODS
Study area and data used
The India Meteorological Department (IMD) in Pune provided the meteorological data, which included precipitation, maximum temperature, and minimum temperature from climate stations within the study area. This data was collected for a period of 1980–2017. Additionally, daily streamflow data was collected from the Central Water Commission (CWC) India for the Kidangoor station within the study area between 1987 and 2017. The land use map of the study area was generated from Landsat images downloaded from the USGS website for the period 1992 and 2008. The land use maps were created through the supervised classification of Landsat images in the ArcGIS software. The soil map of the study area was extracted using the Digital Soil Map of the World provided by the Food and Agricultural Organization (FAO) of the United Nations. The 30 × 30-m digital elevation model (DEM) of the study area was downloaded from NASA's Earth Science Data Portal. The hydrological simulation of future streamflow in SWAT requires both climate projections and land use information. The best performing climate model ensemble determined for the study area in simulating precipitation and temperature identified by Saranya & Vinish (2021) was utilized to project the climate variables. For the control period of 1980–2005, historical simulations of climate variables were taken from the climate models. The future projection of climate variables corresponding to future emission scenarios, such as RCPs 4.5 and 8.5, as outlined in the 5th IPCC assessment report, was chosen for this study. CNRM_CM5-RegCM4, CNRM_CM5-RCA4, GFDL_ESM2M-RegCM4, GFDL_ESM2M-RCA4, and NorESM1_M-RCA4 are the GCM-RCM (general circulation model-regional climate model) combinations chosen for precipitation simulation. Likewise, the selected GCM-RCM pairs for temperature (both maximum and minimum) are GFDL_ESM2M-RegCM4, GFDL_ESM2M-RCA4, NorESM1_M-RCA4, CanESM2-RegCM4, and MIROC5-RCA4. These climate model projections for the period 2025–2086 were obtained from the climate data portal of IITM Pune (Indian Institute of Tropical Meteorology). The land use map for 2030 and 2060 was projected using the land change Modeller Module of Clark labs' TerrSet software (Clark labs 2021).
METHODS
SWAT model
SWAT set-up, calibration, and validation
The threshold values are set in order to prevent the formation of an excessive amount of HRUs by removing insignificant land use, soil type, and slope in each sub-basin (Chen & Chang 2021). One hundred and twenty-three HRUs are being created in this study area. The next step is to define the weather data for the simulation period. Once the HRUs and weather data were defined, the input tables for the SWAT model were written. The model uses the SCS-CN method (Neitsch et al. 2009) to calculate runoff, and the most accurate Penman–Monteith equation was chosen to compute the potential evapotranspiration. The final step is to run the SWAT model to simulate the streamflow for the defined watershed. The runoff was estimated independently for each HRU, and then it was routed in order to get the overall runoff for the watershed. The simulation included a 3-year warm-up period. The simulation was scheduled to run from 1980 to 2017, inclusive of the warm-up period.
SWAT Calibration Uncertainties Program (SWAT-CUP) is a program designed for analysing the prediction uncertainty of calibration and validation outcomes of the SWAT model. It includes various calibration and uncertainty analysis procedures, such as Sequential Uncertainty Fitting-2 (SUFI-2). Sensitivity analysis is an important part of hydrological modelling, as it helps to identify the most important parameters that influence model output. SWAT-CUP offers both global and local sensitivity analysis methods. Global sensitivity analysis involves exploring the sensitivity of model output to changes in all input parameters, while local sensitivity analysis focuses on the sensitivity of model output to changes in a specific parameter. To validate the model, SWAT uses observed streamflow data and compares them with simulated streamflow from the model. During calibration, the model parameters are adjusted until the simulated streamflow closely matches the observed data, within an acceptable margin of error. The adjusted values of the sensitive parameters were then input into the SWAT model, and runoff simulations for the validation period were executed. The performance of the SWAT model in simulating streamflow for the calibration and validation periods was evaluated in terms of the performance indices such as NSE, root mean square error to the standard deviation (RSR), and percent bias (PBIAS) (Moriasi et al. 2007). Once the model is calibrated and validated, the model can be used to forecast future streamflow and evaluate the impact of various land management scenarios on watershed processes and water resources. This model has been successfully applied over the study area in previous studies and has been shown to be effective in simulating the hydrological characteristics of the basin.
Bias correction of GCM-RCM pairs
Regional climate models are increasingly being used to provide high-resolution climate projections for specific regions. However, the output of these models may have biases due to the model's inherent limitations, such as the use of coarse input data, limited representation of physical processes, and simplification of complex feedback mechanisms. As a result, bias correction is often necessary to improve the accuracy of RCM output for precipitation and temperature variables (Fang et al. 2015; Mudbhatkal & Mahesha 2018). Bias correction of precipitation and temperature outputs from RCMs involves adjusting the model output to better match observations from weather stations or other datasets. Out of the significant types of bias correction methods available for precipitation and temperature, the widely adopted distribution mapping (DM) method has been chosen in this study. It involves adjusting the cumulative distribution function (CDF) of the modelled data by matching the observed and simulated quantiles. A transfer function is derived from the observed and simulated datasets, which is used to readjust distributions in order to correct the RCM simulations for observed rainfall and temperature. It corrects the mean, standard deviation, and quantiles of raw RCM data by matching the distribution function with the data observed.
Neural network autoregression model (NNAR)
The ‘forecast’ package in R includes an implementation of the NNAR time series forecasting model. This package provides a user-friendly interface for building, training, and testing NNAR models in R. The ‘nnetar()’ function is a time series forecasting function provided in the forecast package in R (Hyndman & Athanasopoulos 2018). The nnetar() function takes a time series object as input and fits an NNAR model to the data. The function allows the user to specify several hyperparameters, including the number of lagged values to include as predictors, the number of hidden layers in the neural network, and the number of neurons in each hidden layer. The default values for these hyperparameters are chosen based on empirical testing and are generally suitable for many time series datasets. Once the NNAR model is trained using the nnetar() function, the forecast() function can be used to generate forecasts of future values. The function takes the trained model and the number of time periods to forecast as inputs and produces a forecast object that can be further analysed and plotted.
The nnetar () function in R uses a feedforward neural network with a single hidden layer and backpropagation algorithm to approximate the function f. The number of neurons in the hidden layer is determined by the size parameter of the function. It is possible to iteratively generate future sample paths of this model by randomly producing a εt value, which can be obtained through either resampling from historical values or generating a value from a normal distribution. In the context of prediction, the network is utilized in a repetitive manner. To predict the next step, we utilize the historical data at hand. To forecast two steps ahead, we make use of the prior one-step prediction in conjunction with the historical information. We continue this procedure until we have obtained all of the necessary predictions. The weights and biases of the neural network are estimated using the training set, and the trained model is then used to make predictions on the test set. The observed streamflow data of Kidangoor station available for the period 1987–2016 was used to train, validate, and test the model. The input layer was taken as the Target with a lag of specified length, the input layer data were classified as (70% Training + 15% Test + 15% Validation) randomly. The results in the output layer were compared with the target. The calibrated NNAR model was validated for 2017 by comparing the observed streamflow data from the stream gauge station in the study area. In addition, the NNAR model was used to predict streamflow for the future period after its performance was deemed satisfactory. The results were then compared to the future projections made by the SWAT model under RCP 4.5 and RCP 8.5. A comparative analysis of the future predictions of both models was conducted.
Model performance evaluation
RESULTS AND DISCUSSION
The study aimed to forecast the streamflow of the Meenachil River Basin by utilizing two models, a physically based model called SWAT and a data-driven model named NNAR. The performance of these models was assessed by comparing their results with observed streamflow data from 1987 to 2017. The models were then utilized to predict future streamflow in the study area for the years 2025–2086.
Streamflow simulation in the SWAT model
The SWAT model was calibrated in SWAT-CUP software using the SUFI-2 algorithm by adjusting the model parameters to improve its ability to replicate observed streamflow. This process is performed by running the model for the calibration period 1987–2004 using a set of initial parameter values. The model output is then compared to observed data, and the parameter values are adjusted to reduce the difference between the simulated and observed values. The performance measure Nash–Sutcliffe efficiency (NSE) was used as the objective function by the SUFI-2 algorithm for calibration. The global sensitivity analysis resulted in 15 most sensitive parameters such as CH_N2, CH_K2, SOL_BD, SOL_AWC, GW_DELAY, GWQMN, GW_REVAP, ALPHA_BF, CN2 REVAPMN, RECHARGE_DP, ESCO, EPCO, CH_N2, SOL_K, and ALPHA_BNK which are used to calibrate the model. The description, value range, and fitted value of the sensitive parameters are given in Table 1.
Parameter . | Description . | Fitted value . | Value range . |
---|---|---|---|
ALPHA_BF | Baseflow alpha factor (days) | 0.875 | 0–1 |
CN2 | SCS runoff curve number | 96.42 | 35–100 |
GW_DELAY | Groundwater delay (days) | 212.50 | 0–500 |
GWQMN | Threshold depth of water in the shallow aquifer required for return flow (mm) | 2,875 | 0–5,000 |
GW_REVAP | Groundwater ‘revap’ coefficient | 0.168 | 0.02–0.2 |
REVAPMN | Threshold depth of water for revap or percolation to occur | 87.5 | 0–500 |
RECHARGE_DP | Deep aquifer percolation factor | 0.225 | 0–1 |
ESCO | Soil evaporation compensation factor | 0.475 | 0–1 |
EPCO | Plant uptake compensation factor | 0.325 | 0–1 |
CH_N2 | Manning's n value for the main channel | 0.0172 | 0–0.3 |
CH_K2 | Effective hydraulic conductivity in main channel alluvium | 487.5 | 0–250 |
SOL_BD | Moist bulk density (g/cm3) | 1.90 | 0.9–2.5 |
SOL_AWC | Available water capacity of the soil layer | 0.175 | 0–1 |
SOL_K | Saturated hydraulic conductivity | 1,450 | 0–2,000 |
ALPHA_BNK | Baseflow alpha factor for bank storage | 0.325 | 0–1 |
Parameter . | Description . | Fitted value . | Value range . |
---|---|---|---|
ALPHA_BF | Baseflow alpha factor (days) | 0.875 | 0–1 |
CN2 | SCS runoff curve number | 96.42 | 35–100 |
GW_DELAY | Groundwater delay (days) | 212.50 | 0–500 |
GWQMN | Threshold depth of water in the shallow aquifer required for return flow (mm) | 2,875 | 0–5,000 |
GW_REVAP | Groundwater ‘revap’ coefficient | 0.168 | 0.02–0.2 |
REVAPMN | Threshold depth of water for revap or percolation to occur | 87.5 | 0–500 |
RECHARGE_DP | Deep aquifer percolation factor | 0.225 | 0–1 |
ESCO | Soil evaporation compensation factor | 0.475 | 0–1 |
EPCO | Plant uptake compensation factor | 0.325 | 0–1 |
CH_N2 | Manning's n value for the main channel | 0.0172 | 0–0.3 |
CH_K2 | Effective hydraulic conductivity in main channel alluvium | 487.5 | 0–250 |
SOL_BD | Moist bulk density (g/cm3) | 1.90 | 0.9–2.5 |
SOL_AWC | Available water capacity of the soil layer | 0.175 | 0–1 |
SOL_K | Saturated hydraulic conductivity | 1,450 | 0–2,000 |
ALPHA_BNK | Baseflow alpha factor for bank storage | 0.325 | 0–1 |
Source: Saranya & Vinish (2022).
Performance of the NNAR model
Before using the nnetar() function for streamflow modelling, the stationarity of the time series data was tested using KPSS test and confirmed that the series is not trend stationary with a p-value of 0.0231. The KPSS is a statistical test used to determine whether a time series is stationary or non-stationary. Stationary time series are those where the statistical properties (such as the mean and variance) remain constant over time, while non-stationary time series are those where these properties change over time. This test works by testing the null hypothesis that the time series is stationary against the alternative hypothesis that it is non-stationary. Further, the seasonality component was removed using the stl() function in stat package. The preprocessed data was fed to the nnetar() function for forecasting. The forecasting of time series data using the NNAR model had two phases. The determination of the order of the autoregressive model was phase one and training the neural network using the training dataset by considering the order of autoregressive was the second phase.
The nnetar() function from the forecast package has the power of both ARIMA and ANN. This function was used to train the model with various lags. An optimal model was selected by the function and was of 20 days lag. Further models with higher lags were tested and were found to be ill-learned compared to the optimal 20 lag model. Model training was observed through a validation dataset along with the training data and was found to be efficiently trained. Finally, completely unknown data was fed to the trained model to confirm the consistency in performance. The performance of the NNAR model during training, validation, and testing was evaluated using popular model evaluation measures suitable to the hydrological study. The performance of the model for lag 1–20 is shown in Table 2. From Table 2, it is clear that the model became stable at 20 lag version with the highest consistent correlation and R-square values over training, validation, and testing models. The NNAR model produced the best prediction with a 20-day windowed streamflow data. The streamflow was predicted using the model and was compared with that of the observed streamflow available from Kidangoor station.
Lag . | Training . | Validation . | Testing . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
R . | RMSE . | MAE . | R2 . | R . | RMSE . | MAE . | R2 . | R . | RMSE . | MAE . | R2 . | |
1 | 0.2187 | 13.32 | 0.225 | 0.031 | 0.44 | 13.65 | 0.281 | 0.002 | 0.84 | 21.99 | 0.2961 | 0.105 |
2 | 0.77 | 28.63 | 0.295 | 0.54 | 0.77 | 28.414 | 0.294 | 0.55 | 0.77 | 28.03 | 0.289 | 0.555 |
3 | 0.776 | 28.032 | 0.289 | 0.55 | 0.775 | 27.54 | 0.293 | 0.54 | 0.779 | 27.27 | 0.298 | 0.544 |
4 | 0.812 | 26.08 | 0.29 | 0.61 | 0.827 | 25.471 | 0.2961 | 0.65 | 0.74 | 33.49 | 0.3 | 0.466 |
5 | 0.773 | 27.87 | 0.291 | 0.54 | 0.77 | 27.548 | 0.293 | 0.542 | 0.789 | 26.88 | 0.299 | 0.56 |
6 | 0.82 | 25.47 | 0.29 | 0.65 | 0.758 | 33.06 | 0.29 | 0.48 | 0.753 | 33.07 | 0.297 | 0.478 |
7 | 0.746 | 32.33 | 0.289 | 0.466 | 0.74 | 31.72 | 0.283 | 0.467 | 0.753 | 31.094 | 0.277 | 0.4766 |
8 | 0.766 | 30.468 | 0.281 | 0.499 | 0.78 | 29.86 | 0.271 | 0.533 | 0.79 | 29.36 | 0.281 | 0.55 |
9 | 0.799 | 28.86 | 0.285 | 0.57 | 0.83 | 28.64 | 0.29 | 0.577 | 0.808 | 28.418 | 0.298 | 0.5856 |
10 | 0.8133 | 27.985 | 0.299 | 0.601 | 0.812 | 27.66 | 0.299 | 0.6 | 0.816 | 27.249 | 0.294 | 0.609 |
11 | 0.814 | 27.11 | 0.297 | 0.605 | 0.8134 | 27.015 | 0.364 | 0.6005 | 0.8186 | 26.67 | 0.3 | 0.6123 |
12 | 0.823 | 26.257 | 0.2969 | 0.623 | 0.828 | 25.92 | 0.301 | 0.63 | 0.834 | 25.57 | 0.294 | 0.649 |
13 | 0.845 | 24.93 | 0.3 | 0.67 | 0.85 | 24.47 | 0.299 | 0.687 | 0.85 | 24.04 | 0.299 | 0.701 |
14 | 0.874 | 22.75 | 0.33 | 0.73 | 0.877 | 22.51 | 0.348 | 0.744 | 0.879 | 22.28 | 0.361 | 0.745 |
15 | 0.8745 | 23.22 | 0.37 | 0.74 | 0.899 | 21.668 | 1.18 | 0.799 | 0.9375 | 19.59 | 0.55 | 0.866 |
16 | 0.935 | 20.48 | 0.49 | 0.85 | 0.891 | 28.8 | 0.46 | 0.71 | 0.868 | 27.74 | 0.52 | 0.642 |
17 | 0.9023 | 22.65 | 0.51 | 0.79 | 0.914 | 22.13 | 0.42 | 0.81 | 0.82 | 27.05 | 0.5 | 0.59 |
18 | 0.9099 | 23.457 | 0.517 | 0.798 | 0.895 | 26.95 | 0.454 | 0.761 | 0.9071 | 23.71 | 0.506 | 0.8 |
19 | 0.9006 | 19.84 | 0.487 | 0.79 | 0.9025 | 19.53 | 0.534 | 0.8 | 0.9035 | 19.38 | 0.5465 | 0.8024 |
20 | 0.907 | 18.23 | 0.563 | 0.81 | 0.906 | 18.53 | 0.565 | 0.9061 | 0.9063 | 18.76 | 0.559 | 0.808 |
Lag . | Training . | Validation . | Testing . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
R . | RMSE . | MAE . | R2 . | R . | RMSE . | MAE . | R2 . | R . | RMSE . | MAE . | R2 . | |
1 | 0.2187 | 13.32 | 0.225 | 0.031 | 0.44 | 13.65 | 0.281 | 0.002 | 0.84 | 21.99 | 0.2961 | 0.105 |
2 | 0.77 | 28.63 | 0.295 | 0.54 | 0.77 | 28.414 | 0.294 | 0.55 | 0.77 | 28.03 | 0.289 | 0.555 |
3 | 0.776 | 28.032 | 0.289 | 0.55 | 0.775 | 27.54 | 0.293 | 0.54 | 0.779 | 27.27 | 0.298 | 0.544 |
4 | 0.812 | 26.08 | 0.29 | 0.61 | 0.827 | 25.471 | 0.2961 | 0.65 | 0.74 | 33.49 | 0.3 | 0.466 |
5 | 0.773 | 27.87 | 0.291 | 0.54 | 0.77 | 27.548 | 0.293 | 0.542 | 0.789 | 26.88 | 0.299 | 0.56 |
6 | 0.82 | 25.47 | 0.29 | 0.65 | 0.758 | 33.06 | 0.29 | 0.48 | 0.753 | 33.07 | 0.297 | 0.478 |
7 | 0.746 | 32.33 | 0.289 | 0.466 | 0.74 | 31.72 | 0.283 | 0.467 | 0.753 | 31.094 | 0.277 | 0.4766 |
8 | 0.766 | 30.468 | 0.281 | 0.499 | 0.78 | 29.86 | 0.271 | 0.533 | 0.79 | 29.36 | 0.281 | 0.55 |
9 | 0.799 | 28.86 | 0.285 | 0.57 | 0.83 | 28.64 | 0.29 | 0.577 | 0.808 | 28.418 | 0.298 | 0.5856 |
10 | 0.8133 | 27.985 | 0.299 | 0.601 | 0.812 | 27.66 | 0.299 | 0.6 | 0.816 | 27.249 | 0.294 | 0.609 |
11 | 0.814 | 27.11 | 0.297 | 0.605 | 0.8134 | 27.015 | 0.364 | 0.6005 | 0.8186 | 26.67 | 0.3 | 0.6123 |
12 | 0.823 | 26.257 | 0.2969 | 0.623 | 0.828 | 25.92 | 0.301 | 0.63 | 0.834 | 25.57 | 0.294 | 0.649 |
13 | 0.845 | 24.93 | 0.3 | 0.67 | 0.85 | 24.47 | 0.299 | 0.687 | 0.85 | 24.04 | 0.299 | 0.701 |
14 | 0.874 | 22.75 | 0.33 | 0.73 | 0.877 | 22.51 | 0.348 | 0.744 | 0.879 | 22.28 | 0.361 | 0.745 |
15 | 0.8745 | 23.22 | 0.37 | 0.74 | 0.899 | 21.668 | 1.18 | 0.799 | 0.9375 | 19.59 | 0.55 | 0.866 |
16 | 0.935 | 20.48 | 0.49 | 0.85 | 0.891 | 28.8 | 0.46 | 0.71 | 0.868 | 27.74 | 0.52 | 0.642 |
17 | 0.9023 | 22.65 | 0.51 | 0.79 | 0.914 | 22.13 | 0.42 | 0.81 | 0.82 | 27.05 | 0.5 | 0.59 |
18 | 0.9099 | 23.457 | 0.517 | 0.798 | 0.895 | 26.95 | 0.454 | 0.761 | 0.9071 | 23.71 | 0.506 | 0.8 |
19 | 0.9006 | 19.84 | 0.487 | 0.79 | 0.9025 | 19.53 | 0.534 | 0.8 | 0.9035 | 19.38 | 0.5465 | 0.8024 |
20 | 0.907 | 18.23 | 0.563 | 0.81 | 0.906 | 18.53 | 0.565 | 0.9061 | 0.9063 | 18.76 | 0.559 | 0.808 |
When using 20-day windowed streamflow data, the NNAR model yielded the best prediction. For this reason, the statistical parameters that correspond to the 20-day lag are in bold.
Comparison of streamflow projected by the SWAT and NNAR models
The graph shows that the trend of the annual maximum streamflow of the NNAR model matches exactly with the SWAT simulated streamflow under RCP 8.5 scenario. However, the upper extreme of annual maximum streamflow simulated by SWAT under both scenarios was found to be higher than that of the NNAR model, and the lower extremes of annual maximum streamflow coincided exactly in the case of NNAR and SWAT under RCP 8.5.
The variation in annual average streamflow simulated by NNAR and SWAT under RCP 8.5 also follows the same trend. Whereas a significant discrepancy was observed between the NNAR model simulation and the SWAT simulated streamflow under RCP 4.5. The findings indicate that future streamflow is more likely to follow the trend demonstrated by both the NNAR and SWAT models under RCP 8.5 scenario. The similarity between the predicted future trends in streamflow, as shown by both the NNAR and SWAT models, suggests that the SWAT model prediction under RCP 8.5 is reliable. The use of multiple models to simulate future streamflow can provide more robust results and a more comprehensive understanding of the expected changes in streamflow. The ups and downs shown in annual streamflow provide useful information for decision-makers to develop appropriate water management strategies to address upcoming drought and flood conditions.
When quantitatively comparing the SWAT and NNAR models, the first aspect considered was the input data requirements. To achieve accurate and dependable streamflow prediction for a catchment, the SWAT model necessitates extensive input data, including climate data (such as precipitation, maximum and minimum temperature, relative humidity, wind speed, and solar radiation), topographic data like the study area's DEM, as well as soil and land use maps specific to the study area. Conversely, the NNAR model only requires previous records of streamflow data as its input. The process of preparing input data for the SWAT model is time-consuming, particularly when generating the land use map. This land use map was derived from Landsat images using ArcGIS software, and its projection for future periods was carried out using TerrSet software. In total, from the initiation of input data preparation, calibration to the simulation and future projection of streamflow in the SWAT model, the entire process took approximately 2 months to complete. In contrast, the NNAR model, with its minimal data requirements, only required 2 days for prediction, making it significantly faster in comparison.
The choice between the SWAT model and the NNAR model for predicting streamflow depends on the specific requirements of the study, the availability of data, and the level of understanding and interpretability desired. SWAT is more appropriate when detailed physical processes need to be explicitly represented, while NNAR is valuable when accurate predictions are needed without the need for a detailed understanding of underlying mechanisms.
CONCLUSIONS
Water resource management relies heavily on streamflow predictions. As water is a vital resource, decision-makers must be able to allocate water resources effectively, particularly during droughts, floods, or other extreme weather events. The usefulness of ML in predicting streamflow is significant, especially in cases where traditional physical models may face limitations due to the lack of data or computing resources. In this study, the effectiveness of the NNAR data-driven model for predicting streamflow was examined and compared to the SWAT hydrological model, for the Meenachil River Basin in central Kerala, India. For the NNAR model, the streamflow data's lagged values were used as inputs, and the predicted value for the next time step was the output. The best model was obtained by using 20 lagged streamflow values as inputs when predicting the streamflow. Both SWAT and NNAR models demonstrated satisfactory performance when calibrated and validated using historical data. The NNAR model showed a better representation of actual streamflow variations compared to the SWAT model for the historical period. In order to enhance the reliability of future predictions, both climate and land use changes were taken into account in the SWAT model. When comparing the future predictions of both SWAT and NNAR models, a strong correlation (R = 0.86) was observed between the simulated streamflow by SWAT using climate data corresponding to RCP 8.5 and the results of the NNAR model. While it is not feasible to validate future predictions, the substantial correlation between the projections of two completely distinct models enhances the credibility and dependability of their forecasts. The consensus in their predictions bolsters confidence in the results and establishes a solid foundation for making informed decisions or conducting further analyses concerning streamflow.
The use of the NNAR model has made streamflow prediction easier by reducing the required time and computational resources compared to the traditional method of using SWAT. Further research may be needed to understand the advantages and limitations of each modelling approach in different hydrological settings and under different climate scenarios. ML has the potential to be a valuable tool for streamflow prediction, particularly in situations where physical models may be limited by data availability or computational resources. Ongoing research is needed to further refine and improve ML approaches for streamflow prediction, and to better understand the strengths and limitations of these methods in different contexts. Overall, the results provide evidence that ML models can be a useful tool in hydrological prediction, especially in cases where process-based models are computationally expensive or when data are limited.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.