Abstract
Hydrological simulations perform a vital role in river discharge forecasts, which is very essential in water resources engineering. The present study has been carried out using a semi-distributed model developed using HEC-HMS, an artificial neural network (ANN), and a hybrid model (HEC-HMS-ANN) for simulation of daily discharge in the Kallada River basin, Kerala, India. The HEC-HMS model did not perform well with the available dataset. So for simulating daily runoff, a hybrid model is developed by coupling HEC–HMS output with ANN. The model prediction accuracy is assessed using statistical metrics. Precipitation, lagged precipitation, and lagged discharge were used as input variables for the ANN model. The optimal number of lags was determined using partial autocorrelation. The hybrid model integrating the output from HEC-HMS into ANN shows better performance than the other models in simulating daily discharge and estimating the accuracy of yearly peak discharge. The accuracy evaluation of yearly peak discharge values demonstrates that simulation error is reduced by 66% and 26.5% in the hybrid model compared to the HEC-HMS and ANN models, respectively.
HIGHLIGHTS
Semi-distributed (HEC-HMS) and ANN models are developed for daily discharge simulation.
Developed a hybrid model (HEC-HMS-ANN) for accurate daily discharge simulation.
The accuracy of simulated peak discharge values of the models is further assessed to find the robustness of the developed models.
Graphical Abstract
INTRODUCTION
The rainfall-runoff (RR) process is one of the most complex phenomena in hydrology due to the spatiotemporal variability of the variables affecting runoff and the non-linear behaviour of the rainfall-runoff process. To calculate the appropriate quantity of water storage in reservoirs, to predict the probability of floods, to design hydraulic structures etc, it is essential to predict runoff (Hamdan et al. 2021). Numerous water resource modellers and hydrologists developed many RR models to predict runoff, but still, there are many obstacles to overcome (He et al. 2011; Meresa 2019). The predictive capability of various models is primarily dependent on hydrologic parameters, historical data, and model structure (Meresa & Gatachew 2018). Among different models, the most commonly used models are physically based models (Al-Juboori 2022). Physically based distributed hydrological modelling entails a basic overview of the total underlying procedures and physical systems that regulate the hydrological cycle, as well as water movement and parameters related to catchment area characteristics (Hsu & Gupta 1995).
Based on the representation of catchment variability, the rainfall-runoff models can be classified as lumped, semi-distributed and distributed (Gebre 2015; Sok & Oeurng 2016). Distributed models consider the spatial variability at every grid cell, whereas the lumped model does not consider the spatial variability. In the semi-distributed model, the total catchment area is divided into sub-areas. Sub-areas can be based on the slope of a catchment, soil, land use or its combination. Many studies showed that the hydrological model HEC-HMS could be effectively implemented for event-based or continuous rainfall-runoff modelling (Chu & Steinman 2009; Majidi & Shahedi 2012; Halwatura & Najim 2013; Singh & Jain 2015; Hamdan et al. 2021; Shakarneh et al. 2022). Chu & Steinman (2009) used HEC-HMS to develop event-based and continuous rainfall-runoff models for Mona lake catchment in west Michigan U.S. Majidi & Shahedi (2012) performed event-based rainfall-runoff modelling using HEC-HMS hydrological model for Abnama watershed, south of Iran. HEC-GeoHMS is an ArcMap extension hydrological toolkit that enables the export of hydrological inputs into HEC-HMS directly (Fleming & Doan 2013). The physically based distributed and semi-distributed rainfall-runoff models are data intensive.
During the past decade, artificial intelligence (AI) techniques have been widely used to simulate rainfall-runoff processes due to its capability to reproduce the non-linear nature of the hydrologic variables and are not data intensive. Pham et al. (2020) used convolutional neural network (CNN) for rainfall runoff modeling and the results are compared with traditional models and long short term memory (LSTM) network. Reddy et al. (2021) applied the EANN (emotional artificial neural network) in monthly runoff prediction and compared it with the feed forward neural network (FFNN) and multivariate adaptive regression spline (MARS). Mao et al. (2021) studied the comparison of the application of AI techniques (LSTM and ANN) for setting up the relation between rainfall and runoff and suggested that machine learning-based models showed improved performance over traditional models. But the noisy input data may significantly affect the performance in these models and the individual models may give different results for the same data sets. Recently there have been studies on various models based on a combination of regression models and AI techniques or various combinations of AI techniques. Hybrid models give better results than individual models. Zhang et al. (2021a) applied a hybrid model (CEEMD-LSTM) in forecasting the precipitation data. Zhang et al. (2021b) predicted runoff using nonlinear auto-regressive (NAR) and improved the performance of the model by coupling it with complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). All these AI models are data-driven, and one major drawback of these models is that the physics of the problem is unknown. These models will not be able to reproduce the response of the cachment due to change in land use/land cover or change in soil type etc.
Recent studies are on the development of hybrid models by combining AI techniques with physically based models. Gholami & Khaleghi (2021) compared the performance of ANN and HEC-HMS models in rainfall-runoff simulation. Young & Liu (2015) developed a hybrid model to improve the performance of the physically-based model (HEC-HMS) simulated hourly runoff using ANN. Zhihua et al. (2020) applied a coupled ANN (optimized SWAT_ANN) model for accurate runoff simulation. In the case of physically-based models or hybrid models using AI techniques, one disadvantage is that the models fail to simulate the peak values in most of the cases. In this study, a physically based semi-distributed model is developed using HEC-HMS integrated with ANN to improve the model performance in daily discharge simulation and the yearly peak discharge simulation. This integration can have the properties of the traditional model (i.e., watershed characteristics) and features of neural networks, which further enhances the performance of the hybrid model. The objectives are (1) To develop HEC-HMS and ANN rainfall-runoff models for the daily time step; (2) To develop a hybrid model (HEC-HMS-ANN); (3) To assess the performance of the developed models in simulating the peak discharge values.
Study area
The Kallada river system is one of the major river systems in southern Kerala, concerning irrigation, power generation and fisheries. The river takes its origin from the Kulathupuzha ranges of the Western Ghats at an elevation of 1,500 m between lat 8° 9″ and 9° 17″ N and long 77° 16″ and 76° 24″ E. The location of Kallada River is in Kollam district, Kerala. After dwindling among the hills and plains in a westward direction, it drains into the Ashtamudi estuary at Uppudu, which is connected with the Arabian Sea at Neendakara. The total drainage area is calculated to be 1,699 km2 with a length of 121 km as shown in Figure 1.
The river system has too much exposure to anthropogenic disturbances such as excessive sand mining (Mohammed Irshad 2015) and exploitation of groundwater (Abhisheka & Binoj Kumar 2018). A stone masonry dam constructed with 23 km2 water-spread area of the reservoir across the river system was commissioned in 1987. The dam was constructed at Parappar near Thenmala, very close to the confluence of the three rivers Kulathuppuzha, Chendurni, and Kalthuruthy tributaries. Another masonry pick-up weir with a length of 120.69 m was also constructed at Ottakkal 5 km downstream. The weir diverts the impounded water into left and right bank canals, which are exclusively irrigation canals.
Data
The 0.25°*0.25° gridded daily rainfall data from 1986–2010 was collected from Indian Meteorological Department, Pune and processed and extracted. There are two grid points in the watershed, and the Thiessen method is used to calculate the average rainfall. The assigned Thiessen weights are 0.63 and 0.37 to the grid points G1 and G2, respectively. The spatial location of two grid points A and B are 9 °N, 77 °E and 9 °N, 76 °45′E respectively and is shown in Figure 1. The daily discharge data is acquired from the Central Water Commission (CWC) at Pattazhy gauging site from 1986 to 2010. The descriptive statistical details of rainfall and discharge are shown in Table 1. The Kallada River watershed is delineated using the shuttle radar topography mission-digital elevation model (SRTM-DEM) and is shown in Figure 1.
Daily rainfall and discharge dataset descriptive statistics for the period of 1986–2010
Dataset . | Mean . | Std. dev . | Minimum . | Maximum . |
---|---|---|---|---|
Precipitation (G1) (mm) | 5.28 | 13.11 | 0.00 | 326.2 |
Precipitation (G2) (mm) | 6.50 | 14.66 | 0.00 | 268.60 |
Discharge (m3/sec) | 44.60 | 66.14 | 0.00 | 2,367.00 |
Dataset . | Mean . | Std. dev . | Minimum . | Maximum . |
---|---|---|---|---|
Precipitation (G1) (mm) | 5.28 | 13.11 | 0.00 | 326.2 |
Precipitation (G2) (mm) | 6.50 | 14.66 | 0.00 | 268.60 |
Discharge (m3/sec) | 44.60 | 66.14 | 0.00 | 2,367.00 |
The Food and Agriculture Organization (FAO) soil map and the land use/land cover (LULC) map (1995) of ORNL DAAC (https://daac.ornl.gov/get_data) are the datasets used for developing curve number (CN) for the basin. The soil map and LULC map are shown in Figures 2 and 3. The study area consists of four different textures of the soil, i.e., clay, loam, gravelly clay and gravelly loam. There are nine land use classes, i.e., plantation, cropland, vegetation, grassland, shrubland, built-up land, water bodies, mixed forest and dense forest. The majority of the basin area is occupied by plantation.
METHODOLOGY
The semi-distributed (HEC-HMS) model and ANN were developed for daily discharge simulation. For the HEC-HMS model, the datasets used were gridded rainfall, observed discharge, FAO soil map and LULC map. A hybrid model is developed by taking the output from HEC-HMS as an additional input into ANN. The flow chart of the methodology is shown in Figure 4.
HEC-HMS
In 1998, the USACE created a public domain and user-friendly version of HEC-1 system referred to as HEC-HMS. HEC-HMS is most commonly used for hydrological simulations, such as flood frequency analysis, flood alarm system design and stream flow modelling, reservoir spillway capacity evaluation, and so on. The HEC-HMS offers a wide range of methods and modules for estimating runoff from precipitation, for example, loss models, transform models, routing methods, base flow estimation methods etc.
The HEC-GeoHMS extension of ArcGIS is used as a pre-processing tool for HEC-HMS model input data preparation. Hydrologic inputs for HEC-HMS require many data sets, which can be imported from HEC-GeoHMS.
ANN
@An ANN can link inputs to outputs without specifying the physical mechanism. ANNs are extensively used in the field of hydrology, especially in rainfall-runoff modelling, water quality modelling etc. (ASCE Committee 2000). ANN is a network of nodes with connection weights. It can be structured in multiple layers that are interconnected by neurons (nodes). In the network of multiple layers, the first is the input layer, the last is the output layer and in the layers in between are the hidden layers. Typical feed-forward schemes with three ANN layers is shown in Figure 5. These weights can change when the network is trained to learn a specific series.
Based on the layer in which the node is positioned, the inputs to that node could arrive from either system variables or outputs of an alternative node. These inputs are the ‘I = (I1, I2, I3)’ vector input. The weights arrangement that leads up to the node is Si = (wi1, wi2, wi3), where Si represents the weight of association between the ith node of the hidden layer and input variables.
The sigmoid function fits the curve in an ‘S’ shape, which gives a non-linear response. The network will map non-linear processes using this function. All these individual nodes connection create an artificial neural network. The Tansig and Purelin activation function is adopted for the input and output layers respectively.
HYBRID MODEL (HEC-HMS-ANN)
In the present study, the simulated discharge of the semi-distributed model developed using HEC-HMS, and its lags are added as an extra input to the ANN model to improve the accuracy. The required lag is determined using partial autocorrelation. The semi-distributed model is developed considering the variability in soil, land use etc. Hence, using a hybrid model can enhance the accuracy of standalone models.
MODEL PERFORMANCE EVALUATION


RESULTS & DISCUSSION
HEC-HMS
For discharge simulation, a semi-distributed model using HEC-HMS is developed. The data preprocessing is done using the HEC-GeoHMS extension in ArcMap and is imported into HEC-HMS. The delineation of stream network and sub-basin is done in ArcGIS from SRTM-DEM, the physical characteristic of the stream and sub-basin is extracted. Physical characteristic for a stream includes length upstream and downstream elevation and slope, and the sub-basin longest flow path, centroidal flow length and slopes. All this information is extracted from the terrain data. The results acquired after the extraction of sub-basin and stream characteristics are the catchment area of each sub-basin, the slope of each sub-basin, flow length, which is used to calculate the time of concentration. The cross-section profiles of the reaches are extracted from the SRTM DEM using ArcMap. The details of sub-basins, reaches and junctions in the study area are shown in Figure 6.
In this study the Soil Conservation Service (SCS) CN method is used to compute the infiltration losses. From the soil type and LULC class, the CN is computed. The spatial map of the CN grid is shown in Figure 7. CN value of the basin ranges from 30 to 100 (Figure 7). SCS UH (unit hydrograph) method is used to transform excess precipitation into runoff. From the discharge data of 25 years, 60% is used for calibration (1986–2000) and 40% is used for validation (2001–2010) (Wang et al. 2016). The model parameters are calibrated by trial and error. The simulated results during the calibration and validation are shown in Figures 8 and 9, respectively. The efficiency of the model is assessed using statistical parameters, namely NSE, RMSE and R2 and found to be 0.39, 0.8 m3/sec and 0.40 for calibration and 0.16, 0.9 m3/sec and 0.30 for validation, respectively. The performance of the semi-distributed model is not found satisfactory.
Calibration of HEC-HMS model during the period 1986–2000 (a) time series plot and (b) scatter plot.
Calibration of HEC-HMS model during the period 1986–2000 (a) time series plot and (b) scatter plot.
Validation of HEC-HMS model for the period 2001–2010 (a) time series plot and (b) scatter plot.
Validation of HEC-HMS model for the period 2001–2010 (a) time series plot and (b) scatter plot.
ANN
For determining the required number of lags for ANN, a partial autocorrelation function (PACF) is used. The PACF of precipitation (P) and the observed discharge (Q) is shown in Figure 10. From Figure 10, it is clear that the lag-1 series shows a better correlation with the current time step than the other lag series.
In the neural network modelling, out of 9,128 data points (1986–2010), 70% of the data (6,390) is used for training and 30% of the data (2,738) is used for testing (Reddy et al. 2021). The ANN model uses precipitation (Pt), its lag-1 (Pt-1) and lag-1 of observed discharge (Qt-1) to simulate the discharge of the current time step (Qt). The correlation of predictor variables with the target variable is tabulated in Table 2. The neural network training and validation of the ANN model is developed with the selected input variables. The model's efficiency is evaluated with the statistical parameters such as NSE RMSE and PBIAS for training as well as testing period, and the results are tabulated in Table 3. The graphical plot of the training and testing period of the ANN model and the scatter plots is shown in Figure 11.
Correlation of predictor variables with the target variable
. | . | Pt . | Pt-1 . | Qt-1 . |
---|---|---|---|---|
Training | Qt | 0.44 | 0.49 | 0.88 |
Testing | 0.48 | 0.43 | 0.84 |
. | . | Pt . | Pt-1 . | Qt-1 . |
---|---|---|---|---|
Training | Qt | 0.44 | 0.49 | 0.88 |
Testing | 0.48 | 0.43 | 0.84 |
Training and testing results of ANN model for simulation of discharge (Qt)
Input variable combination . | TYPE . | phase . | Neuron . | PBIAS (%) . | NSE . | RMSE (m3/sec) . |
---|---|---|---|---|---|---|
Pt, Pt-1 and Qt-1 | ANN | Training | 5 | −4.65 | 0.90 | 0.0105 |
Testing | −9.99 | 0.75 | 0.0088 |
Input variable combination . | TYPE . | phase . | Neuron . | PBIAS (%) . | NSE . | RMSE (m3/sec) . |
---|---|---|---|---|---|---|
Pt, Pt-1 and Qt-1 | ANN | Training | 5 | −4.65 | 0.90 | 0.0105 |
Testing | −9.99 | 0.75 | 0.0088 |
ANN model (a) time series plot training and testing period, (b) scatter plot for training and (c) scatter plot for testing.
ANN model (a) time series plot training and testing period, (b) scatter plot for training and (c) scatter plot for testing.
HYBRID MODEL
The HEC-HMS model did not perform satisfactorily in simulating the discharge of the Kallada river basin. As a result, a model was developed by merging the output of the HEC-HMS model with the ANN model to increase the model's accuracy. For determining the required number of lags for the hybrid model, PACF is used. The PACF plot of HEC-HMS simulated discharge (QHEC) is shown in Figure 12.
In the hybrid model, the simulated discharge of HEC-HMS (QHEC(t)) and its lag-1 (QHEC(t-1) along with other inputs (Pt, Pt-1, Qt-1) are used in simulating Qt. The correlation of predictor variables with the target variable is tabulated in Table 4. The model's efficiency is evaluated with the statistical parameters such as NSE, RMSE and PBIAS for the training as well as testing period and are tabulated in Table 5.
Correlation of predictor variables with the target variable
. | . | Pt . | Pt-1 . | QHEC(t) . | QHEC(t-1) . | Qt-1 . |
---|---|---|---|---|---|---|
Training | Qt | 0.44 | 0.49 | 0.64 | 0.64 | 0.88 |
Testing | 0.48 | 0.43 | 0.59 | 0.53 | 0.84 |
. | . | Pt . | Pt-1 . | QHEC(t) . | QHEC(t-1) . | Qt-1 . |
---|---|---|---|---|---|---|
Training | Qt | 0.44 | 0.49 | 0.64 | 0.64 | 0.88 |
Testing | 0.48 | 0.43 | 0.59 | 0.53 | 0.84 |
Training and testing results of hybrid model for simulation of discharge (Qt)
Input variable combination . | Type . | phase . | Neuron . | PBIAS (%) . | NSE . | RMSE (m3/sec) . |
---|---|---|---|---|---|---|
Pt, Pt-1, Qt-1, QHEC(t) and QHEC(t-1) | Hybrid | Training | 5 | 0.31 | 0.94 | 0.0084 |
Testing | −4.32 | 0.77 | 0.0082 |
Input variable combination . | Type . | phase . | Neuron . | PBIAS (%) . | NSE . | RMSE (m3/sec) . |
---|---|---|---|---|---|---|
Pt, Pt-1, Qt-1, QHEC(t) and QHEC(t-1) | Hybrid | Training | 5 | 0.31 | 0.94 | 0.0084 |
Testing | −4.32 | 0.77 | 0.0082 |
The graphical plot of simulated discharge for the training and testing period and the scatter plots is shown in Figure 13. From Tables 3 and 5, Figures 11 and 13, it is clear that the hybrid model is performing better compared to individual ANN and HEC-HMS models. The simulated discharge of the HEC-HMS model is obtained based on the characteristics of the watershed area. Hence the performance of the hybrid model is enhanced.
Hybrid model (a) time series plot training and testing period, (b) scatter plot for training and (c) scatter plot for testing.
Hybrid model (a) time series plot training and testing period, (b) scatter plot for training and (c) scatter plot for testing.
ACCURACY ASSESSMENT OF PEAK VALUES
The simulated yearly peak discharge values of all the 3 models (i.e., HEC-HMS, ANN and Hybrid) are compared with the observed peak discharge values and tabulated the performance of models in Table 6. From Table 6, it is evident that the integration of the traditional semi-distributed model with the ANN model has improved the accuracy in simulating yearly peak daily discharge.
Yearly peak discharge values accuracy assessment
. | Statistical performance metrics . | ||
---|---|---|---|
Model . | PBIAS (%) . | NSE . | RMSE (m3/sec) . |
HEC-HMS | 47.79 | 0.26 | 373.96 |
ANN | 24.30 | 0.85 | 169.10 |
Hybrid | 16.73 | 0.92 | 124.37 |
. | Statistical performance metrics . | ||
---|---|---|---|
Model . | PBIAS (%) . | NSE . | RMSE (m3/sec) . |
HEC-HMS | 47.79 | 0.26 | 373.96 |
ANN | 24.30 | 0.85 | 169.10 |
Hybrid | 16.73 | 0.92 | 124.37 |
CONCLUSIONS
In the current study, the Kallada river basin discharge simulation is modelled using a semi-distributed model (HEC-HMS), ANN, and a hybrid model. The HEC-HMS model is developed using soil, land use and climatic parameters. This model was calibrated and validated from 1986 to 2000 and 2001 to 2010. The model's efficiency is measured using statistical metrics such as NSE, RMSE and R2. For the ANN model, the chosen input variables are Pt, Pt-1 and Qt-1 for simulating the discharge of the current time step Qt. For the hybrid model, the simulated discharge of HEC-HMS (QHEC) and its lag-1 series are fed as extra input variables into the ANN model.
The results of the study are summarised as follows:
- 1.
The statistical metrics such as NSE, RMSE and R2 for the HEC-HMS model are 0.39, 0.8 m3/sec, 0.40 for the calibration period, and 0.16, 0.9 m3/sec, 0.30 for the validation period, respectively. The results of the HEC-HMS model is not satisfactory.
- 2.
The performance of the hybrid model (HEC-HMS-ANN) is shown better performance over ANN and semi-distributed models in simulating daily discharge and also in the yearly peak simulation.
- 3.
The developed hybrid model can be used to assess the impact of LULC changes on discharge in the basin.
ACKNOWLEDGEMENTS
The authors would like to acknowledge the anonymous reviewers, associate editor and editor for their insightful comments and suggestions.
AUTHOR CONTRIBUTIONS
Beeram Satya Narayana Reddy: Formal assessment, conceptualization, data collection, framework of methodology, model development, software, initial draft writing, review and editing.
S K Pramada: Formal assessment, conceptualization, methodology, resources, supervision and validation, review and editing
STATEMENTS AND DECLARATIONS
The authors have no competing interests to declare that are relevant to the content of this article.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories. Precipitation data (https://www.imdpune.gov.in/Clim_Pred_LRF_New/Grided_Data_Download.html); Streamflow data (https://indiawris.gov.in/wris/#/RiverMonitoring); FAO soil map (https://data.review.fao.org/map/catalog/srv/api/records/446ed430-8383-11db-b9b2-000d939bc5d8); LULC map (https://daac.ornl.gov/get_data).
CONFLICT OF INTEREST
The authors declare there is no conflict.