ABSTRACT
Developing accurate flood forecasting models is necessary for flood control, water resources and management in the Mahanadi River Basin. In this study, convolutional neural network (CNN) is integrated with random forest (RF) and support vector regression (SVR) for making a hybrid model (CNN–RF and CNN–SVR) where CNN is used as feature extraction technique while RF and SVR are used as forecasting models. These hybrid models are compared with RF, SVR, and artificial neural network (ANN). The influence of training–testing data division on the performance of hybrid models has been tested. Hyperparameter sensitivity analyses are performed for forecasting models to select the best value of hyperparameters and to exclude the nonsensitive hyperparameters. Two hydrological stations (Kantamal and Kesinga) are selected as case studies. Results indicated that CNN–RF model performs better than other models for both stations. In addition, it is found that CNN has improved the accuracy of RF and SVR models for flood forecasting. The results of the training–testing division show that both models’ performance is better at 50–50% data division. Validation results show that both models are not overfitting or underfitting. Results demonstrate that CNN–RF model can be used as a potential model for flood forecasting in river basins.
HIGHLIGHTS
CNN-based hybrid machine learning models are developed for flood forecasting and are compared with other ML models.
The impact of training–testing is analyzed and is selected for the best performance of models.
Four input models are tested for better input combinations for flood forecasting.
Sensitivity analysis of developed models is done for identifying the sensitive, insensitive and most sensitive parameters of models.
ABBREVIATIONS
- AI
artificial intelligence
- ALLSSA
AntiLeakage Least-Squares Spectral Analysis
- ANFIS
adaptive neuro-fuzzy inference system
- ANN
artificial neural network
- ARIMA
Auto-Regressive Integrated Moving Average
- ARIMAX
Auto-Regressive Integrated Moving Average with eXplanatory variable
- C
regularization parameter
- CNN
convolutional neural network
- DEM
digital elevation map
- DL
deep learning
- L
water level
- LSTM
long-short-term memory
- MAE
mean absolute error
- MCM
million cubic meters
- ML
machine learning
- NASA
National Aeronautics and Space Administration
- NSE
Nash–Sutcliffe Efficiency
- PCS
projected coordinate system
- Q
discharge
- R
rainfall
- RBF
radial basis function
- R2
coefficient of determination
- RF
Random Forest
- RMSE
root mean square error
- SARIMAX
Seasonal Auto-Regressive Integrated Moving Average with eXogenous factor
- SRTM
Shuttle Radar Topography Mission
- SVM
support vector machine
- SVR
support vector regression
- WRIS
Water Resources Information System
INTRODUCTION
Floods are the most happening natural hazards which cause serious damage to the environment, human life, and livelihood (Khosravi et al. 2018). The damages caused by disasters like flooding are incalculable. Due to these tremendous and irreversible damages, it becomes extremely important to forecast the floods to reduce the flood risk and properly plan and manage the water resources systems (Kant et al. 2013; Sarker 2023; Sarker et al. 2023). Flow in rivers is nonlinear and affected by catchment characteristics, rainfall and climate conditions (Le et al. 2019). Therefore, more accurate and reliable models are required for flood forecasting. There are two types of models for forecasting, which are physically based hydrological models and data-driven models. Artificial intelligence (AI)-based machine learning models are data-driven models which create statistical relationships between input and output. These machine learning models have the ability to capture the nonlinearity form of time series and have made a lot of progress in research, especially in hydrology (Mosavi et al. 2018). The AI technique-based machine learning models have become popular in hydrology and water resources in recent years as they can analyse large-scale data and long-time series (Wang et al. 2009). Several machine learning models such as artificial neural networks (ANNs), support vector machine (SVM), adaptive neural-based fuzzy inference systems, decision trees and other regression-based models have been used for time series modelling in hydrology and water resources systems. The ANN is based on understanding the brain and nervous system and has been used in hydrology since the 1990s. The ANN model is used in conjunction with the flash flood routing model for forecasting the longitudinal stage profiles in rivers for flash floods (Hsu et al. 2010). Rezaeian-Zadeh et al. (2013) used ANN for monthly flood flow forecasting in arid and semi-arid regions. Neural networks based on bootstrap techniques have been used to quantify the parametric uncertainty involved in forecasting (Tiwari & Chatterjee 2010). The ANN model has worked better than the ARMA model for short-term rainfall prediction for real-time flood forecasting (Toth et al. 2000). A coupled wavelet-transformed neural network is developed for flood forecasting in non-perennial rivers in semi-arid watersheds (Adamowski & Sun 2010). The ANN has been useful for forecasting and analysis in hydrology and water resources (Campolo et al. 1999; Chao et al. 2008).
Because of SVM's nonlinear regression and time series forecasting, it has been used much in hydrology and water resources engineering for forecasting. Lin et al. (2006) used SVM for long-term discharge forecasting and found that it had great potential for prediction of long-term discharge in comparison to ANN and auto-regressive moving average models. The SVM and random forest (RF) are also integrated with Google Earth Engine and have been utilized for lake and river monitoring and rainfall forecasting (Yu et al. 2017; Dehkordi et al. 2022). SVM and ANN are compared for flood forecasting with evolutionary strategy as an optimization technique for parameter optimization and it was concluded that SVM performs better than ANN (Bafitlhile & Li 2019). Nayak & Ghosh (2013) predicted the rainfall events using SVM classifiers and concluded that model predicted all the events in advance and was better in terms of false alarm and prediction. The performances of Auto-Regressive Integrated Moving Average (ARIMA), Auto-Regressive Integrated Moving Average with eXplanatory variable (ARIMAX), and Seasonal Auto-Regressive Integrated Moving Average with eXogenous factor (SARIMAX) models have been compared with the AntiLeakage Least-Squares Spectral Analysis (ALLSSA) for forecasting in Italian regions and it was shown that ALLSSA has a great potential for forecasting as it considers the seasonal and trend components (Ghaderpour et al. 2023). Gizaw & Gan (2016) analysed the performance of SVM for flood frequency analysis under historical and future climates and stated the SVM performed well based on goodness-of-fit. The support vector regression (SVR) is applied for regional flood frequency analysis in arid and semi-arid regions and compared with ANN, Adaptive Neuro-Fuzzy Inference System (ANFIS), and NLR (Sharifi Garmdareh et al. 2018). Results have indicated that SVR and ANFIS give better results than ANN and NLR for predicting peak flood discharge. A genetic algorithm-based SVM model is developed for predicting the monthly reservoir storage (Su et al. 2014). Wang et al. (2013) have employed SVM in conjunction with particle swarm optimization for improving the rainfall-runoff modelling. RF is the combination of classification and regression trees that overcomes the issues of overfitting a single decision tree (Breiman et al. 1984). The RF is a popular model due to its prediction capacity and processing speed (Mosavi et al. 2018). The RF has been used for the simulation of large-scale discharge (Schoppa et al. 2020). Tang et al. (2020) have used the hybrid RF based on flood hydrograph generalization for flood forecasting. RF based on flood hydrograph generalization was evaluated for flood forecasting (Tang et al. 2020). Wang et al. (2015) developed the flood hazard risk assessment model based on RF. Muñoz et al. (2018) developed a stepwise methodology for flash flood forecasting based on RF. Ali et al. (2020) used the hybridized RF for monthly rainfall forecasting. The RF model has the highest accuracy as compared to other machine learning models for the classification of snow cover area variation (Gogineni & Chintalacheruvu 2023). RF is compared with SVM for real-time radar-derived rainfall forecasting and it was concluded that RF outperforms SVM (Yu et al. 2017).
Ding et al. (2019, 2020), Le et al. (2019), Roy et al. (2022) and Yan et al. (2021) used long-short-term memory (LSTM) as a prediction model for flood forecasting. Cai & Yu (2022) have used a hybrid-based Recurrent Neural Network (RNN) model for flood forecasting in urban reservoirs. There are several other studies which state that deep learning (DL) techniques have great potential as a prediction model for flood forecasting. Fu et al. (2019) have used a hybrid model based on convolutional neural network (CNN) and LSTM for weather prediction. Shakir et al. (2022) have applied a CNN-based LSTM model for simulating groundwater levels. These studies show that hybrid DL models can improve the results. Tao et al. (2020) have highlighted that there is a need to extract highly correlated features for the development of hybrid ML models and DL has the advantage of utilizing the hidden layers for feature extraction. The DL models have great feature extraction capability. The CNN has been developed and used as a feature extraction technique in various studies. Mostly it has been in image classification and segmentation. These techniques improve the performance of models especially when multiple input variables are used. There has not been much study about combining DL models as feature extraction techniques with ML models as prediction models for flood forecasting.
The purpose of this study is to develop a CNN-based hybrid ML (CNN–RF and CNN–SVR) model for flood forecasting. This study includes the impact of training–testing data division on these hybrid models' performance. Multiple input variables and different combinations of these variables are tested for better input combinations for hybrid models. Sensitivity analysis of hybrid models is also included in this study. The performances of these models are evaluated by using the coefficient of correlation (R2), root mean square error (RMSE), mean absolute error (MAE) and Nash–Sutcliffe efficiency (NSE).
STUDY AREA
Figure 1 illustrates the cartographic representation of the study area, derived from Digital Elevation Model (DEM) data. The DEM serves as a geospatial dataset illustrating the elevation characteristics of the investigated region (Gao et al. 2022). The NASA SRTM Plus DEM is imported into ArcMap 10.3 followed by automatic extraction of the river's sub-watershed using the hydrology tool from the Spatial Analyst Toolbox in ArcMap 10.3. Subsequently, rectification and correction of the downloaded DEM are performed. Sinks in the DEM are filled using the ‘fill’ option using the hydrology tool. This is followed by the derivation of the flow direction and flow accumulation and the stream network is extracted (Sarker 2021). Subsequently, sub-watersheds are delineated, and stream order is calculated. The extracted stream network and delineated sub-watersheds are then reprojected to the projected coordinate system (PCS) of the regional projection, specifically WGS-1984, UTM zone 44° N.
Dataset
Daily variation in discharge for (a) Kantamal station and (b) Kesinga station.
Statistical description of discharge for Kantamal and Kesinga stations
. | Kantamal (m3/s) . | Kesinga (m3/s) . |
---|---|---|
Mean | 382.75 | 261.44 |
Std | 1,007.94 | 692.02 |
Min | 0.25 | 0.24 |
Max | 20,000 | 21,192.80 |
Q50 | 114.96 | 94.15 |
Q75 | 283.00 | 216.88 |
. | Kantamal (m3/s) . | Kesinga (m3/s) . |
---|---|---|
Mean | 382.75 | 261.44 |
Std | 1,007.94 | 692.02 |
Min | 0.25 | 0.24 |
Max | 20,000 | 21,192.80 |
Q50 | 114.96 | 94.15 |
Q75 | 283.00 | 216.88 |
Q50 and Q75 are the 50th and 75th percentile, respectively.
Decomposed graph of discharge data for (a) Kantamal station and (b) Kesinga station.
Decomposed graph of discharge data for (a) Kantamal station and (b) Kesinga station.
METHODOLOGY
In this study, two CNN-based hybrid models (CNN–RF and CNN–SVR) are applied for flood forecasting and compared with simple RF, SVR, and ANN. In this hybrid model, CNN is used as a feature extraction technique while RF and SVR are used for the prediction of discharge. The brief descriptions of these models are as follows:
Support vector machine











Random Forest
Artificial neural network
Convolutional neural network



Model evaluation methods
Model performance is evaluated by comparing the model output and observed data using evaluation methods. These evaluation methods determine the efficiency of the model. In this study, RMSE, coefficient of determination (R2), mean square error and NSE are used for model evaluation. The coefficient of determination (R2) measures the correlation between observed and predicted values (Adamowski & Sun 2010). The NSE has the ability to measure the relation of observed and predicted values that are different from the mean. The MAE measures goodness-of-fit that is related to moderate flows, while the RMSE is related to high flow values (Kisi 2010; Ghaderpour et al. 2023).
Root mean square error
Coefficient of determination (R2)
Mean absolute error
Nash–Sutcliffe Efficiency


Model development
The scaled data are processed to the CNN model for feature extraction. The output of the CNN model is taken as input to machine learning models for forecasting. Models are tested with various training–testing data divisions. Then models are tested for four input models to get a better input combination for the model's performance. Sensitivity analysis is done for CNN-based models to analyse the effective and non-effective hyperparameters of models and to get the optimum values of effective parameters. Non-effective hyperparameters are eliminated and optimum values of effective hyperparameters are set for best performance of the models. After deciding the model's parameters, models are trained on training data and the performance is tested with a test dataset. For evaluating the model's performance, R2, RMSE, MAE and NSE are used as evaluation parameters.
RESULTS AND DISCUSSION
Input combination results for the Kantamal station
. | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|
. | R2 . | RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . |
CNN–RF | ||||||||
Input 1 | 0.92 | 337.00 | 96.51 | 0.90 | 0.37 | 629.36 | 152.40 | 0.37 |
Input 2 | 0.95 | 265.55 | 79.95 | 0.94 | 0.57 | 517.49 | 109.22 | 0.62 |
Input 3 | 0.90 | 370.86 | 170.41 | 0.85 | 0.87 | 286.47 | 166.27 | 0.79 |
Input 4 | 0.93 | 312.83 | 133.85 | 0.90 | 0.90 | 248.00 | 129.62 | 0.86 |
CNN–SVR | ||||||||
Input 1 | 0.59 | 758.06 | 143.92 | −0.17 | 0.60 | 499.57 | 108.58 | 0.07 |
Input 2 | 0.75 | 597.33 | 116.21 | 0.64 | 0.61 | 494.75 | 108.23 | 0.58 |
Input 3 | 0.98 | 166.63 | 53.08 | 0.98 | 0.88 | 273.71 | 55.66 | 0.88 |
Input 4 | 0.99 | 130.67 | 42.11 | 0.99 | 0.91 | 234.34 | 57.23 | 0.91 |
RF | ||||||||
Input 1 | 0.71 | 637.98 | 173.89 | 0.54 | 0.56 | 508.20 | 158.73 | 0.59 |
Input 2 | 0.76 | 580.65 | 195.85 | 0.53 | 0.64 | 474.61 | 183.33 | 0.46 |
Input 3 | 0.93 | 309.38 | 184.61 | 0.92 | 0.85 | 310.19 | 193.39 | 0.79 |
Input 4 | 0.92 | 336.12 | 141.92 | 0.88 | 0.89 | 265.62 | 134.64 | 0.83 |
SVR | ||||||||
Input 1 | 0.64 | 711.22 | 138.17 | 0.32 | 0.59 | 504.79 | 109.72 | 0.16 |
Input 2 | 0.78 | 552.19 | 105.85 | 0.70 | 0.53 | 540.95 | 111.36 | 0.60 |
Input 3 | 0.98 | 180.78 | 64.59 | 0.98 | 0.83 | 326.14 | 67.42 | 0.83 |
Input 4 | 0.99 | 102.67 | 34.31 | 0.99 | 0.89 | 262.62 | 63.93 | 0.89 |
ANN | ||||||||
Input 1 | 0.63 | 718.78 | 167.50 | 0.42 | 0.63 | 484.02 | 128.97 | 0.50 |
Input 2 | 0.70 | 652.45 | 169.41 | 0.53 | 0.65 | 471.35 | 138.91 | 0.55 |
Input 3 | 0.80 | 524.71 | 161.98 | 0.74 | 0.82 | 340.29 | 123.96 | 0.80 |
Input 4 | 0.87 | 432.55 | 150.21 | 0.84 | 0.85 | 311.68 | 131.36 | 0.85 |
. | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|
. | R2 . | RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . |
CNN–RF | ||||||||
Input 1 | 0.92 | 337.00 | 96.51 | 0.90 | 0.37 | 629.36 | 152.40 | 0.37 |
Input 2 | 0.95 | 265.55 | 79.95 | 0.94 | 0.57 | 517.49 | 109.22 | 0.62 |
Input 3 | 0.90 | 370.86 | 170.41 | 0.85 | 0.87 | 286.47 | 166.27 | 0.79 |
Input 4 | 0.93 | 312.83 | 133.85 | 0.90 | 0.90 | 248.00 | 129.62 | 0.86 |
CNN–SVR | ||||||||
Input 1 | 0.59 | 758.06 | 143.92 | −0.17 | 0.60 | 499.57 | 108.58 | 0.07 |
Input 2 | 0.75 | 597.33 | 116.21 | 0.64 | 0.61 | 494.75 | 108.23 | 0.58 |
Input 3 | 0.98 | 166.63 | 53.08 | 0.98 | 0.88 | 273.71 | 55.66 | 0.88 |
Input 4 | 0.99 | 130.67 | 42.11 | 0.99 | 0.91 | 234.34 | 57.23 | 0.91 |
RF | ||||||||
Input 1 | 0.71 | 637.98 | 173.89 | 0.54 | 0.56 | 508.20 | 158.73 | 0.59 |
Input 2 | 0.76 | 580.65 | 195.85 | 0.53 | 0.64 | 474.61 | 183.33 | 0.46 |
Input 3 | 0.93 | 309.38 | 184.61 | 0.92 | 0.85 | 310.19 | 193.39 | 0.79 |
Input 4 | 0.92 | 336.12 | 141.92 | 0.88 | 0.89 | 265.62 | 134.64 | 0.83 |
SVR | ||||||||
Input 1 | 0.64 | 711.22 | 138.17 | 0.32 | 0.59 | 504.79 | 109.72 | 0.16 |
Input 2 | 0.78 | 552.19 | 105.85 | 0.70 | 0.53 | 540.95 | 111.36 | 0.60 |
Input 3 | 0.98 | 180.78 | 64.59 | 0.98 | 0.83 | 326.14 | 67.42 | 0.83 |
Input 4 | 0.99 | 102.67 | 34.31 | 0.99 | 0.89 | 262.62 | 63.93 | 0.89 |
ANN | ||||||||
Input 1 | 0.63 | 718.78 | 167.50 | 0.42 | 0.63 | 484.02 | 128.97 | 0.50 |
Input 2 | 0.70 | 652.45 | 169.41 | 0.53 | 0.65 | 471.35 | 138.91 | 0.55 |
Input 3 | 0.80 | 524.71 | 161.98 | 0.74 | 0.82 | 340.29 | 123.96 | 0.80 |
Input 4 | 0.87 | 432.55 | 150.21 | 0.84 | 0.85 | 311.68 | 131.36 | 0.85 |
The bold values are the best input combinations to the models.
Input combination results for the Kesinga station
. | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|
R2 . | RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . | |
CNN–RF | ||||||||
Input 1 | 0.87 | 321.19 | 78.38 | 0.80 | −0.04 | 474.53 | 114.44 | 0.24 |
Input 2 | 0.92 | 249.56 | 0.67 | 0.88 | 0.44 | 348.96 | 88.10 | 0.53 |
Input 3 | 0.94 | 206.94 | 80.03 | 0.92 | 0.88 | 161.56 | 78.74 | 0.86 |
Input 4 | 0.98 | 113.58 | 30.54 | 0.98 | 0.90 | 146.65 | 41.12 | 0.92 |
CNN–SVR | ||||||||
Input 1 | 0.39 | 672.47 | 112.71 | −1.74 | 0.53 | 319.50 | 77.13 | −0.30 |
Input 2 | 0.52 | 594.50 | 105.10 | −0.80 | 0.61 | 290.62 | 73.21 | 0.17 |
Input 3 | 0.67 | 495.20 | 84.04 | 0.01 | 0.88 | 162.58 | 50.29 | 0.82 |
Input 4 | 0.71 | 465.26 | 79.67 | 0.19 | 0.89 | 153.53 | 46.03 | 0.84 |
RF | ||||||||
Input 1 | 0.59 | 549.28 | 130.22 | 0.10 | 0.40 | 361.50 | 110.34 | 0.34 |
Input 2 | 0.77 | 411.19 | 128.87 | 0.60 | 0.41 | 357.78 | 118.99 | 0.40 |
Input 3 | 0.93 | 221.43 | 118.84 | 0.91 | 0.81 | 203.62 | 126.01 | 0.78 |
Input 4 | 0.96 | 165.59 | 75.83 | 0.95 | 0.86 | 175.37 | 78.83 | 0.86 |
SVR | ||||||||
Input 1 | 0.42 | 653.59 | 112.88 | −0.88 | 0.56 | 310.22 | 78.29 | 0.21 |
Input 2 | 0.51 | 601.26 | 109.66 | −0.51 | 0.62 | 287.58 | 77.04 | 0.35 |
Input 3 | 0.73 | 445.72 | 81.60 | 0.35 | 0.86 | 175.15 | 56.62 | 0.80 |
Input 4 | 0.76 | 425.93 | 0.82 | 0.43 | 0.87 | 166.59 | 57.77 | 0.82 |
ANN | ||||||||
Input 1 | 0.42 | 655.84 | 128.38 | −0.42 | 0.58 | 303.25 | 92.85 | 0.40 |
Input 2 | 0.62 | 536.48 | 118.48 | 0.31 | 0.57 | 305.42 | 91.90 | 0.56 |
Input 3 | 0.77 | 415.11 | 112.58 | 0.67 | 0.80 | 206.33 | 69.64 | 0.87 |
Input 4 | 0.89 | 286.46 | 97.26 | 0.87 | 0.82 | 199.48 | 95.22 | 0.88 |
. | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|
R2 . | RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . | |
CNN–RF | ||||||||
Input 1 | 0.87 | 321.19 | 78.38 | 0.80 | −0.04 | 474.53 | 114.44 | 0.24 |
Input 2 | 0.92 | 249.56 | 0.67 | 0.88 | 0.44 | 348.96 | 88.10 | 0.53 |
Input 3 | 0.94 | 206.94 | 80.03 | 0.92 | 0.88 | 161.56 | 78.74 | 0.86 |
Input 4 | 0.98 | 113.58 | 30.54 | 0.98 | 0.90 | 146.65 | 41.12 | 0.92 |
CNN–SVR | ||||||||
Input 1 | 0.39 | 672.47 | 112.71 | −1.74 | 0.53 | 319.50 | 77.13 | −0.30 |
Input 2 | 0.52 | 594.50 | 105.10 | −0.80 | 0.61 | 290.62 | 73.21 | 0.17 |
Input 3 | 0.67 | 495.20 | 84.04 | 0.01 | 0.88 | 162.58 | 50.29 | 0.82 |
Input 4 | 0.71 | 465.26 | 79.67 | 0.19 | 0.89 | 153.53 | 46.03 | 0.84 |
RF | ||||||||
Input 1 | 0.59 | 549.28 | 130.22 | 0.10 | 0.40 | 361.50 | 110.34 | 0.34 |
Input 2 | 0.77 | 411.19 | 128.87 | 0.60 | 0.41 | 357.78 | 118.99 | 0.40 |
Input 3 | 0.93 | 221.43 | 118.84 | 0.91 | 0.81 | 203.62 | 126.01 | 0.78 |
Input 4 | 0.96 | 165.59 | 75.83 | 0.95 | 0.86 | 175.37 | 78.83 | 0.86 |
SVR | ||||||||
Input 1 | 0.42 | 653.59 | 112.88 | −0.88 | 0.56 | 310.22 | 78.29 | 0.21 |
Input 2 | 0.51 | 601.26 | 109.66 | −0.51 | 0.62 | 287.58 | 77.04 | 0.35 |
Input 3 | 0.73 | 445.72 | 81.60 | 0.35 | 0.86 | 175.15 | 56.62 | 0.80 |
Input 4 | 0.76 | 425.93 | 0.82 | 0.43 | 0.87 | 166.59 | 57.77 | 0.82 |
ANN | ||||||||
Input 1 | 0.42 | 655.84 | 128.38 | −0.42 | 0.58 | 303.25 | 92.85 | 0.40 |
Input 2 | 0.62 | 536.48 | 118.48 | 0.31 | 0.57 | 305.42 | 91.90 | 0.56 |
Input 3 | 0.77 | 415.11 | 112.58 | 0.67 | 0.80 | 206.33 | 69.64 | 0.87 |
Input 4 | 0.89 | 286.46 | 97.26 | 0.87 | 0.82 | 199.48 | 95.22 | 0.88 |
The bold values are the best input combinations to the models.
Final results of all models for Kantamal and Kesinga stations
. | Kantamal station . | |||||||
---|---|---|---|---|---|---|---|---|
Training . | Testing . | |||||||
R2 . | RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . | |
CNN–RF | 0.99 | 96.36 | 34.63 | 0.99 | 0.95 | 172.83 | 54.19 | 0.95 |
CNN–SVR | 0.98 | 150.83 | 48.75 | 0.98 | 0.92 | 226.26 | 56.31 | 0.92 |
RF | 0.97 | 179.13 | 99.30 | 0.97 | 0.93 | 215.09 | 109.60 | 0.91 |
SVR | 0.99 | 112.08 | 37.98 | 0.99 | 0.91 | 239.31 | 58.69 | 0.91 |
ANN | 0.87 | 422.00 | 133.61 | 0.85 | 0.86 | 298.48 | 108.58 | 0.86 |
. | Kesinga station . | |||||||
. | Training . | Testing . | ||||||
. | R2 . | RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . |
CNN–RF | 0.97 | 157.29 | 33.13 | 0.96 | 0.91 | 137.25 | 41.06 | 0.92 |
CNN–SVR | 0.95 | 195.71 | 33.13 | 0.93 | 0.91 | 143.49 | 44.64 | 0.87 |
RF | 0.96 | 173.92 | 76.95 | 0.95 | 0.88 | 160.50 | 77.56 | 0.88 |
SVR | 0.97 | 158.23 | 31.07 | 0.96 | 0.88 | 158.47 | 43.68 | 0.90 |
ANN | 0.90 | 265.71 | 81.19 | 0.89 | 0.84 | 186.35 | 72.99 | 0.89 |
. | Kantamal station . | |||||||
---|---|---|---|---|---|---|---|---|
Training . | Testing . | |||||||
R2 . | RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . | |
CNN–RF | 0.99 | 96.36 | 34.63 | 0.99 | 0.95 | 172.83 | 54.19 | 0.95 |
CNN–SVR | 0.98 | 150.83 | 48.75 | 0.98 | 0.92 | 226.26 | 56.31 | 0.92 |
RF | 0.97 | 179.13 | 99.30 | 0.97 | 0.93 | 215.09 | 109.60 | 0.91 |
SVR | 0.99 | 112.08 | 37.98 | 0.99 | 0.91 | 239.31 | 58.69 | 0.91 |
ANN | 0.87 | 422.00 | 133.61 | 0.85 | 0.86 | 298.48 | 108.58 | 0.86 |
. | Kesinga station . | |||||||
. | Training . | Testing . | ||||||
. | R2 . | RMSE . | MAE . | NSE . | R2 . | RMSE . | MAE . | NSE . |
CNN–RF | 0.97 | 157.29 | 33.13 | 0.96 | 0.91 | 137.25 | 41.06 | 0.92 |
CNN–SVR | 0.95 | 195.71 | 33.13 | 0.93 | 0.91 | 143.49 | 44.64 | 0.87 |
RF | 0.96 | 173.92 | 76.95 | 0.95 | 0.88 | 160.50 | 77.56 | 0.88 |
SVR | 0.97 | 158.23 | 31.07 | 0.96 | 0.88 | 158.47 | 43.68 | 0.90 |
ANN | 0.90 | 265.71 | 81.19 | 0.89 | 0.84 | 186.35 | 72.99 | 0.89 |
Bold values are the values of the best model's results.
Performance evaluation graphs for train test split for R2, NSE, RMSE and MAE at (a) Kantamal station and (b) Kesinga station.
Performance evaluation graphs for train test split for R2, NSE, RMSE and MAE at (a) Kantamal station and (b) Kesinga station.
Performance evaluation graphs for RF parameters in CNN–RF at the Kantamal station and the Kesinga station.
Performance evaluation graphs for RF parameters in CNN–RF at the Kantamal station and the Kesinga station.
Performance evaluation graphs for SVR parameters in CNN–SVR at the Kantamal station and the Kesinga station.
Performance evaluation graphs for SVR parameters in CNN–SVR at the Kantamal station and the Kesinga station.
Training and testing
There is no study about the division of data into training and testing which verifies any specific division of data into training and testing. The division has been varied in different studies. In this study, data is tested for different data division percentages for training and testing for hybrid models. From Figure 8, it can be seen that the results of CNN–RF are either better or the same by dividing data into 50–50% as compared to 60–40% division for both stations, while the results of CNN–SVR are slightly better with 60–40% division at Kantamal station in terms of RMSE and MAE only. However, the results of CNN–SVR are better with 50–50% division at Kesinga station. As most of the results are better with 50–50% division, it is chosen in this study as training–testing data division.
Input selection
This study consists of the forecasting based on multivariate modelling procedures. Two more variables, rainfall(R) and water level(L), are added with discharge (Q) for model development. Different input combinations, input 1 (Q), input 2 (R, Q), input 3 (L, Q) and input 4 (R, L, Q), are tested with all models to analyse the relationship of these variables as input for discharge forecasting. The accuracy of all models is low for input 1 as shown in Tables 2 and 3. With adding rainfall as input in input 2, accuracy is improved for some models while it was not affected for other models. Adding water level as input to discharge in input 3 improves the accuracy of all models to a high extent. Results are better when all three variables are used as input for forecasting. Adding rainfall and water level separately with discharge improves the accuracy of models shown in Tables 2 and 3; however, water level had more impact than rainfall on the model's accuracy. So, all variables together are more effective for model performance and were chosen for this study.
Model parameters sensitivity analysis
All ML models contain many parameters which derive the model's performance. These parameters derive the accuracy of models. Some parameters may be more sensitive than other parameters, and some parameters may be insensitive in a model's performance. To analyse all these types of parameters, sensitivity analyses are performed on CNN–RF and CNN–SVR.
The RF is mainly constructed with n-estimators, depth of tree, minimum sample split, maximum features, minimum sample leaf and criterion. The depth of the tree represents the longest path between the root node and the leaf node. N-estimator represents the number of trees in a model. Minimum sample split (Min_sample_split) represents the minimum number of observations required in a given node in order to split it. Maximum features (Max_features) are the maximum number of input features for each tree in a model. The minimum sample leaf (Min_sample_leaf) specifies the number of samples that should be present in the leaf node after splitting the node.
The SVR model's accuracy depends on the values of C, gamma and epsilon mainly. The values of these parameters depend on the data and change with different sets of datasets and their length. C is a regularization parameter which represents the trade-offs between the expected errors and the model's complexity. A higher value of C means a smaller margin of decision functions, and a lower value of C means a larger margin of decision functions. Gamma defines the distance of influence of a single training point. The low value of gamma means the distance is far, and the high value means the distance is close. Epsilon defines the range of prediction from support vectors up to which no penalty is applied in training loss.
The main parameters in CNN are the number of layers, the number of nodes in each layer, batch size and epochs. A number of hidden layers and a number of nodes in the hidden layer are set experimentally. Batch size represents the group of numbers of datasets. These groups are called batches and the model is trained on a sample of the same size of batches. Epochs are the number of training of the model on the whole dataset.
To find out the sensitive parameters, the accuracy of models is tested by changing one parameter value and keeping the other parameter value the same. The CNN model in both CNN–RF and CNN–SVR is developed with one input layer, two hidden layers, one flattening layer and one output layer. The model is tested with different numbers of hidden layers, and it is found that more hidden layers do not improve the model's accuracy. Also, changing the number of nodes in these layers is not improving the model. So, the hidden layers are chosen as two layers and the number of nodes is taken as 128 for both models for both stations.
RF has many parameters for model fitting such as n-estimator, minimum sample leaf, minimum sample split, maximum depth, and maximum features. In all these parameters, minimum sample split and minimum sample leaf have no role in the model's performance for both stations. The minimum value of the sample split is two, which is the default value for the model, and the minimum value for the sample leaf is two. The accuracy of the model increases with increasing the value of the n-estimator up to a certain value, and after that it becomes constant. As RF has a bootstrap technique, the accuracy of the model can vary with every run of the model for the same value of n-estimator, but this variation is very small. The performance of CNN–RF with an n-estimator of more than 100 is the same for Kantamal station while it is reduced for Kesinga station. So, the n-estimator value is taken as 100 in model development. As the model consists of three input variables, the maximum feature value goes up to 3. The range of maximum depth varies from 1 to 10 and it is found that after a value of 5, the results are the same. From Figure 9, it can be seen that maximum depth is more sensitive for CNN–RF than other parameters, followed by n-estimator and maximum features for both stations. The chosen values of the n-estimator, maximum features and maximum depth are 100, 3 and 10, respectively, for both models (Figure 9). The results are shown in the form of R2 and are found to be the same using other evaluation matrices such as RMSE, MAE and NSE.
The SVR consists of three important parameters, which are C, gamma and epsilon. Epsilon has no role in the model's performance. So, it can be taken as 0 or neglected. C has a large effect on model performance. Gamma is taken as 0.001 for C, and epsilon is 0 for testing both C and gamma. It can be seen from Figure 10 that at a lower value of C, accuracy is very low and increases with increasing the value of C. The accuracy of the model depends more on the C value than the gamma value, but gamma is also important to increase the accuracy of the model, as shown in Figure 10. So, both parameters are sensitive to the model's performance. A value of C of more than 10,000 is suitable for model performance for Kantamal and Kesinga stations. C is taken as 10,000 for both stations. Increasing the value of C further does not contribute to increasing the accuracy of the model, as shown in Figure 10. The value of C is taken as 25,000 for testing gamma. The model's accuracy increases with increasing the gamma value up to a certain value of gamma, and then it starts decreasing. The model's accuracy is high at a value of gamma equal to 0.1 for the Kantamal station and 0.05 for the Kesinga station. Only two parameters are sensitive and to be considered for this study. The results are shown in the form of R2 and it follows the same for RMSE, MAE, and NSE.
Forecasting results of developed models
The evaluation methods interpret the model's accuracy. The training and test results are good for all models. Results are acceptable for all models for both stations (Table 4). However, the results of CNN–RF are better than other models for training and testing for both stations in terms of all evaluation methods. The R2 values are 0.95, 0.92, 0.93, 0.91, and 0.86 for CNN–RF, CNN–SVR, RF, SVR and ANN, respectively, for Kantamal station. Similarly, the R2 values are 0.91, 0.91, 0.88, 0.88, and 0.84 for CNN–RF, CNN–SVR, RF, SVR, and ANN, respectively, for Kesinga station. A similar trend is followed for RMSE values for both stations. However, MAE results are better for CNN–SVR and SVR models than RF mode for both stations. It is concluded that the SVR model is more accurate for moderate flows while the RF model works on high flows. Also, it is clear that CNN improves the accuracy of models, as shown in Table 4. However, its impact as a feature extraction technique is more on the RF model as compared to the SVR model.
Scatter plot of observed and forecasted data at the Kantamal station.
Scatter plot of observed and forecasted data at the Kesinga station.
Plot of observed and forecasted data at the Kantamal station in left panel and observed minus forecasted data in the right panel.
Plot of observed and forecasted data at the Kantamal station in left panel and observed minus forecasted data in the right panel.
Plot of observed and forecasted data at the Kesinga station in left panel and observed minus forecasted data in the right panel.
Plot of observed and forecasted data at the Kesinga station in left panel and observed minus forecasted data in the right panel.
Significance, limitations and future scope
Accurate flood forecasting is crucial for meeting downstream demands, including agricultural, industrial, and drinking water needs. The present study highlights the effectiveness of hybrid machine learning models compared to simpler models, providing enhanced streamflow forecasts. The implications of this research extend to enhancing proper water resource planning and management, crucial for sustainable environmental protection and addressing the challenges posed by climate change. The developed models show great potential for forecasting. The CNN performs well as a feature extraction technique for improving the performance of RF and SVR models for forecasting. Three input features (rainfall, discharge, and water level) are used in this study. More input features can be examined for flood forecasting, such as temperature and evapotranspiration. The influence of training–testing data division on models is analysed to fix the division of data for this study. This is limited to this study because the division may be different for different input features and in different study areas. In this study, values of hyperparameters are selected based on the experiment by keeping other hyperparameters constant while selecting one. This process does not give the ideal combination of hyperparameters' values. Optimization techniques can be used to solve this problem and it can save some computational time. In future research, expand the present study by analysing historical land cover changes in the basin over the past three decades, investigating their relationship with streamflow patterns and water resource management. Additionally, we plan to explore the reciprocal impact between land use alterations and streamflow dynamics, emphasizing their role in environmental protection and climate change challenges. Integrating these aspects will enhance our understanding of streamflow forecasting factors and aid in proactive water resource planning for sustainable environmental management amidst evolving climate conditions.
This study investigates flood forecasting using hybrid models in the Mahanadi River Basin. This study's significance lies in its application for accurate streamflow forecasting, crucial for meeting downstream demands such as agricultural, industrial, and drinking water needs. This work highlights the effectiveness of hybrid machine learning models compared to simpler models, offering improved streamflow forecasts.
CONCLUSION
The present study has been conducted on flood forecasting using two hybrid machine learning models based on CNN, namely CNN–RF (combining CNN with RF) and CNN–SVR (combining CNN with SVR). The performance of the models has been assessed using statistical indicators including R2, RMSE, MAE, and NSE. The two hybrid models have been evaluated using various distinct data splits for training and testing. From the results, it can be concluded that a discernible trend has been observed: the CNN–SVR model shows higher performance with a 60:40 split for RMSE and MAE only at the Kantamal station. Conversely, the 50:50 split gives better performance of two models at both stations. Furthermore, the results of input data selection on the Mahanandi River basin indicate that among the four input models, the input 4 model (all three variables together) shows a greater influence on the model outputs. Sensitivity analyses of model parameters help in finding the effective parameters and their values for the model's best performance. Among the two models, CNN–RF shows higher performance in two stations compared to the CNN–SVR and other models. Therefore, CNN can be considered as a valuable feature extraction technique for flood forecasting, with the potential to enhance overall predictive results.
While this study encourages the use of developed models for other works, some aspects can be improved further. A 20-year dataset is sufficient for forecasting but more length of data may be useful for improving the hybrid models' results especially for peak flows. Optimization techniques can be used for these hybrid models for better performance and reducing time consumption for model development. Also, more variables like temperature and evaporation can be explored for model development. Future work will include the use of optimization techniques, more weather data, and the exploration of more ML models for further hybridization for flood forecasting.
ACKNOWLEDGMENT
We gratefully acknowledge the Central Water Commission of India and the Indian Meteorological Department of India for their support of the data.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.