ABSTRACT
Peak discharge is an essential element of hydrological forecasting. Due to rapid outbreaks of flash floods in hilly areas and the lack of measured data, the fast and accurate estimation of peak discharge is crucial for flash flood hazard management. Three machine learning algorithms were applied to estimate peak discharge; this estimation was compared with the results of hydrological–hydraulic models, and the results were verified with measured watershed data. In this paper, 10 hydrological and geomorphological parameters were selected to predict the flood peak discharge in 103 watersheds in Taiyi Mountain North District. The results show that the particle swarm optimization backpropagation (PSO-BP) neural network model outperforms the BP neural network and random forest regression in prediction performance. PSO-BP has a lower mean absolute error (2.51%), root mean square error (3.74%), and mean absolute percentage error (2.74%) than the other models, which indicates that PSO-BP has high prediction accuracy. Importance analysis revealed that rainfall, early impact rainfall, catchment area, and rain intensity are the key input parameters of PSO-BP. The proposed method was confirmed to be a fast and relatively accurate algorithm for estimating the peak discharge of flash floods in ungauged basins.
HIGHLIGHT
This paper applied three machine learning algorithms to estimate the peak discharge, comparing it with hydrological–hydraulic models and verifying the results by a watershed with measured data.
INTRODUCTION
Floods can cause serious harm to the environment, economy, infrastructure, humans, and animals (Karami et al. 2024). The frequent occurrence of extreme rainfall scenarios due to climate change in recent years has placed greater demands on the accuracy and immediacy of hydrological forecasting and warning (Zsoter et al. 2020). However, floods in hilly areas are characterized by short ephemeral times and heavy rainfall, and many factors that affect flooding via complex processes. Thus, the quick and accurate prediction of the characteristic parameters of floods remains an urgent scientific problem.
Existing flood peak discharge prediction methods mainly include mechanism-based hydrological–hydraulic methods and data-driven machine learning-based prediction methods. Hydrological–hydraulic methods include methods such as building hydrological models and reasoning equation methods, antecedent precipitation index (API) (Yao et al. 2019) methods, and flood peak modulus calculations. Among the hydrological models, the Soil and Water Assessment Tool (SWAT) model (Karami et al. 2024; Yalcin 2024), the Nedbør-Afstrømnings-Model (NAM) model (Sun et al. 2020; Parvaze et al. 2022), the Variable Infiltration Capacity model (Meresa et al. 2022), and the Xinanjiang model (Gui et al. 2024; Liu et al. 2024) have more often been applied in hydrological forecasting and have achieved better results in practice. However, the premise of building hydrological models is to have a long series of measured flow and water level data for rate validation, which is more suitable for watersheds that have measured data and are large in size. In addition, methods such as the inference formula (Li et al. 2016), API (Ye et al. 2013), and flood peak modulus (Liu et al. 2022) are more widely used in the calculation of flood peaks in ungauged basins. These methods are mainly based on the probabilistic distribution modelling method (Meresa et al. 2022) and empirical formulas for flood calculation, which are characterized by low computational efficiency, mostly rely on manual experience, and neglect the mechanism of hydrological processes. In contrast, the machine learning method, as a data-driven model, overcomes the conditional limitations of the modelling process and can directly explore the intrinsic relationship of the data to achieve efficient prediction. Backpropagation (BP; Han et al. 2022), PSO-BP (Zhang et al. 2017), random forest regression (RFR) (Zhao et al. 2022), SVM (Rahimzad et al. 2021), long short-term memory (LSTM) (Rasheed et al. 2022), and other machine learning methods have been shown by many scholars to have excellent performance in hydrological forecasting and warning, but they are mostly applied to large watersheds in areas with large catchment areas, and LSTM has mostly been applied to time-series flood field prediction (Zhang et al. 2022). It is difficult to apply this method to ungauged basins. Thus, this paper selects the BP, PSO-BP, and RFR models as machine learning models.
Most of the current literature on flood flow prediction has focused on watersheds with large watershed areas where measured data are available (Tsakiri et al. 2018; Bernard & Gregoretti 2021), and less research has been carried out in watersheds with small natural watershed areas in ungauged basins. Hydrological–hydraulic modelling, a mechanism-based computational approach, suffers from the disadvantage of computational inefficiency, while machine learning methods, as a data-driven approach, require a large number of samples for the model to cover as large a distribution of variables as possible. Therefore, the coupling of hydrological–hydraulic methods with machine learning methods involves the coupling of model mechanisms with intelligent algorithmic tools, which improves not only the superb learning ability of machine learning to achieve fast and accurate predictions but also the physical interpretability and applicability of the model. In ungauged basins, the combination of hydrological–hydraulic methods and machine learning methods, such as the calculation of flood-forming flow (Han et al. 2022), the calculation of critical rainfall (Zhao et al. 2022), and the prediction of watershed runoff (Hughes et al. 2023), which have achieved better results. The combination of hydrological–hydraulic methods and machine learning methods has become a new research hotspot in flood prediction and early warning, but the prediction of the peak discharge of natural floods in ungauged basins is still relatively weak.
The paper uses a hydrological–hydraulic method combined with machine learning methods to predict natural flood peak discharge prediction in mountain flood watersheds. This method has been applied to 103 watersheds in the Taiyi Mountain North District. This is one of the best choices for natural flood parameterization of watersheds in a hilly area because of the distribution of 103 watersheds in the Taiyi Mountain North District, which is located in the highest topography of Shandong Province, has a small watershed area, lacks reservoirs in the watersheds, and is basically unaffected by human activities. First, the hydrological–hydraulic method was used to calculate and obtain the flood peak discharge results of 103 small watersheds, which served as the input data for the BP neural network, PSO-BP neural network, and RFR models for the prediction of flood peak discharge in ungauged basins. Second, the hydrological–hydraulic method and the machine learning method were validated in selected watersheds where measurement information was available. Last, input variable importance calculations were carried out to discuss in depth the contributions of parameters affecting the flood peak discharge in ungauged basins and to provide a new tool for exploring the natural flooding process in ungauged basins.
The manuscript is organized as follows: in Section 2, the study area and data source are described; in Section 3, the methodologies, BP neural network, PSO-BP neural network, RFR prediction model, and evaluation criteria of the prediction model are introduced; in Section 4 and Section 5, the results and discussion are reported; and in Section 6, the conclusions are drawn.
STUDY AREA AND DATA SOURCE
Study area
A total of 103 small watersheds in the Taiyi Mountain North District of Shandong Province were chosen as the study area for the following reasons: (1) there are no small reservoirs in the watershed, which avoids the influence of small reservoir storage on the flooding process and (2) no rainfall or hydrological stations belong to the watershed, and no measurements are available.
Dataset
Watershed parameter extraction
The ArcSWAT tool was used to extract information about 103 watersheds from 12.5 m Digital Elevation Model data, including variables such as watershed area, watershed slope, longest river confluence path, and river gradient. The DEM data were obtained from the ALOS (Advanced Land Observing Satellite, launched in 2006), which utilizes a phased array L-band synthetic aperture radar (PALSAR) (https://search.asf.alaska.edu/#/). The land use data and subsurface soil data are derived from the Resources and Environmental Science Data Center (http://www.resdc.cn/).
Rainfall data extraction
The 103 watersheds had no actual rainfall data. The rainfall in the watersheds was calculated using the spatial interpolation of actual rainfall measurements from stations near the watersheds. The watershed characteristic value adopts the extracted centroid point data of the watershed. The rainfall data were obtained from the Shandong Hydrology Bureau.
The range of the 10 meteorological and geomorphological parameters for the 103 watersheds is shown in Table S1 and Figure S3.
METHODOLOGIES
Machine learning methods
BP neural network
The BP neural network, which was proposed by Rumelhart and McClelland, is a multilayer feedforward neural network based on an error BP algorithm (Ervine et al. 2000). The BP neural network has become one of the most well-liked neural network models because of its strong nonlinear mapping and its capacity to accurately approximate any function. The BP neural network reduces the output error by understanding the training data using the gradient descent algorithm, and it trains the bias and node weights of the network via error backpropagation (Figure S1(a))).
PSO-BP neural network
Particle swarm optimization (PSO) is primarily utilized in the modified BP neural network algorithm model to optimize initial weights and thresholds, which are subsequently substituted into the BP neural network model for training and prediction (Hosseini et al. 2016). The PSO method is used to optimize the weights and thresholds, and the BP neural network model with optimized parameters is used to estimate the peak discharge of the small watershed by inputting parameters in hilly areas with ungauged basins (Figure S1(b)).
RFR model
Hydrological–hydraulic modelling to calculate flood peak discharge
The surface rainfall of the watershed was calculated using the point-rainfall conversion coefficient based on the spatially interpolated rainfall data, as well as the rainfall time distribution. The effective precipitation was then calculated using the rainfall-runoff relationship curve, and the flood discharge within the confluence time was calculated using the instantaneous unit line. In this study, the flood peak discharges of 15 different early impact rainfall events at five return periods (5, 10, 20, 50 and 100 years) were calculated for 103 small watersheds. The calculated 7,210 sets of data served as the actual values for the machine learning prediction model; 7,010 sets were used as the training set, and 200 sets were used as the test set.
Evaluation criteria
The mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE) (Yan et al. 2023), and R2 (Ayus et al. 2023) were applied to evaluate the performance of the model.
RESULTS
Results of machine learning models
To verify the consistency and robustness of the BP neural network, the PSO-BP neural network, and the RFR model, Figure 2(b), 2(d), and 2(f) shows the scatter plots of the measured and simulated values for the 95% confidence scenario. The high overlap between the predicted trend line and the diagonal line (Figure 2(b)) proves that the predicted and actual values show agreement, indicating that the model has better prediction results. Figure 2(d) shows that the actual value of the test set and the anticipated value of the PSO-BP are dispersed around the diagonal line, proving that the prediction of the model is more accurate. The actual and predicted RFR values of the test set are dispersed along the diagonal line, as shown in Figure 2(f), demonstrating the broad applicability of the model. The correlation coefficients (r) of the BP neural network, the PSO-BP neural network, and RFR with the measured values are 0.99891, 0.99922, and 0.99114, respectively, again proving the high prediction accuracy of the three machine learning models, and the PSO-BP neural network has better robustness and consistency.
Comparative analysis of the results of different prediction models
Model verification
A watershed of a hilly area with measured peak discharge data was selected, and BP neural network, RFR, and PSO-BP neural network machine learning models and a hydrological–hydraulic model were used to calculate the peak discharge and compare it with the measured data to validate the model accuracy.
There are hydrological stations (Wohushan, Beifeng, Huangtaiqiao, and Gushan) within the selected watershed (Figure 1(b)). There should be no upstream reservoir as the selection criterion to guarantee the correctness of data verification, but this is challenging. Fortunately, a reservoir can be found in the upper reaches of the basin, which is located above the Beifeng Hydrological Station. We believe that the regulating and storage effects of the reservoir have little effect on the peak discharge at the station site when considering 100-year-return periods because of the peak-shaving and flood-regulating effects of the reservoir. Therefore, the once-in-100-year flood peak discharge of the Beifeng Hydrological Station was selected to verify the accuracy of the data.
Table 1 shows the comparison of the values of the measured, hydrological–hydraulic, BP neural network, RFR, and PSO-BP neural network models. The values calculated by the PSO-BP neural network model and the hydrological–hydraulic model are very similar, with a difference of 0.09%. Compared with those of the other two machine learning models, the predicted values of the PSO-BP neural network model are closer to the measured values, which once again confirm the excellent accuracy of the PSO-BP neural network model. The PSO-BP neural network model can be used for flood peak discharge prediction in hilly areas.
Model . | Value (m3/s) . | Relative error (%) . |
---|---|---|
Measured | 1,723.00 | – |
Hydrological–hydraulic model | 1,766.36 | 2.52 |
BP neural network model | 1,798.81 | 4.40 |
RFR model | 1,788.99 | 3.83 |
PSO-BP neural network model | 1,767.97 | 2.61 |
Model . | Value (m3/s) . | Relative error (%) . |
---|---|---|
Measured | 1,723.00 | – |
Hydrological–hydraulic model | 1,766.36 | 2.52 |
BP neural network model | 1,798.81 | 4.40 |
RFR model | 1,788.99 | 3.83 |
PSO-BP neural network model | 1,767.97 | 2.61 |
DISCUSSION
The performances of the BP, PSO-BP, and RFR models are further discussed. The BP neural network is adaptable to a large number of training samples, whereas the RFR model has superior application, primarily for small sample sets. The particle swarm algorithm has a significant advantage in parameter optimization, and based on the findings displayed in Figure 4, the PSO-BP has a prediction simulation accuracy that is superior to that of the BP neural network model (Zhang et al. 2017, 2023). The PSO-BP neural network algorithm addresses the issue that the gradient descent approach is prone to falling into local tiny values and significantly enhances the prediction performance of the entire model (Zhang et al. 2019). The advantage of the PSO-BP neural network algorithm over the RFR and BP neural network models is its excellent prediction accuracy. The experiments show that the PSO-BP neural network model is more suitable for predicting flood flow in hilly areas.
CONCLUSIONS
Predictions in ungauged basins are considered another milestone that has significant implications for the advancement of hydrology. An accurate forecast of flood peak discharge is crucial in the flood forecasting process. Currently, the primary indicator for forecasting flash floods is a rainfall warning. Although they are also significant indicators, discharge and water level warnings are rarely used. Discharge forecasting plays an important role as an intermediary between critical rainfall and hazardous water levels. However, it is challenging to calibrate and test hydrological models in hilly watersheds because of a lack of rainfall and hydrological data.
This paper compares the results of hydrological–hydraulic methods and three machine learning models for computationally predicting flood peak discharges and uses a watershed for data validation. The relative errors ranged from 2.52 to 4.40%, confirming the accuracy and suitability of the hydrological–hydraulic and machine learning models. Compared with hydrological–hydraulic models, machine learning models have the advantages of high modelling efficiency. They can deeply mine the relationships among the data, and the results are reliable. This is a way to solve the problem of predicting flood peak discharge in hilly areas with ungauged basins. The values of 0.99922 for r and of 0.99844 for R2 confirm the superiority of the PSO-BP neural network over the BP neural network and RFR models. The importance analysis revealed that rainfall, early impact rainfall, watershed area, and rainfall intensity are more important in calculating flood peak discharge. The innovation of these results lies in the parallel application of hydrological–hydraulic models and machine learning models, which is a combination of traditional computation and new tools. These models can be applied to future flood forecasting in hilly areas with ungauged basins. This outcome can help hydrologists and local administrators provide important guidance for flash flood warning and scheduling policies and aid in smart water network construction.
ACKNOWLEDGEMENTS
The authors wish to gratefully acknowledge the financial assistance from the Natural Science Foundation of Shandong Province (ZR2020ME249), the Natural Science Foundation of Shandong Province (NSFS) (No. ZR2020QE282), the National Natural Science Foundation of China (42301046), and the other anonymous reviewer whose comments significantly improved the quality of this paper.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.