ABSTRACT
Floods are becoming increasingly frequent and severe due to climate change and urbanization, thereby increasing risks to lives, property, and the environment. This necessitates the development of precise flood forecasting systems. This study addresses the critical task of predicting flood peak arrival times, which is essential for timely warnings and preparations, by introducing a comprehensive machine-learning framework. Our approach integrates interpretable feature engineering, individual model design, and novel model ensembles to enhance prediction accuracy. We extract informative features from historical flood flow and rainfall data, design a suite of machine-learning models, and develop a novel ensemble technique to combine model predictions. We conducted case studies on the Tunxi and Changhua basins in China. Numerical experiments reveal that our method significantly benefits from feature engineering and model ensembles, achieving mean absolute error (MAE) prediction errors of 1.524 h for Tunxi and 2.192 h for Changhua. These results notably outperform the best baseline method, which achieves MAE errors of 1.727 h for Tunxi and 2.737 h for Changhua.
HIGHLIGHTS
This paper identifies and formalizes an important flood forecasting task – predicting the arrival time of flood peak.
This paper develops a novel machine-learning prediction framework, built based on an interpretable feature engineering and model ensemble method.
This paper demonstrates the importance and effectiveness of feature engineering and ensemble methods in improving prediction accuracy.
INTRODUCTION
Floods are among the most frequent and impactful natural disasters. With their increasing frequency and intensity, driven by climate change and urbanization, they present significant risks to human life, property, and environmental health. Accurate flood monitoring and forecasting are imperative for fast reaction and early intervention, enabling timely evacuation plans and resource allocation to mitigate potential damages. Additionally, precise flood predictions empower decision-makers to implement proactive measures, such as deploying temporary flood barriers or issuing warnings to vulnerable communities, ultimately reducing the socioeconomic impacts of flooding events (Jia et al. 2018; Lee & Kim 2021; Li et al. 2021).
In flood prevention and mitigation, flood forecasting plays a pivotal role in shaping effective strategies (Badrzadeh et al. 2015; Yenming et al. 2016; Zhihui et al. 2023). Along this line of research, a category of works relies on conventional hydrological models (Deng & Li 2013; Han & Coulibaly 2017; Le et al. 2019; Xu et al. 2019; Zhao et al. 2019), such as the Xin'an Jiang model, which simulates standard flood processes using established physical principles. More recently, as the model-based forecasting methods heavily rely on human expertise and may be challenging to adapt to more general settings, a more popular and practical option is to develop data-driven models, i.e., building models upon extensive observations and parameterized frameworks (Garg et al. 2023; Rajab et al. 2023; Sharma & Kumari 2024).
In particular, the data-driven model typically leverages historical data, especially rainfall and flood flow data, to make predictions for future times. The critical part is the forecasting model. A number of recent studies attempted to leverage machine-learning algorithms to perform the forecasting (Toth et al. 2000; Hapuarachchi et al. 2011; Palash et al. 2018; Ming et al. 2020; Nguyen & Chen 2020; Ashok & Pekkat 2022; Kabbilawsh et al. 2022; Rahman et al. 2023). These methods encompass conventional machine-learning models such as linear models (Palash et al. 2018) and multi-feature linear models (Rahman et al. 2023), as well as deep neural networks such as long short-term memory (LSTM) (Le et al. 2019; Cho et al. 2022; Atashi et al. 2022; Rajab et al. 2023; Hayder et al. 2023), wavelet-based neural network models (Agarwal et al. 2022; Santos et al. 2023; Chakraborty & Biswas 2023), and spatial–temporal models (Ding et al. 2020; Noor et al. 2022). Moreover, some researchers have further adopted these models to perform more complicated tasks with improved modifications. To name a few, Agarwal et al. (2023) have integrated a class of machine-learning models to study the extreme annual rainfall in North-Eastern India. The same group has also extended the utilization of neural network models to the multiple input multiple output (MIMO) setting to perform the flood modeling and learn storage rate changes in the river system (Agarwal et al. 2021a, 2021b). These models have been demonstrated to be effective in performing short-term flood prediction.
However, rather than performing short-term flood flow prediction, predicting the arrival time of the flood peak – or, more precisely, the time when the flood flow reaches a dangerous level – could be even more crucial. This is because it enables the government to carry out early warnings and make necessary intervention efforts and preparations. Significantly, flood peak arrival time prediction is generally more challenging than short-term flood flow prediction as it typically requires the model to anticipate flood behavior over a longer term. That is, the flood peak could occur much later than the prediction time. Therefore, the machine-learning methods developed in the previous literature may not be directly adapted to perform accurate flood peak arrival time prediction. To this end, the objectives of this paper are summarized as follows:
We will develop new machine-learning frameworks for predicting the flood peak arrival times based on the historical data.
We will explore a set of modules in the developed machine-learning framework that can enable more accurate, robust, and interpretable predictions.
We will conduct case studies in the Tunxi and Changhua basins to validate our framework's design and compare its prediction accuracy with baseline methods.
The remaining part of this paper is organized as follows: First, we will first introduce the primary method in Section 2, covering the problem overview and description of the developed machine-learning framework. This encompasses feature engineering, machine-learning model design, and model ensemble.
In Section 3, we will introduce the case studies in Tunxi and Changhua, China, including the description of the dataset and some basic data organization and preprocessing procedures. In Section 4, we will provide a detailed illustration of the performance of the proposed models on several real-world flood forecasting datasets. This section includes a comparison between the proposed forecasting method and baselines, as well as ablation studies to interpret the role of different features and modules in our method. In Section 5, we discuss the key findings and the limitations of the developed approaches. Then, we summarize the methodology and results of this paper, as well as highlight the future directions in Section 6.
METHODS
Overview
More specifically, since the time of making a prediction is known, the target of the prediction can be transformed into the time difference between the current one and the flood peak arrival time, i.e., the time to the flood peak. This will serve as the prediction target in the model design part.
To effectively predict the timing of flood peaks using historical flow and rainfall data, several challenges must be addressed. First, there is no direct physical model that links the timing of a flood peak to historical patterns. Our understanding relies primarily on general observations, such as heavier rainfall typically shortening the lead time to a flood peak, higher flow indicating more intense flood peaks, and prolonged heavy rain leading to quicker rises in flood levels. Second, identifying the most appropriate machine-learning model remains uncertain, as it is unclear which model will provide the best prediction performance. Consequently, we are motivated to investigate the design of both feature extraction methods and predictive models, which are detailed in the subsequent subsections.
Feature engineering
In this part, we aim to develop a simple yet effective method for extracting informative features from the raw data. Such a feature engineering framework has been widely demonstrated to be effective in improving the performance of the model in many time series-based machine-learning tasks (Wang et al. 2022; Pablo & Dormido-Canto 2023).
In particular, we will use the historical rainfall and flood flow as the raw features and then design a novel approach to extract the features accordingly. We use feature engineering to account for the duration, amount, and frequency of events. For example, average rainfall and river flow measure the strength of these factors in the area. Their variability indicates the potential for sudden, intense rain, and how often it rains consecutively suggests how frequently the area experiences rainfall leading up to peak floods. We detail these features, including their mathematical formula and descriptions, as follows.




Visualization of the feature engineering module, where we extract informative features from the raw historical flood and rainfall data through the perspective of frequency, count of heavy rain/flood, statistical quantities, and increases/changes.
Visualization of the feature engineering module, where we extract informative features from the raw historical flood and rainfall data through the perspective of frequency, count of heavy rain/flood, statistical quantities, and increases/changes.
Features of historical rainfall
We first consider the features generated by leveraging the historical rainfall. In particular, the corresponding candidate features and their calculation formula are described as follows:
Cumulative rainfall, formulated as
. The cumulative rainfall basically calculates the total rainfall of the historical period. In general, higher cumulative rainfall typically implies more heavier floods and thus plays an important role in predicting the peak arrival time.
Average rainfall, formulated as
. Here, one may think the average rainfall is similar to the cumulative rainfall since there is only a simple linear transformation between them. However, in our calculation, we consider the timestamps with nonzero rainfall (i.e., ri > 0) and only take the average over the nonzero rainfalls. Notably, the number of rainy days in the L's most recent timestamps is different for different timestamps. This implies that the average rainfall considered in our work is, in fact, a nonlinear function of the cumulative rainfall and thus can help provide richer information for our prediction task.
Exponential moving average of rainfall, formulated as
. Here
∈ (0, 1) denotes the decaying factor, which is chosen to be 2/3 in our paper. In particular, we consider the exponential moving average because it is natural that the rainfall in more recent times will have a stronger effect on the prediction, so we increase the weights for the closer rainfall data.
Standard deviation of daily rainfall, formulated as
. The idea of considering the standard deviation of the rainfall is to investigate whether the stability of the rainfall will affect the flood prediction. For instance, if we have the same total rainfall, which of the following two cases is more likely to reach the flood peak earlier: (1) there is a very heavy rainfall in one single timestamp; (2) the rainfall is uniformly distributed across all timestamps.
Rainfall increase, formulated as
. Here we use 5-day average to smooth the calculation. In general, the idea of calculating the rainfall increase is to show whether there is heavier rainfall in recent days, making it more likely to have heavier floods in the future.
Count of heavy rainfall, formulated as
, where
is a threshold that can be set according to the definition in (MANOBS 2013), and the rainfall higher than
is considered as the heavy rainfall. We apply this feature as a complement to the total/average rainfall, which can be understood as the frequency of heavy rainfall rather than just the magnitude of rainfall.
Count of consecutive heavy rainfall, formulated as
, the number of consecutive days with heavy rainfall is comparable to the overall count of heavy rainfall. Nevertheless, there are cases where it is more informative to determine if heavy rainfall occurs on successive days. This intuitively provides a better understanding of the frequency and intensity of the rainfall.
Features of historical flood flow
Then, we consider the features generated by leveraging the historical flood flow. Compared to the features of historical rainfall, the feature extraction method will be similar. We consider the strength, frequency, and standard deviation of the daily changes in flood flow. In particular, the features and the corresponding formulas are provided as follows:
Previous flood flow
, i.e., the flood flow in the previous time stamp.
Average flood flow change, formulated as
.
Exponential moving average, formulated as
.
Standard deviation, formulated as
.
Flow increase, formulated as:
.
Now we have detailed a set of analytic and interpretable feature extraction methods. In the experiment section (Section 3), we will first apply a standard statistical method to evaluate the importance of these features and show that using such interpretable feature extraction methods can lead to substantially better predictive performance compared to raw features.
Predictive model designs
In this section, we will introduce the predictive models used for forecasting the arrival time of flood peaks. Given the uncertainty regarding which machine-learning model is optimal for this task, we will consider a set of standard machine-learning models as well as several task-specific models. Instead of selecting a single best model based on the smallest validation error, we will construct an ensemble model. This ensemble approach utilizes all candidate models by assigning them different importance weights, as described in the next subsection.







Moreover, we will consider the following machine-learning models that will be used for the model ensemble.




Then, after obtaining the parameter and b, we are able to use Equation (3) to output the prediction, i.e., the time to the flood peak.
Indeed, the coefficients of multi-factor linear regression offer higher interpretability than other black-box models, such as neural networks. In multi-factor linear regression, each coefficient represents the average impact of an independent variable on the dependent variable, assuming other variables remain constant. This relationship is easier to understand and explain. Therefore, multi-factor linear regression is preferred for its greater interpretability in serious contexts. However, one critical issue in linear regression is its multicollinearity, i.e., some features are correlated with each other, leading to model redundancy. In order to address such a potential issue, we further consider two regularized linear regression methods: ridge regression (Gary 2009; Carlos & Upmanu 2010; Mumtaz et al. 2020) and Lasso regression (Ranstam & Cook 2018; Rastogi et al. 2021; Rofah et al. 2021).









Visualization of the feature-aware MLP model, outputs of MLP1 and MLP2, which operated on flood and rainfall features separately, and then feeding them into MLP3 to obtain the prediction.
Visualization of the feature-aware MLP model, outputs of MLP1 and MLP2, which operated on flood and rainfall features separately, and then feeding them into MLP3 to obtain the prediction.
Model ensemble





Model pipeline and workflow
Step 1: Perform feature engineering on the raw historical data. Then, a linear test of the features obtained from feature engineering will be applied to perform feature cleaning.
Step 2: Based on the extracted features, we train multiple models to generate the candidate predictions. These models are trained based on the MSE loss with possible regularization terms. The hyperparameters are tuned through a grid search method.
Step 3: Given the five trained candidate models, we further implement a linear ensemble for their predictions. The coefficients are trained and optimized based on the MSE loss on the validation dataset.
Step 4: Once all model parameters are well trained, given a new set of raw features, we will then do feature extraction, candidate model inferences, and results ensemble sequentially to obtain the final prediction.
Visualization of the machine-learning forecasting framework, including feature extraction, model design, and model ensemble.
Visualization of the machine-learning forecasting framework, including feature extraction, model design, and model ensemble.
Model evaluation
Using multiple evaluation metrics is crucial for assessing model performance from different viewpoints. This comprehensive approach guarantees that we capture various aspects of the model's accuracy and robustness, providing a more complete picture of its effectiveness. A method can be considered strictly superior if it surpasses other methods across all these metrics.



CASE STUDIES IN TUNXI AND CHANGHUA, CHINA
(a) Illustration of the Tunxi Basin, China, (b) Illustration of the Changhua Basin, China.
(a) Illustration of the Tunxi Basin, China, (b) Illustration of the Changhua Basin, China.
This section is then organized as follows: first, we will introduce the key properties of the dataset in these two basins. Then, we will evaluate the feature engineering module in our framework and discuss the importance of different features. Finally, we will compare the performances achieved by different methods.
Description of the dataset
In total, the Tunxi Basin dataset contains 134 floods, including 23,132 historical monitoring records (rainfall and flow) in total, spanning from 17 June 1981 to 18 March 2007. The Changhua Basin dataset contains 44 floods, including 9,354 historical monitoring records (rainfall and flow) in total, spanning from 7 April 1998 to 20 July 2010. The detailed description of the dataset is shown in Table 1. The dataset forms a time series, and the features contain the rainfall and cumulative flood flow at each timestamp, the granularity of the time series data is 1 h.
Description of the Tunxi and Changhua dataset
Dataset . | Tunxi Basin dataset . | Changhua Basin dataset . |
---|---|---|
Type of data | Rainfall and flow | Rainfall and flow |
Sampling frequency | 1 data/h | 1 data/h |
Number of floods | 134 floods | 44 floods |
Number of data in the dataset | 23,132 | 9,354 |
Dataset . | Tunxi Basin dataset . | Changhua Basin dataset . |
---|---|---|
Type of data | Rainfall and flow | Rainfall and flow |
Sampling frequency | 1 data/h | 1 data/h |
Number of floods | 134 floods | 44 floods |
Number of data in the dataset | 23,132 | 9,354 |
Visualization of all floods in the Tunxi and Changhua basins, China. (a) TunXi Basin, (b) Changhua Basin.
Visualization of all floods in the Tunxi and Changhua basins, China. (a) TunXi Basin, (b) Changhua Basin.
Data organization
Then let , the prediction target, i.e., the time to the flood peak, is defined as
, where
denotes the time of the closest flood peak. Both
and
are described in hours. This leads to a feature–target pair
. Moreover, since it is difficult to predict the flood peak at a very early stage (e.g.,
), we will only select the data for which the ground truth time to the flood peak is less than 20 h. This gives us the dataset with raw features.
NUMERICAL RESULTS
In this section, we will present the numerical results for the case studies in the Tunxi and Changhua basins. This section is organized as follows: first, we will introduce and evaluate the feature engineering module in our framework. Then, we discuss the importance of different features. Finally, we will make comparisons between the performances achieved by different methods to demonstrate the benefit of the proposed framework.
Feature engineering
Preprocessing
Based on the data described in Section 3.2 and the methods described in Section 2.2.2, we then perform feature engineering. Multiple informative features are extracted from the raw data and then added as inputs to the model. These features include two main categories: rainfall and flow, totaling 13 features. Due to the significant numerical differences contained in hydrological data, it is necessary to preprocess the features before model training.
Perform min–max normalization on the input variables to standardize the data to a range of 0–1, reducing the impact of data representation methods on the model results.
Shuffle the order of the dataset. By shuffling the order of the dataset, it prevents the occurrence of patterned data, which could lead to overfitting or non-convergence.
Perform train-validation-test splitting using a 6:2:2 ratio.
Analysis of the features
In response to the designed features, we first employ the single-factor regression to evaluate the performance of these features. Based on this, we can demonstrate which of them are effective. We provide the results for the Tunxi and Changhua datasets in Table 2, where the p-value, RMSE, MAE, and on the test dataset are presented.
Analysis of the features for Tunxi Basin, where the important features with are marked in green, and the irrelevant features with p-value > 0.05 are marked in magenta.
![]() |
![]() |
RMSE and MAE are measured in hours.
From the results, it can be seen that the p-values for the daily increase are consistently insignificant in both Changhua and Tunxi. This implies that this feature should be excluded in the feature integration section below. Additionally, the accumulated rainfall, the days of rainfall, and the number of days of consecutive heavy rain all achieved an of around 0.5 in out-of-sample tests, proving that these features have a specific effect on flood peak prediction. The exponential weighted moving average (EWMA) of rainfall performs slightly better than the average rainfall, indicating that recent rainfall should indeed have a more significant impact on the flood peak. This aligns with logical common sense, as closer rainfall events are more influential. However, when examining the flow features, although the p-values of each feature are significant, their performance is substantially worse. This suggests that while we are aiming for predictions related to flood flow, relying solely on flow features does not yield good performance. It seems that features derived from rainfall data may be more crucial for our task.
Feature redundancy analysis

The percentage of the variance explained vs the number of principal components: (a) Tunxi dataset, China; (b) Changhua dataset, China.
The percentage of the variance explained vs the number of principal components: (a) Tunxi dataset, China; (b) Changhua dataset, China.
Results of different machine-learning models
In this part, we will examine the performance of the different machine-learning models, including individual models, linear regression, ridge regression, Lasso regression, MLP, feature-aware MLP, and the ensemble model.
Experiment details: Here, we present the implementation details, including hyperparameters for all individual models. For the linear regression model, since there is no hyperparameter, we directly use the optimal solution of the ordinary least square. For ridge regression and Lasso regression, we use grid search to tune the coefficients of the and
regularizations, respectively, and the final optimal coefficients are
and
, respectively. For MLP, we use four layers, including two hidden layers. The first hidden layer has 128 neurons and the second layer has 64 neurons. For feature-aware MLP, we use three MLPs: one for processing rainfall features, the other for dealing with flood flow features, and the last MLP for predicting. All of these MLPs have three layers, including one hidden layer with 64 neurons. These methods are trained using Adam optimizer with a learning rate of 0.001 for 150 iterations. The ensemble model is formed by a linear regression. Thus, there are no hyperparameters, and we perform it on the validation dataset.
The benefit of feature engineering
First, we will investigate the benefits of the developed feature engineering method. We will evaluate the prediction performance of the MLP and linear regression models using both raw and extracted features. The results, summarized in Table 3, clearly demonstrate that using the extracted features significantly enhances prediction performance across all evaluation metrics. For instance, in the Tunxi Basin, applying feature engineering improves performance on the raw data by over 100% when using either the linear regression model or the MLP model. Similarly, significant improvements are observed in the Changhua Basin.
Model performance when using the raw features and extracted features
Basin . | Model . | Feature . | RMSE . | MAE . | R-square . |
---|---|---|---|---|---|
Tunxi Basin | Linear regression Linear regression MLP MLP | Raw features Extracted features Raw features Extracted features | 6.017 3.205 4.601 2.213 | 5.413 2.522 4.171 1.727 | −0.275 0.691 0.361 0.852 |
Changhua Basin | Linear regression Linear regression MLP MLP | Raw features Extracted features Raw features Extracted features | 3.782 3.025 3.513 2.997 | 2.964 2.851 2.902 2.737 | 0.561 0.613 0.573 0.661 |
Basin . | Model . | Feature . | RMSE . | MAE . | R-square . |
---|---|---|---|---|---|
Tunxi Basin | Linear regression Linear regression MLP MLP | Raw features Extracted features Raw features Extracted features | 6.017 3.205 4.601 2.213 | 5.413 2.522 4.171 1.727 | −0.275 0.691 0.361 0.852 |
Changhua Basin | Linear regression Linear regression MLP MLP | Raw features Extracted features Raw features Extracted features | 3.782 3.025 3.513 2.997 | 2.964 2.851 2.902 2.737 | 0.561 0.613 0.573 0.661 |
RMSE and MAE are measured in hours. Bold font indicates the best values compared with the others.
Performance comparison between different models
Under these three metrics, it is evident that performing the model ensemble can significantly enhance the performance of individual models. For instance, in Changhua, the Ensemble Model outperforms others with the lowest RMSE (2.564) and MAE (2.192), and the highest R-square (0.735); in Tunxi, the Ensemble Model also shows the best performance with the lowest RMSE (1.966) and MAE (1.524) and the highest R-square (0.883).
MLP models generally perform better than linear models, although they lack interpretability to some extent. This implies that, for the arrival time prediction task, using the black-box MLP may typically lead to better performance, when provided with the extracted features.
Using ridge regression and Lasso regression performs similarly to the standard linear regression, implying that overfitting or feature redundancy (after the feature cleaning) may not be significant.
Predictions obtained using different machine-learning models for (a) Tunxi and (b) Changhua dataset. The x-axis represents the real arrival time in hour, the y-axis represents the predicted arrival time in hour. The closer to the line y = x, the smaller the prediction error.
Predictions obtained using different machine-learning models for (a) Tunxi and (b) Changhua dataset. The x-axis represents the real arrival time in hour, the y-axis represents the predicted arrival time in hour. The closer to the line y = x, the smaller the prediction error.
Performance comparison between different models
Basin . | Model . | RMSE . | MAE . | R-square . |
---|---|---|---|---|
Tunxi Basin | Linear regression | 3.205 | 2.522 | 0.691 |
Ridge regression | 3.348 | 3.044 | 0.662 | |
Lasso regression | 3.343 | 3.038 | 0.663 | |
MLP model | 2.213 | 1.727 | 0.852 | |
Feature-aware MLP | 2.864 | 2.290 | 0.752 | |
Ensemble model | 1.966 | 1.524 | 0.883 | |
Changhua Basin | Linear regression | 3.025 | 2.851 | 0.613 |
Ridge regression | 3.013 | 2.808 | 0.622 | |
Lasso regression | 3.005 | 2.787 | 0.629 | |
MLP model | 2.997 | 2.737 | 0.661 | |
Feature-aware MLP | 3.040 | 2.891 | 0.609 | |
Ensemble model | 2.564 | 2.192 | 0.735 |
Basin . | Model . | RMSE . | MAE . | R-square . |
---|---|---|---|---|
Tunxi Basin | Linear regression | 3.205 | 2.522 | 0.691 |
Ridge regression | 3.348 | 3.044 | 0.662 | |
Lasso regression | 3.343 | 3.038 | 0.663 | |
MLP model | 2.213 | 1.727 | 0.852 | |
Feature-aware MLP | 2.864 | 2.290 | 0.752 | |
Ensemble model | 1.966 | 1.524 | 0.883 | |
Changhua Basin | Linear regression | 3.025 | 2.851 | 0.613 |
Ridge regression | 3.013 | 2.808 | 0.622 | |
Lasso regression | 3.005 | 2.787 | 0.629 | |
MLP model | 2.997 | 2.737 | 0.661 | |
Feature-aware MLP | 3.040 | 2.891 | 0.609 | |
Ensemble model | 2.564 | 2.192 | 0.735 |
RMSE and MAE are measured in hours.
These findings demonstrate the effectiveness of our proposed machine-learning framework and justify the necessity of each module within it.
DISCUSSION
In this section, we would like to discuss the key findings and a number of limitations of our system. In particular, one key finding is that, according to the analysis of the different features in Table 2, the historical rainfall data are more crucial for predicting the flood peak arrival time compared to the historical flood flow. This could be useful for other flood-based prediction tasks. The other key finding is that model ensemble is crucial in real-world forecasting tasks. If there is a better individual prediction model, we are able to incorporate it into our ensemble framework to further improve the performance.
Regarding the limitations, one of them is that our feature engineering method only considers the historical rainfall and flood flow. This is restricted by the Tunxi and Changhua datasets. Another aspect of our framework that can be improved is the design of machine-learning models. In many real-world scenarios, we are able to collect data from multiple sources/stations at different locations in the Basin, and then it naturally forms a spatial–temporal data structure (Shao et al. 2023). Our current model does not take the spatial correlation of the data into consideration, so there may still be space for further improvement. Last but not least, our method is data driven. The lack of physical understanding and constraints may lead to unexpected and unreasonable predictions in some extreme cases.
CONCLUSIONS
In this paper, we studied an important flood forecasting problem: predicting the arrival time of flood peaks, which is essential for timely warnings and preparations but has received limited attention. We further developed a comprehensive machine-learning framework with novel designs for accomplishing the forecasting task. Specifically, the framework is built based on the combination of an interpretable feature engineering method, individual machine-learning models, and a novel model ensemble technique. This paper further applies the proposed machine-learning framework to predict the flood peak arrival time in the Tunxi and Changhua basins, China. Numerical results show the effectiveness of our forecasting system. Feature engineering improves MAE in the Tunxi Basin from 5.413 to 2.522 for the linear model and from 4.171 to 1.727 for the MLP model. Our ensemble methods outperform baselines, achieving MAE errors of 1.524 in Tunxi and 2.192 in Changhua, compared to 1.727 and 2.737 from the best baselines. These advancements enable more effective flood management through proactive measures and timely responses.
We also identify several future directions that can further improve the prediction accuracy. First, it is interesting to discover more informative features if other observations can be obtained, such as temperature, evaporation, soil moisture content, etc. Then, the machine-learning model can also be improved to involve the spatial component, such as graph neural networks-based models, to make better use of the spatial–temporal data. Moreover, in order to achieve more reliable and controllable prediction, a future direction is to incorporate physical understandings into the model design, i.e., integrating the conventional hydrological models of flood flow into the data-driven machine-learning models.
FUNDING
This research was funded by the Research Projects of the Department of Education of Guangdong Province, China grant number 2023ZDZX1083, and the Research Foundation of Shenzhen Polytechnic University, China under grant number 6023312050K.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.