ABSTRACT
This study proposes a multi-task deep learning model for simultaneous prediction of time-series water levels and flood risk thresholds, aiming to enhance flood forecasting precision. Using AutoKeras, single-task and multi-task models were optimised to predict water levels 10–360 min ahead based on 720 min of prior data. The multi-task model consistently outperformed the single-task model across multiple evaluation metrics, including correlation coefficients, root mean squared error, Nash–Sutcliffe efficiency, and Kling–Gupta efficiency scores. Real-time prediction tests on actual rainfall events further validated the multi-task model's improved accuracy and applicability in operational flood forecasting. The study demonstrates significant progress in flood prediction methodologies, offering a more comprehensive approach to forecasting and categorising flood incidents.
HIGHLIGHTS
A multi-task deep learning model enhances flood forecasting precision.
AutoKeras optimises single-task and multi-task models for water level prediction.
The multi-task approach outperformed the single-task approach in terms of correlation, RMSE, Nash–Sutcliffe efficiency, and Kling–Gupta efficiency scores.
Real-time testing validates improved accuracy for operational flood forecasting.
Comprehensive methodology advances flood prediction and categorisation.
INTRODUCTION
Accurate and prompt prediction of river floods is crucial for efficient flood risk management and disaster mitigation strategies. The capacity to anticipate flood occurrences with a notable level of accuracy is essential to diminish the severe socioeconomic consequences. Given the escalating frequency and severity of extreme weather incidents due to climate change, the demand for reliable flood prediction systems has become increasingly urgent.
Recent technological progress, exemplified by the fusion of deep learning algorithms with conventional hydrological approaches, has notably enhanced the efficacy of flood prediction systems. Recent advances in hydrological forecasting have often debated the role of modelling and forecasting approaches in predicting water levels and managing flood risks. As Herath et al. (2021) highlighted, the distinction between modelling and forecasting is critical in hydrological applications, where data-driven methods, such as genetic programming or artificial neural networks, can be effective for short-term predictions but may benefit from the inclusion of physical models to enhance accuracy (Herath et al. 2021). For example, Chadalawada et al. (2020) demonstrated the effectiveness of Gaussian process-based machine learning in hydrological applications, specifically for rainfall-runoff modelling. This approach highlights the potential of data-driven techniques to improve prediction accuracy by automatically generating models based on observed data without the need for a detailed physical representation.
This study adopts a data-driven approach focused on forecasting water levels and issuing flood warnings using a multitasked deep learning model optimised for short-term predictions. By leveraging historical water level and rainfall data, the proposed method aims to improve the operational efficiency of flood-forecasting systems without the need for detailed physical modelling. Conventional hydrological techniques employed in flood forecasting frequently encounter challenges stemming from their dependence on physical models, which may be constrained by the intricate nature and fluctuations of meteorological and hydrological phenomena (Amarilla et al. 2023). These models require a significant amount of data, which is a resource that may not always be readily available, particularly in areas with limited monitoring systems. This lack of data can potentially compromise the precision of predictions when these models are used under unfamiliar or fluctuating circumstances, ultimately resulting in suboptimal flood risk management strategies (Zhao et al. 2023).
Furthermore, in environments that are constantly changing owing to rapid hydrological variations, traditional models may prove inadequate for adapting instantaneously. Swift changes in flow rates and land-use patterns can lead to decreased effectiveness of current models, thereby causing delays or inaccuracies in flood forecasts (Trošelj et al. 2023). Hence, the amalgamation of real-time data and machine learning techniques with traditional forecasting methods can significantly enhance their predictive capabilities.
Forecasting the water level of rivers using deep learning techniques has experienced noteworthy progress and a wide range of practical uses, thereby improving flood management. Most deep learning models found in hydrology literature have primarily focused on single-task training, aiming to predict a single variable, such as streamflow or water level (Hu et al. 2018; Bowes et al. 2019; Kratzert et al. 2019; Ng et al. 2023). The utilisation of multi-task models has demonstrated enhanced generalisation and reduced overfitting, as evidenced in various fields such as natural language processing and computer vision (Seltzer & Droppo 2013; Chen et al. 2014; Girshick 2015; Ruder 2017).
Within hydrological studies, numerous variables exhibit interrelations, highlighting the potential benefits of a multi-task model in exploiting information from correlated variables for improved predictions. Specifically, by training multiple variables governed by common physical processes, a multi-task deep learning model can effectively capture shared hydrologic mechanisms, resulting in more accurate predictions of target hydrologic variables (Sadler et al. 2022).
Multi-task modelling presents itself as a potential strategic approach aimed at alleviating a particular obstacle encountered by deep learning models, which pertains to the necessity of possessing extensive observational datasets for model training. This challenge is especially pronounced in the realm of Earth sciences, where the scarcity of data is a prevalent issue owing to the elevated expenses associated with the collection of such data. By engaging in multi-task learning, it is possible to effectively harness the information originating from one specific variable to bolster the predictive precision pertaining to another interconnected variable, particularly in cases where the latter variable is inadequately represented within the existing dataset. Multi-task deep learning (MTDL) has shown significant promise in hydrological applications by leveraging the interdependencies between various hydrological variables to improve prediction accuracy and model robustness. For instance, integrating spatial information and multi-task learning into long short-term memory networks has enhanced the performance of hydrological models in simulating runoff and actual evaporation, with Nash–Sutcliffe efficiency coefficients (NSEs) ≥ 0.82 and 0.95, respectively (Li et al. 2023). In another study, modelling the daily average streamflow and stream water temperature together using MTDL improved the prediction accuracy for 56/101 sites across the United States, although the benefits varied depending on the site and model configuration (Sadler et al. 2022). In satellite-based precipitation estimation, a novel MTDL framework that simultaneously trains rain/no-rain classification and rain-rate regression tasks outperformed conventional single-task models, highlighting the efficiency of knowledge transfer between tasks (Bannai et al. 2023). These studies collectively underscore the potential of multitasked deep learning in enhancing the accuracy, robustness, and interpretability of hydrological models by effectively leveraging interdependencies.
Flood warnings and alerts were issued to facilitate citizen evacuation and restrict river access before a flood occurred. Although predicting the time series of water levels is crucial for determining when to issue warnings, it is equally important to accurately predict whether water levels will exceed the established warning and alert criteria. Therefore, this study proposes a flood prediction methodology that can be effectively utilised in operational flood forecasting by developing a multi-task deep learning model. This model aims to simultaneously predict time-series water levels and determine whether flood warning and alert thresholds will be exceeded, in contrast to traditional single-task deep learning models that focus solely on time-series water level prediction.
In this study, the term ‘forecasting’ was used to describe the short-term prediction of water levels for operational flood warning purposes. The developed model is not intended to represent physical hydrological processes but rather to predict future water levels and flood risks based on observed data.
To achieve this objective, we employed AutoKeras, an open-source library that supports the automatic design and optimisation of deep learning models. This approach was chosen to minimise the influence of trial-and-error methods and human experience, which significantly affect the model structure composition and hyperparameter optimisation. This research focused on comparing the differences between single- and multi-task applications rather than optimising the deep learning model for prediction.
Furthermore, we evaluated the prediction accuracy of each model by performing real-time predictions for actual rainfall events and examined the applicability of the multi-task model in flood forecasting and warning operations.
MATERIALS AND METHODS
Study area
Data
Data collection
As shown in Figures 2 and 3, the rainfall and water level data collected from 1 July 2019 to 31 October 2021 were used for training the deep learning model, while the data from 1 April 1 to 31 October 2022 were used for testing. In South Korea, the monsoon season (May–October) is the primary period when significant flood events occur, and flood forecasting and warning operations are primarily concentrated during this period. Therefore, only data from these months, where meaningful hydrological activity is present, were selected for training and testing, while data from dry or low water level periods were excluded as they contribute little to the model's ability to forecast floods. Additionally, the training data begin on 1 July, 2019, as this is the earliest point at which all three water level observation stations used as model inputs became operational, ensuring consistent and comprehensive input data for model development.
Data processing for training and validation
The model was designed to use only past water level observations from the target station and nearby stations, as well as rainfall data. Specifically, to predict water levels up to 6 h in the future, the input features consisted solely of historical observations from the three stations and rainfall up to the time of prediction. The model does not use any future data from the target station as input. This approach ensures that the model does not access future information and adheres to strict predictive modelling standards, thereby avoiding any form of data leakage.
The data processing procedure can be described in detail as follows. This process begins with data cleaning and normalisation. We employed a data imputation technique to handle missing values in the dataset. Specifically, we used a median imputation method for the numeric columns. This approach replaces missing values with the median of each column to ensure data continuity while minimising the impact of potential outliers. The data were standardised using Scikit-learn's StandardScaler to ensure that the features had a mean of 0 and unit variance, which is a common preprocessing step in machine learning.
Time-series data for sequence-to-sequence prediction were prepared using the sliding-window approach. We created input–output pairs using this method, where each input sequence consisted of 72 time steps (input_steps) encompassing all features (rainfall and water level data for the three stations). Correspondingly, each output sequence contained 36 time steps (output_steps) for the target variable (water level at Station 3). The sliding-window technique was applied across the entire dataset by incrementally shifting the window to create numerous overlapping sequences.
In the constructed dataset, periods without rainfall and consequent water level changes were more prevalent than those with rainfall-induced water level rise. If all the data were used for training, the model would be optimised for predicting low water level increases, potentially compromising its performance during significant flood events. To address this imbalance and enhance the model performance, particularly in predicting peak flows and rising water levels, a filtering mechanism was implemented. We applied two key criteria: the maximum rainfall (RAIN) within the input window should be ≥ 0.01, and the maximum water level (Station 3) within the output window should be≥ Elevation Level (E.L.) 2. This filtering ensured that the model was trained on data periods that included rainfall and consequential water level changes, thereby potentially improving its predictive capability for flood events.
For the classification of the multi-task model, a sophisticated labelling scheme was devised based on predefined water level thresholds. We categorised the maximum predicted water level into five classes (0–4), corresponding to different severity levels of flooding.
(0) Normal: No flood risk (<E.L.5.0 m)
(1) Concern: Elevated water levels, requiring increased monitoring (≥ E.L.5.0 m)
(2) Caution: Potential for minor flooding, increased vigilance necessary (≥ E.L.8.0 m)
(3) Alert: Significant flood risk, preparation for flood mitigation measures required (≥ E.L.9.5 m)
(4) Severe: Severe flooding expected, immediate action necessary (≥ E.L.11.8 m)
The criteria for categorising the maximum predicted water levels were based on the thresholds defined by the Korea Flood Control Office. These thresholds are designed to operate within a four-stage flood warning system – Concern, Caution, Alert, and Severe – based on meteorological and hydrological conditions. Each stage corresponds to a predefined water level threshold at each observation station.
These integer labels are then converted into a one-hot encoded format, preparing them for use in multiclass classification models. The resulting preprocessed dataset exhibited specific characteristics tailored to both regression and classification tasks for water level prediction. By integrating both continuous (regression) and categorical (classification) outputs, the model was designed to provide precise water level forecasts and broader flood risk assessments, enhancing its utility for flood management and early warning systems.
Multi-task model
Time-series water level prediction data, which are the output of flood forecasting models, are used for issuing flood warnings. Flood warnings are issued when the maximum predicted water level exceeds the predefined thresholds for flood forecasting, or warnings at specific flood forecast points. However, single-task time-series prediction models have limitations in accurately predicting the peak water levels because they are trained to optimise the average error over the entire prediction period.
To enhance the performance of flood warning issuance, a multi-task model is necessary. This model should not only predict time-series water levels but also determine whether warning thresholds will be exceeded based on observational data collected up to the present. Multi-task learning has been used to process various tasks using a single model. Rather than having separate models for tasks such as water level time-series prediction and flood warning threshold exceedance prediction using a single model for multiple tasks offers several advantages (one forward propagation, one backpropagation, and a lower parameter count).
These benefits ensure real-time operational efficiency. Furthermore, when multiple tasks are interrelated, learning them together can improve overall performance (Vandenhende et al. 2021).
A multi-task deep learning model was designed to perform multiple tasks simultaneously, leveraging shared information across tasks to improve overall performance. This approach is in contrast to single-task models, which focus on predicting only one variable at a time. Multi-task models can generalise better and reduce overfitting by learning from multiple related tasks.
Multi-tasks can be particularly useful in scenarios where data are sparse or expensive to collect, as they can leverage information from one variable to improve the prediction of another related variable. When trained on variables driven by the same underlying physical processes, these models can represent shared processes better, leading to more accurate predictions.
Model configuration, training and validation
Model configuration using AutoKeras
In this study, we utilised AutoKeras to automatically determine the optimal neural network architecture and optimise both single- and multi-task models. The primary objective was to apply a multitasked deep learning approach to improve the accuracy of river water level time series predictions and flood occurrence forecasting. Using AutoKeras, we aimed to automate the architecture search and optimisation process, thus minimising potential human biases and manual intervention. This allowed for a more objective comparison of the performances of the single-task and multi-task models.
AutoKeras is an open-source library that supports the automatic design and optimisation of deep learning models. This approach was chosen to minimise the influence of trial-and-error methods and human experience, which significantly affect the model structure composition and hyperparameter optimisation. AutoKeras was developed by the DATA Lab at Texas A&M University (Jin et al. 2019). Implemented in Python 3 and built on the Keras library, this tool automates tasks such as hyperparameter tuning and model optimisation, allowing users to quickly adapt machine learning models to a variety of tasks. The AutoKeras model architecture search process utilises a Neural Architecture Search and Bayesian optimisation techniques. The process starts by defining an initial architecture and then generates and modifies various architectures to find the optimal model for the given dataset and task.
The strength of AutoKeras lies in its ability to deliver high-performance deep learning networks with minimal user intervention. It automates crucial processes, such as feature engineering (including mining, selection, and construction) and network configuration (encompassing hyperparameter selection and fine-tuning). This automation streamlines the complex and time-consuming process required to develop optimal deep learning models (Perez 2019; Alaiad et al. 2023).
Model training and validation for the single-task model
The model architecture search was guided by the mean absolute error (MAE) as the loss function, which was used to minimise absolute errors during both training and validation. Furthermore, the NSE was incorporated as a custom metric to evaluate the model performance during both training and validation, specifically in the context of hydrological forecasting. This dual approach ensured that the model was optimised using the MAE for general error minimisation, whereas the NSE provided a domain-specific assessment of hydrological prediction accuracy. To balance the thorough exploration of the model architectures and computational efficiency, we constrained the search space to a maximum of 50 trials. The optimal single-task model, as determined by AutoKeras, implemented a regression model with the following hyperparameters: a dropout rate of 0.5 for the regression head, stochastic gradient descent (SGD) optimiser, and a learning rate of 0.01 (as shown in Table 1).
Category . | Parameter/metric . | Value . |
---|---|---|
Hyperparameters | regression_head_1/dropout | 0.5 |
optimizer | SGD | |
learning rate | 0.01 | |
Training metrics | loss | 0.2300 |
nse | 0.7315 | |
Validation metrics | val_loss | 0.1773 |
val_nse | 0.9646 | |
Trial info | Trial ID | 3 |
Best Step | 98 | |
Score | 0.1773 |
Category . | Parameter/metric . | Value . |
---|---|---|
Hyperparameters | regression_head_1/dropout | 0.5 |
optimizer | SGD | |
learning rate | 0.01 | |
Training metrics | loss | 0.2300 |
nse | 0.7315 | |
Validation metrics | val_loss | 0.1773 |
val_nse | 0.9646 | |
Trial info | Trial ID | 3 |
Best Step | 98 | |
Score | 0.1773 |
The model was trained for 98 epochs, and the best performance was achieved in the final epoch. The NSE was employed as the primary performance metric, given its widespread use in hydrological modelling. The model demonstrated a substantial improvement from the training to the validation phase, with the NSE increasing from 0.7315 to 0.9646.
Model training and validation for the multi-task model
We implemented a multi-task learning model that simultaneously addressed regression and classification tasks. The model architecture incorporated specific hyperparameters: a dropout rate of 0.25 was applied to the classification head to enhance regularisation, and a dropout rate of 0.0 was applied to the regression head, effectively disabled the dropout for that task. And a ‘flatten’ spatial reduction type for the classification head. The Adam optimiser was employed with a learning rate of 0.001. The model achieved optimal performance at epoch 34, as shown in Table 2. The NSE was utilised as the primary metric for the regression task, whereas accuracy was employed for the classification task.
Category . | Parameter/metric . | Value . |
---|---|---|
Hyperparameters | regression_head_1/dropout | 0.0 |
classification_head_1/spatial_reduction_1/reduction_type | Flatten | |
classification_head_1/dropout | 0.25 | |
optimizer | Adam | |
learning_rate | 0.001 | |
Training metrics | loss | 0.2355 |
regression_head_1_loss | 0.1359 | |
classification_head_1_loss | 0.0996 | |
regression_head_1_nse | 0.9330 | |
classification_head_1_accuracy | 0.9744 | |
Validation metrics | val_loss | 0.1780 |
val_regression_head_1_loss | 0.1255 | |
val_classification_head_1_loss | 0.0525 | |
val_regression_head_1_nse | 0.9787 | |
val_classification_head_1_accuracy | 0.9841 | |
Trial info | Trial ID | 19 |
Best step | 34 | |
Score | 0.1780 |
Category . | Parameter/metric . | Value . |
---|---|---|
Hyperparameters | regression_head_1/dropout | 0.0 |
classification_head_1/spatial_reduction_1/reduction_type | Flatten | |
classification_head_1/dropout | 0.25 | |
optimizer | Adam | |
learning_rate | 0.001 | |
Training metrics | loss | 0.2355 |
regression_head_1_loss | 0.1359 | |
classification_head_1_loss | 0.0996 | |
regression_head_1_nse | 0.9330 | |
classification_head_1_accuracy | 0.9744 | |
Validation metrics | val_loss | 0.1780 |
val_regression_head_1_loss | 0.1255 | |
val_classification_head_1_loss | 0.0525 | |
val_regression_head_1_nse | 0.9787 | |
val_classification_head_1_accuracy | 0.9841 | |
Trial info | Trial ID | 19 |
Best step | 34 | |
Score | 0.1780 |
The regression task exhibited high NSE values that increased from 0.9330 during training to 0.9787 during validation. Concurrently, the classification task achieved remarkable accuracy, which increased from 97.44% during training to 98.41% during validation. This high accuracy coupled with the improvement in the validation set suggests that the model effectively learned to categorise the data without overfitting.
The total loss decreased from 0.2355 during training to 0.1780 during validation, with both the regression and classification components of the loss function showing an improvement.
The preprocessed dataset consisting of input features (X), regression targets (Y), and classification labels (categorical labels) was partitioned into training and validation sets in an 80:20 ratio. The multi-task deep learning model was configured using AutoKeras to simultaneously address the regression and classification tasks for water level prediction. The model architecture was designed with a shared input node for both tasks to facilitate the learning of common features. Two distinct output heads were implemented: a regression head utilising MAE as the loss function and incorporating an NSE metric and a classification head employing categorical cross-entropy loss and an accuracy metric. To balance the exploration of the model architectures with computational efficiency, the AutoKeras search space was constrained to a maximum of 50 trials. The training process was structured with a maximum of 100 epochs and incorporated task-specific early stopping mechanisms to prevent overfitting and optimise resource utilisation. For the regression task, the validation regression loss was monitored, whereas the classification task was overseen using accuracy.
For feature extraction, two parallel flattened layers were used to transform the 3D input into a 1D vector of 288 features (72 × 4). This approach allows the model to capture temporal dependencies while maintaining a relatively low number of parameters. The model branches into task-specific paths. The classification branch applies a dropout layer to the flattened features, which is likely to prevent overfitting. This is followed by a dense layer with five units corresponding to the number of flood severity classes. The final softmax activation function produces class probabilities. The regression branch consisted of a single dense layer with 36 units directly connected to the flattened input, producing water level predictions for 36 time steps ahead.
The model has two output heads: a softmax layer for classification, output probabilities for five flood severity classes, and a linear output layer for regression, predicting 36 future water level values. In terms of complexity, the model contained 11,849 trainable parameters.
The optimised model structure exhibited several characteristics. The use of parallel flattened layers suggests that the model leverages a shared representation of the input data for both tasks, potentially allowing beneficial information transfer between the regression and classification objectives. The inclusion of a dropout layer only in the classification branch indicated that the optimisation process identified different regularisation requirements for each task. Architectural simplicity, evident in the absence of complex convolutional or recurrent layers, suggests that, for this specific problem, a simpler architecture is sufficient, potentially leading to faster inference times and easier interpretability. The direct connection of the regression branch from the flattened input to the output suggests that the model learned the direct mapping from the input time series to the predicted water levels.
The learning curves for both the single-task and multi-task models displayed inconsistencies, especially during early epochs, which constitutes a typical occurrence in deep learning frameworks subjected to highly variable hydrological time-series datasets. This variability, particularly in rainfall and water levels, can cause fluctuations in the learning processes. Additionally, the automated architecture search process in AutoKeras introduced variations in the training behaviour as it explored different model configurations. Despite these initial fluctuations, both models demonstrated stable convergence in the later stages of training, indicating successful optimisation.
One constraint inherent to this study was the limited dataset used for training and validation. The data comprised approximately 3 years of rainfall and water level observations, focusing mainly on the rainy seasons. This limited temporal scope may have led to reduced variability in the validation set, making the prediction task less challenging during validation than during training. Consequently, the validation loss was consistently lower than the training loss, which was particularly evident in the single-task model. Future research should aim to expand the dataset to encompass a wider range of hydrological conditions, including more diverse seasonal data and extended timeframes to improve the representativeness of the training and validation datasets.
RESULTS
In this study, we evaluated the accuracy of the water level time series and flood risk predictions using pre-trained single- and multi-task models on the test dataset, as shown in Figure 3. The predicted results were compared with actual observed water levels to assess the performance of the models. To execute this evaluation, we utilised 72-time steps of rainfall and water level data from three stations spanning 720 min prior to the current time. Using this input, we predicted the water levels and flood risk levels for the target water level station (Hangtangdaegyo) 10–360 min ahead.
In this study, we analysed the water level prediction results for four representative rainfall events during the test period: Event 1 (31 July 2022, 22:30–23:30), Event 2 (8 August 2022, 10:20–11:30), Event 3 (9 August 2022, 17:10–19:30), and Event 4 (5 September 2022, 21:30–23:50). The detailed results are presented in Figures S1–S4.
This study compared the performance of single- and multi-task models for predicting river water levels at 10-min intervals with a lead time of up to 360 min. The models were evaluated using four metrics (Correlation Coefficient, RMSE, NSE, Kling–Gupta efficiency (KGE)) for the four flood events, as shown in Table 3 (Chadalawada & Babovic 2019). The multi-task model consistently achieved higher correlation coefficients, particularly for the July and August 7 events. This indicates a stronger linear relationship between the predicted and observed water levels, suggesting improved prediction accuracy. In most cases, the multi-task model exhibited lower RMSE values than the single-task model. This trend was particularly pronounced for the August 7 and August 9 events, indicating that the multi-task model's predictions deviated less from the observed values. Although both models exhibited negative NSE values in some instances, the multi-task model generally achieved higher (more positive) NSE scores. This was particularly noticeable for the August 7 event, where the multi-task model consistently outperformed the single-task model in terms of NSE.
. | Single-task model . | Multi-task model . | |||||||
---|---|---|---|---|---|---|---|---|---|
Prediction time . | Correlation coefficient . | RMSE . | NSE . | KGE . | Correlation coefficient . | RMSE . | NSE . | KGE . | |
31 July 2022 | 22:20 | 0.912 | 1.123 | −1.762 | 0.097 | 0.958 | 1.166 | −1.977 | 0.216 |
22:30 | 0.962 | 1.081 | −1.679 | 0.134 | 0.916 | 1.123 | −1.891 | 0.256 | |
22:40 | 0.952 | 1.058 | −1.729 | 0.165 | 0.940 | 1.025 | −1.560 | 0.317 | |
22:50 | 0.967 | 1.021 | −1.720 | 0.218 | 0.924 | 0.970 | −1.455 | 0.325 | |
23:00 | 0.964 | 0.980 | −1.695 | 0.249 | 0.912 | 0.893 | −1.237 | 0.332 | |
23:10 | 0.956 | 0.962 | −1.813 | 0.291 | 0.907 | 0.884 | −1.376 | 0.361 | |
23:20 | 0.961 | 0.935 | −1.979 | 0.357 | 0.942 | 0.793 | −1.141 | 0.429 | |
23:30 | 0.965 | 0.925 | −2.095 | 0.374 | 0.894 | 0.836 | −1.529 | 0.418 | |
23:40 | 0.975 | 0.918 | −2.247 | 0.438 | 0.897 | 0.817 | −1.572 | 0.468 | |
23:50 | 0.964 | 0.916 | −2.415 | 0.436 | 0.871 | 0.822 | −1.751 | 0.357 | |
8 August 2022 | 10:20 | 0.959 | 0.587 | 0.108 | 0.371 | 0.952 | 0.465 | 0.441 | 0.396 |
10:30 | 0.957 | 0.571 | 0.015 | 0.404 | 0.928 | 0.475 | 0.319 | 0.386 | |
10:40 | 0.968 | 0.552 | −0.075 | 0.437 | 0.934 | 0.419 | 0.378 | 0.439 | |
10:50 | 0.961 | 0.445 | 0.181 | 0.600 | 0.951 | 0.238 | 0.766 | 0.630 | |
11:00 | 0.961 | 0.363 | 0.362 | 0.689 | 0.949 | 0.175 | 0.851 | 0.724 | |
11:10 | 0.977 | 0.272 | 0.590 | 0.809 | 0.956 | 0.229 | 0.710 | 0.825 | |
11:20 | 0.965 | 0.265 | 0.571 | 0.837 | 0.931 | 0.218 | 0.710 | 0.721 | |
11:30 | 0.964 | 0.215 | 0.689 | 0.869 | 0.910 | 0.285 | 0.456 | 0.753 | |
11:40 | 0.948 | 0.246 | 0.554 | 0.861 | 0.873 | 0.257 | 0.512 | 0.681 | |
11:50 | 0.945 | 0.270 | 0.381 | 0.833 | 0.837 | 0.232 | 0.543 | 0.695 | |
9 August 2022 | 17:10 | 0.901 | 0.433 | 0.090 | 0.405 | 0.806 | 0.483 | −0.130 | 0.438 |
17:20 | 0.935 | 0.403 | 0.213 | 0.387 | 0.781 | 0.450 | 0.017 | 0.398 | |
17:30 | 0.908 | 0.402 | 0.215 | 0.376 | 0.802 | 0.440 | 0.057 | 0.406 | |
17:40 | 0.930 | 0.375 | 0.301 | 0.357 | 0.785 | 0.403 | 0.192 | 0.388 | |
17:50 | 0.904 | 0.369 | 0.304 | 0.343 | 0.813 | 0.389 | 0.228 | 0.428 | |
18:00 | 0.920 | 0.352 | 0.349 | 0.358 | 0.842 | 0.370 | 0.281 | 0.471 | |
18:10 | 0.893 | 0.337 | 0.378 | 0.333 | 0.870 | 0.358 | 0.300 | 0.541 | |
18:20 | 0.900 | 0.329 | 0.380 | 0.336 | 0.897 | 0.355 | 0.279 | 0.599 | |
18:30 | 0.893 | 0.312 | 0.410 | 0.315 | 0.918 | 0.348 | 0.267 | 0.660 | |
18:40 | 0.903 | 0.296 | 0.439 | 0.331 | 0.915 | 0.351 | 0.207 | 0.681 | |
18:50 | 0.913 | 0.282 | 0.447 | 0.318 | 0.918 | 0.296 | 0.393 | 0.675 | |
18:00 | 0.894 | 0.260 | 0.488 | 0.308 | 0.915 | 0.297 | 0.334 | 0.701 | |
19:10 | 0.871 | 0.254 | 0.463 | 0.297 | 0.917 | 0.247 | 0.489 | 0.717 | |
19:20 | 0.825 | 0.243 | 0.445 | 0.301 | 0.885 | 0.236 | 0.476 | 0.706 | |
19:30 | 0.843 | 0.216 | 0.503 | 0.297 | 0.899 | 0.216 | 0.504 | 0.767 | |
5 September 2022 | 21:30 | 0.754 | 0.342 | 0.328 | 0.097 | 0.844 | 0.377 | 0.184 | 0.432 |
21:40 | 0.713 | 0.342 | 0.305 | 0.113 | 0.794 | 0.368 | 0.196 | 0.403 | |
21:50 | 0.783 | 0.320 | 0.364 | 0.141 | 0.840 | 0.353 | 0.226 | 0.470 | |
22:00 | 0.796 | 0.304 | 0.396 | 0.131 | 0.867 | 0.352 | 0.192 | 0.514 | |
22:10 | 0.754 | 0.304 | 0.369 | 0.152 | 0.893 | 0.335 | 0.234 | 0.544 | |
22:20 | 0.762 | 0.294 | 0.383 | 0.137 | 0.859 | 0.337 | 0.191 | 0.524 | |
22:30 | 0.757 | 0.286 | 0.382 | 0.153 | 0.822 | 0.308 | 0.282 | 0.460 | |
22:40 | 0.846 | 0.257 | 0.464 | 0.191 | 0.843 | 0.298 | 0.277 | 0.511 | |
22:50 | 0.766 | 0.253 | 0.423 | 0.172 | 0.808 | 0.274 | 0.321 | 0.523 | |
23:00 | 0.811 | 0.228 | 0.483 | 0.206 | 0.850 | 0.251 | 0.377 | 0.591 | |
23:10 | 0.778 | 0.222 | 0.453 | 0.202 | 0.775 | 0.228 | 0.422 | 0.534 | |
23:20 | 0.753 | 0.216 | 0.427 | 0.247 | 0.791 | 0.205 | 0.482 | 0.555 | |
23:30 | 0.797 | 0.195 | 0.462 | 0.223 | 0.819 | 0.186 | 0.507 | 0.621 | |
23:40 | 0.767 | 0.178 | 0.474 | 0.248 | 0.801 | 0.188 | 0.417 | 0.655 | |
23:50 | 0.797 | 0.159 | 0.533 | 0.257 | 0.777 | 0.186 | 0.362 | 0.633 |
. | Single-task model . | Multi-task model . | |||||||
---|---|---|---|---|---|---|---|---|---|
Prediction time . | Correlation coefficient . | RMSE . | NSE . | KGE . | Correlation coefficient . | RMSE . | NSE . | KGE . | |
31 July 2022 | 22:20 | 0.912 | 1.123 | −1.762 | 0.097 | 0.958 | 1.166 | −1.977 | 0.216 |
22:30 | 0.962 | 1.081 | −1.679 | 0.134 | 0.916 | 1.123 | −1.891 | 0.256 | |
22:40 | 0.952 | 1.058 | −1.729 | 0.165 | 0.940 | 1.025 | −1.560 | 0.317 | |
22:50 | 0.967 | 1.021 | −1.720 | 0.218 | 0.924 | 0.970 | −1.455 | 0.325 | |
23:00 | 0.964 | 0.980 | −1.695 | 0.249 | 0.912 | 0.893 | −1.237 | 0.332 | |
23:10 | 0.956 | 0.962 | −1.813 | 0.291 | 0.907 | 0.884 | −1.376 | 0.361 | |
23:20 | 0.961 | 0.935 | −1.979 | 0.357 | 0.942 | 0.793 | −1.141 | 0.429 | |
23:30 | 0.965 | 0.925 | −2.095 | 0.374 | 0.894 | 0.836 | −1.529 | 0.418 | |
23:40 | 0.975 | 0.918 | −2.247 | 0.438 | 0.897 | 0.817 | −1.572 | 0.468 | |
23:50 | 0.964 | 0.916 | −2.415 | 0.436 | 0.871 | 0.822 | −1.751 | 0.357 | |
8 August 2022 | 10:20 | 0.959 | 0.587 | 0.108 | 0.371 | 0.952 | 0.465 | 0.441 | 0.396 |
10:30 | 0.957 | 0.571 | 0.015 | 0.404 | 0.928 | 0.475 | 0.319 | 0.386 | |
10:40 | 0.968 | 0.552 | −0.075 | 0.437 | 0.934 | 0.419 | 0.378 | 0.439 | |
10:50 | 0.961 | 0.445 | 0.181 | 0.600 | 0.951 | 0.238 | 0.766 | 0.630 | |
11:00 | 0.961 | 0.363 | 0.362 | 0.689 | 0.949 | 0.175 | 0.851 | 0.724 | |
11:10 | 0.977 | 0.272 | 0.590 | 0.809 | 0.956 | 0.229 | 0.710 | 0.825 | |
11:20 | 0.965 | 0.265 | 0.571 | 0.837 | 0.931 | 0.218 | 0.710 | 0.721 | |
11:30 | 0.964 | 0.215 | 0.689 | 0.869 | 0.910 | 0.285 | 0.456 | 0.753 | |
11:40 | 0.948 | 0.246 | 0.554 | 0.861 | 0.873 | 0.257 | 0.512 | 0.681 | |
11:50 | 0.945 | 0.270 | 0.381 | 0.833 | 0.837 | 0.232 | 0.543 | 0.695 | |
9 August 2022 | 17:10 | 0.901 | 0.433 | 0.090 | 0.405 | 0.806 | 0.483 | −0.130 | 0.438 |
17:20 | 0.935 | 0.403 | 0.213 | 0.387 | 0.781 | 0.450 | 0.017 | 0.398 | |
17:30 | 0.908 | 0.402 | 0.215 | 0.376 | 0.802 | 0.440 | 0.057 | 0.406 | |
17:40 | 0.930 | 0.375 | 0.301 | 0.357 | 0.785 | 0.403 | 0.192 | 0.388 | |
17:50 | 0.904 | 0.369 | 0.304 | 0.343 | 0.813 | 0.389 | 0.228 | 0.428 | |
18:00 | 0.920 | 0.352 | 0.349 | 0.358 | 0.842 | 0.370 | 0.281 | 0.471 | |
18:10 | 0.893 | 0.337 | 0.378 | 0.333 | 0.870 | 0.358 | 0.300 | 0.541 | |
18:20 | 0.900 | 0.329 | 0.380 | 0.336 | 0.897 | 0.355 | 0.279 | 0.599 | |
18:30 | 0.893 | 0.312 | 0.410 | 0.315 | 0.918 | 0.348 | 0.267 | 0.660 | |
18:40 | 0.903 | 0.296 | 0.439 | 0.331 | 0.915 | 0.351 | 0.207 | 0.681 | |
18:50 | 0.913 | 0.282 | 0.447 | 0.318 | 0.918 | 0.296 | 0.393 | 0.675 | |
18:00 | 0.894 | 0.260 | 0.488 | 0.308 | 0.915 | 0.297 | 0.334 | 0.701 | |
19:10 | 0.871 | 0.254 | 0.463 | 0.297 | 0.917 | 0.247 | 0.489 | 0.717 | |
19:20 | 0.825 | 0.243 | 0.445 | 0.301 | 0.885 | 0.236 | 0.476 | 0.706 | |
19:30 | 0.843 | 0.216 | 0.503 | 0.297 | 0.899 | 0.216 | 0.504 | 0.767 | |
5 September 2022 | 21:30 | 0.754 | 0.342 | 0.328 | 0.097 | 0.844 | 0.377 | 0.184 | 0.432 |
21:40 | 0.713 | 0.342 | 0.305 | 0.113 | 0.794 | 0.368 | 0.196 | 0.403 | |
21:50 | 0.783 | 0.320 | 0.364 | 0.141 | 0.840 | 0.353 | 0.226 | 0.470 | |
22:00 | 0.796 | 0.304 | 0.396 | 0.131 | 0.867 | 0.352 | 0.192 | 0.514 | |
22:10 | 0.754 | 0.304 | 0.369 | 0.152 | 0.893 | 0.335 | 0.234 | 0.544 | |
22:20 | 0.762 | 0.294 | 0.383 | 0.137 | 0.859 | 0.337 | 0.191 | 0.524 | |
22:30 | 0.757 | 0.286 | 0.382 | 0.153 | 0.822 | 0.308 | 0.282 | 0.460 | |
22:40 | 0.846 | 0.257 | 0.464 | 0.191 | 0.843 | 0.298 | 0.277 | 0.511 | |
22:50 | 0.766 | 0.253 | 0.423 | 0.172 | 0.808 | 0.274 | 0.321 | 0.523 | |
23:00 | 0.811 | 0.228 | 0.483 | 0.206 | 0.850 | 0.251 | 0.377 | 0.591 | |
23:10 | 0.778 | 0.222 | 0.453 | 0.202 | 0.775 | 0.228 | 0.422 | 0.534 | |
23:20 | 0.753 | 0.216 | 0.427 | 0.247 | 0.791 | 0.205 | 0.482 | 0.555 | |
23:30 | 0.797 | 0.195 | 0.462 | 0.223 | 0.819 | 0.186 | 0.507 | 0.621 | |
23:40 | 0.767 | 0.178 | 0.474 | 0.248 | 0.801 | 0.188 | 0.417 | 0.655 | |
23:50 | 0.797 | 0.159 | 0.533 | 0.257 | 0.777 | 0.186 | 0.362 | 0.633 |
The comparative analysis of the single-task and multi-task models evaluated using KGE revealed significant insights into their respective performances across various flood events. The multi-task model demonstrated superior consistency in performance compared to the single-task model. Across all observed time periods, the multi-task model generally achieved higher KGE values. On 8 August 2022, although both models reached similar peak KGE values (0.837 for the single-task and 0.825 for the multi-task), the multi-task model exhibited greater stability in its predictions over time, with fewer fluctuations than the single-task model. The multi-task model showed marked improvement in handling complex and rapidly changing flood scenarios. On 9 August 2022, at 19:30, the multi-task model achieved a KGE value of 0.767, significantly outperforming the single-task model by 0.297. Similarly, on 5 September 2022, the multi-task model demonstrated superior performance, with KGE values ranging from 0.403 to 0.591, compared with the single-task model's range of 0.097–0.257.
The multi-task model consistently exhibited superior performance across different time points and flood events.
In this study, the flood prediction performances of single- and multi-task models were evaluated based on their ability to predict the occurrence of floods within the next 6 h. Table 4 presents a contingency table comparing the performance of each model in predicting flood occurrences in 4,420 cases during the test from 1 April to 31 October 2022.
. | . | Observed . | |
---|---|---|---|
. | . | Yes (flood) . | No (No flood) . |
Single-task model | Yes (Flood) | 192 (Hits) | 90 (False alarms) |
No (No flood) | 49 (Misses) | 4,097 (Correct negatives) | |
Multi-task model | Yes (Flood) | 211 (Hits) | 107 (False alarms) |
No (No flood) | 30 (Misses) | 4,080 (Correct negatives) |
. | . | Observed . | |
---|---|---|---|
. | . | Yes (flood) . | No (No flood) . |
Single-task model | Yes (Flood) | 192 (Hits) | 90 (False alarms) |
No (No flood) | 49 (Misses) | 4,097 (Correct negatives) | |
Multi-task model | Yes (Flood) | 211 (Hits) | 107 (False alarms) |
No (No flood) | 30 (Misses) | 4,080 (Correct negatives) |
The analysis in Table 4 provides crucial insights into the capabilities of the models in terms of correctly identifying flood occurrences, false alarms, and missed events. Compared to the single-task model, the multi-task model showed a 9.9% improvement in flood detection, correctly identifying 19 additional flood events. This reduces the number of missed flood occurrences by 38.8% (19 fewer events). However, it increased the number of false alarms by 18.9% (17 more) and slightly decreased the number of correct negatives by 0.4% (17 fewer).
Table 5 presents performance metrics for flood forecasting using test dataset of additional performance metrics – precision, recall, F1-score, and critical success index (CSI) – for both models (Hicks et al. 2022). The multi-task model showed a significant increase in recall (87.56 vs. 79.67%), indicating better flood event detection. The multi-task model achieved a higher F1 score (0.7553 vs. 0.7345). Furthermore, when comparing precision, the single-task model had a slightly higher value (68.1%) compared to the multi-task model (66.4%). However, the multi-task model's superior recall led to an overall higher F1 score. Additionally, the CSI was higher for the multi-task model (0.607) compared to the single-task model (0.580), reflecting its better overall prediction accuracy despite a higher rate of false alarms. These results suggest that the multi-task model offers improved flood prediction capabilities, particularly in terms of event detection, while balancing a slight increase in false alarms.
. | Single-task model . | Multi-task model . |
---|---|---|
Precision | 0.681 | 0.664 |
Recall | 0.797 | 0.875 |
F1-score | 0.734 | 0.756 |
CSI | 0.580 | 0.607 |
. | Single-task model . | Multi-task model . |
---|---|---|
Precision | 0.681 | 0.664 |
Recall | 0.797 | 0.875 |
F1-score | 0.734 | 0.756 |
CSI | 0.580 | 0.607 |
CONCLUSIONS
This study proposes a multi-task deep learning model to simultaneously predict time-series water levels and determine whether flood warnings and alert thresholds will be exceeded. This model aims to enhance the precision of flood forecasting and warning systems. This study employed AutoKeras, an open-source library, to automatically design and optimise single-task and multi-task deep learning models. This minimises the influence of trial-and-error methods and human experience on the model structure and hyperparameter optimisation. The models used 72 time steps of rainfall and water level data from the three stations, covering 720 min before the current time. Predictions were pre-trained for the target water level station (Hangtangdaegyo in the Hantan Basin) from 10 to 360 min ahead. While the model optimised in this study demonstrates effectiveness for flood and water level prediction, it may not represent the most optimal structure for these specific forecasting tasks. Although AutoKeras was employed to efficiently determine the basic model structure and optimise parameters during initial development, applying the latest deep learning architectures to the same dataset could potentially yield better results. For future improvements, we plan to secure a larger and more diverse dataset and enhance the configuration of training and testing data to develop a more specialised model for flood prediction.
Despite these limitations, the primary contribution of this study lies in developing a practical time-series forecasting tool that enhances the precision of flood warnings and short-term water level predictions. Rather than focusing on the detailed representation of hydrological processes, this approach prioritises data-driven methods for operational decision-making, making it particularly valuable for real-world applications in flood forecasting and risk management.
This study evaluated the accuracy of water level time-series and flood risk predictions using pre-trained single- and multi-task models. By testing four flood events, the multi-task model consistently outperformed its single-task counterpart, offering more accurate and reliable predictions of water levels and flood risks. The multi-task model exhibited superior performance across multiple evaluation metrics. Higher correlation coefficients were consistently achieved, indicating a stronger linear relationship between the predicted and observed water levels, suggesting an improved ability to capture the underlying patterns in water level fluctuations. Furthermore, the multi-task model generally produced lower RMSE values, demonstrating its capacity to generate predictions that aligned more closely with the observed values. NSE scores, a crucial metric in hydrological modelling, also favour the multitasking approach. Although both models occasionally produced negative NSE values, the multi-task model generally achieved higher and more positive scores. The multi-task model consistently achieved higher KGE values than the single-task model, demonstrating greater stability and accuracy, particularly in complex flood scenarios.
In addition, an analysis was conducted using precision, recall, F1-score, and the CSI to evaluate the model's ability to correctly predict flood events. The multi-task model demonstrated a substantial improvement in recall and F1-score, indicating better flood event detection capabilities, albeit with a slight increase in false alarms. The CSI was also higher for the multi-task model, reflecting its overall superior performance in predicting flood events.
Real-time prediction tests for actual rainfall events further validated the improved accuracy of the multi-task model and its potential applicability to operational flood forecasting and warning systems. This test was crucial for assessing the practical utility of the model in water resource management and flood risk mitigation, demonstrating its potential to enhance early warning systems and improve flood preparedness. In conclusion, this study marks a significant progress in flood prediction methodologies, providing a more comprehensive approach for forecasting and categorising flood incidents.
ACKNOWLEDGEMENTS
This work was supported by a Korea Environmental Industry & Technology Institute (KEITI) grant funded by the Ministry of Environment, South Korea (Grant # 2022003610003).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.