This study proposes a multi-task deep learning model for simultaneous prediction of time-series water levels and flood risk thresholds, aiming to enhance flood forecasting precision. Using AutoKeras, single-task and multi-task models were optimised to predict water levels 10–360 min ahead based on 720 min of prior data. The multi-task model consistently outperformed the single-task model across multiple evaluation metrics, including correlation coefficients, root mean squared error, Nash–Sutcliffe efficiency, and Kling–Gupta efficiency scores. Real-time prediction tests on actual rainfall events further validated the multi-task model's improved accuracy and applicability in operational flood forecasting. The study demonstrates significant progress in flood prediction methodologies, offering a more comprehensive approach to forecasting and categorising flood incidents.

  • A multi-task deep learning model enhances flood forecasting precision.

  • AutoKeras optimises single-task and multi-task models for water level prediction.

  • The multi-task approach outperformed the single-task approach in terms of correlation, RMSE, Nash–Sutcliffe efficiency, and Kling–Gupta efficiency scores.

  • Real-time testing validates improved accuracy for operational flood forecasting.

  • Comprehensive methodology advances flood prediction and categorisation.

Accurate and prompt prediction of river floods is crucial for efficient flood risk management and disaster mitigation strategies. The capacity to anticipate flood occurrences with a notable level of accuracy is essential to diminish the severe socioeconomic consequences. Given the escalating frequency and severity of extreme weather incidents due to climate change, the demand for reliable flood prediction systems has become increasingly urgent.

Recent technological progress, exemplified by the fusion of deep learning algorithms with conventional hydrological approaches, has notably enhanced the efficacy of flood prediction systems. Recent advances in hydrological forecasting have often debated the role of modelling and forecasting approaches in predicting water levels and managing flood risks. As Herath et al. (2021) highlighted, the distinction between modelling and forecasting is critical in hydrological applications, where data-driven methods, such as genetic programming or artificial neural networks, can be effective for short-term predictions but may benefit from the inclusion of physical models to enhance accuracy (Herath et al. 2021). For example, Chadalawada et al. (2020) demonstrated the effectiveness of Gaussian process-based machine learning in hydrological applications, specifically for rainfall-runoff modelling. This approach highlights the potential of data-driven techniques to improve prediction accuracy by automatically generating models based on observed data without the need for a detailed physical representation.

This study adopts a data-driven approach focused on forecasting water levels and issuing flood warnings using a multitasked deep learning model optimised for short-term predictions. By leveraging historical water level and rainfall data, the proposed method aims to improve the operational efficiency of flood-forecasting systems without the need for detailed physical modelling. Conventional hydrological techniques employed in flood forecasting frequently encounter challenges stemming from their dependence on physical models, which may be constrained by the intricate nature and fluctuations of meteorological and hydrological phenomena (Amarilla et al. 2023). These models require a significant amount of data, which is a resource that may not always be readily available, particularly in areas with limited monitoring systems. This lack of data can potentially compromise the precision of predictions when these models are used under unfamiliar or fluctuating circumstances, ultimately resulting in suboptimal flood risk management strategies (Zhao et al. 2023).

Furthermore, in environments that are constantly changing owing to rapid hydrological variations, traditional models may prove inadequate for adapting instantaneously. Swift changes in flow rates and land-use patterns can lead to decreased effectiveness of current models, thereby causing delays or inaccuracies in flood forecasts (Trošelj et al. 2023). Hence, the amalgamation of real-time data and machine learning techniques with traditional forecasting methods can significantly enhance their predictive capabilities.

Forecasting the water level of rivers using deep learning techniques has experienced noteworthy progress and a wide range of practical uses, thereby improving flood management. Most deep learning models found in hydrology literature have primarily focused on single-task training, aiming to predict a single variable, such as streamflow or water level (Hu et al. 2018; Bowes et al. 2019; Kratzert et al. 2019; Ng et al. 2023). The utilisation of multi-task models has demonstrated enhanced generalisation and reduced overfitting, as evidenced in various fields such as natural language processing and computer vision (Seltzer & Droppo 2013; Chen et al. 2014; Girshick 2015; Ruder 2017).

Within hydrological studies, numerous variables exhibit interrelations, highlighting the potential benefits of a multi-task model in exploiting information from correlated variables for improved predictions. Specifically, by training multiple variables governed by common physical processes, a multi-task deep learning model can effectively capture shared hydrologic mechanisms, resulting in more accurate predictions of target hydrologic variables (Sadler et al. 2022).

Multi-task modelling presents itself as a potential strategic approach aimed at alleviating a particular obstacle encountered by deep learning models, which pertains to the necessity of possessing extensive observational datasets for model training. This challenge is especially pronounced in the realm of Earth sciences, where the scarcity of data is a prevalent issue owing to the elevated expenses associated with the collection of such data. By engaging in multi-task learning, it is possible to effectively harness the information originating from one specific variable to bolster the predictive precision pertaining to another interconnected variable, particularly in cases where the latter variable is inadequately represented within the existing dataset. Multi-task deep learning (MTDL) has shown significant promise in hydrological applications by leveraging the interdependencies between various hydrological variables to improve prediction accuracy and model robustness. For instance, integrating spatial information and multi-task learning into long short-term memory networks has enhanced the performance of hydrological models in simulating runoff and actual evaporation, with Nash–Sutcliffe efficiency coefficients (NSEs) ≥ 0.82 and 0.95, respectively (Li et al. 2023). In another study, modelling the daily average streamflow and stream water temperature together using MTDL improved the prediction accuracy for 56/101 sites across the United States, although the benefits varied depending on the site and model configuration (Sadler et al. 2022). In satellite-based precipitation estimation, a novel MTDL framework that simultaneously trains rain/no-rain classification and rain-rate regression tasks outperformed conventional single-task models, highlighting the efficiency of knowledge transfer between tasks (Bannai et al. 2023). These studies collectively underscore the potential of multitasked deep learning in enhancing the accuracy, robustness, and interpretability of hydrological models by effectively leveraging interdependencies.

Flood warnings and alerts were issued to facilitate citizen evacuation and restrict river access before a flood occurred. Although predicting the time series of water levels is crucial for determining when to issue warnings, it is equally important to accurately predict whether water levels will exceed the established warning and alert criteria. Therefore, this study proposes a flood prediction methodology that can be effectively utilised in operational flood forecasting by developing a multi-task deep learning model. This model aims to simultaneously predict time-series water levels and determine whether flood warning and alert thresholds will be exceeded, in contrast to traditional single-task deep learning models that focus solely on time-series water level prediction.

In this study, the term ‘forecasting’ was used to describe the short-term prediction of water levels for operational flood warning purposes. The developed model is not intended to represent physical hydrological processes but rather to predict future water levels and flood risks based on observed data.

To achieve this objective, we employed AutoKeras, an open-source library that supports the automatic design and optimisation of deep learning models. This approach was chosen to minimise the influence of trial-and-error methods and human experience, which significantly affect the model structure composition and hyperparameter optimisation. This research focused on comparing the differences between single- and multi-task applications rather than optimising the deep learning model for prediction.

Furthermore, we evaluated the prediction accuracy of each model by performing real-time predictions for actual rainfall events and examined the applicability of the multi-task model in flood forecasting and warning operations.

Study area

The study area focuses on the upper Hantan Basin, which is located in the central region of the Korean Peninsula and is primarily situated within Gangwon Province. As shown in Figure 1, the basin extends upstream from the Hangtangdaegyo Bridge. The Hantan River, a principal tributary of the Imjin River, is the predominant waterway in this region. The mainstream length is 175.81 km and the total drainage area is 1,063.96 km². Three water level observation stations were established to monitor the hydrological conditions within the basin: Samhapgyo (Station 1), Jangsudaegyo (Station 2), and Hangtangdaegyo (Station 3). These stations play crucial roles in collecting data for water resource management and flood prediction. Rainfall data for the basin were gathered from six nearby rainfall observation stations: Cheongyangri, Odeokchogyo, Seomyeonchocho, Damokchogyo, Yongdamgyo, and Myeongwallri. This network of observation points provides comprehensive hydrological data that are essential for understanding the basin's water dynamics and supporting the objectives of this study.
Figure 1

Study area and location of water level and rain gauge stations.

Figure 1

Study area and location of water level and rain gauge stations.

Close modal

Data

Data collection

This study gathered sequential water level data from Samhapgyo (Station 1), Jangsudaegyo (Station 2), and Hangtangdaegyo (Station 3) between 1 July 2019 and 31 October 2021 encompassing measurements taken at 10-min intervals. This corresponds to Figures 2 and 3(b)–3(d), respectively. Rainfall data were used to calculate the mean annual rainfall using the Thiessen weighted average method and observations from six nearby rainfall stations. This corresponds to (a) in Figures 2(a) and 3(b).
Figure 2

Observed rain and water level data for training. (a) Rain data, (b) water level data (Station 1), (c) water level data (Station 2), (d) water level data (Station 3).

Figure 2

Observed rain and water level data for training. (a) Rain data, (b) water level data (Station 1), (c) water level data (Station 2), (d) water level data (Station 3).

Close modal
Figure 3

Observed rain and water level data for the test. (a) Rain data, (b) water level data (Station 1), (c) water level data (Station 2), (d) water level data (Station 3).

Figure 3

Observed rain and water level data for the test. (a) Rain data, (b) water level data (Station 1), (c) water level data (Station 2), (d) water level data (Station 3).

Close modal

As shown in Figures 2 and 3, the rainfall and water level data collected from 1 July 2019 to 31 October 2021 were used for training the deep learning model, while the data from 1 April 1 to 31 October 2022 were used for testing. In South Korea, the monsoon season (May–October) is the primary period when significant flood events occur, and flood forecasting and warning operations are primarily concentrated during this period. Therefore, only data from these months, where meaningful hydrological activity is present, were selected for training and testing, while data from dry or low water level periods were excluded as they contribute little to the model's ability to forecast floods. Additionally, the training data begin on 1 July, 2019, as this is the earliest point at which all three water level observation stations used as model inputs became operational, ensuring consistent and comprehensive input data for model development.

Data processing for training and validation

The river water level prediction model developed in this study was designed for real-time applications. Consequently, the training and validation datasets were structured to emulate real-time input data formats. The model aims to predict 36 time points up to 360 min in the future using observational data from 72-time points spanning the past 720 min to the present. Furthermore, because peak water level prediction is crucial for flood warning decisions, the prediction target values of the training dataset were selected from points where water levels began to rise after rainfall events. The dataset was extracted by sliding the time axis until peak water levels occurred. Figure 4 illustrates a segment of the dataset (Station 3) formulated in accordance with the specified data processing.
Figure 4

Example of the sliding-window method used to generate input–output pairs for sequence-to-sequence prediction. This figure shows the structuring of past rainfall and water level data as input and future water level predictions as the target.

Figure 4

Example of the sliding-window method used to generate input–output pairs for sequence-to-sequence prediction. This figure shows the structuring of past rainfall and water level data as input and future water level predictions as the target.

Close modal

The model was designed to use only past water level observations from the target station and nearby stations, as well as rainfall data. Specifically, to predict water levels up to 6 h in the future, the input features consisted solely of historical observations from the three stations and rainfall up to the time of prediction. The model does not use any future data from the target station as input. This approach ensures that the model does not access future information and adheres to strict predictive modelling standards, thereby avoiding any form of data leakage.

The data processing procedure can be described in detail as follows. This process begins with data cleaning and normalisation. We employed a data imputation technique to handle missing values in the dataset. Specifically, we used a median imputation method for the numeric columns. This approach replaces missing values with the median of each column to ensure data continuity while minimising the impact of potential outliers. The data were standardised using Scikit-learn's StandardScaler to ensure that the features had a mean of 0 and unit variance, which is a common preprocessing step in machine learning.

Time-series data for sequence-to-sequence prediction were prepared using the sliding-window approach. We created input–output pairs using this method, where each input sequence consisted of 72 time steps (input_steps) encompassing all features (rainfall and water level data for the three stations). Correspondingly, each output sequence contained 36 time steps (output_steps) for the target variable (water level at Station 3). The sliding-window technique was applied across the entire dataset by incrementally shifting the window to create numerous overlapping sequences.

In the constructed dataset, periods without rainfall and consequent water level changes were more prevalent than those with rainfall-induced water level rise. If all the data were used for training, the model would be optimised for predicting low water level increases, potentially compromising its performance during significant flood events. To address this imbalance and enhance the model performance, particularly in predicting peak flows and rising water levels, a filtering mechanism was implemented. We applied two key criteria: the maximum rainfall (RAIN) within the input window should be ≥ 0.01, and the maximum water level (Station 3) within the output window should be≥ Elevation Level (E.L.) 2. This filtering ensured that the model was trained on data periods that included rainfall and consequential water level changes, thereby potentially improving its predictive capability for flood events.

For the classification of the multi-task model, a sophisticated labelling scheme was devised based on predefined water level thresholds. We categorised the maximum predicted water level into five classes (0–4), corresponding to different severity levels of flooding.

  • (0) Normal: No flood risk (<E.L.5.0 m)

  • (1) Concern: Elevated water levels, requiring increased monitoring (≥ E.L.5.0 m)

  • (2) Caution: Potential for minor flooding, increased vigilance necessary (≥ E.L.8.0 m)

  • (3) Alert: Significant flood risk, preparation for flood mitigation measures required (≥ E.L.9.5 m)

  • (4) Severe: Severe flooding expected, immediate action necessary (≥ E.L.11.8 m)

The criteria for categorising the maximum predicted water levels were based on the thresholds defined by the Korea Flood Control Office. These thresholds are designed to operate within a four-stage flood warning system – Concern, Caution, Alert, and Severe – based on meteorological and hydrological conditions. Each stage corresponds to a predefined water level threshold at each observation station.

These integer labels are then converted into a one-hot encoded format, preparing them for use in multiclass classification models. The resulting preprocessed dataset exhibited specific characteristics tailored to both regression and classification tasks for water level prediction. By integrating both continuous (regression) and categorical (classification) outputs, the model was designed to provide precise water level forecasts and broader flood risk assessments, enhancing its utility for flood management and early warning systems.

Multi-task model

Time-series water level prediction data, which are the output of flood forecasting models, are used for issuing flood warnings. Flood warnings are issued when the maximum predicted water level exceeds the predefined thresholds for flood forecasting, or warnings at specific flood forecast points. However, single-task time-series prediction models have limitations in accurately predicting the peak water levels because they are trained to optimise the average error over the entire prediction period.

To enhance the performance of flood warning issuance, a multi-task model is necessary. This model should not only predict time-series water levels but also determine whether warning thresholds will be exceeded based on observational data collected up to the present. Multi-task learning has been used to process various tasks using a single model. Rather than having separate models for tasks such as water level time-series prediction and flood warning threshold exceedance prediction using a single model for multiple tasks offers several advantages (one forward propagation, one backpropagation, and a lower parameter count).

These benefits ensure real-time operational efficiency. Furthermore, when multiple tasks are interrelated, learning them together can improve overall performance (Vandenhende et al. 2021).

A multi-task deep learning model was designed to perform multiple tasks simultaneously, leveraging shared information across tasks to improve overall performance. This approach is in contrast to single-task models, which focus on predicting only one variable at a time. Multi-task models can generalise better and reduce overfitting by learning from multiple related tasks.

Multi-tasks can be particularly useful in scenarios where data are sparse or expensive to collect, as they can leverage information from one variable to improve the prediction of another related variable. When trained on variables driven by the same underlying physical processes, these models can represent shared processes better, leading to more accurate predictions.

Model configuration, training and validation

Model configuration using AutoKeras

In this study, we utilised AutoKeras to automatically determine the optimal neural network architecture and optimise both single- and multi-task models. The primary objective was to apply a multitasked deep learning approach to improve the accuracy of river water level time series predictions and flood occurrence forecasting. Using AutoKeras, we aimed to automate the architecture search and optimisation process, thus minimising potential human biases and manual intervention. This allowed for a more objective comparison of the performances of the single-task and multi-task models.

AutoKeras is an open-source library that supports the automatic design and optimisation of deep learning models. This approach was chosen to minimise the influence of trial-and-error methods and human experience, which significantly affect the model structure composition and hyperparameter optimisation. AutoKeras was developed by the DATA Lab at Texas A&M University (Jin et al. 2019). Implemented in Python 3 and built on the Keras library, this tool automates tasks such as hyperparameter tuning and model optimisation, allowing users to quickly adapt machine learning models to a variety of tasks. The AutoKeras model architecture search process utilises a Neural Architecture Search and Bayesian optimisation techniques. The process starts by defining an initial architecture and then generates and modifies various architectures to find the optimal model for the given dataset and task.

The strength of AutoKeras lies in its ability to deliver high-performance deep learning networks with minimal user intervention. It automates crucial processes, such as feature engineering (including mining, selection, and construction) and network configuration (encompassing hyperparameter selection and fine-tuning). This automation streamlines the complex and time-consuming process required to develop optimal deep learning models (Perez 2019; Alaiad et al. 2023).

Model training and validation for the single-task model

In this study, we employed AutoKeras to optimise a deep learning model for water level prediction. The preprocessed dataset was strategically partitioned into training and validation sets at an 80:20 ratio to ensure a representative distribution for model evaluation. To address the specific requirements of hydrological modelling, we implemented a custom metric called NSE. The NSE, which is crucial for assessing the hydrological model performance, is defined as
where y_true represents observed values and y_pred denotes predicted values at time i.

The model architecture search was guided by the mean absolute error (MAE) as the loss function, which was used to minimise absolute errors during both training and validation. Furthermore, the NSE was incorporated as a custom metric to evaluate the model performance during both training and validation, specifically in the context of hydrological forecasting. This dual approach ensured that the model was optimised using the MAE for general error minimisation, whereas the NSE provided a domain-specific assessment of hydrological prediction accuracy. To balance the thorough exploration of the model architectures and computational efficiency, we constrained the search space to a maximum of 50 trials. The optimal single-task model, as determined by AutoKeras, implemented a regression model with the following hyperparameters: a dropout rate of 0.5 for the regression head, stochastic gradient descent (SGD) optimiser, and a learning rate of 0.01 (as shown in Table 1).

Table 1

Summarises the hyperparameters and key metrics for single-task model training and validation

CategoryParameter/metricValue
Hyperparameters regression_head_1/dropout 0.5 
optimizer SGD 
learning rate 0.01 
Training metrics loss 0.2300 
nse 0.7315 
Validation metrics val_loss 0.1773 
val_nse 0.9646 
Trial info Trial ID 
Best Step 98 
Score 0.1773 
CategoryParameter/metricValue
Hyperparameters regression_head_1/dropout 0.5 
optimizer SGD 
learning rate 0.01 
Training metrics loss 0.2300 
nse 0.7315 
Validation metrics val_loss 0.1773 
val_nse 0.9646 
Trial info Trial ID 
Best Step 98 
Score 0.1773 

The model was trained for 98 epochs, and the best performance was achieved in the final epoch. The NSE was employed as the primary performance metric, given its widespread use in hydrological modelling. The model demonstrated a substantial improvement from the training to the validation phase, with the NSE increasing from 0.7315 to 0.9646.

Interestingly, both the loss and NSE metrics showed superior results in the validation set compared with the training set. The validation loss (0.1773) was lower than the training loss (0.2300), whereas the validation NSE (0.9646) significantly outperformed the training NSE (0.7315). This pattern, illustrated in Figure 5, indicates that the model not only avoids overfitting but also exhibits enhanced predictive capabilities on the validation dataset.
Figure 5

Loss graph of training and validation for each epoch (single-task model). (a) Regression loss (MAE), (b) NSE.

Figure 5

Loss graph of training and validation for each epoch (single-task model). (a) Regression loss (MAE), (b) NSE.

Close modal
The structure of the optimised single-task model comprises several key components, as shown in Figure 6. The model accepts a 3D input of shape (none, 72, 4) in the input layer, where 72 represents the time steps and 4 corresponds to the number of features. A CastToFloat32 layer is employed in the data preprocessing stage to ensure consistent data type throughout the network by converting all inputs to the 32-bit floating-point (Float32) format. This standardisation helps maintain computational efficiency by enabling faster and more optimised processing within the deep learning framework. Additionally, using Float32 reduces memory usage compared to higher precision formats, improving the model's ability to handle large-scale data while ensuring numerical stability throughout training and inference. For regularisation, a dropout layer is applied directly to the input, suggesting that the model benefits from input feature regularisation. Dropout was implemented as a regularisation technique to prevent overfitting by randomly setting a fraction of input units to zero during training, forcing the model to learn more generalised features. In the single-task model, a dropout rate of 0.5 was applied to the regression head. These rates were selected by the AutoKeras optimisation process to ensure optimal model performance without overfitting. In the feature extraction stage, a flattened layer transformed the 3D input into a 1D vector of 288 features (72 × 4). The output layer consisted of a single dense layer with 36 units used as the regression head, which directly mapped the flattened features to the 36 time steps of the water level predictions. The model contains 10,404 trainable parameters. The optimised model structure exhibited notable simplicity and efficiency. The absence of complex layers suggests that a straightforward architecture is sufficient for this water level prediction task, potentially offering faster inference and better interpretability. The direct mapping of the model from flattened inputs to predictions indicated the effective capture of temporal patterns. Placing a dropout immediately after the input suggests that input feature regularisation is crucial and could be beneficial for handling noisy hydrological data.
Figure 6

Structure of the optimised single-task model.

Figure 6

Structure of the optimised single-task model.

Close modal

Model training and validation for the multi-task model

We implemented a multi-task learning model that simultaneously addressed regression and classification tasks. The model architecture incorporated specific hyperparameters: a dropout rate of 0.25 was applied to the classification head to enhance regularisation, and a dropout rate of 0.0 was applied to the regression head, effectively disabled the dropout for that task. And a ‘flatten’ spatial reduction type for the classification head. The Adam optimiser was employed with a learning rate of 0.001. The model achieved optimal performance at epoch 34, as shown in Table 2. The NSE was utilised as the primary metric for the regression task, whereas accuracy was employed for the classification task.

Table 2

Summarises the hyperparameters and key metrics for multi-task model training and validation

CategoryParameter/metricValue
Hyperparameters regression_head_1/dropout 0.0 
classification_head_1/spatial_reduction_1/reduction_type Flatten 
classification_head_1/dropout 0.25 
optimizer Adam 
learning_rate 0.001 
Training metrics loss 0.2355 
regression_head_1_loss 0.1359 
classification_head_1_loss 0.0996 
regression_head_1_nse 0.9330 
classification_head_1_accuracy 0.9744 
Validation metrics val_loss 0.1780 
val_regression_head_1_loss 0.1255 
val_classification_head_1_loss 0.0525 
val_regression_head_1_nse 0.9787 
val_classification_head_1_accuracy 0.9841 
Trial info Trial ID 19 
Best step 34 
Score 0.1780 
CategoryParameter/metricValue
Hyperparameters regression_head_1/dropout 0.0 
classification_head_1/spatial_reduction_1/reduction_type Flatten 
classification_head_1/dropout 0.25 
optimizer Adam 
learning_rate 0.001 
Training metrics loss 0.2355 
regression_head_1_loss 0.1359 
classification_head_1_loss 0.0996 
regression_head_1_nse 0.9330 
classification_head_1_accuracy 0.9744 
Validation metrics val_loss 0.1780 
val_regression_head_1_loss 0.1255 
val_classification_head_1_loss 0.0525 
val_regression_head_1_nse 0.9787 
val_classification_head_1_accuracy 0.9841 
Trial info Trial ID 19 
Best step 34 
Score 0.1780 

Both tasks demonstrated impressive performance with notable improvements from training to validation, as shown in Figure 7.
Figure 7

Loss graph of training and validation for each epoch (multi-task model). (a) Regression loss (MAE), (b) classification loss, (c) NSE, (d) classification accuracy.

Figure 7

Loss graph of training and validation for each epoch (multi-task model). (a) Regression loss (MAE), (b) classification loss, (c) NSE, (d) classification accuracy.

Close modal

The regression task exhibited high NSE values that increased from 0.9330 during training to 0.9787 during validation. Concurrently, the classification task achieved remarkable accuracy, which increased from 97.44% during training to 98.41% during validation. This high accuracy coupled with the improvement in the validation set suggests that the model effectively learned to categorise the data without overfitting.

The total loss decreased from 0.2355 during training to 0.1780 during validation, with both the regression and classification components of the loss function showing an improvement.

The preprocessed dataset consisting of input features (X), regression targets (Y), and classification labels (categorical labels) was partitioned into training and validation sets in an 80:20 ratio. The multi-task deep learning model was configured using AutoKeras to simultaneously address the regression and classification tasks for water level prediction. The model architecture was designed with a shared input node for both tasks to facilitate the learning of common features. Two distinct output heads were implemented: a regression head utilising MAE as the loss function and incorporating an NSE metric and a classification head employing categorical cross-entropy loss and an accuracy metric. To balance the exploration of the model architectures with computational efficiency, the AutoKeras search space was constrained to a maximum of 50 trials. The training process was structured with a maximum of 100 epochs and incorporated task-specific early stopping mechanisms to prevent overfitting and optimise resource utilisation. For the regression task, the validation regression loss was monitored, whereas the classification task was overseen using accuracy.

The AutoKeras optimisation process yielded a multitasked deep learning model for simultaneous water level prediction and flood severity classification, as shown in Figure 8. The structure of the optimised model begins with an input layer that accepts a 3D input of shape (none, 72, 4), where 72 represents the time steps, and four corresponds to the number of features. A CastToFloat32 layer was employed in the data preprocessing stage to ensure a consistent data type throughout the network, potentially improving computational efficiency.
Figure 8

Structure of the optimised single-task model.

Figure 8

Structure of the optimised single-task model.

Close modal

For feature extraction, two parallel flattened layers were used to transform the 3D input into a 1D vector of 288 features (72 × 4). This approach allows the model to capture temporal dependencies while maintaining a relatively low number of parameters. The model branches into task-specific paths. The classification branch applies a dropout layer to the flattened features, which is likely to prevent overfitting. This is followed by a dense layer with five units corresponding to the number of flood severity classes. The final softmax activation function produces class probabilities. The regression branch consisted of a single dense layer with 36 units directly connected to the flattened input, producing water level predictions for 36 time steps ahead.

The model has two output heads: a softmax layer for classification, output probabilities for five flood severity classes, and a linear output layer for regression, predicting 36 future water level values. In terms of complexity, the model contained 11,849 trainable parameters.

The optimised model structure exhibited several characteristics. The use of parallel flattened layers suggests that the model leverages a shared representation of the input data for both tasks, potentially allowing beneficial information transfer between the regression and classification objectives. The inclusion of a dropout layer only in the classification branch indicated that the optimisation process identified different regularisation requirements for each task. Architectural simplicity, evident in the absence of complex convolutional or recurrent layers, suggests that, for this specific problem, a simpler architecture is sufficient, potentially leading to faster inference times and easier interpretability. The direct connection of the regression branch from the flattened input to the output suggests that the model learned the direct mapping from the input time series to the predicted water levels.

The learning curves for both the single-task and multi-task models displayed inconsistencies, especially during early epochs, which constitutes a typical occurrence in deep learning frameworks subjected to highly variable hydrological time-series datasets. This variability, particularly in rainfall and water levels, can cause fluctuations in the learning processes. Additionally, the automated architecture search process in AutoKeras introduced variations in the training behaviour as it explored different model configurations. Despite these initial fluctuations, both models demonstrated stable convergence in the later stages of training, indicating successful optimisation.

One constraint inherent to this study was the limited dataset used for training and validation. The data comprised approximately 3 years of rainfall and water level observations, focusing mainly on the rainy seasons. This limited temporal scope may have led to reduced variability in the validation set, making the prediction task less challenging during validation than during training. Consequently, the validation loss was consistently lower than the training loss, which was particularly evident in the single-task model. Future research should aim to expand the dataset to encompass a wider range of hydrological conditions, including more diverse seasonal data and extended timeframes to improve the representativeness of the training and validation datasets.

In this study, we evaluated the accuracy of the water level time series and flood risk predictions using pre-trained single- and multi-task models on the test dataset, as shown in Figure 3. The predicted results were compared with actual observed water levels to assess the performance of the models. To execute this evaluation, we utilised 72-time steps of rainfall and water level data from three stations spanning 720 min prior to the current time. Using this input, we predicted the water levels and flood risk levels for the target water level station (Hangtangdaegyo) 10–360 min ahead.

In this study, we analysed the water level prediction results for four representative rainfall events during the test period: Event 1 (31 July 2022, 22:30–23:30), Event 2 (8 August 2022, 10:20–11:30), Event 3 (9 August 2022, 17:10–19:30), and Event 4 (5 September 2022, 21:30–23:50). The detailed results are presented in Figures S1–S4.

Figure 9 illustrates the time-series water level predictions and flood risk levels for Event 1 on 31 July 2022 at 22:30, 23:00, and 23:30, comparing the single-task model (left) and the multi-task model (right). As the single-task model did not directly predict flood risk levels, we inferred these from the water level. During Event 1, rainfall caused a water level increase that remained below the ‘Concern’ level of flood risk (real flood level: 0). Both the single-task and multi-task models correctly predicted flood occurrences below the ‘Concern’ level. Examining the water level time-series predictions at 22:30, the single-task model showed significant deviations from the observed water levels (black dotted points). However, these discrepancies tended to decrease in subsequent predictions up to 23:30. In contrast, the multi-task model accurately predicted the rising trend of the water levels, which closely matched the observed data predicted at 22:30. From 22:30 to 23:30, the multi-task model slightly underestimated the water levels compared to the observed data.
Figure 9

Predicted water level using single-task (left panel) and multi-task (right panel) deep learning models for 31 July 2022, between 22:30 and 23:30 p.m. (a) 31 July 2022, 22:30. (b) 31 July 2022, 22:30. (c) 31 July 2022, 23:00. (d) 31 July 2022, 23:00. (e) 31 July 2022, 23:30. (f) 31 July 2022, 23:30.

Figure 9

Predicted water level using single-task (left panel) and multi-task (right panel) deep learning models for 31 July 2022, between 22:30 and 23:30 p.m. (a) 31 July 2022, 22:30. (b) 31 July 2022, 22:30. (c) 31 July 2022, 23:00. (d) 31 July 2022, 23:00. (e) 31 July 2022, 23:30. (f) 31 July 2022, 23:30.

Close modal
Figure 10 shows the results of the time-series water level predictions and flood risk levels using the single-task model (left) and the multi-task model (right) during Event 2 on 8 August 2022 at 10:30, 11:00, 11:20, and 11:30. On 8 August at 14:30, the actual flood risk level exceeded the ‘Concern’ level due to rainfall. Both the single-task model and the multi-task model predicted that the peak river water level would not exceed the ‘Concern’ level of EL.5.0 m within the 360 min ahead predicted at 10:30. The single-task model predicted that the peak water level would exceed EL of 5.0 m predicted at 11:10. In contrast, the multi-task model predicted that the flood risk level would exceed the level of concern predicted at 11:00.
Figure 10

Predicted water level using single-task (left panel) and multi-task (right panel) deep learning models for 8 August 2022, between 10:30 and 11:30 a.m. (a) 8 August 2022, 10:30. (b) 8 August 2022, 10:30. (c) 8 August 2022, 11:00. (d) 8 August 2022, 11:00. (e) 8 August 2022, 11:10. (f) 8 August 2022, 11:10. (g) 8 August 2022, 11:30. (h) 8 August 2022, 11:30.

Figure 10

Predicted water level using single-task (left panel) and multi-task (right panel) deep learning models for 8 August 2022, between 10:30 and 11:30 a.m. (a) 8 August 2022, 10:30. (b) 8 August 2022, 10:30. (c) 8 August 2022, 11:00. (d) 8 August 2022, 11:00. (e) 8 August 2022, 11:10. (f) 8 August 2022, 11:10. (g) 8 August 2022, 11:30. (h) 8 August 2022, 11:30.

Close modal
Figure 11 shows the results of the time-series water level predictions and flood risk levels during Event 3 on 9 August 2022 at 18:00, 18:10, 18:10, and 18:30. On 10 August at 00:00, the actual flood risk level exceeded the ‘Concern level’ due to rainfall. At the 18:00 forecast time, the single-task model predicted that the peak river water level would not exceed the ‘Concern level’ of EL.5.0 m within the next 360 min, and it only predicted that the peak river water level would exceed EL.5.0 m from the 19:30. From the forecast time 18:00, the multi-task model predicted that a flood at the ‘Concern level’ would occur within the next 360 min. However, based on the 18:00 prediction, no actual flooding occurred over the next 360 min. From the 18:10 prediction time onwards, the multi-task model consistently and accurately predicted that an actual flood would occur within the next 360 min.
Figure 11

Predicted water level using single-task (left panel) and multi-task (right panel) deep learning models for 9 August 2022, between 18:00 and 19:30 p.m. (a) 9 August 2022, 18:00. (b) 9 August 2022, 18:00. (c) 9 August 2022, 18:10. (d) 9 August 2022, 18:10. (e) 9 August 2022, 19:10. (f) 9 August 2022, 19:10. (g) 9 August 2022, 19:30. (h) 9 August 2022, 19:30.

Figure 11

Predicted water level using single-task (left panel) and multi-task (right panel) deep learning models for 9 August 2022, between 18:00 and 19:30 p.m. (a) 9 August 2022, 18:00. (b) 9 August 2022, 18:00. (c) 9 August 2022, 18:10. (d) 9 August 2022, 18:10. (e) 9 August 2022, 19:10. (f) 9 August 2022, 19:10. (g) 9 August 2022, 19:30. (h) 9 August 2022, 19:30.

Close modal
Figure 12 shows the time-series water level predictions and flood risk levels during Event 4 on 5 September 2022 at 21:30, 22:00, 22:10, and 23:50. Due to rainfall at 04:10 on 6 September the actual flood risk exceeded the ‘Concern level’. The single-task model predicted that the peak river level would exceed the ‘Concern level’ of EL.5.0 m within 360 min from the prediction time of 23:50. The multi-task model predicted at 21:30 that a flood exceeding the ‘Concern level’ would occur. From the 22:10 prediction onwards, it was consistently forecast that a flood would occur within the next 360 min.
Figure 12

Predicted water level using single-task (left panel) and multi-task (right panel) deep learning models for 5 September 2022, between 21:30 and 23:50 p.m. (a) 5 September 2022, 21:30. (b) 5 September 2022, 21:30. (c) 5 September 2022, 22:00. (d) 5 September 2022, 22:00. (e) 5 September 2022, 22:10. (f) 5 September 2022, 22:10. (g) 5 September 2022, 23:50. (h) 5 September 2022, 23:50.

Figure 12

Predicted water level using single-task (left panel) and multi-task (right panel) deep learning models for 5 September 2022, between 21:30 and 23:50 p.m. (a) 5 September 2022, 21:30. (b) 5 September 2022, 21:30. (c) 5 September 2022, 22:00. (d) 5 September 2022, 22:00. (e) 5 September 2022, 22:10. (f) 5 September 2022, 22:10. (g) 5 September 2022, 23:50. (h) 5 September 2022, 23:50.

Close modal

This study compared the performance of single- and multi-task models for predicting river water levels at 10-min intervals with a lead time of up to 360 min. The models were evaluated using four metrics (Correlation Coefficient, RMSE, NSE, Kling–Gupta efficiency (KGE)) for the four flood events, as shown in Table 3 (Chadalawada & Babovic 2019). The multi-task model consistently achieved higher correlation coefficients, particularly for the July and August 7 events. This indicates a stronger linear relationship between the predicted and observed water levels, suggesting improved prediction accuracy. In most cases, the multi-task model exhibited lower RMSE values than the single-task model. This trend was particularly pronounced for the August 7 and August 9 events, indicating that the multi-task model's predictions deviated less from the observed values. Although both models exhibited negative NSE values in some instances, the multi-task model generally achieved higher (more positive) NSE scores. This was particularly noticeable for the August 7 event, where the multi-task model consistently outperformed the single-task model in terms of NSE.

Table 3

Evaluation results for predicted water level of single-task and multi-task models

Single-task model
Multi-task model
Prediction time
Correlation coefficientRMSENSEKGECorrelation coefficientRMSENSEKGE
31 July 2022 22:20 0.912 1.123 −1.762 0.097 0.958 1.166 −1.977 0.216 
22:30 0.962 1.081 −1.679 0.134 0.916 1.123 −1.891 0.256 
22:40 0.952 1.058 −1.729 0.165 0.940 1.025 −1.560 0.317 
22:50 0.967 1.021 −1.720 0.218 0.924 0.970 −1.455 0.325 
23:00 0.964 0.980 −1.695 0.249 0.912 0.893 −1.237 0.332 
23:10 0.956 0.962 −1.813 0.291 0.907 0.884 −1.376 0.361 
23:20 0.961 0.935 −1.979 0.357 0.942 0.793 −1.141 0.429 
23:30 0.965 0.925 −2.095 0.374 0.894 0.836 −1.529 0.418 
23:40 0.975 0.918 −2.247 0.438 0.897 0.817 −1.572 0.468 
23:50 0.964 0.916 −2.415 0.436 0.871 0.822 −1.751 0.357 
8 August 2022 10:20 0.959 0.587 0.108 0.371 0.952 0.465 0.441 0.396 
10:30 0.957 0.571 0.015 0.404 0.928 0.475 0.319 0.386 
10:40 0.968 0.552 −0.075 0.437 0.934 0.419 0.378 0.439 
10:50 0.961 0.445 0.181 0.600 0.951 0.238 0.766 0.630 
11:00 0.961 0.363 0.362 0.689 0.949 0.175 0.851 0.724 
11:10 0.977 0.272 0.590 0.809 0.956 0.229 0.710 0.825 
11:20 0.965 0.265 0.571 0.837 0.931 0.218 0.710 0.721 
11:30 0.964 0.215 0.689 0.869 0.910 0.285 0.456 0.753 
11:40 0.948 0.246 0.554 0.861 0.873 0.257 0.512 0.681 
11:50 0.945 0.270 0.381 0.833 0.837 0.232 0.543 0.695 
9 August 2022 17:10 0.901 0.433 0.090 0.405 0.806 0.483 −0.130 0.438 
17:20 0.935 0.403 0.213 0.387 0.781 0.450 0.017 0.398 
17:30 0.908 0.402 0.215 0.376 0.802 0.440 0.057 0.406 
17:40 0.930 0.375 0.301 0.357 0.785 0.403 0.192 0.388 
17:50 0.904 0.369 0.304 0.343 0.813 0.389 0.228 0.428 
18:00 0.920 0.352 0.349 0.358 0.842 0.370 0.281 0.471 
18:10 0.893 0.337 0.378 0.333 0.870 0.358 0.300 0.541 
18:20 0.900 0.329 0.380 0.336 0.897 0.355 0.279 0.599 
18:30 0.893 0.312 0.410 0.315 0.918 0.348 0.267 0.660 
18:40 0.903 0.296 0.439 0.331 0.915 0.351 0.207 0.681 
18:50 0.913 0.282 0.447 0.318 0.918 0.296 0.393 0.675 
18:00 0.894 0.260 0.488 0.308 0.915 0.297 0.334 0.701 
19:10 0.871 0.254 0.463 0.297 0.917 0.247 0.489 0.717 
19:20 0.825 0.243 0.445 0.301 0.885 0.236 0.476 0.706 
19:30 0.843 0.216 0.503 0.297 0.899 0.216 0.504 0.767 
5 September 2022 21:30 0.754 0.342 0.328 0.097 0.844 0.377 0.184 0.432 
21:40 0.713 0.342 0.305 0.113 0.794 0.368 0.196 0.403 
21:50 0.783 0.320 0.364 0.141 0.840 0.353 0.226 0.470 
22:00 0.796 0.304 0.396 0.131 0.867 0.352 0.192 0.514 
22:10 0.754 0.304 0.369 0.152 0.893 0.335 0.234 0.544 
22:20 0.762 0.294 0.383 0.137 0.859 0.337 0.191 0.524 
22:30 0.757 0.286 0.382 0.153 0.822 0.308 0.282 0.460 
22:40 0.846 0.257 0.464 0.191 0.843 0.298 0.277 0.511 
22:50 0.766 0.253 0.423 0.172 0.808 0.274 0.321 0.523 
23:00 0.811 0.228 0.483 0.206 0.850 0.251 0.377 0.591 
23:10 0.778 0.222 0.453 0.202 0.775 0.228 0.422 0.534 
23:20 0.753 0.216 0.427 0.247 0.791 0.205 0.482 0.555 
23:30 0.797 0.195 0.462 0.223 0.819 0.186 0.507 0.621 
23:40 0.767 0.178 0.474 0.248 0.801 0.188 0.417 0.655 
23:50 0.797 0.159 0.533 0.257 0.777 0.186 0.362 0.633 
Single-task model
Multi-task model
Prediction time
Correlation coefficientRMSENSEKGECorrelation coefficientRMSENSEKGE
31 July 2022 22:20 0.912 1.123 −1.762 0.097 0.958 1.166 −1.977 0.216 
22:30 0.962 1.081 −1.679 0.134 0.916 1.123 −1.891 0.256 
22:40 0.952 1.058 −1.729 0.165 0.940 1.025 −1.560 0.317 
22:50 0.967 1.021 −1.720 0.218 0.924 0.970 −1.455 0.325 
23:00 0.964 0.980 −1.695 0.249 0.912 0.893 −1.237 0.332 
23:10 0.956 0.962 −1.813 0.291 0.907 0.884 −1.376 0.361 
23:20 0.961 0.935 −1.979 0.357 0.942 0.793 −1.141 0.429 
23:30 0.965 0.925 −2.095 0.374 0.894 0.836 −1.529 0.418 
23:40 0.975 0.918 −2.247 0.438 0.897 0.817 −1.572 0.468 
23:50 0.964 0.916 −2.415 0.436 0.871 0.822 −1.751 0.357 
8 August 2022 10:20 0.959 0.587 0.108 0.371 0.952 0.465 0.441 0.396 
10:30 0.957 0.571 0.015 0.404 0.928 0.475 0.319 0.386 
10:40 0.968 0.552 −0.075 0.437 0.934 0.419 0.378 0.439 
10:50 0.961 0.445 0.181 0.600 0.951 0.238 0.766 0.630 
11:00 0.961 0.363 0.362 0.689 0.949 0.175 0.851 0.724 
11:10 0.977 0.272 0.590 0.809 0.956 0.229 0.710 0.825 
11:20 0.965 0.265 0.571 0.837 0.931 0.218 0.710 0.721 
11:30 0.964 0.215 0.689 0.869 0.910 0.285 0.456 0.753 
11:40 0.948 0.246 0.554 0.861 0.873 0.257 0.512 0.681 
11:50 0.945 0.270 0.381 0.833 0.837 0.232 0.543 0.695 
9 August 2022 17:10 0.901 0.433 0.090 0.405 0.806 0.483 −0.130 0.438 
17:20 0.935 0.403 0.213 0.387 0.781 0.450 0.017 0.398 
17:30 0.908 0.402 0.215 0.376 0.802 0.440 0.057 0.406 
17:40 0.930 0.375 0.301 0.357 0.785 0.403 0.192 0.388 
17:50 0.904 0.369 0.304 0.343 0.813 0.389 0.228 0.428 
18:00 0.920 0.352 0.349 0.358 0.842 0.370 0.281 0.471 
18:10 0.893 0.337 0.378 0.333 0.870 0.358 0.300 0.541 
18:20 0.900 0.329 0.380 0.336 0.897 0.355 0.279 0.599 
18:30 0.893 0.312 0.410 0.315 0.918 0.348 0.267 0.660 
18:40 0.903 0.296 0.439 0.331 0.915 0.351 0.207 0.681 
18:50 0.913 0.282 0.447 0.318 0.918 0.296 0.393 0.675 
18:00 0.894 0.260 0.488 0.308 0.915 0.297 0.334 0.701 
19:10 0.871 0.254 0.463 0.297 0.917 0.247 0.489 0.717 
19:20 0.825 0.243 0.445 0.301 0.885 0.236 0.476 0.706 
19:30 0.843 0.216 0.503 0.297 0.899 0.216 0.504 0.767 
5 September 2022 21:30 0.754 0.342 0.328 0.097 0.844 0.377 0.184 0.432 
21:40 0.713 0.342 0.305 0.113 0.794 0.368 0.196 0.403 
21:50 0.783 0.320 0.364 0.141 0.840 0.353 0.226 0.470 
22:00 0.796 0.304 0.396 0.131 0.867 0.352 0.192 0.514 
22:10 0.754 0.304 0.369 0.152 0.893 0.335 0.234 0.544 
22:20 0.762 0.294 0.383 0.137 0.859 0.337 0.191 0.524 
22:30 0.757 0.286 0.382 0.153 0.822 0.308 0.282 0.460 
22:40 0.846 0.257 0.464 0.191 0.843 0.298 0.277 0.511 
22:50 0.766 0.253 0.423 0.172 0.808 0.274 0.321 0.523 
23:00 0.811 0.228 0.483 0.206 0.850 0.251 0.377 0.591 
23:10 0.778 0.222 0.453 0.202 0.775 0.228 0.422 0.534 
23:20 0.753 0.216 0.427 0.247 0.791 0.205 0.482 0.555 
23:30 0.797 0.195 0.462 0.223 0.819 0.186 0.507 0.621 
23:40 0.767 0.178 0.474 0.248 0.801 0.188 0.417 0.655 
23:50 0.797 0.159 0.533 0.257 0.777 0.186 0.362 0.633 

The comparative analysis of the single-task and multi-task models evaluated using KGE revealed significant insights into their respective performances across various flood events. The multi-task model demonstrated superior consistency in performance compared to the single-task model. Across all observed time periods, the multi-task model generally achieved higher KGE values. On 8 August 2022, although both models reached similar peak KGE values (0.837 for the single-task and 0.825 for the multi-task), the multi-task model exhibited greater stability in its predictions over time, with fewer fluctuations than the single-task model. The multi-task model showed marked improvement in handling complex and rapidly changing flood scenarios. On 9 August 2022, at 19:30, the multi-task model achieved a KGE value of 0.767, significantly outperforming the single-task model by 0.297. Similarly, on 5 September 2022, the multi-task model demonstrated superior performance, with KGE values ranging from 0.403 to 0.591, compared with the single-task model's range of 0.097–0.257.

The multi-task model consistently exhibited superior performance across different time points and flood events.

In this study, the flood prediction performances of single- and multi-task models were evaluated based on their ability to predict the occurrence of floods within the next 6 h. Table 4 presents a contingency table comparing the performance of each model in predicting flood occurrences in 4,420 cases during the test from 1 April to 31 October 2022.

Table 4

Contingency table for test dataset

Observed
Yes (flood)No (No flood)
Single-task model Yes (Flood) 192 (Hits) 90 (False alarms) 
No (No flood) 49 (Misses) 4,097 (Correct negatives) 
Multi-task model Yes (Flood) 211 (Hits) 107 (False alarms) 
No (No flood) 30 (Misses) 4,080 (Correct negatives) 
Observed
Yes (flood)No (No flood)
Single-task model Yes (Flood) 192 (Hits) 90 (False alarms) 
No (No flood) 49 (Misses) 4,097 (Correct negatives) 
Multi-task model Yes (Flood) 211 (Hits) 107 (False alarms) 
No (No flood) 30 (Misses) 4,080 (Correct negatives) 

The analysis in Table 4 provides crucial insights into the capabilities of the models in terms of correctly identifying flood occurrences, false alarms, and missed events. Compared to the single-task model, the multi-task model showed a 9.9% improvement in flood detection, correctly identifying 19 additional flood events. This reduces the number of missed flood occurrences by 38.8% (19 fewer events). However, it increased the number of false alarms by 18.9% (17 more) and slightly decreased the number of correct negatives by 0.4% (17 fewer).

Table 5 presents performance metrics for flood forecasting using test dataset of additional performance metrics – precision, recall, F1-score, and critical success index (CSI) – for both models (Hicks et al. 2022). The multi-task model showed a significant increase in recall (87.56 vs. 79.67%), indicating better flood event detection. The multi-task model achieved a higher F1 score (0.7553 vs. 0.7345). Furthermore, when comparing precision, the single-task model had a slightly higher value (68.1%) compared to the multi-task model (66.4%). However, the multi-task model's superior recall led to an overall higher F1 score. Additionally, the CSI was higher for the multi-task model (0.607) compared to the single-task model (0.580), reflecting its better overall prediction accuracy despite a higher rate of false alarms. These results suggest that the multi-task model offers improved flood prediction capabilities, particularly in terms of event detection, while balancing a slight increase in false alarms.

Table 5

Performance metrics for flood forecasting using the test dataset

Single-task modelMulti-task model
Precision 0.681 0.664 
Recall 0.797 0.875 
F1-score 0.734 0.756 
CSI 0.580 0.607 
Single-task modelMulti-task model
Precision 0.681 0.664 
Recall 0.797 0.875 
F1-score 0.734 0.756 
CSI 0.580 0.607 

This study proposes a multi-task deep learning model to simultaneously predict time-series water levels and determine whether flood warnings and alert thresholds will be exceeded. This model aims to enhance the precision of flood forecasting and warning systems. This study employed AutoKeras, an open-source library, to automatically design and optimise single-task and multi-task deep learning models. This minimises the influence of trial-and-error methods and human experience on the model structure and hyperparameter optimisation. The models used 72 time steps of rainfall and water level data from the three stations, covering 720 min before the current time. Predictions were pre-trained for the target water level station (Hangtangdaegyo in the Hantan Basin) from 10 to 360 min ahead. While the model optimised in this study demonstrates effectiveness for flood and water level prediction, it may not represent the most optimal structure for these specific forecasting tasks. Although AutoKeras was employed to efficiently determine the basic model structure and optimise parameters during initial development, applying the latest deep learning architectures to the same dataset could potentially yield better results. For future improvements, we plan to secure a larger and more diverse dataset and enhance the configuration of training and testing data to develop a more specialised model for flood prediction.

Despite these limitations, the primary contribution of this study lies in developing a practical time-series forecasting tool that enhances the precision of flood warnings and short-term water level predictions. Rather than focusing on the detailed representation of hydrological processes, this approach prioritises data-driven methods for operational decision-making, making it particularly valuable for real-world applications in flood forecasting and risk management.

This study evaluated the accuracy of water level time-series and flood risk predictions using pre-trained single- and multi-task models. By testing four flood events, the multi-task model consistently outperformed its single-task counterpart, offering more accurate and reliable predictions of water levels and flood risks. The multi-task model exhibited superior performance across multiple evaluation metrics. Higher correlation coefficients were consistently achieved, indicating a stronger linear relationship between the predicted and observed water levels, suggesting an improved ability to capture the underlying patterns in water level fluctuations. Furthermore, the multi-task model generally produced lower RMSE values, demonstrating its capacity to generate predictions that aligned more closely with the observed values. NSE scores, a crucial metric in hydrological modelling, also favour the multitasking approach. Although both models occasionally produced negative NSE values, the multi-task model generally achieved higher and more positive scores. The multi-task model consistently achieved higher KGE values than the single-task model, demonstrating greater stability and accuracy, particularly in complex flood scenarios.

In addition, an analysis was conducted using precision, recall, F1-score, and the CSI to evaluate the model's ability to correctly predict flood events. The multi-task model demonstrated a substantial improvement in recall and F1-score, indicating better flood event detection capabilities, albeit with a slight increase in false alarms. The CSI was also higher for the multi-task model, reflecting its overall superior performance in predicting flood events.

Real-time prediction tests for actual rainfall events further validated the improved accuracy of the multi-task model and its potential applicability to operational flood forecasting and warning systems. This test was crucial for assessing the practical utility of the model in water resource management and flood risk mitigation, demonstrating its potential to enhance early warning systems and improve flood preparedness. In conclusion, this study marks a significant progress in flood prediction methodologies, providing a more comprehensive approach for forecasting and categorising flood incidents.

This work was supported by a Korea Environmental Industry & Technology Institute (KEITI) grant funded by the Ministry of Environment, South Korea (Grant # 2022003610003).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alaiad
A.
,
Migdady
A.
,
Al-Khatib
R. M.
,
Alzoubi
O.
,
Zitar
R. A.
&
Abualigah
L.
(
2023
)
Autokeras approach: A robust automated deep learning network for diagnosis disease cases in medical images
,
Journal of Imaging
,
9
(
3
),
64
.
Amarilla
G. A.
,
Stalder
D. H.
,
Pasten
M.
&
Pinto-Roa
D. P.
(
2023
). '
Comparative Analysis of Statistical and Recurrent Neural Network Models for Short-Term River Level Forecasting in the Paraguay River
',
2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI)
, pp.
1
6
.
Bannai
T.
,
Xu
H.
,
Utsumi
N.
,
Koo
E.
,
Lu
K.
&
Kim
H.
(
2023
)
Multi-task learning for simultaneous retrievals of passive microwave precipitation estimates and rain/no-rain classification
,
Geophysical Research Letters
,
50
,
e2022GL102283
.
https://doi.org/10.1029/2022GL102283
.
Chadalawada
J.
&
Babovic
V.
(
2019
)
Review and comparison of performance indices for automatic model induction
,
Journal of Hydroinformatics
,
21
(
1
),
13
31
.
Chadalawada
J.
,
Herath
H. M. V. V.
&
Babovic
V.
(
2020
)
Hydrologically informed machine learning for rainfall-runoff modeling: A genetic programming-based toolkit for automatic model induction
,
Water Resources Research
,
56
,
e2019WR026933
.
https://doi.org/10.1029/2019WR026933
.
Chen
D.
,
Mak
B.
,
Leung
C.-C.
&
Sivadas
S.
(
2014
). '
Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition
',
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, pp.
5592
5596
.
Girshick
R.
(
2015
). '
Fast R-CNN
',
2015 IEEE International Conference on Computer Vision (ICCV)
, pp.
1440
1448
.
Herath
H. M. V. V.
,
Chadalawada
J.
&
Babovic
V.
(
2021
)
Genetic programming for hydrological applications: To model or to forecast that is the question
,
Journal of Hydroinformatics
,
23
(
4
),
740
763
.
https://doi.org/10.2166/Hydro.2021.179
.
Hicks
S. A.
,
Strümke
I.
,
Thambawita
V.
,
Hammou
M.
,
Riegler
M. A.
,
Halvorsen
P.
&
Parasa
S.
(
2022
)
On evaluation metrics for medical applications of artificial intelligence
,
Scientific Reports
,
12
(
1
),
5979
.
https://doi.org/10.1038/s41598-022-09954-8
.
Hu
C.
,
Wu
Q.
,
Li
H.
,
Jian
S.
,
Li
N.
&
Lou
Z.
(
2018
)
Deep learning with a long short-term memory networks approach for rainfall-runoff simulation
,
Water
,
10
(
11
),
1543
.
https://doi.org/10.3390/w10111543
.
Jin
H.
,
Song
Q.
&
Hu
X.
(
2019
). '
Auto-keras: An efficient neural architecture search system
',
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
.
Anchorage, AK, USA
,
4–8 August 2019
, pp.
1946
1956
.
Kratzert
F.
,
Klotz
D.
,
Herrnegger
M.
,
Sampson
A. K.
,
Hochreiter
S.
&
Nearing
G. S.
(
2019
)
Toward improved predictions in ungauged basins: exploiting the power of machine learning
,
Water Resources Research
,
55
,
11344
11354
.
https://doi.org/10.1029/2019WR026065
.
Li
B.
,
Li
R.
,
Sun
T.
,
Gong
A.
,
Tian
F.
,
Khan
M. Y. A.
&
Ni
G.
(
2023
)
Improving LSTM hydrological modeling with spatiotemporal deep learning and multi-task learning: A case study of three mountainous areas on the Tibetan Plateau
,
Journal of Hydrology
,
620
,
129401
.
 https://doi.org/10.1016/j.jhydrol.2023.129401
.
Ng
K. W.
,
Huang
Y. F.
,
Koo
C. H.
,
Chong
K. L.
,
El-Shafie
A.
&
Ahmed
A. N.
(
2023
)
A review of hybrid deep learning applications for streamflow forecasting
,
Journal of Hydrology
,
25
(
Part B
),
130141
.
Perez
J. G. M.
(
2019
)
Autotext: AutoML for Text Classification
.
Master's thesis
.
Puebla, Mexico
:
National Institute of Astrophysics, Optics and Electronics
.
Ruder
S.
(
2017
)
An overview of multi-task learning in deep neural networks. ArXiv abs/1706.05098. Retrieved from https://api.semanticscholar.org/CorpusID:10175374
.
Sadler
J. M.
,
Appling
A. P.
,
Read
J. S.
,
Oliver
S. K.
,
Jia
X.
,
Zwart
J. A.
&
Kumar
V.
(
2022
)
Multi-task deep learning of daily streamflow and water temperature
,
Water Resources Research
,
58
(4), 1–18.
https://doi.org/10.1029/2021WR030138
.
Seltzer
M. L.
&
Droppo
J.
(
2013
). '
Multi-task learning in deep neural networks for improved phoneme recognition
',
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
, pp.
6965
6969
.
Trošelj
J.
,
Nayak
S.
,
Hobohm
L.
&
Takemi
T.
(
2023
)
Real-time flash flood forecasting approach for development of early warning systems: integrated hydrological and meteorological application
,
Geomatics, Natural Hazards and Risk
,
14
(
1
), 1–32.
https://doi.org/10.1080/19475705.2023.2269295
.
Vandenhende
S.
,
Georgoulis
S.
,
Van Gansbeke
W.
,
Proesmans
M.
,
Dai
D.
&
Van Gool
L.
(
2021
)
Multi-task learning for dense prediction tasks: A survey
,
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
44
(
7
),
3614
3633
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).