ABSTRACT
The utilisation of modelling tools in hydrology has been effective in predicting future floods by analysing historical rainfall and inflow data, due to the association between climate change and flood frequency. This study utilised a historical dataset of monthly inflow and rainfall for the Terengganu River in Malaysia, and it is renowned for its hydrological patterns that exhibit a high level of unpredictability. The evaluation of the predictive precision and effectiveness of the Optimised Decision Tree ODT model, along with the RF and GBT models, in this study involved analysing several indicators. These indicators included the correlation coefficient, mean absolute error, percentage of relative error, root mean square error, Nash-Sutcliffe efficiency, and accuracy rate. The research results indicated that the ODT and RF models performed better than the GBT model in predicting monthly inflows. The ODT model, as well as the RF and GBT models, showed validation results with average accuracies of 94%, 91%, and 92%, respectively. The R² values were 90.2%, 84.8%, and 96.0%, respectively, and the NES values ranged from 0.92 to 0.94. The results of this research have greater implications, extending beyond the forecasting of monthly inflow rates to encompass other hydro-meteorological variables that depend exclusively on historical input data.
HIGHLIGHTS
Introducing advanced model – optimised decision tree (ODT) for precise monthly inflow prediction, leveraging 50 years of rainfall and inflow data.
ODT outperforms gradient boosting tree and random forest models.
ODT predicts inflow levels well based on historical rainfall, outperforming other advanced models.
NOMENCLATURE
- AB
AdaBoost
- ANFIS
artificial neuro-fuzzy interface system
- ANN
artificial neural networks
- BDTR
boosted decision tree regression
- BLR
Bayesian linear regression
- CNN
convolutional neural network
- DFR
decision forest regression
- DT
decision tree
- GA
genetic algorithm
- GBT
gradient boosting tree
- GP
Gaussian process
- KNN
k-nearest neighbours
- LDA
linear discriminant analysis
- LR
logistic regression
- LSTM
long short-time memory
- MAE
mean absolute error
- MAPE
mean absolute percentage error
- MLR
multiple linear regression
- NB
naive Bayes
- NN
neural network
- NNR
neural network regression
- ODT
optimised decision tree
- PSO
particle swarm optimisation
- R2
correlation coefficient
- %RE
percentage of relative error
- RF
random forest
- RMSE
root mean square error
- SVM
support vector machines
- XGBoost
eXtreme gradient boosting
INTRODUCTION
The river is considered to be among the most crucial water resources and is a key component of the global freshwater resource system, which serves various purposes, including the provision of drinking water, support for agricultural practices, and facilitation of industrial activity. However, throughout the prior century, floods have constituted around 40% of natural calamities, resulting in over 19% of overall casualties and affecting more than 48% of the total population impacted (Munawar et al. 2019; Zhang et al. 2021). Over the past few decades, there has been a significant rise in the intensity and magnitude of flood threats, primarily attributed to climate change and various anthropogenic factors (Bubeck & Thieken 2018; Wang et al. 2022c). Furthermore, large-scale floods present a significant threat to human life and result in tens of thousands of fatalities and substantial economic losses each year in regions susceptible to flood (Aerts et al. 2018; Ahmadalipour & Moradkhani 2019). Similarly, river floods are often described as having very high velocities that have prominent effects on people's lives and economies (Ahmadalipour & Moradkhani 2019). Flash floods are natural occurrences that happen when a large volume of water is released quickly over a brief period of time, typically during heavy rainfall lasting only a few minutes or hours. Likewise, the sudden failure of structures is another cause of flash flooding (Wang et al. 2019). Subsequently, developing a river inflow forecast technique is essential for flood control, early warning, and reservoir operation (Cheng et al. 2020; Kilinc & Yurtsever 2022). It also has a prominent role in mitigating the impacts of the deficit on water resource systems. Furthermore, accurate forecasting results in better control of water availability, life protection, improved hydropower generation, and reduced economic losses from early warnings (Kilinc & Yurtsever 2022). Consequently, it becomes pivotal to develop forecasting models for river inflow (Ibrahim et al. 2022). However, the accurate prediction of forthcoming floods and the identification of susceptible regions are complex undertakings that necessitate the utilisation of accurate geographical and temporal data, in conjunction with dependable predictive models (Boucher et al. 2020; Chu et al. 2021; Elbeltagi et al. 2022; Zahura & Goodall 2022; Jahangir et al. 2023; Wang & Zai 2023). In the past few years, there has been a focused endeavour to develop accurate prediction models for monthly flood forecasting. Several modelling techniques have been utilised for this purpose, with a special focus given to artificial intelligence models. These models are adept at extracting valuable information from extensive datasets, presenting it in clearly understandable formats, and preserving time and resources (Maier & Dandy 2000). Furthermore, machine learning (ML) techniques are frequently utilised in hydraulic and water structure planning due to their capacity to accurately forecast solutions in non-linear problems (Di Nunno et al. 2021; Elbeltagi et al. 2022; Granata & Di Nunno 2023; Ruma et al. 2023). Numerous studies have been conducted in multiple locations across the globe with the aim of understanding the susceptibility to flooding in areas that have been severely affected by such events. For instance, in India (Chowdhuri et al. 2020; Ramesh & Iqbal 2022), the United States of America (Giovannettone et al. 2020), Japan (Fan & Huang 2020), Bangladesh (Alam et al. 2021), Vietnam (Dang et al. 2024), Iran (Goodarzi et al. 2024), Romania (Zhen & Bărbulescu 2024), China (Chen et al. 2023; Dai et al. 2023; Zhu et al. 2023), South Korea (Lee et al. 2023), Morocco (Nifa et al. 2023), Pakistan (Khan et al. 2023), and Australia (Ahmed et al. 2021). ML approaches have frequently been utilised within the framework of two distinct modelling types: basic single modelling and hybrid combination modelling (Choubin et al. 2019). Over the period of recent years, there has been an upgrade from single-application modelling to hybrid ensemble modelling, which has been proven through experiments to yield stronger predictive models and the ability to make accurate forecasts about forthcoming occurrences (Tyralis et al. 2021; Granata et al. 2022; Ibrahim et al. 2022; Di Nunno et al. 2023; Ng et al. 2023). Numerous studies were conducted to forecast river flood rates based on different time series by using hybrid and single ensemble models, including various models. For instance, a streamflow prediction study of the Euphrates River suggested a hybrid method that combined a long short-time memory (LSTM) and a genetic algorithm (Kilinc & Haznedar 2022). In hydrology, a hybrid LSTM neural network (NN) and a lion optimiser model have been effectively used for monthly runoff forecasting. This model demonstrated a higher accuracy compared to existing models (Yuan et al. 2018) and long short-term memory (LSTM)-weighted mean of vectors optimizer (INFO) is used for water temperature prediction (Ikram et al. 2023). The Relevance Vector Machine tuned with Improved Manta-Ray foraging optimization (RVM-IMRFO) model demonstrated substantial enhancements in performance indicators such as root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (R2), and Nash–Sutcliffe efficiency (NSE) compared to alternative tuning algorithms (Adnan et al. 2023b). This was achieved using random vector functional link based on quantum-based avian navigation optimizer algorithm (RVFL-QANA) with limited climatic data modelling to estimate potential evapotranspiration, as reported by Mostafa et al. (2023). In addition, suspended sediment load prediction in river systems was performed using the support vector machine (SVM)-FFAPSO model in 2024 by Katipoğlu et al. (2024). The proposed hybrid artificial neuro-fuzzy interface system (ANFIS-WCAMFO) model, using nine input combinations of meteorological datasets, suggests a promising technique due to its high predictive accuracy and low error in predicting monthly evapotranspiration (Adnan et al. 2021). In a 2023 study, it was discovered that the ELM-JSO significantly improved the RMSE of the standalone ELM model by 13% for the optimal inputs of temperature, precipitation, and groundwater level during the testing stage. The Extreme Learning Machine-Jellyfish Search Optimizer (ELM-JSO) had the highest performance in estimating the daily groundwater level, followed by the ELM optimized using whale optimization algorithm (WOA-ELM), ELM-particle swarm optimisation (PSO), and ELM optimized using Harris Hawks optimizer (ELM-HHO) (Adnan et al. 2023a). In addition, an artificial neural network (ANN) model has been employed by Mei et al. (2023) and Dang et al. (2024), predicting daily streamflow using multi-layer perceptron (MLP) (Mohammadi et al. 2020), ANFIS (Goodarzi et al. 2024), ANFIS-ABC (Pham et al. 2024), random forest (RF) (Zahura & Goodall 2022; Naganna et al. 2023), gradient boosting tree (GBT) as applied in Ni et al. (2020), SVM, as discussed by Essam et al. (2022) and Dang et al. (2024), along with logistic regression (LR) and frequency ratio, as used by Tehrany and Kumar (2018). In 2024, a study investigated multiple linear regression (MLR) and RF to predict monthly streamflow (Xu et al. 2024). Extreme Gradient BOOSTING (XGBoost or XGB) has demonstrated efficacy in addressing streamflow prediction (Goodarzi et al. 2024). Furthermore, in 2024, a study suggested hybrid methods such as convolutional neural network (CNN)-LSTM, Sparrow Search Algorithm With Backpropagation Neural Networks (SSA-BP), and ELM optimized using particle swarm optimization (PSO-ELM) for monthly water outflow prediction (Zhen & Bărbulescu 2024). Cai & Yu (2022) implemented a hybrid recurrent NN on flood forecasting. Moreover, Sahana et al. (2020) demonstrated that the SVM exhibited superior performance as a model for conducting flood assessments in the Sundarban biosphere reserve situated in India (Sahana et al. 2020). Pham et al. (2020) conducted an evaluation of a flood-affected area in Vietnam and reported a high level of accuracy achieved by the LR model (Pham et al. 2020). Moreover, Shada et al. (2022) have proposed a hourly flood forecasting using hybrid wavelet-SVM and Xu & Peng (2015) and Schulte (2017) have proposed novel frameworks for flood predictions. Eventually, ML techniques such as SVMs, LR, and ANNs have proven to be beneficial predictors. Yet, decision tree (DT)-based models continue to maintain their popularity. This assertion is supported by numerous studies that have utilised DT models and have consistently found them to be superior predictors (Chen et al. 2020; Tang et al. 2020; Pham et al. 2021, 2020). These frameworks employ many methodologies, including fuzzy clustering, K-means clustering, and NNs. Certainly, the intrinsic flexibility of ML models allows the development of improved and more efficient models for solving the challenges associated with the monitoring and control of flood in rivers. Furthermore, the ability to predict is further improved by the utilisation of optimisation and ensemble techniques, while incurring minimum additional costs in terms of time, memory, and compute. Consequently, additional types of DT-based models have emerged and are gradually gaining importance, one of which is the RF model (Chen et al. 2020) which utilises an ensemble of DTs. Another notable model is gradient boosted tree (GBT) (Naganna et al. 2023), which combines multiple weak DTs to create a stronger predictive model. Alternating DTs has also emerged as a distinct type of DT-based model (Janizadeh et al. 2019; Chen et al. 2020). Logistic model trees (Khosravi et al. 2018), naïve Bayes (NB) trees (Chen et al. 2020; Tang et al. 2020), and reduced error pruning trees are other noteworthy models in this category (Khosravi et al. 2018). Although the methods mentioned above have been extensively validated and proven to be effective models, there exists another innovative approach in the field of DTs called optimised decision tree (ODT). However, the utilisation of this method in flood prediction research has not been extensively utilised, making it challenging to confidently establish its efficacy as a reliable classifier. In relation to this issue, the current investigation has focused on developing and motivating a forecasting model to forecast monthly flood rates based on historical rainfall and inflow rates to obtain more accurate estimates and to upgrade the DT model to the ODT model. Moreover, the primary objective of optimising model parameters was to determine the most suitable model parameters from an extensive collection of hydrological data, with the objective of achieving precise forecasts for an extensive variety of flood-related occurrences, make more accurate predictions of flood risks than the single DT, and comparing the usefulness of three ensembles of ML. The authors propose that this model exhibits the capacity to be generalised and implemented in various rivers across the globe. The ODT model and the comparison models implemented in this study are robust statistical methods that can be utilised for classification, prediction, interpretation, and data manipulation. Additionally, the sold DT model has shown encouraging results in its ability to forecast river conditions by utilising several types of covariates, including water quality measurements and weather patterns. The inherent adaptability of the DT model is a highly important asset for academics and policymakers who are endeavouring to comprehensively comprehend and effectively govern river ecosystems at a worldwide level (Khosravi et al. 2018). Even when identifying the pattern behaviour of a complex dataset, the DT model showed remarkable performance (Everaert et al. 2016). Accordingly, the proposed ODT model has not previously been used for flood ensemble modelling and it will be used to estimate the monthly inflow rate by utilising rainfall and inflow databases for the Terengganu River in Malaysia along with GBT and RF.
RESEARCH AREA
DATA COLLECTION
Forecasting models
The forecasting model employs three distinct ML algorithms: the DT, RF algorithm, and GBT.
The DT algorithm is highly regarded for its ability to handle both classification and regression tasks effectively, making it a valuable statistical tool for predictive modelling and classification purposes due to its ability to represent complex relationships in data (Everaert et al. 2016). This approach is commonly used to assess various consequences, including decision-making processes, event outcome probabilities, and investment risk evaluations, showcasing its versatility and applicability (Ho et al. 2019). The DT method for decision analysis involves utilising a tree-like structure to represent decisions and their potential outcomes, as described by Hu et al. (2016). A DT serves as a visual representation similar to a flowchart, used to systematically build classification or regression models in a structured manner, aiding in understanding complex relationships within the data. This structure entails a process of iteratively dividing the data into subsets based on defined criteria, allowing for the systematic organisation and analysis of information within the DT model. A DT is composed of internal nodes, branches, and leaves. The internal nodes are responsible for evaluating the value of a certain attribute or feature. As a result, each internal node corresponds to a specific attribute or feature and branches out into leaves representing the possible values or outcomes associated with that feature. Edges and branches in a DT represent the outcomes of tests or decisions, connecting to subsequent end nodes, which are also referred to as leaf nodes and are responsible for predicting the final outcomes. Leaf nodes play a crucial role in predicting the final outcomes of target values in DTs, representing class labels or distributions that contribute to the tree's hierarchical structure, resembling a tree-like shape. DTs demonstrate versatility by effectively managing both categorical and numerical data, making them suitable for a wide range of prediction tasks across different data formats. As a result, DT has been used in numerous streamflow prediction events, demonstrating promising results, outstanding capacity to apply to different cases, and excellent performance. These situations include flood forecasting (Dang et al. 2024), monthly streamflow (Wang et al. 2022a), and weekly forecast precipitation (Khairudin et al. 2020). A DT is an algorithm that starts at the top and makes decisions by splitting the data into smaller subsets, which may lead to suboptimal solutions. Furthermore, a study conducted in 2017 demonstrated that ODTs have become feasible. It revealed that optimal DTs can provide out-of-sample accuracy ratings that are 1–5% higher than those achieved by earlier heuristics such as the much-used Classification And Regression Trees (CART) algorithm (Bertsimas & Dunn 2017). This paper suggests a method to find the best values for the variables in a DT using optimise parameters (Grid). Optimise parameters (Grid) is a tool that adjusts the settings of an ML model automatically by searching through a grid of values for optimal performance. Hyperparameters are settings of a model that must be decided before training and cannot be learned from the data directly. When an ML model is created, it often comes with several hyperparameters that control its behaviour and performance, such as the maximum size of the tree, the minimum number of instances required in a node for inducing a split, the node splitting criterion, and the amount of pruning. The model's performance can change greatly depending on the chosen hyperparameter values. Finding the best hyperparameter values to improve the model's performance is called hyperparameter tuning or optimisation. The ‘optimise parameters (Grid)’ tool in RapidMiner lets the researchers set ranges for hyperparameter values and search through all possible combinations to find the best settings. Each combination is used to train the model with the training data and assess its performance using metrics such as accuracy, F1-score, R2, RMSE, MAE, percentage of relative error (%RE) or Receiver-Operating Characteristic Curve Area Under The Curve (ROC-AUC) on a separate validation dataset.
This study implemented the ODT method and compared its performance with that of the GBT, RF, and sold DT algorithms. The results indicate that the ODT exhibits better accuracy and offer high confidence levels in solving prediction challenges. In recent times, significant efforts have been made to enhance the efficiency of the ODT method, resulting in its widespread implementation for the purpose of predicting river future inflow. The RF is a type of supervised learning technique that is utilised for both regression and classification applications (Sahour et al. 2021). Furthermore, RF combines and utilises the weakly ensemble classifiers DT to form a more robust classifier (Cutler 2010; Goldstein et al. 2011).
The RF technique produces a collection of trees that collectively form a more compact DT, and the ultimate categorisation is determined by allowing these trees to choose the most popular classes (Cutler 2010; Hou et al. 2021). At each node, the features of each created tree are selected randomly, whereas a conventional CART DT utilises all available information. As a result, the RF algorithm ensures the presence of random characteristics. During the process of randomly picking features, the input training data used to create each tree are derived from a complete training set. This is achieved by randomly selecting a specified number of training samples with replacements. This methodology guarantees the randomisation of the training samples. Subsequently, a tree is cultivated on the novel training set through the utilisation of random feature selection. Every tree reaches its optimum growth potential without undergoing any pruning. The utilisation of dual randomness in random forests serves to mitigate the issue of overfitting, hence improving both the accuracy and generalisation capacity of the model. Furthermore, the RF is widely utilised in various fields, including hydrology, due to its high accuracy, capability with large datasets, and robustness against noisy data (Schoppa et al. 2020). Moreover, researchers have implemented RF in numerous streamflow prediction events, demonstrating encouraging results and remarkable adaptability to diverse scenarios. Such circumstances pertain to flood prediction studies in Vietnam (Dang et al. 2024), others in India for forecasting streamflow on a one- and three-day lead daily (Naganna et al. 2023), forecasting streamflow one, two, and three days ahead in China (Wang et al. 2022b), streamflow forecasting up to 7 days in the UK (Di Nunno et al. 2023), predicting monthly streamflow in China (Xu et al. 2024), daily streamflow forecasting in China (Shen et al. 2022), and forecasting monthly streamflow in the USA (Wang et al. 2022a). When predicting streamflow, it is important to note that the hybrid RF model consistently outperforms the standalone RF model. Forecasting streamflow for 2, 3, and 4 months in 2021 involves a hybrid model (RF-MLR) that delivers more precise predictions than the standalone model (Abbasi et al. 2021).
The GBT model, introduced by Friedman (2001), is a combination of regression and classification tree models, and is specifically created by generating a sequence of DTs. The gradient boosting algorithm utilises the bootstrapping technique to successively include additional regression trees into the model without altering the structure of the model parameters. This is done in order to minimise losses or errors (Naganna et al. 2020; Hasan et al. 2023). Enhancing trees has been found to improve accuracy but can also reduce speed and human interpretability. Initially, the algorithm is set up with a fixed value. Subsequently, the pseudo-residuals are computed and then fitted in the base learner, which is a regression tree, after being scaled. Ultimately, the multiplier is determined by solving an optimisation challenge to modify the reduction function and forecast the outcome of the GBTB with a model consisting of multiple trees. Moreover, the fundamental gradient boosting approach is adjusted based on regularisation and limitations on the trees. In order to mitigate overfitting and improve the sensitivity of GBT to uncertainty, it utilises a technique known as shrinkage (Biau et al. 2019). Furthermore, the GBTB has been used in the long and short streamflow forecasting fields and has shown a promised outcome, such as forecasts of daily streamflow (Naganna et al. 2023), and monthly streamflow forecasting in the USA (Wang et al. 2022a).
Methodology
The data collection phase began with collecting the historical dataset of monthly rainfall, temperature, humidity, and inflow rates (5,280 recorders) from observation and meteorological stations along the river basin of the Terengganu River from 1990 to 2020. Phase II: data pre-processing involves the preparation and classification of a dataset in order to improve the quality of the input data and make it suitable for subsequent phases. Furthermore, during the second phase, the model will undergo training utilising the initial dataset along with the test results that have been obtained. In Phase III, the prediction model is constructed by methodically structuring the entire dataset to make it usable for the model. Moreover, in Phase III the RapidMiner software has been used to construct the three models and to implement the optimise parameters (Grid) algorithm. Eventually, the performance is evaluated using statistical measures such as R2, RMSE, MAE, %RE, and NSE, along with accuracy derived from the validation process, and then the results of the ODT models will be compared with the outcomes of the GBT and RF models for validation purposes.
Model configuration
In this study, predictive modelling for estimating the monthly inflow rate has been investigated based on historical monthly rainfall, temperature, humidity, and river inflow rates of the Terengganu River in Malaysia. The monthly rate of rainfall temperature, humidity, and inflow for all the selected stations were categorised before being fed into the model. Therefore, the goal of categorising and preparing the data is to achieve results within the expected range while maintaining the original data distribution. It is crucial to ensure that the modelling process is not impacted by overfitting. Furthermore, to mitigate overfitting, the approach of partitioning the dataset into two distinct groups was selected due to its simplicity and resilience. Subsequently, the individual station data, along with the corresponding rainfall data, are integrated into the proposed ODT, RF, and GBT models for the purpose of training and validating the model's performance. In addition to the approach utilised in the present study, many additional forms of cross-validation exist, such as the hold-out method and the leave-one-out method (Arlot & Celisse 2010). As with all supervised model development, data must be separated into two groups: ‘training’ and ‘validation.’ Usually, most of the original data, ranging from 70 to 90%, is used for training, and the rest is reserved for testing. In this study, datasets containing 5,280 records from 2000 to 2020 were utilised to train and validate the models. The diagram in Figure 3 illustrates the different stages involved in predictive data mining modelling.
Model performance and evaluation
There are numerous matrices to evaluate the performance and accuracy of the ML algorithms. Consequently, in order to mathematically quantify the predictive performances of ODT, GBT, and RF, several statistical measures have been utilised to evaluate the performance for inflow forecasting; those measurements are R2, RMSE, the MAE, %RE, and NSE taken along with accuracy for the performance assessment of the developed model and comparison models. Additionally, as the accuracy alone does not provide adequate details about the inflow rate, relying only upon the accuracy rate may not be the proper method. Accordingly, various measurements have been implemented to evaluate the performance, as given in Equations (1)–(6).
RMSE measures the average magnitude of the errors between predicted and actual values, providing a way to assess the model's accuracy. When RMSE values are lower, the model is more accurate in predicting outcomes. Both RMSE and MAE are employed to quantify the disparities between predicted and actual inflow values. Moreover, a high positive R2 value, known as the coefficient of determination, indicates strong model performance. In addition, the model is accurate when the NSE values are close to 1. Table 1 shows general performance ratings.
R2 . | Performance Rating . | NSE . | Performance Rating . | MAE . | Performance Rating . |
---|---|---|---|---|---|
0.75 < R2 ≤ 1 | Very good | 0.75 < NSE ≤ 1 | Very good | MER < 10 | Highly accurate |
0.65 < R2 ≤ 0.75 | Good | 0.65 < NSE ≤ 0.75 | Good | 11 < MER ≤ 20 | Good |
0.5 < R2 ≤ 0.65 | Satisfactory | 0.5 < NSE ≤ 0.65 | Satisfactory | 21 < MER ≤ 50 | Reasonable |
R2 ≤ 0.5 | Unsatisfactory | NSE ≤ 0.5 | Unsatisfactory | 51 + | Inaccurate |
R2 . | Performance Rating . | NSE . | Performance Rating . | MAE . | Performance Rating . |
---|---|---|---|---|---|
0.75 < R2 ≤ 1 | Very good | 0.75 < NSE ≤ 1 | Very good | MER < 10 | Highly accurate |
0.65 < R2 ≤ 0.75 | Good | 0.65 < NSE ≤ 0.75 | Good | 11 < MER ≤ 20 | Good |
0.5 < R2 ≤ 0.65 | Satisfactory | 0.5 < NSE ≤ 0.65 | Satisfactory | 21 < MER ≤ 50 | Reasonable |
R2 ≤ 0.5 | Unsatisfactory | NSE ≤ 0.5 | Unsatisfactory | 51 + | Inaccurate |
RESULTS EXTRACTION AND PERFORMANCE EVALUATION
Three different regression models, including ODT which focuses on creating a tree-like model for decision-making, GBT which builds trees sequentially to correct errors of the previous models, and RF models known for their ensemble learning approach, were used to evaluate the river inflow for each station in the Terengganu River basin. Forecasting the inflow for each station helps prevent the model from memorising the data (overfitting) and enhances the model's performance by providing insights into future trends and patterns. Moreover, the absence of a standard rule for dividing the data is crucial because it allows flexibility in adapting the data partitioning strategy to the specific characteristics of the dataset and the modelling objectives. Therefore, to build an optimised data-driven prediction model, it is crucial to meticulously select the most appropriate data partitioning strategy during model development and evaluation to ensure the model's accuracy and generalisability. Before developing the prediction model, several researchers utilised varying data portions for testing and training sets to explore how different data splits impact the model's performance and to assess the model's robustness across diverse datasets. In a study by Ridwan et al. (2021), four regression models, including Bayesian linear regression (BLR), decision forest regression (DFR), boosted decision tree regression (BDTR), and neural network regression (NNR) were used to predict the rainfall rate in Tasik Kenyir, Terengganu (Ridwan et al. 2021). The data were divided into 80–90% for training and 20–10% for testing to ensure that the model is trained on a substantial portion of the data while also having a significant portion reserved for evaluating the model's generalisation ability.
The three models designed for river inflow forecasting were developed for a fair comparison. The proposed model's functionality was assessed by analysing its ability to accurately predict river inflow values based on historical data. This assessment included a comparison of its performance metrics, such as accuracy, precision, and reliability. This section explores a detailed analysis of how well the model performs, including how accurate and reliable it is. Additionally, it assesses how the chosen training method affects the accuracy and reliability of the predicted results. To thoroughly study the proposed modelling technique, the methods' reliability was assessed with various statistical indices, including RMSE, MAE, and R2, during both the model training and testing phases as listed in Table 2.
Models . | Stations . | Measurement function results . | ||||||
---|---|---|---|---|---|---|---|---|
Number . | ACC % . | RMSE . | MAE . | R2% . | RE% . | NSE% . | MAPE . | |
ODT | 1 | 98 | 5.358 | 3.099 | 0.960 | 0.1 | 0.961 | 8% |
2 | 98 | 4.267 | 2.867 | 0.943 | 0.0 | 0.944 | 10.6% | |
3 | 96.50 | 1.0 | 0.436 | 0.960 | 0.0 | 0.960 | 13.8% | |
4 | 94 | 25.267 | 12.425 | 0.951 | 40.9 | 0.960 | 18% | |
5 | 94.12 | 4.949 | 2.171 | 0.910 | 0.1 | 0.911 | 10% | |
6 | 78 | 42.764 | 25.319 | 0.840 | 0.1 | 0.891 | 9.% | |
7 | 95.8 | 7.133 | 4.629 | 0.984 | 0.1 | 0.984 | 9.6% | |
8 | 92.85 | 0.651 | 0.411 | 0.759 | 0.3 | 0.755 | 11.6% | |
9 | 94.48 | 31.455 | 25.829 | 0.815 | 0.1 | 0.874 | 22% | |
GBT | 1 | 92.55 | 12.0 | 7.987 | 0.869 | 45 | 0.803 | 34% |
2 | 94.44 | 13.506 | 6.698 | 0.811 | 0.0 | 0.675 | 33.5% | |
3 | 91.61 | 2.962 | 1.645 | 0.868 | 0.6 | 0.71 | 35.6% | |
4 | 95.10 | 73.99 | 41.69 | 0.861 | 0.1 | 0.779 | 34% | |
5 | 94.44 | 8.478 | 5.179 | 0.877 | 0.0 | 0.738 | 43% | |
6 | 83.82 | 70.7 | 27.966 | 0.679 | 0.1 | 0.566 | 19.9% | |
7 | 93.06 | 27.5 | 6.116 | 0.904 | 0.1 | 0.764 | 34.5% | |
8 | 91.61 | 0.669 | 0.406 | 0.898 | 0.3 | 0.744 | 34% | |
9 | 93.10 | 0.915 | 13.821 | 0.873 | 6.8 | 0.68 | 45% | |
RF | 1 | 92.02 | 5.833 | 3.788 | 0.969 | 24.6 | 0.954 | 14% |
2 | 95.83 | 5.245 | 3.271 | 0.964 | 11.7 | 0.951 | 17.8% | |
3 | 96.50 | 1.100 | 0.634 | 0.976 | 1.9 | 0.96 | 16% | |
4 | 93.01 | 30.9 | 18.303 | 0.960 | 165.8 | 0.947 | 15% | |
5 | 93.06 | 3.386 | 2.039 | 0.969 | 14.9 | 0.958 | 16% | |
6 | 70.43 | 30.9 | 18.3 | 0.923 | 114 | 0.841 | 9.9% | |
7 | 95.83 | 10.256 | 6.116 | 0.974 | 22.7 | 0.967 | 17.8% | |
8 | 90.85 | 0.298 | 0.162 | 0.959 | 1.26 | 0.949 | 20% | |
9 | 94.48 | 21.339 | 13.28 | 0.951 | 22.3 | 0.942 | 22% |
Models . | Stations . | Measurement function results . | ||||||
---|---|---|---|---|---|---|---|---|
Number . | ACC % . | RMSE . | MAE . | R2% . | RE% . | NSE% . | MAPE . | |
ODT | 1 | 98 | 5.358 | 3.099 | 0.960 | 0.1 | 0.961 | 8% |
2 | 98 | 4.267 | 2.867 | 0.943 | 0.0 | 0.944 | 10.6% | |
3 | 96.50 | 1.0 | 0.436 | 0.960 | 0.0 | 0.960 | 13.8% | |
4 | 94 | 25.267 | 12.425 | 0.951 | 40.9 | 0.960 | 18% | |
5 | 94.12 | 4.949 | 2.171 | 0.910 | 0.1 | 0.911 | 10% | |
6 | 78 | 42.764 | 25.319 | 0.840 | 0.1 | 0.891 | 9.% | |
7 | 95.8 | 7.133 | 4.629 | 0.984 | 0.1 | 0.984 | 9.6% | |
8 | 92.85 | 0.651 | 0.411 | 0.759 | 0.3 | 0.755 | 11.6% | |
9 | 94.48 | 31.455 | 25.829 | 0.815 | 0.1 | 0.874 | 22% | |
GBT | 1 | 92.55 | 12.0 | 7.987 | 0.869 | 45 | 0.803 | 34% |
2 | 94.44 | 13.506 | 6.698 | 0.811 | 0.0 | 0.675 | 33.5% | |
3 | 91.61 | 2.962 | 1.645 | 0.868 | 0.6 | 0.71 | 35.6% | |
4 | 95.10 | 73.99 | 41.69 | 0.861 | 0.1 | 0.779 | 34% | |
5 | 94.44 | 8.478 | 5.179 | 0.877 | 0.0 | 0.738 | 43% | |
6 | 83.82 | 70.7 | 27.966 | 0.679 | 0.1 | 0.566 | 19.9% | |
7 | 93.06 | 27.5 | 6.116 | 0.904 | 0.1 | 0.764 | 34.5% | |
8 | 91.61 | 0.669 | 0.406 | 0.898 | 0.3 | 0.744 | 34% | |
9 | 93.10 | 0.915 | 13.821 | 0.873 | 6.8 | 0.68 | 45% | |
RF | 1 | 92.02 | 5.833 | 3.788 | 0.969 | 24.6 | 0.954 | 14% |
2 | 95.83 | 5.245 | 3.271 | 0.964 | 11.7 | 0.951 | 17.8% | |
3 | 96.50 | 1.100 | 0.634 | 0.976 | 1.9 | 0.96 | 16% | |
4 | 93.01 | 30.9 | 18.303 | 0.960 | 165.8 | 0.947 | 15% | |
5 | 93.06 | 3.386 | 2.039 | 0.969 | 14.9 | 0.958 | 16% | |
6 | 70.43 | 30.9 | 18.3 | 0.923 | 114 | 0.841 | 9.9% | |
7 | 95.83 | 10.256 | 6.116 | 0.974 | 22.7 | 0.967 | 17.8% | |
8 | 90.85 | 0.298 | 0.162 | 0.959 | 1.26 | 0.949 | 20% | |
9 | 94.48 | 21.339 | 13.28 | 0.951 | 22.3 | 0.942 | 22% |
The RF model had a substantially greater level of efficacy in terms of the R2 rate, averaging at 0.96%, in comparison to the ODT and GBT models, which achieved rates of 0.90 and 0.84%, respectively. In addition, the ODT model has an average relative error (RE%) of 0.4%, indicating little difference between the actual and predicted values. This is in contrast to the RF model, which has a 42.1% error rate, deemed considerable and potentially unacceptable in certain situations. The GBT model exhibited a RE% that was 5.48% greater than that of the ODT model. When comparing the models, the GBT model shows slightly lower predictive performance compared to the RF and ODT models. Nevertheless, the disparity is minor, so the two approaches may produce almost indistinguishable outcomes. The previous section discussed the application of Equations (1)–(6) to the ODT, GBT, and RF models for the nine streamflow stations on the Terengganu River. These models were generated using the historical rainfall and streamflow dataset from 2000 to 2020 and the trained model with training approach #1. The results of this analysis are presented in Table 2. The results unequivocally demonstrate that the ODT model outperforms the GBT and RF models in terms of accuracy (ACC), boasting an impressive average of 94% compared to 92 and 91% for the GBT and RF models, respectively. Furthermore, the findings indicated that 94% of the 2160 river inflow recorded datasets were accurately forecasted, resulting in a trained model that is 94% accurate with a minimal error rate of only 6%. Moreover, the present research system exhibits superior outcomes in comparison to past research methodologies and studies with comparable aims. In their study in 2021, Munawar et al. (2021) employed a CNN to accurately map floods. They achieved an impressive average accuracy rate of 84%. Lopez-Fuentes et al. (2017) conducted a study in 2017 using CNN models to predict floods and landslides. The study achieved an average accuracy rate of 83.96%. Additionally, flood mapping was performed with an accuracy of 87.5% using texture features and RF with RGB images (Feng et al. 2015). Elkhrachy (2015) employed the analytical hierarchy process to ascertain the relative influence weights of flood-causing elements, achieving an accuracy rate of 84.4% (Elkhrachy 2015). In 2015, a study was conducted to create a flood susceptibility mapping system that was combined with GIS. The study utilised different types of kernels and SVM classifiers, resulting in an accuracy rate of 84.97% (Tehrany et al. 2015). Lee et al. (2017) reported that the RF model achieved an accuracy of 78.78% for the regression technique and 79.18% for the classification procedure. The boosted tree model achieved a validation accuracy of 77.55% for regression and 77.26% for classification (Lee et al. 2017). As reported by Ridwan et al. (2021), the study employed multiple models to forecast rainfall in Tasik Kenyir, Terengganu, revealing a range of rainfall prediction coefficients from 0.5 to 0.9. The study reported the highest values for daily (0.97), weekly (0.98), 10-day (0.98), and monthly (0.99) rainfall predictions. DFR models have achieved 0.094 of MAE and the RMSE was 0.156. The BDTR model exhibited an MAE of 0.064 and an RMSE of 0.117. The NNR model attained an MAE of 0.389 and an RMSE of 0.672. The BLR model exhibited an MAE of 0.417 and an RMSE of 0.674 (Ridwan et al. 2021).
In addition, a study conducted in 2024 utilised RF and MLP models, with climatic data as input, to forecast monthly streamflow. The RF model had significant efficacy in capturing the variability of low flow, as indicated by R2, NSE, and RMSE values of 0.90, 0.89, and 4.53, respectively. In contrast, the MLR model exhibited somewhat lower predictive accuracy than the RF model, as indicated by R2, NSE, and RMSE values of 0.63, 0.60, and 8.02, respectively. Both models show a high level of efficacy in integrating all climate factors. Nevertheless, the unrefined models slightly outperformed the trained model, indicating overfitting and validating the reasonableness of the variables (Xu et al. 2024).
In this investigation, the analysis revealed that the ODT model, along with the GBT and RF models, demonstrated NSE values of 0.92, 0.718, and 0.94, respectively. The ODT and RF models exhibited exceptional outcomes, as evidenced by the NSE values falling between 0.92 and 0.94, suggesting a high degree of performance. In addition, the GBT model had an NSE value of 0.718, which is within the acceptable performance range (0.65 < NSE ≤ 0.75), suggesting a satisfactory degree of performance. Furthermore, it is important to mention that the MAE values for these models were 8.58, 12.4, and 7.3, respectively. The data listed in Table 2 show that both the ODT and RF models exhibit a significant level of accuracy, as evidenced by their MAE values. On the other hand, the GBT model has an accuracy that falls between 11 and 20 for the MAE. According to the evaluation criteria in Table 1, all objective functions achieved a level of performance ranging from exceptional to good, indicating an overall satisfactory performance. Table 2 also presents the evaluation findings of the mean absolute percentage error (MAPE) for the ODT, RF, and GBT models. MAPE evaluates the average percentage error of a model, providing a measure of the average accuracy of its predictions. The maximum obtained MAPE value was 35.6%. The MAPE values are classified into the following categories: values below 10% are considered great; values between 10 and 20% are considered good; values between 20 and 50% are considered acceptable; and values above 50% are considered inaccurate (Moreno et al. 2013; Shrestha et al. 2021). Three stations have estimated MAPE values that fall below 10%, indicating that they are excellent. The remaining stations have values below 20%, indicating that they are good. The RF model values are categorised between great and good, while the GBT model demonstrated an acceptable range of MAPE values.
Recent studies have further corroborated the efficacy of ML models in streamflow forecasting. For instance, in 2024, Dang et al. investigated nine models for predicting floods. These were AdaBoost (AB), DT, Gaussian process (GP), K-nearest neighbours, linear discriminant analysis, NB, NN, RF, and SVM. Results showed three ML models, GP, RF, and NN, outperformed the remaining models, with R2 values of 0.997, 0.996, and 0.995, respectively (Dang et al. 2024). Similarly, a study conducted in 2024 utilised RF and MLP models, with climatic data as input, to forecast monthly streamflow, indicated by R2 values of 0.90 and 0.63%, respectively (Xu et al. 2024). Moreover, in their study, Li et al. (2023) employed MLR and RF models to forecast the monthly water deficit index. The performance of the MLR and RF models was excellent at all 44 sites. In general, the RF model exhibited superior performance compared to the MLR model, as indicated by a higher coefficient of determination (R2 > 0.8%) across 38 locations (Li et al. 2023). These studies support the claim that the models used in this study exhibit superior performance in terms of R2 values.
The implications of these findings are significant for hydrological modelling and water resource management. The high R2 values suggest that the ODT, GBT, and RF models are highly effective tools for predicting streamflow, which is critical for planning and managing water resources in various hydrological contexts. These models' predictive capabilities can inform decision-making processes, enhance the precision of hydrological forecasts, and ultimately contribute to more effective and sustainable water resource management practices. Figure 6 shows the Taylor diagram results for the ODT model along with RF and GBT models for all the stations.
Performance of the ODT, RF, and GBT models on streamflow prediction
In this study, two out of the three models – ODT and RF – demonstrated excellent to good performance in monthly streamflow prediction, with RF outperforming ODT in certain cases. Additionally, the GBT exhibited a satisfactory level of performance. These findings support Yan et al.'s (2022) conclusion that RF outperformed other models in streamflow prediction, highlighting the importance of conducting comparative analyses of model performance (Yan et al. 2022). As suggested by Zhang et al. (2018), the superior performance of RF can be attributed to its capability to understand and model complex relationships between input variables and streamflow (Zhang et al. 2018). Conversely, GBT assumes a linear input–output relationship, which can limit its accuracy in modelling complex systems such as streamflow due to the potential oversimplification of relationships. ODT and RF, however, excel in identifying and utilising non-linear interactions between input variables, thereby enhancing the accuracy of their predictions (Yang et al. 2017). Tyralis et al. (2019) provided a concise analysis that underscored RF's strong predictive abilities in addressing complex hydrological issues, including rainfall–runoff forecasting, streamflow predictions, and groundwater modelling (Tyralis et al. 2019). Additionally, ODT and RF are noted for their superior performance in predicting both low and high flows compared to GBT. This advantage is attributed to their flexible non-linear fitting capabilities, which allow for more accurate and adaptable predictions across different sections of data.
Study limitations and uncertainty analysis
Although this study provides essential insights, it is important to recognise that there are several limitations. Initially, a variety of meteorological factors that affect streamflow were taken into account. However, several factors, such as temperature and humidity, were eliminated from the analysis because it was determined that they had a minimal impact on the results. In addition, the study did not analyse key climate change-related factors such as glaciers and permafrost due to a lack of available data. For example, rising temperatures can cause the breakdown of permafrost, which has the potential to change the way groundwater and surface water interact, as well as affect soil moisture and streamflow dynamics. Furthermore, the 50-year duration of meteorological data might not sufficiently encompass the long-term patterns and cyclic properties of specific factors, perhaps leading to an underestimation of the relationships between rainfall rates and streamflow. By including more extensive datasets, the researchers can improve the strength and reliability of their findings. While the researchers successfully used historical rainfall and inflow rates and excluded other factors, such as temperature and humidity, to forecast streamflow, it is important to acknowledge that these parameters alone may not completely explain the fluctuations in streamflow. Streamflow is likely to be influenced by additional elements and their intricate relationships. Therefore, it is recommended that future studies utilise hydrological models that integrate physical mechanisms in order to further examine the impacts of climatic conditions on streamflow. Moreover, although it is assumed that human-induced effects on the study area are minimal, disregarding their interactions with climatic factors, such as changes in precipitation patterns caused by human-made aerosol emissions (Jiang et al. 2023), could introduce some level of bias or uncertainty into our evaluations.
CONCLUSIONS
This study assessed streamflow forecasting using an ODT alongside RF and GBT models based on a historical monthly rainfall and inflow database spanning from 1990 to 2020 for the Terengganu River basin. Rainfall is a well-established contributing factor to natural disasters, such as the significant flood in Terengganu in December 2014. The results indicated that the ODT and RF models, in comparison to the GBT model, provided reasonably accurate predictions of inflow rates based on historical rainfall and inflow data, with R2 values of 90.2, 84.8, and 96.0%, respectively. The MAE values for these models were 8.58, 12.4, and 7.3, respectively, and the NSE values fell between 0.92 and 0.94.
ODT and RF methods are recommended for inflow prediction and can be utilised for future flood-related references, not only due to their robust prediction capabilities over GBT and previous methods but also because of their transparent model structures. This transparency allows flood management authorities to monitor and customise inputs based on regional needs, as well as provide early warnings for potential floods, thereby preserving lives and property. Future research could focus on employing ML models to address the non-linearity in streamflow models, particularly CNNs for image processing tasks and other advanced hybrid ML models.
ACKNOWLEDGMENT
The authors would like to express their gratitude to the Higher Institution Centre of Excellence (HICoE), Ministry of Higher Education (MOHE), Malaysia under the project code 2024001HICOE as referenced in JPT(BPKI)1000/016/018/34(5).
AUTHORS’ CONTRIBUTIONS
Osama A. Abozweita: Investigation, Methodology, Software, Formal analysis, Visualization, Writing- Original draft preparation
Ali Najah Ahmed: Conceptualization, Methodology, Formal analysis, Supervision, Writing - Review & Editing
Lariyah Bte Mohd Sidek: Methodology, Supervision, Validation, Writing - Review & Editing, Resources
Hidayah Bte Basri: Supervision, Methodology, Validation, Writing - Review & Editing, Resources
Mohd Hafiz Bin Zawawi: Methodology, Validation, Writing - Review & Editing, Resources
Yuk Feng Huang: Data Curation , Methodology, Validation, Writing - Review & Editing
Ahmed El-Shafie: Conceptualization, Methodology, Validation, Writing - Review & Editing
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.