As global climates undergo changes, the frequency of water-related disasters rises, leading to significant economic losses and safety hazards. During flood events, river water levels exhibit unpredictable fluctuations, introducing considerable noise that poses challenges for accurate prediction. A prediction of water level by using existing water level data makes a major contribution to forecasting flood. Enhanced least-squares support vector machine (ELSSVM) is utilized by integrating an additional extra bias error control term. In this study, least-squares support vector machine (LSSVM) and ELSSVM optimized by the genetic algorithm (GA) were chosen to be compared with the help of data decomposition methods to improve daily water level prediction accuracy. Double empirical mode decomposition (DEMD) will be integrated with LSSVM and ELSSVM. Thus, the models are named LSSVM-GA, ELSSVM-GA, empirical mode decomposition (EMD)-LSSVM-GA, EMD-ELSSVM-GA, DEMD-LSSVM-GA, and DEMD-ELSSVM-GA. The proposed models are used in forecasting the water level of Klang River in Sri Muda, Malaysia. The behavior proposed models are investigated and compared based on several performance metrics. The results demonstrated that the DEMD-ELSSVM-GA model outperformed the other models based on the performance analysis in forecasting the water level with RMSE = 0.2536 m and R2 = 0.8596 for testing data that indicate the forecasting accuracy.

  • Klang River in Malaysia was chosen as the case study.

  • Data decomposition methods are integrated to mitigate high-frequency noise data.

  • Enhanced least-squares support vector machine (LSSVM) by adding extra bias error control term.

  • Hybrid double empirical mode decomposition is proposed to decompose water level data.

  • LSSVM and enhanced LSSVM models have been optimized by genetic algorithm.

In Malaysia, floods have emerged as a recurring and prominent natural disaster over recent decades, inflicting considerable economic devastation, especially in regions like Sri Muda, Sungai Gombak, and Mentakab. These floods result from intense rainfall overwhelming local catchment capacities. Establishing robust early warning systems is crucial to protect urban centers and lives (Sumi et al. 2012). Hydrologists analyze statistical properties in hydrological data, including rainfall–runoff patterns (He et al. 2022) and water level records, to enhance preparedness. Monitoring water level data is vital for anticipating floods, but improving future water level predictions is essential for effective flood warnings (Faruq et al. 2021). Moreover, climate change is causing more frequent and intense extreme weather events like hurricanes, storms, and heavy rainfall. These events lead to sudden and significant changes in water levels in rivers, lakes, and coastal areas. To effectively manage water resources and prepare for disasters, it is crucial to develop advanced forecasting models. Our research focuses on understanding and predicting how climate change impacts water levels. By exploring these connections, we aim to improve forecasting accuracy, support sustainable water management, and help communities cope with the increasing risks associated with changing water levels.

Recent times have seen a surge in the use of machine learning techniques, including support vector machines (SVMs) (Cortes & Vapnik 1995; Sukanya & Vijayakumar 2023) and its variant, the least-squares support vector machine (LSSVM), for effective processing of both linear and nonlinear datasets across diverse domains. In flood forecasting research, SVM has gained attention for robust predictive capabilities in hydrological data prediction. Despite its advantages, SVM has drawbacks like longer computation times. To address this, the improved LSSVM model has been introduced, known for its ability to solve linear matrix equations with fewer constraints (Wang & Hu 2005). The use of the radial basis function (RBF) kernel function with LSSVM has emerged as a favorable alternative, transforming inequality computations into equality computations. Integrated with decomposition methods, LSSVM has proven effective in predicting long-term runoff (Seo et al. 2016). Considering the small sample size challenge, especially pertinent in the case of one-dimensional and limited dataset, LSSVM is a preferred choice (Tang et al. 2018). Consequently, this study employs machine learning models, specifically LSSVM and enhanced LSSVM (ELSSVM), for a comparative analysis. These models are integrated with decomposition methods to forecast daily water levels, facilitating flood prediction within the study area.

In dealing with the inherent inconsistency in hydrological time series data, effective data preprocessing techniques are crucial for noise reduction. Decomposition methods, particularly empirical mode decomposition (EMD), have proven to be powerful tools for enhancing prediction accuracy by mitigating signal or time series noise (Huang et al. 1998). EMD decomposes water level data into components known as intrinsic mode functions (IMFs) and residue, offering a natural mode for each mono-component. The effectiveness of EMD has been acknowledged in various fields, including biomedical data analysis, fault detection, power signal analysis, and medicine. Recent studies highlight the impact of EMD on water level prediction (Loh et al. 2019), demonstrating improved performance when combined with forecasting models like EMD with artificial neural network and EMD with SVM. This article focuses on forecasting river water levels in the context of climate change, which holds significant utility for society. Accurate river water level forecasts are critical for effective water resource management, flood preparedness, and infrastructure planning, particularly in the face of changing climate patterns. The utility of the article lies in its potential to enhance early warning systems, allowing communities to prepare for and mitigate the impacts of floods resulting from climate-induced factors such as altered precipitation patterns and extreme weather events.

The main contribution of the proposed work is as follows:

  • i The enhancement of forecasting model accuracy through the incorporation of an additional control term for bias error into the objective function of the LSSVM, referred to as ELSSVM. This modification enables unbiased estimation within the forecasting framework, thereby addressing a crucial aspect of predictive modeling.

  • ii Exploring the potential of integrating these decomposition methods into existing forecasting models, such as ELSSVM, with the goal of improving predictive performance while minimizing overfitting issues. Through empirical analysis, this study seeks to evaluate the impact of double decomposition on forecasting river water levels.

  • iii Developing hybrid prediction methods, specifically double empirical mode decomposition (DEMD)-LSSVM-GA and DEMD-ELSSVM-GA models, which combine DEMD as a decomposition technique with the forecasting tools such LSSVM and ELSSVM optimized by genetic algorithm (GA). These models are designed to enhance the forecasting accuracy of future water levels, thus contributing to the evolution of early warning systems for floods.

Empirical mode decomposition

EMD is a powerful procedure that is usually used to analyze the nonlinear (Dehghan et al. 2022). The procedure of the EMD is to decompose the original data into different features to produce numerous sets of inherent IMFs. The produced IMFs include different frequency bands ranging from high to low. Let be an original time series from the dataset. The procedure for EMD methods are as follows (Chowdhury et al. 2008):

Step 1: Find all local extrema that includes local maxima and minima of the time series.

Step 2: Connect all the local maxima and local minima by applying a cubic spline line to obtain the upper envelope and the lower envelope.

Step 3: Determine the mean envelope from the lower and upper envelope.

Step 4: Determine the difference between the original data and the mean envelope determined in Step 3.

Step 5: Verify whether the vector satisfied the characteristics of IMFs. If yes, the vector is the first IMF and the vector is replaced by the residuals. If not, replace the signal with the vector.

Step 6: Repeat Steps 1–5, and then stop the process until the termination condition is satisfied.

In addition, the shifting process of EMD will end its operation as soon as the residual exhibits a monotonous behavior, making it unfeasible to continue extracting IMFs. The final product of decomposition by EMD is a set of IMFs and residuals from the original data (Colominas et al. 2015). For double EMD (DEMD), the first IMF with the highest frequency will undergo EMD again and produce IMFs that will be named SIMF1, SIMF2, etc (Ahmed et al. 2022).

Least-squares support vector machine

Vapnik (Cortes & Vapnik 1995) first introduced SVM as a solution for both regression and classification problems. LSSVM is a developed method from SVM. In this LSSVM model for function estimation, the optimization problem is formulated in Equation (1):
(1)
where Z is the loss function, is the error, and is the regularization constant.

Enhanced least-squares support vector machine

To achieve unbiased estimation for the forecasting model, an additional term for controlling bias error

is incorporated into the objective function of LSSVM. The reorganization of Equation (2) is as follows:
(2)
Equations (1) and (2) can be solved by the Lagrange function and Karush–Kuhn–Tucker (KKT). Equation (3) shows the Lagrange function. Next, find the partial derivative of by expression below based on KKT conditions by using Equation (4):
(3)
(4)
By using Equation (4), the optimization problem can be transformed into the task of resolving linear equations, as outlined in Equation (5):
(5)
where is a dimensional column vector, signify the kernel function, which fulfills the Mercer condition and gives rise to the ultimate presentation of the LSSVM model, as illustrated in Equation (6). Equation (7) presents the kernel function chosen in this article for the LSSVM model, which is the RBF:
(6)
(7)
where is the width of the kernel function.

Genetic algorithm

GA represents a powerful and versatile class of optimization algorithms inspired by the principles of natural selection and evolution. Introduced in the field of artificial intelligence and computational science, GAs draw inspiration from the mechanisms that drive biological evolution, such as selection, crossover, and mutation. The primary objective of GAs is to efficiently explore solution spaces, searching for optimal or near-optimal solutions to complex problems. These algorithms are particularly well suited for tasks where traditional optimization methods may struggle due to high-dimensional, nonlinear, or combinatorial nature. Therefore, the GA demonstrates a rapid capacity to achieve the global optimal solution. Consequently, in the context of this research, GA was employed to optimize the penalty factor and kernel function in LSSVM and ELSSVM. Figure 1 shows the process of GA to optimize the parameters of ELSSVM. Figure 2 shows the overall method of DEMD with LSSVM that optimized by GA
Figure 1

The simplified process of using the GA algorithm.

Figure 1

The simplified process of using the GA algorithm.

Close modal
Figure 2

Flowchart of DEMD with LSSVM that optimized by GA.

Figure 2

Flowchart of DEMD with LSSVM that optimized by GA.

Close modal
Root-mean-square error (RMSE) is commonly used to measure the distance between the forecasted value by the model and the actual value of water level data observed. In other words, the goodness of the performance can be measured by RMSE (Faruq et al. 2021). Equation (8) shows the equation of RMSE. The squared correlation coefficient () is computed to evaluate the explained variance of models as presented in Equation (9). Lately, has been effectively used in analyzing wind power forecasting by Wang et al. (2022):
(8)
(9)
where is the observed value, is the forecasted value, n is the number of data, and is the mean value of water level data.

Study area and dataset

The study area chosen in this article is the Klang River, which flows through Selangor and Kuala Lumpur in Malaysia. The Department of Irrigation and Drainage Malaysia (DID) has provided the hydrological data of Klang River in Sri Muda that will be used in this study. Taman Sri Muda is a township located in Shah Alam, Selangor, Malaysia. Sungai Klang, also known as the Klang River, is one of the major rivers in the region. The length of the river is approximately 25 km long (Station 3015432). The daily water level data provided ranges between the year 2011 and 2022. The total number of data is 4,382 days, and this research study considers daily data for analysis. The data for model generation in this article are selected as 80% of the data for a training set of LSSVM and ELSSVM, which is from 1 January 2011 to 1 August 2020 and the balance 20% of water level data selected for the validation set also known as a testing set of LSSVM and ELSSVM which is from 2 August 2020 to 31 December 2022. Hence, the forecasting model in this study will be implemented based on the training data set, which is 3,505 days and the water level will be predicted based on the testing dataset, which is 877 days. A daily series of water level data of Klang River for training and testing is illustrated in Figure 3. Based on the chosen area, there are four levels, which are normal, alert, warning, and danger with threshold water levels of 2.8, 4.4, 4.7, and 5.0 m, respectively. Flood happens when the water level exceeds the normal level. Thus, in the flood warning system, reliable water level prediction models hold considerable importance in alerting the authorities and notifying the victims.
Figure 3

The daily water level time series at Klang River in Sri Muda.

Figure 3

The daily water level time series at Klang River in Sri Muda.

Close modal

It is imperative to water level models for the evaluation of flood occurrences and the effective management of water resources within the country. Elevated water levels are indicative of broader impacts stemming from climate change. Changes in precipitation patterns, temperature, and glacial melting contribute to fluctuations in water levels, exerting influence on the flow of rivers, the levels of lakes, and coastal regions.

Data decomposition

In this study, the EMD and DEMD methods were employed to examine the time series of water level data. These techniques played a crucial role in preprocessing the daily water level data of Klang River in Sri Muda, Malaysia, spanning from 2011 to 2022. Specifically, the EMD and DEMD techniques were employed to break down the original water level time series into multiple independent IMFs alongside a residual component. The decomposition results are visually depicted in the following figures. Figure 4 shows the first IMF obtained through EMD whereby with the highest frequency that will be going through the second EMD. Similarly, Figure 5(a) displays the results from EMD, second IMF to sixth IMF and Figure 5(b) shows that the seventh IMF to residual components become the input variables for forecasting water levels. Finally, Figure 6(a) and 6(b) illustrates the outcomes from DEMD (EMD of first IMF) that produces SIMFs. The DEMD method transforms the IMF 1 into 10 SIMF components and 1 SResidual. Following the decomposition process, the complexity of the forecasting task is significantly reduced owing to the emergence of a more regular data pattern.
Figure 4

The first IMF derived from the EMD method for the decomposition of daily water level data with highest frequency.

Figure 4

The first IMF derived from the EMD method for the decomposition of daily water level data with highest frequency.

Close modal
Figure 5

(a) The second IMF to sixth IMF derived from the application of the EMD method and (b) the seventh IMF to residual derived from the application of the EMD method.

Figure 5

(a) The second IMF to sixth IMF derived from the application of the EMD method and (b) the seventh IMF to residual derived from the application of the EMD method.

Close modal
Figure 6

(a) The first SIMF to fifth SIMF from the application of the DEMD method (EMD of first IMF) and (b) the sixth SIMF to SRes from the application of the DEMD method.

Figure 6

(a) The first SIMF to fifth SIMF from the application of the DEMD method (EMD of first IMF) and (b) the sixth SIMF to SRes from the application of the DEMD method.

Close modal

According to Figure 2, the water level data have been transformed into 10 IMF components and one residual component through the EMD method. Each IMF corresponds to a specific frequency component or oscillatory mode present in the original signal. IMF 1 generally captures the highest frequency component, and the subsequent IMFs represent lower frequency components. The residual component represents the remaining part of the signal that is not captured by the 10 IMFs. It includes fine details, high-frequency noise, and any components that could not be effectively modeled by the decomposition process. Hence, the IMF 1 with the highest frequency has chosen to undergo EMD again by using DEMD to improve predictive performance while minimizing overfitting issues. Figure 4(a) and 4(b) depicts the results of data analysis conducted at multiple decomposition levels. SIMF 1 represents higher frequencies, while SIMF 10 encapsulates the lowest frequencies within the decomposed water level data.

Support vector machine and least-squares support vector machine

The accuracy of SVM and LSSVM models is significantly influenced by the careful selection of the kernel and its parameters. Previous studies on hydrologic issues have explored LSSVM and ELSSVM with different kernels, and the RBF has consistently emerged as the most accurate and efficient choice. In this study, we adopt the RBF kernel based on this established effectiveness. The kernel parameters of LSSVM and ELSSVM are denoted as and , respectively, ranged between 0.001 and 10. The penalty factors of LSSVM and ELSSVM are denoted as C and , respectively, ranged between 0.1 and 10. To optimize the parameters, a GA approach is employed and the optimized parameters are shown in Table 1. Table 2 shows the statistical results pertaining to the hybrid technique employed for the estimation of IMFs utilizing LSSVM and ELSSVM methods.

Table 1

The initialization genetic algorithm parameter

ParametersValue
Population size 20 
Maximum number of generation 100 
Selection rate 0.9 
Crossover rate 0.7 
Mutation rate 0.2 
ParametersValue
Population size 20 
Maximum number of generation 100 
Selection rate 0.9 
Crossover rate 0.7 
Mutation rate 0.2 
Table 2

Results of LSSVM and ELSSVM models in estimating IMFs and SIMFs obtained from EMD and DEMD, respectively, in the test phase

LSSVM-GA
ELSSVM-GA
RMSERMSE
EMD IMF1 0.4589 0.2489 0.3598 0.2658 
IMF2 0.0569 0.9485 0.04698 0.9789 
IMF3 0.0258 0.9564 0.0245 0.9897 
IMF4 0.0197 0.9548 0.0154 0.9978 
IMF5 0.0199 0.9632 0.0198 0.9975 
IMF6 0.0117 0.9745 0.0104 0.9974 
IMF7 0.0071 0.9787 0.0052 0.9988 
IMF8 0.0067 0.9799 0.0254 0.9824 
IMF9 0.0041 0.9854 0.0035 0.9856 
IMF10 0.0021 0.9956 0.0014 0.9984 
Res 0.0009 0.9995 0.0009 0.9998 
DEMD SIMF1 0.5869 0.3598 0.4578 0.2569 
SIMF2 0.0569 0.9574 0.0485 0.9854 
SIMF3 0.0249 0.9645 0.0152 0.9942 
SIMF4 0.0248 0.9647 0.0121 0.9974 
SIMF5 0.0185 0.9674 0.0075 0.9984 
SIMF6 0.0098 0.9762 0.0045 0.9987 
SIMF7 0.0087 0.9781 0.0054 0.9992 
SIMF8 0.0074 0.9863 0.0005 0.9993 
SIMF9 0.0049 0.9871 0.0035 0.9997 
SIMF10 0.0030 0.9929 0.0015 0.9999 
SRes 0.0025 0.9985 0.0009 0.9999 
LSSVM-GA
ELSSVM-GA
RMSERMSE
EMD IMF1 0.4589 0.2489 0.3598 0.2658 
IMF2 0.0569 0.9485 0.04698 0.9789 
IMF3 0.0258 0.9564 0.0245 0.9897 
IMF4 0.0197 0.9548 0.0154 0.9978 
IMF5 0.0199 0.9632 0.0198 0.9975 
IMF6 0.0117 0.9745 0.0104 0.9974 
IMF7 0.0071 0.9787 0.0052 0.9988 
IMF8 0.0067 0.9799 0.0254 0.9824 
IMF9 0.0041 0.9854 0.0035 0.9856 
IMF10 0.0021 0.9956 0.0014 0.9984 
Res 0.0009 0.9995 0.0009 0.9998 
DEMD SIMF1 0.5869 0.3598 0.4578 0.2569 
SIMF2 0.0569 0.9574 0.0485 0.9854 
SIMF3 0.0249 0.9645 0.0152 0.9942 
SIMF4 0.0248 0.9647 0.0121 0.9974 
SIMF5 0.0185 0.9674 0.0075 0.9984 
SIMF6 0.0098 0.9762 0.0045 0.9987 
SIMF7 0.0087 0.9781 0.0054 0.9992 
SIMF8 0.0074 0.9863 0.0005 0.9993 
SIMF9 0.0049 0.9871 0.0035 0.9997 
SIMF10 0.0030 0.9929 0.0015 0.9999 
SRes 0.0025 0.9985 0.0009 0.9999 

Based on Table 2, the results underscore the superior predictive capabilities of the LSSVM model when compared to the ELSSVM model, particularly in the context of IMFs generated through the EMD and DEMD techniques. For example, when considering IMF3 of EMD, the RMSE value for the LSSVM model stands at 0.0258, while the ELSSVM model exhibits an RMSE value of 0.0245. Next, consider IMF3 of DEMD, the RMSE value for the LSSVM model stands at 0.0249, while the ELSSVM model exhibits an RMSE value of 0.0152. Therefore, we can conclude that ELSSVM models have shown better results compared to LSSVM models with the help of data decompositions. The IMF 1 and SIMF 1 often contain high-frequency noise or fast-varying components that might not follow a predictable pattern. Attempting to forecast such noise can result in poor performance, as the noise may not have a consistent structure over time. However, combining the forecasting results from multiple IMFs gives less weight to the last IMF and improves overall forecasting performance. On the other hand, the first IMF of EMD tends to capture high-frequency noise or rapid fluctuations in the signal that results in poor forecasting with low values of . Therefore, DEMD has been implemented only for IMF 1.

Discussion on forecasting

This section will provide a comprehensive discussion of the experimental findings. To determine the optimal forecasting model, we will compare six different models: LSSVM-GA, EMD-LSSVM-GA, DEMD-SVM-GA, ELSSVM, EMD-ELSSVM-GA, and DEMD-LSSVM-GA. After modeling the IMFs using LSSVM and ELSSVM models, we will merge the estimated series, resulting in the reconstruction of the predicted daily water level time series. Figure 7 presents a visual comparison of water level predictions generated by the proposed models alongside observed water levels. Given the primary objective of developing a model for water level prediction, it is important to note that the training data will be used to construct the model, which will be subsequently evaluated on testing data. Consequently, the discussion will revolve around the testing error analysis to identify the most effective model for predicting water levels. Figure 8 shows the enlarged version of prediction and actual value between 15 December and 1 January for every consequence year in the testing error. Figure 9 shows scatterplots for every models that act as a visual tool that illustrates the connection between two variables. When it comes to fitting a model, a scatterplot is frequently employed to compare the predicted values of the model (on the x-axis) against the actual observed values (on the y-axis). Table 3 presents a visual representation of the statistical results obtained from hybrid models. Upon comparing the performance of LSSVM and ELSSVM models, it becomes evident that the decomposition-driven hybrid models exhibit higher accuracy and reduced error rates. These results underscore the advantages of integrating decomposition techniques into the modeling process.
Table 3

The testing errors generated by each of the models

RMSE (m)
LSSVM-GA 0.5186 0.5049 
EMD-LSSVM-GA 0.4976 0.5246 
DEMD-LSSVM-GA 0.2987 0.7178 
ELSSVM-GA 0.4769 0.6648 
EMD-ELSSVM-GA 0.3642 0.6895 
DEMD-ELSSVM-GA 0.2536 0.8596 
RMSE (m)
LSSVM-GA 0.5186 0.5049 
EMD-LSSVM-GA 0.4976 0.5246 
DEMD-LSSVM-GA 0.2987 0.7178 
ELSSVM-GA 0.4769 0.6648 
EMD-ELSSVM-GA 0.3642 0.6895 
DEMD-ELSSVM-GA 0.2536 0.8596 
Figure 7

Comparison of observed and predicted water level of proposed models.

Figure 7

Comparison of observed and predicted water level of proposed models.

Close modal
Figure 8

Observed and predicted values from 15 December to 1 January 2020, 2021, and 2022.

Figure 8

Observed and predicted values from 15 December to 1 January 2020, 2021, and 2022.

Close modal
Figure 9

The scatterplots for every model.

Figure 9

The scatterplots for every model.

Close modal

In Figure 8, the decomposition method with ELSSVM that added bias error term introduces a bias toward certain predictions, especially on days with extreme measurements. While this bias improves performance on average, it leads to suboptimal predictions on days with extreme values. In contrast, the decomposition method with LSSVM model's ability to adapt its decision boundary without introducing excessive bias results in more accurate predictions for extreme scenarios during November and December 2021. However, despite the existing model showing better performance during this time period, it is essential to consider the overall performance metrics such as RMSE and values. These metrics evaluate the accuracy and goodness of fit of the model across the entire dataset, providing a comprehensive assessment of its predictive capability. The proposed model's superior performance in terms of these metrics indicates its overall effectiveness.

Analyzing the data presented in Table 3 reveals a notable difference in the performance of the LSSVM and ELSSVM models when tested on the dataset. Specifically, ELSSVM demonstrates a higher level of effectiveness compared to LSSVM, indicating that ELSSVM models hold the potential for superior forecasting compared to LSSVM models. This can be attributed to ELSSVM's efficiency in handling large-scale problems, particularly with parameter optimization, and its compatibility with data decomposition methods. The findings indicate that the ELSSVM results are typically satisfactory. Despite a very small difference in performance metrics, ELSSVM generally enhances forecasting accuracy. This study underscores the effectiveness and adaptability of LSSVM-type models in addressing time series challenges. ELSSVM emerges as a promising option for water level forecasting in the Klang River (Malaysia).

In the context of LSSVM and ELSSVM models, the integration of various decomposition methods (EMD and DEMD) that are optimized by GA highlights a significant improvement in the performance of proposed models based on values compared to the single model. The hybrid LSSVM models have outperformed single LSSVM based on the errors presented in Table 3. The EMD-LSSVM-GA and DEMD-LSSVM-GA leverage the complementary strengths of both approaches. While decomposition methods capture nuanced patterns inherent in the signal, LSSVM excels in effectively classifying these patterns. The integration of these methodologies creates a synergistic effect, where the hybrid model benefits from the detailed feature representation obtained through decomposition, leading to improved classification accuracy. As climate change intensifies, the versatility of hybrid models becomes essential for providing reliable forecasts that aid in proactive water resource management, flood preparedness, and sustainable river basin planning. In fact, DEMD-LSSVM-GA outperforms EMD-LSSVM by 36.83%. This substantial enhancement in testing data results suggests that the proposed DEMD-LSSVM-GA models exhibit strong predictive capabilities. DEMD's strength lies in its ability to handle changing patterns in water data due to factors like climate change. First, decomposition separates different frequencies in the data more effectively, helping to capture important features that influence water levels. However, the first IMF has the highest frequency. Hence, only IMF 1 undergoes DEMD. DEMD's approach of making these features independent from each other is particularly helpful when multiple factors, like weather and human activities, affect water levels. This makes DEMD a powerful tool for accurate water level predictions, especially in the face of complex and changing environmental conditions.

On the other hand, the superior performance of the hybrid ELSSVM model, integrating ELSSVM with a decomposition method, can be attributed to several key factors. One primary advantage lies in the enhanced feature representation achieved through the decomposition process. This decomposition method effectively breaks down the input signal into IMFs or components, each representing specific frequency patterns or oscillations. This is particularly beneficial for ELSSVM, as it excels in handling complex, nonlinear relationships between features and labels. The additional information obtained from the decomposition step enables the ELSSVM classifier to better discern intricate patterns in the data, leading to improved generalization and classification accuracy. Moreover, hybrid models excel in capturing the complex interactions influenced by climate change, where shifting precipitation patterns, extreme weather events, and sea-level rise contribute to dynamic water level fluctuations. The combination of decomposition methods (EMD and DEMD) with ELSSVM, are known as EMD-ELSSVM-GA and DEMD-ELSSVM-GA, respectiveky. DEMD-ELSSVM-GA showcased a remarkable 24.67% improvement over EMD-ELSSVM-GA. This outcome underscores the substantial enhancement in daily water level estimation achieved through the hybrid technique involving DEMD data decomposition. In summary, the evaluation of various models reveals that DEMD-ELSSVM-GA exhibits a performance metric close to 1, indicating a highly effective prediction model compared to its counterparts. Therefore, it can be reasonably concluded that the DEMD-ELSSVM-GA model stands as the most suitable technique for forecasting daily water levels based on the justifications.

Table 3 presents insightful comparisons of individual models, revealing notable improvements in performance metrics. Specifically, the EMD-ELSSVM model exhibits superior performance compared to EMD-LSSVM, showcasing reductions in RMSE for testing data by 26.81%. The fusion of the DEMD approach with the ELSSVM model results in heightened prediction efficiency compared to DEMD-LSSVM, with substantial reductions in RMSE value for testing data by 15.1%. Flood dynamics are often influenced by complex, nonlinear interactions between various hydrological variables. ELSSVM, being specifically designed for nonlinear regression tasks, can better capture these intricate relationships compared to LSSVM. It is noteworthy that the adoption of EMD and DEMD methodologies has significantly improved both the accuracy of LSSVM and ELSSVM models in the context of this study. When contrasted with the EMD method, DEMD excels in its ability to efficiently distinguish between tidal signals and noise. This effectiveness can be attributed to DEMD's adaptable decomposition traits, its strong theoretical foundation, and its capacity to effectively mitigate high frequency. When these qualities are coupled with LSSVM's and ELSSVM's capability for forecasting, it results in the attainment of remarkably accurate tidal forecast predictions.

In summary, the utilization of the EMD method for data analysis did not lead to an improvement in the information received by the LSSVM and ELSSVM models. This outcome can be attributed to the inadequate data decomposition performed by EMD. In the context of this research, the DEMD method integrated with LSSVM and ELSSVM to enhance the data preprocessing method that enhanced the performance of LSSVM and ELSSVM models. Furthermore, the findings strongly suggest that the DEMD-LSSVM-GA and DEMD-ELSSVM-GA models proved to perform better in predicting the daily water level.

LSSVM models have been widely applied in the domain of water level prediction. The current study introduces an ELSSVM model, which incorporates a bias error control mechanism. Nonetheless, prior studies often neglected the incorporation of data features when constructing these models, leaving room for potential improvements in prediction accuracy. In this research, we introduced an approach for daily water level prediction, employing data decomposition principles to augment the predictive performance of daily water levels. Decomposition techniques, namely, EMD and DEMD were employed to break down the original daily water level dataset into individual IMF components characterized by reduced complexity and pronounced periodicity. Moreover, the machine learning models that have been utilized to forecast water level in this study are LSSVM and ELSSVM. Some researchers argued that a single LSSVM and ELSSVM model is not the best technique to forecast hydrological data. Therefore, this study has implemented the decomposition method with SVM and LSSVM. By comparing the machine learning methods, ELSSVM-GA, EMD-ELSSVM-GA, and DEMD-ELSSVM have outperformed LSSVM-GA, EMD-LSSVM-GA, and DEMD-LSSVM-GA, respectively. According to the experimental results, the DEMD-LSSVM-GA and DEMD-ELSSVM-GA were chosen as the best models in terms of multiple performance metrics by comparing them to other decomposition models. Overall, by comparing all models, the results demonstrated that the DEMD-ELSSVM-GA model outperformed the other models based on the performance analysis in forecasting the water level for Klang River in Sri Muda, Malaysia, with RMSE = 0.2536 m and = 0.8596 for testing data that indicate the forecasting accuracy. On the other hand, hybrid models relying on EMD displayed poor performance.

Researchers can explore by focusing on three aspects. The ELSSVM demonstrates significant potential to forecast water levels in the Klang River, Malaysia. However, future research should focus on optimizing forecast extrapolation and refining error control mechanisms to enhance its effectiveness further. Furthermore, conducting additional research into water level prediction using further decomposed hydrological data is recommended such as variational mode decomposition-based hybrid models. Finally, a comparison between other machine learning models such as radial basis neural networks and artificial neural networks can be investigated.

All the authors gratefully acknowledged the financial support from the ‘Ministry of Higher Education Malaysia for Fundamental Research Grant Scheme with Project Code: FRGS/1/2022/STG06/USM/03/1’

Vikneswari Someetheram: conceptualization, software, project administration, formal analysis, writing. Muhammad Fadhil Marsani: supervision. Mohd Shareduwan Mohd Kasihmuddin: supervision. Siti Zulaikha Mohd Jamaludin: validation. Mohd. Asyraf Mansor: validation, funding acquisition. All authors have read and agreed to the published version of the manuscript.

The research is fully funded and supported by the Ministry of Higher Education Malaysia for Fundamental Research Grant Scheme, FRGS/1/2022/STG06/USM/03/1.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Ahmed
A. M.
,
Deo
R. C.
,
Ghahramani
A.
,
Feng
Q.
,
Raj
N.
,
Yin
Z.
&
Yang
L.
2022
New double decomposition deep learning methods for river water level forecasting
.
Science of the Total Environment
831
,
154722
.
Chowdhury
A. R.
,
Guhathakurta
K.
&
Mukherjee
I.
2008
Empirical mode decomposition analysis of two different financial time series and their comparison
.
Chaos, Solitons and Fractals
37
,
1214
1227
.
Colominas
M. A.
,
Schlotthauer
G.
&
Torres
M. E.
2015
An unconstrained optimization approach to empirical mode decomposition
.
Digital Signal Processing
40
,
164
175
.
Cortes
C.
&
Vapnik
V.
1995
Support-vector networks. Machine Learning 20, 273–297.
doi:10.1007/BF00994018.
Dehghan
Y.
,
Amini Zenooz
S. M.
&
Pour
Z. F.
2022
Analysis of sea level fluctuations around the Australian coast with anomaly time series analysis approach
.
Marine Environmental Research
181
,
105742
.
Faruq
A.
,
Marto
A.
&
Abdullah
S. S.
2021
Flood forecasting of Malaysia Kelantan river using support vector regression technique
.
Computer Systems Science and Engineering
39
,
297
306
.
He
S.
,
Sang
X.
,
Yin
J.
,
Zheng
Y.
&
Chen
H.
2022
Short-term runoff prediction optimization method based on BGRU-BP and BLSTM-BP neural networks
.
Water Resources Management
37
,
747
768
.
Huang
N. E.
,
Shen
Z.
,
Long
S. R.
,
Wu
M. C.
,
Shih
H. H.
,
Zheng
Q.
,
Yen
N.
,
Tung
C.
&
Liu
H. H.
1998
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis
.
Proceedings of the Royal Society of London Series A
454
,
903
995
.
Loh
E. C.
,
Ismail
S. B.
&
Khamis
A.
2019
Empirical mode decomposition couple with artificial neural network for water level prediction
.
Civil Engineering and Architecture
7
(
6A
),
19
32
.
Seo
Y.
,
Kim
S.
&
Singh
V. P.
2016
Physical interpretation of river stage forecasting using soft computing and optimization algorithms
. In:
Harmony Search Algorithm
.
Springer
,
Berlin, Heidelberg, Germany
, pp. 259–266.
Sukanya
K.
&
Vijayakumar
P.
2023
Frequency control approach and load forecasting assessment for wind systems
.
Intelligent Automation and Soft Computing
35
,
971
982
.
Sumi
S. M.
,
Zaman
M. F.
&
Hirose
H.
2012
A rainfall forecasting method using machine learning models and its application to the Fukuoka city case
.
International Journal of Applied Mathematics and Computer Science
22
,
841
854
.
Tang
G.
,
Zhang
Y.
&
Wang
H.
2018
Multivariable LS-SVM with moving window over time slices for the prediction of bearing performance degradation
.
Journal of Intelligent & Fuzzy Systems
34
(
6
),
3747
3757
.
Wang
H. F.
&
Hu
D.
2005
Comparison of SVM and LS-SVM for regression. In: Neural Networks and Brain. International Conference on IEEE. 2005 (1), 279e83.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).