ABSTRACT
Hydrological prediction is crucial for managing water resources, and innovations like machine learning (ML) present an opportunity to enhance predictive modeling capabilities. The aim of this study is to compare the usage of ML algorithms, such as CatBoost, with traditional techniques such as ridge regression, support vector machines (SVMs), and gene expression programming (GEP) in climate projection. The investigation found that CatBoost was superior to conventional models in the testing period, with RMSE 3.78 m3/s, MAE 2.613 m3/s, Kling–Gupta efficiency (KGE) 0.650, root mean square error to standard deviation ratio (RSR) 0.611 and NSE 0.626. After it was proven that the best-performing model is CatBoost, future projections according to the NorESM2-MM scenarios were calculated using this model. Climate projections are based on simulations from the Coupled Model Intercomparison Project Phase 6 model, utilizing shared socioeconomic pathway (SSP) scenarios. The results show that SSP3-7.0 and SSP5-8.5 scenarios indicate an increasing trend between 2015 and 2100, while SSP1-2.6 and SSP2-4.5 expect a balancing tendency. This suggests that climate change has little effect on the measuring station and its basin and that the flow is increasing positively.
HIGHLIGHTS
Comparing CatBoost with conventional models has been studied.
Future projections of streamflow with CatBoost have been done under different emission scenarios.
The results reveal that the CatBoost approach considerably increased the accuracy of performance predictions.
INTRODUCTION
Climate change has a profound impact on hydrology, affecting precipitation patterns, streamflow behavior, and water availability across multiple locations. As global temperatures rise, increasing evaporation rates can exacerbate droughts in certain regions while causing heavy rains and flooding in others (IPCC 2021). Effective water resource management and hydropower generation require accurate discharge forecasts. Future climate change will significantly impact precipitation, discharge, and hydrometeorology, all of which are important sources of hydroelectric electricity (Guven & Şebcioğlu 2019). Therefore, climate change projections are essential to identifying expected outcomes and developing effective plans and strategies.
Global climate models (GCMs) are major tools in climate science that simulate the Earth's climate system by projecting large-scale climate variables under various greenhouse gas emission scenarios. However, GCMs typically operate at coarse spatial resolutions, limiting their usefulness for local or regional climate evaluations. To solve this restriction, GCM outputs are refined using statistical and dynamic downscaling approaches, resulting in finer-scale climate projections. Statistical downscaling is the application of statistical methods to determine correlations between large-scale climate variables predicted by GCMs and local climate measurements. Dynamic downscaling, on the other hand, uses high-resolution regional climate models (RCMs) to simulate the climate system at a finer scale, taking into account physical processes and local geography (Giorgi & Mearns 1999). While dynamic downscaling allows for a more comprehensive and physically consistent portrayal of local climate events, it also requires more resources. Both strategies are critical for converting GCM outputs into useful data for climate impact assessments and adaptation planning.
The Coupled Model Intercomparison Project Phase 6 (CMIP6) provides an important foundation for improving the knowledge and dependability of GCMs. By offering defined protocols for model intercomparison, CMIP6 promotes collaboration across disparate research groups and ensures consistency in the use of GCM results for climate projections (Eyring et al. 2016). Recent researchers have demonstrated how the data from the CMIP can be used to improve hydrological forecasts by strengthening the integration of climate models with hydrological forecasting systems. Researchers are working to increase the accuracy of regional hydrological models by integrating higher-resolution CMIP outputs. Alaminie et al. (2023) studied high-resolution CMIP6 GCM datasets to simulate discharge and maximum annual flood, and coupling this model with climate model scenarios of CMIP6 resulted in a forecasting maximum annual discharge in the basin that helps decision-makers to plan for adaptation and mitigation. Kartal (2024) evaluated the comparative effectiveness of multiple machine learning (ML) techniques for bias correction in hydrological forecasting within the context of the CMIP6 scenarios, particularly focusing on an underexplored region. Rudraswamy et al. (2023) used CMIP6 outputs and combined them with multiple hydrological models, and the study demonstrated that improved downscaling methods can substantially reduce uncertainties in predicting future water availability under different climate scenarios.
NORESM2-MM (Norwegian Earth System Model version 2, Medium Resolution) is an advanced global climate model that takes part in the CMIP6. NORESM2-MM is intended to model climatic processes at several scales, incorporating both physical and biogeochemical components of the Earth system. Contributing to CMIP6 enables standardized comparisons with other models, resulting in a better knowledge of climate dynamics and variability (Bentsen et al. 2020). This partnership strengthens climate projections by utilizing a wide range of model outputs, influencing research on climate change consequences and adaptation options (Eyring et al. 2016). The findings of NORESM2-MM within the CMIP6 framework are critical for policymakers and scientists addressing climate-related concerns.
Forecasting models and ML seem to gain popularity for future projections under climate change scenarios. According to Yilmaz et al. (2024), the band similarity (BS) approach, which is relatively new in the literature, is also an effective way to improve the monthly flow forecast performance.
While there are many benefits to traditional forecasting techniques that describe physical processes, scientists are starting to recognize the potential of data-driven models, such as ML algorithms. While there are initiatives to employ data-driven models in place of classical models, there are also them in combination with traditional models (Szczepanek 2022).
Conventional forecasting techniques depend on linear correlations and statistical presumptions in the context of historical time series data. These approaches’ efficacy may be constrained by their linear structure and assumptions about data stationarity, and they frequently call for the explicit modeling of trends and seasonality (Box et al. 2015).
Ridge regression is a sophisticated statistical approach used in climate modeling to solve multicollinearity between predictor variables. In climate studies, where datasets may contain strongly correlated elements such as temperature, humidity, and greenhouse gas concentrations, standard regression models may produce unstable results. Ridge regression addresses this by including a penalty term in the loss function, which effectively reduces the coefficients of correlated predictors, resulting in more robust and interpretable models (Hoerl & Kennard 1970).
Support vector machines (SVMs) are capable of more complicated patterns in climate data by utilizing kernel functions to convert input characteristics into higher-dimensional spaces, which allows them to manage non-linear correlations (Cortes & Vapnik 1995). By maximizing the margin between distinct classes in the dataset, SVMs can provide reliable classification and regression skills, making them suited for complicated climate systems where typical linear models may fail. SVM has four main advantages. For starters, it includes a regularization parameter that lets the user lessen or eliminate difficulties caused by overfitting. Second, SVM is based on a convex optimization problem for which accurate algorithms exist. Third, it is believed to be limited by the test error rate, and there is a substantial body of theory supporting it, indicating that it is a good notion. The final and most significant advantage of SVM is that it uses the kernel trick (function) to generate expert knowledge about the researched phenomenon, hence minimizing both model complexity and estimation error (Guven & Pala 2022).
Gene expression programming (GEP) is an investigation method that develops algorithms and expressions to solve issues automatically (Traore & Guven 2012). GEP works by evolving computer programs represented as linear chromosomes to improve the prediction performance based on a specific dataset. GEP works with two main components: a chromosome structure and tree structures representing this structure. Chromosomes are formed by a combination of various genes, and these genes are used to create tree structures representing specific functions, coding individuals as fixed-length chromosome expression trees. Thus, during the search for a solution, genetic operators such as natural selection, mutation, and crossover come into play. The unique multi-gene structure of GEP allows for the evolution of complex programs that include several subprograms. This flexibility allows researchers to explore non-linear interactions that typical modeling tools might overlook (Ferreira 2001). This programming model aims to find a solution to a specific problem by using the genetic information of individuals in a population. In recent years, GEP has gained popularity in hydrologic engineering. This method has been used to forecast hydrometeorological variables (Guven & Aytek 2009; Guven & Talu 2010; Traore & Guven 2013; Guven et al. 2022, 2024).
Each method offers different capabilities, with ridge regression providing simplicity, SVMs offering flexibility in handling non-linear data, and GEP uses evolutionary optimization to provide flexibility. In contrast to conventional models, ML algorithms, particularly deep learning and ensemble methods, provide promising advances by evaluating large datasets and discovering patterns that conventional models may miss (Reichstein et al. 2019).
CatBoost is an abbreviation of the term ‘Categorical Boosting’. The CatBoost method, developed by Yandex and gaining popularity, is one of the ML techniques. CatBoost, as a gradient boosting technique for categorical data, uses decision trees to represent complex, non-linear connections without requiring extensive preprocessing (Prokhorenkova et al. 2018). First, it randomly splits the data into subsets and builds a series of decision trees on each subset. Categorical variables are used directly without being converted into numerical values. CatBoost also applies an ‘ordered boosting’ strategy that improves the prediction of the target variable with ranking and other techniques. In conclusion, while reducing the risk of overfitting, the model provides better generalization. CatBoost incorporates categorical feature encoding into its architecture and enhances its ability to capture complex patterns in the data by using a permutation-based approach to avoid overfitting (Dorogush et al. 2018). Kumar et al.’s (2023a, b) study showed that CatBoost produces accurate and dependable rainfall forecasts with daily data. CatBoost displays a remarkable performance and achieves impressive R2 values on both the training and validation data, demonstrating a strong fit to the data and accurately capturing the variation in the target variable (Kumar et al. 2023a, b). According to Rathnayake et al. (2023), the CatBoost algorithm outperforms the general black-box algorithms in hydrological modeling prediction.
The Mann–Kendall test is a non-parametric statistical method frequently used in hydrology, environmental studies, and climate change analysis to identify patterns in time series data. Without presuming a particular distribution of the data, the test determines whether the data exhibit a statistically significant monotonic upward or downward trend over time. Each data point in the series is compared with every other data point, and the differences are ranked. A negative sign is assigned if the later value is less than the earlier one, and a positive sign if it is greater. The test statistic is then calculated to see if the trend is statistically significant after adding up these signs (Kendall 1948). This approach's simplicity and capacity to manage non-normally distributed data make it ideal for long-term environmental datasets that might be impacted by non-linear trends and outliers (Hamed & Rao 1998).
When applying the Mann–Kendall test, the test statistic, which is commonly represented by the letter S, is calculated. Its significance is then assessed using a permutation approach for smaller datasets or a normal approximation for larger sample sizes. The test is frequently applied in practice to examine long-term trends in hydrological data, including temperature records, rainfall patterns, and river discharge. For instance, it has been applied to hydrology to find patterns in streamflow data in order to evaluate the effects of land use or climate change (Burn & Elnur 2002). In many areas of environmental monitoring and research, the Mann–Kendall test is a useful tool because it offers a dependable and effective way to analyze trends without assuming anything about the data's underlying distribution.
In this study, conventional methods, which are ridge regression, SVM, and GEP, are compared with CatBoost, a new ML technique, for accuracy in predicting data gauge stations in the Ceyhan Basin. According to the result, the best model was determined and employed for future projection. For future predictions, shared socioeconomic pathway (SSP) scenarios completed by using the NorESM2-MM model, which is improved to incorporate the complexities of climate–river interactions in a critical water resource region, were used. Continuing this section, an extensive overview of the study region and data are presented. The methodology for the study is then described. The results section summarizes the conclusions, including projected patterns of mean discharge applying various models and scenarios.
STUDY AREA AND DATA USED
The investigation needed two datasets: the monthly discharge data that was used as the predictand was collected from the local station in the basin. The second data collection consists of large-scale predictor variables from version 2 of the Norwegian Earth System Model (NorESM2).
Location of the gauge station and digital elevation model (DEM) contours of the watershed.
Location of the gauge station and digital elevation model (DEM) contours of the watershed.
The second dataset is the large-scale predictor variables obtained from the Norwegian Earth System Model version 2 (NorESM2) which is prepared as part of the sixth phase of the Coupled Model Intercomparison Project (CMIP6). The NorESM2 model has been preferred because it can simulate the complex dynamics of the climate system by providing a more detailed interaction between the atmosphere, ocean, land, and ice components at different resolution levels, showcasing local climate characteristics. The model has a low-resolution version (NorESM2-LM) and a medium-resolution version (NorESM2-MM) of NorESM2. NorESM2-MM, with greater resolution, was used to include predictor factors in the ensemble.
The datasets have atmospheric horizontal resolution and are based on a 64 by 128 latitude–longitude global Gaussian grid with a spectral truncation of T42. The CanESM5's native grid has a consistent longitudinal resolution of 2.8125° and approximately a uniform latitudinal resolution of 2.8125°. NorESM2 data were interpolated to meet the CanESM5 grid.
Predictions for each grid cell are stored in files named BOX_iiiX_jjY, where iii is the longitudinal index and jj is the latitudinal index. The dataset is downloaded as a zip file from the Canadian Climate Data and Scenarios (https://climate-scenarios.canada.ca/?page=pred-cmip6#cmip6-predictors) choosing the relevant grid cell. This can be completed by manually entering the central point of the stream or by selecting the cell from the rectangle map on the page. By pressing the retrieve data button, a zip file named ‘BOX_014X_46Y’ was obtained for the specified grid cell. The dataset includes 26 variables for four scenarios: SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5. The daily predictor dataset was converted to monthly average values. Table 1 describes the NorESM2 variables.
List of the NorESM2 variables (predictors) IDs and corresponding variable names
No. . | Variable ID . | Predictor variable . |
---|---|---|
1 | mslp | Mean sea level pressure |
2 | p1_f | 1,000 hPa Wind speed |
3 | p1_u | 1,000 hPa Zonal wind component |
4 | p1_v | 1,000 hPa Meridional wind component |
5 | p1_z | 1,000 hPa Relative vorticity of true wind |
6 | p1th | 1,000 hPa Wind direction |
7 | p1zh | 1,000 hPa Divergence of true wind |
8 | p5_f | 500 hPa Wind speed |
9 | p5_u | 500 hPa Zonal wind component |
10 | p5_v | 500 hPa Meridional wind component |
11 | p5_z | 500 hPa Relative vorticity of true wind |
12 | p5th | 500 hPa Wind direction |
13 | p5zh | 500 hPa Divergence of true wind |
14 | p8_f | 850 hPa Wind speed |
15 | p8_u | 850 hPa Zonal wind component |
16 | p8_v | 850 hPa Meridional wind component |
17 | p8_z | 850 hPa Relative vorticity of true wind |
18 | p8th | 850 hPa Wind direction |
19 | p8zh | 850 hPa Divergence of true wind |
20 | p500 | 500 hPa Geopotential |
21 | p850 | 850 hPa Geopotential |
22 | prcp | Total precipitation |
23 | s500 | 500 hPa Specific humidity |
24 | s850 | 850 hPa Specific humidity |
25 | shum | 1,000 hPa Specific humidity |
26 | temp | Air temperature at 2 m |
No. . | Variable ID . | Predictor variable . |
---|---|---|
1 | mslp | Mean sea level pressure |
2 | p1_f | 1,000 hPa Wind speed |
3 | p1_u | 1,000 hPa Zonal wind component |
4 | p1_v | 1,000 hPa Meridional wind component |
5 | p1_z | 1,000 hPa Relative vorticity of true wind |
6 | p1th | 1,000 hPa Wind direction |
7 | p1zh | 1,000 hPa Divergence of true wind |
8 | p5_f | 500 hPa Wind speed |
9 | p5_u | 500 hPa Zonal wind component |
10 | p5_v | 500 hPa Meridional wind component |
11 | p5_z | 500 hPa Relative vorticity of true wind |
12 | p5th | 500 hPa Wind direction |
13 | p5zh | 500 hPa Divergence of true wind |
14 | p8_f | 850 hPa Wind speed |
15 | p8_u | 850 hPa Zonal wind component |
16 | p8_v | 850 hPa Meridional wind component |
17 | p8_z | 850 hPa Relative vorticity of true wind |
18 | p8th | 850 hPa Wind direction |
19 | p8zh | 850 hPa Divergence of true wind |
20 | p500 | 500 hPa Geopotential |
21 | p850 | 850 hPa Geopotential |
22 | prcp | Total precipitation |
23 | s500 | 500 hPa Specific humidity |
24 | s850 | 850 hPa Specific humidity |
25 | shum | 1,000 hPa Specific humidity |
26 | temp | Air temperature at 2 m |
METHODOLOGY
Data process
The dataset included monthly observed discharge for the years between 1988 and 2014, expressed in cubic meters per second (m3/s). The standardization of a dataset is a common preprocessing step for many ML estimators. Typically, this is done by removing the mean and scaling to unit variance. The robust scaler is an excellent tool for handling data that is not normally distributed or contains outliers. It is a feature scaling technique that uses robust statistics to scale features instead of the mean and standard deviation, as is the case with standard scaling techniques like standard scaler. In certain situations, specifically for algorithms that are sensitive to feature scaling, it guarantees that extreme values do not disproportionately affect feature scaling, improving the model performance. Normalization was performed using the robust scaler technique for GEP, SVM, and ridge regression, while normalization was not applied to CatBoost as tree-based gradient boosting models do not require normalization. To determine the optimum model performance, the data from the station between the years 1988 and 2014 were divided into two parts: train and test. This phase is critical for verifying that the model accurately captures observed climate occurrences. In contrast, the test phase entails applying the trained model to unseen data, often from a subsequent time frame, to assess its predicted accuracy and generalizability. The scatter plot graphs showing the correlation values for the test period identified for each method as a result of the analysis are presented.
Downscaling using GEP and SVM techniques
In hydrological prediction, GEP has proven to be effective, especially for tasks involving complex non-linear relationships, such as streamflow forecasting and rainfall-runoff modeling. GEP is a promising strategy for long-term water resource management and flood prediction since recent research has shown that it can increase the precision and generalizability of hydrological models. In contrast to more conventional techniques such as regression models and neural networks, GEP offers an efficient tool for capturing the dynamic behavior of environmental systems by developing mathematical expressions that represent hydrological processes (Ferreira 2001).
In order to solve problems, GEP consists of four primary steps. This study highlights that choosing a set of functions to be used is the first step. Determining the chromosome structure, which includes defining the quantity and size of genes, is the second step. Choosing the linking function is the third step. Assessing fitness with a particular metric is the last phase. The foundation of GEP consists of these four fundamental steps, which must be carefully followed in order to produce useful results in problem-solving applications. The GEP model's main purpose is to generate a mathematical equation from training data (Fuladipanah et al. 2023).
To find the best answers, this method iteratively improves the population, simulating natural evolution (Ferreira 2001).
The evolution of solutions in GEP is influenced by hyperparameters such as population size and mutation rate, which have an impact on model accuracy and convergence. The population size = 20,000, Generations = 40, Subtree mutation = 0.1, and Parsimony coefficient = 0.01 of the GEP are the hyperparameters used in this study.
SVMs are a popular class of supervised learning algorithms for tasks involving regression and classification. Finding the best hyperplane to divide data points from various classes in a feature space is how they operate. The data points cannot always be successfully separated by a straightforward straight line or hyperplane, though, because the original space might not always permit linear separability. Under such circumstances, SVMs employ a method called the kernel trick, which involves mapping the data into a higher-dimensional space where linear separation is feasible (Mehdizadeh et al. 2017). The following crucial processes are involved in the implementation of an SVM: parameter tuning, model training, model evaluation, kernel selection, and data preparation.
The first step is to prepare the data by splitting it into training and test sets. Missing values and normalization are also completed in this step. This is particularly important in SVM because the algorithm relies on calculating the distances between data points in high-dimensional space. Then, it uses kernels to transform the data into a higher-dimensional space. The most popular kernels are linear, polynomial, and radial basis function (RBF) kernels; the choice of kernel depends on the problem and the type of data, with RBF working best for non-linear classification problems. Then, the model is trained by determining the best hyperplane in the dataset that maximizes the margin between the various classes. After the model is trained, its performance is evaluated using test data. SVM's capacity to generalize and prevent overfitting is governed by parameters like the kernel type, regularization, and margin. The kernel, gamma, epsilon = 1, and regularization parameter (C) = 40 of the SVM are the hyperparameters used in this study.
The efficacy of SVM is rooted in its capacity to identify a decision boundary that exhibits good generalization, even in intricate and high-dimensional spaces (Cortes & Vapnik 1995).
In conclusion, SVMs are effective supervised learning instruments that, by utilizing kernels such as the Gaussian (RBF) kernel, can manage non-linearly separable data. To prevent overfitting, they rely on structural risk minimization, and choosing the right hyperparameters – such as the kernel function and regularization coefficient – is essential for the best results.
New ML technique: CatBoost algorithm
(1) Using categorical features during training rather than preprocessing. CatBoost trains on the entire dataset. Prokhorenkova et al. (2018) found that target statistics is an effective strategy for handling categorical features while minimizing information loss. CatBoost randomly permutes the dataset for each example, calculating the average label value for the example with the same category value placed before the supplied one.
(2) Features in combination. The category features could be combined into a new one. CatBoost employs a greedy approach to creating new splits for trees. For the first split in the tree, no combination is examined. However, for the second and subsequent divides, CatBoost uses all predefined combinations with all categorical variables in the dataset. Splits in the tree are treated as two-value categories and combined.
(3) Unbiased boosting using categorical features. Using the TS approach to convert categorical features to numerical values results in a departure from the original distribution. This is a common issue with older GBDT methods.
(4) Rapid scorer. CatBoost employs oblivious trees as baseline predictors, with the identical dividing criterion applied across all levels. These trees are properly sized and resistant to overfitting. In oblivious trees, every single leaf index is encoded as a binary vector of the same length as the tree depth. This technique is commonly employed in CatBoost model evaluators for calculating model predictions, as all binaries use float, statistics, and one-hot features (Huang et al. 2019).
In CatBoost, hyperparameters are essential to the model performance. How well CatBoost handles categorical features and prevents underfitting or overfitting depends on hyperparameters such as learning rate, tree depth, and iterations. The hyperparameters of the CatBoost models used in this study are given in Table 2. Also, the structure of CatBoost is shown in Figure 5.
The optimal hyperparameters for CatBoost models
Method . | Hyperparameters . | Selected value . |
---|---|---|
CatBoost | Colsample-by level | 0.3 |
Depth | 2 | |
Iterations | 70 | |
Bagging Temperature | 100 | |
Border count | 255 | |
Grow policy | Symmetric tree | |
Leaf reg | 4 | |
Learning rate | 0.11 | |
Loss function | Poisson |
Method . | Hyperparameters . | Selected value . |
---|---|---|
CatBoost | Colsample-by level | 0.3 |
Depth | 2 | |
Iterations | 70 | |
Bagging Temperature | 100 | |
Border count | 255 | |
Grow policy | Symmetric tree | |
Leaf reg | 4 | |
Learning rate | 0.11 | |
Loss function | Poisson |
Assessment metrics
Metrics notably root mean squared error (RMSE), mean absolute error (MAE), Kling–Gupta efficiency (KGE), root mean square error to standard deviation ratio (RSR), and Nash–Sutcliffe efficiency (NSE) offer unique insights into prediction model accuracy and performance.
RMSE estimates the average size of prediction errors by using the square root of the average squared differences between predicted and observed values, yielding an error measure in the same units as the data (Hyndman & Athanasopoulos 2018). The square of RMSE emphasizes the average of squared differences without unit normalization, providing greater weight to larger errors (Montgomery et al. 2012). MAE is a statistic that assesses the accuracy of a model's predictions by calculating the average absolute difference between projected and observed values. It provides a simple interpretation of a model's average error, making it a popular choice for regression analysis and forecasting.
To evaluate the model performance, the RMSE and MAE are calculated. An ideal model would have an RMSE and MAE of 0. Both errors are specified in the same units as the predicted quantities.
The NSE, a commonly used statistic to assess the performance of hydrologic models, is calculated. Nash & Sutcliffe (1970) introduced NSE, which evaluates the model performance relative to a mean value model, with values ranging from −∞ to 1, where 1 indicates a perfect model fit and values below zero indicate that the model performs worse than using the mean of observed values as a predictor (Moriasi et al. 2007).
RMSE and MAE quantify error size, while NSE compares model efficiency, which is significant in domains like hydrology where performance versus a simple mean is critical (Krause et al. 2005). In this study, RMSE, MAE, and NSE will be calculated and taken into account to measure the performance of the methods used in calculating the estimated mean discharge value produced for the future using historical data under climate change scenarios.
The KGE metric is used for assessing the performance of hydrological models, especially when comparing simulated streamflow to observed data. By taking into account the relative bias and variability as well as the correlation between the simulated and observed time series, KGE aggregates several aspects of model performance into a single value. Three elements make up the KGE formula: the variability ratio (γ), the bias ratio (β), and the correlation coefficient (r). In the KGE metric, which has a range of −∞ to 1, a value of 1 denotes the perfect agreement between the observed and simulated data, whereas values nearer 0 or negative denote the subpar model performance. KGE evaluates the performance of ML algorithms (Mishra et al. 2024).
In hydrological prediction, the RSR metric is frequently used to evaluate the model performance, especially when forecasting rainfall, river discharge, and other hydrological variables. It is a dimensionless ratio that contrasts the observed data's inherent variability with the model's error magnitude. It is calculated mathematically by dividing the standard deviation of the observed data by the RMSE of the model predictions. Better model performance is indicated by a lower RSR value; in general, a good model fit is indicated by an RSR of less than 0.5. Because it normalizes the error in relation to the variability in the observed data, the RSR metric is useful for comparing various datasets with varying scales and magnitudes (Moriasi et al. 2007).
In this study, historical data obtained from the NorESM2-MM model, which is part of CMIP6, were analyzed using traditional methods such as ridge regression, GEP, and SVM, as well as an innovative method called CatBoost, and the accuracy of each model was tested. When determining accuracy, metrics such as NSE, RMSE, MAE, KGE, and RSR have been taken into account. Later, the method that yielded the best results according to the test period was identified, and future projections were developed based on this method with 26 inputs.
RESULTS
Flow diagram of the decision process for model performance and future projections.
Flow diagram of the decision process for model performance and future projections.
In this research, the large-scale climate variables (26 input sets) collected from the NorESM2-MM are utilized as the predictor for this method of estimation. Predictand uses local discharge data from the relevant station. Finally, a comparison between conventional models and CatBoost is performed. CatBoost was the best-performing model for discharge data during the testing periods. Tables 3 and 4 compare the statistical results for the testing period as well as the observed values from 2007 to 2014. This comparison shows that the GEP model provides the best result in predicting the mean and minimum values. The CatBoost model provides the best result in predicting the NSE = 0.626, RMSE = 3.78 m3/s, and MAE = 2.613 m3/s values, while the ridge regression model performs the poorest in predicting all values. Considering that the KGE value shows good performance as it approaches 1 and the RSR value shows good performance as it approaches 0, it is observed that CatBoost again provided the best result with KGE = 0.650 and RSR = 0.611. Furthermore, all models above the mean values and the best performance for predicting the standard deviation. The model performance results by month are demonstrated in Figure 11.
Comparison of the NSE, KGE, and RSR performances of the observed and predicted Q (m3/s) for the testing period between the years 2007 and 2014
Model . | NSE . | KGE . | RSR . |
---|---|---|---|
Ridge regression | 0.399 | 0.453 | 0.775 |
GEP | 0.44 | 0.517 | 0.748 |
SVM | 0.461 | 0.545 | 0.734 |
CatBoost | 0.626 | 0.650 | 0.611 |
Model . | NSE . | KGE . | RSR . |
---|---|---|---|
Ridge regression | 0.399 | 0.453 | 0.775 |
GEP | 0.44 | 0.517 | 0.748 |
SVM | 0.461 | 0.545 | 0.734 |
CatBoost | 0.626 | 0.650 | 0.611 |
Comparison of the RMSE and MSE performances of the observed and predicted Q (m3/s) for the testing period between the years 2007 and 2014
Model . | RMSE . | MAE . | Mean . | Min . | Max . | Std. deviation . |
---|---|---|---|---|---|---|
Ridge regression | 4.788 | 3.117 | 5.90 | 1.524 | 12.479 | 3.740 |
GEP | 4.621 | 2.779 | 5.504 | 1.765 | 13.698 | 4.050 |
SVM | 4.534 | 2.957 | 5.815 | 1.575 | 13.944 | 4.260 |
CatBoost | 3.780 | 2.613 | 5.953 | 2.380 | 15.731 | 4.600 |
Observed | – | – | 5.262 | 1.641 | 16.361 | 6.210 |
Model . | RMSE . | MAE . | Mean . | Min . | Max . | Std. deviation . |
---|---|---|---|---|---|---|
Ridge regression | 4.788 | 3.117 | 5.90 | 1.524 | 12.479 | 3.740 |
GEP | 4.621 | 2.779 | 5.504 | 1.765 | 13.698 | 4.050 |
SVM | 4.534 | 2.957 | 5.815 | 1.575 | 13.944 | 4.260 |
CatBoost | 3.780 | 2.613 | 5.953 | 2.380 | 15.731 | 4.600 |
Observed | – | – | 5.262 | 1.641 | 16.361 | 6.210 |
Scatterplot of the observed and predicted Q for the testing period with GEP.
Scatterplot of the observed and predicted Q for the testing period with ridge regression.
Scatterplot of the observed and predicted Q for the testing period with ridge regression.
Scatterplot of the observed and predicted Q for the testing period with SVM.
Scatterplot of the observed and predicted Q for the testing period with CatBoost.
Scatterplot of the observed and predicted Q for the testing period with CatBoost.
As illustrated in Figure 11, the models generally overestimated the discharge values compared with the observed data. All models overestimated discharge in February, April, August, September, and October, while in June and July, all models overestimated. Except for these months, there are differences in other months, which are January, March, May, November, and December. In January, the SVM model was underestimated; in March, the ridge regression was overestimated; in May, CatBoost was underestimated; in November, the SVM was underestimated; and in December, the ridge regression was underestimated. The GEP model showed the best performance in February with overestimation percentages of 4.52%, whereas the worst performance overestimation was 85.43% in September. The SVM model provided very close estimates just below the observed values in March and November by 0.35 and 1.98%, and the worst observed in September by 92.3% over. The ridge regression model showed the weakest performance in September over the observed value by 109.44%. The CatBoost model provided very close estimates just below the observed values in May and July by 3.70 and 3.85%, respectively. Overall, the CatBoost model showed the best performance in predicting the observed values, while the ridge regression model performed relatively inadequately in predicting the observed values.
Historical data related to the selected station were divided into train and test sets, and calculations were made using traditional models such as GEP, SVM, and ridge regression, as well as a new method called CatBoost. The analyses demonstrated that CatBoost provided more valid results compared with the other methods. For this reason, the main aim of the study, which is to project the future under climate change, was carried out using CatBoost.
Future projection of discharge with the CatBoost model under SSP scenarios
The SSP scenarios depict various socioeconomic growth and greenhouse gas emission pathways that influence future climate conditions (O'Neill et al. 2016). SSPs or scenarios highlight specific scenarios and are typically linked to forecasts about greenhouse gas emissions, socioeconomic trends, and climatic impacts. The scenario types and their content are shown below.
SSP1-2.6 describes a scenario with ambitious climate goals and low emissions, with a target temperature increase of about 2 °C above pre-industrial levels.
SSP2-4.5 describes a scenario with moderate climate policy and medium emissions, resulting in a 2.5–3 °C temperature rise.
SSP3-7.0 denotes a high-emission scenario with limited climate policy and massive greenhouse gas emissions, potentially resulting in severe warming of 3.5–4 °C.
SSP5-8.5 represents an extremely high-emission scenario with a focus on fossil fuel consumption and little climate regulations, resulting in increased warming, potentially exceeding 4 °C.
These scenarios are used in comprehensive assessment frameworks to predict potential climatic consequences based on various assumptions about future socioeconomic changes and policy decisions.
NorESM2-MM within the CMIP6 framework, ensuring that they can provide reliable future climate projections based on robust statistical foundations. Properly delineating these periods enhances the model's credibility and informs decision-making in climate policy and adaptation strategies. For the projection, the estimated data between the years 2015 and 2100 has been divided into three periods: 2015–2039, 2040–2069, and 2070–2100.
Comparison of future projection under different SSPs with the observed Q (m3/s) between the years 2015 and 2100
Years . | Observed . | SSP1-2.6 . | SSP2-4.5 . | SSP3-7.0 . | SSP5-8.5 . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean . | Max . | Mean . | Max . | Mean . | Max . | Mean . | Max . | Mean . | Max . | |
2015–2039 | 5.936 | 14.348 | 5.916 | 14.252 | 5.945 | 13.846 | 5.864 | 13.291 | ||
2040–2069 | 6.059 | 15.322 | 6.088 | 15.260 | 6.221 | 15.261 | 6.097 | 14.470 | ||
2070–2100 | 5.968 | 14.843 | 6.025 | 13.016 | 6.627 | 17.004 | 6.668 | 16.757 | ||
1988–2014 | 5.681 | 32.8 | ||||||||
Overall | 5.681 | 32.8 | 5.988 | 14.838 | 6.010 | 14.176 | 6.264 | 15.370 | 6.210 | 14.839 |
Years . | Observed . | SSP1-2.6 . | SSP2-4.5 . | SSP3-7.0 . | SSP5-8.5 . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean . | Max . | Mean . | Max . | Mean . | Max . | Mean . | Max . | Mean . | Max . | |
2015–2039 | 5.936 | 14.348 | 5.916 | 14.252 | 5.945 | 13.846 | 5.864 | 13.291 | ||
2040–2069 | 6.059 | 15.322 | 6.088 | 15.260 | 6.221 | 15.261 | 6.097 | 14.470 | ||
2070–2100 | 5.968 | 14.843 | 6.025 | 13.016 | 6.627 | 17.004 | 6.668 | 16.757 | ||
1988–2014 | 5.681 | 32.8 | ||||||||
Overall | 5.681 | 32.8 | 5.988 | 14.838 | 6.010 | 14.176 | 6.264 | 15.370 | 6.210 | 14.839 |
Observed and predicted monthly mean discharge performance for the testing period by all models.
Observed and predicted monthly mean discharge performance for the testing period by all models.
Line graph of the observed and predicted mean discharge with default model parameters under the SSP1-2.6 scenario.
Line graph of the observed and predicted mean discharge with default model parameters under the SSP1-2.6 scenario.
Line graph of the observed and predicted mean discharge with default model parameters under the SSP2-4.5 scenario.
Line graph of the observed and predicted mean discharge with default model parameters under the SSP2-4.5 scenario.
Line graph of the observed and predicted mean discharge with default model parameters under the SSP3-7.0 scenario.
Line graph of the observed and predicted mean discharge with default model parameters under the SSP3-7.0 scenario.
Line graph of the observed and predicted mean discharge with default model parameters under the SSP5-8.5 scenario.
Line graph of the observed and predicted mean discharge with default model parameters under the SSP5-8.5 scenario.
Figure 12 shows the projection of monthly average flows for the SSP1-2.6 scenario for three separate periods, including the years 2015 and 2100. It is understood that there is a lighter decline compared with the observed data from July to August. It is observed that the forecast for the periods between 2040 and 2069 slightly diverges from the others and shows more mean discharge, with 6.059 m3/s while 5.968 m3/s for the 2070–2100 periods and 5.936 m3/s for the 2015–2039 periods. The maximum mean discharge amount will be July 15.322 m3/s in the 2040–2069 periods.
Figure 13 shows similarities with Figure 12, and it is observed that the estimated data calculated for the SSP2-4.5 scenario for the same periods, excluding March, parallels the observed data. There is a lighter decline compared with the observed data from July to August. Similarly, it can be observed that the periods of 2040 and 2069 slightly diverge from the others and predict more mean discharge with 6.088 m3/s while 6.025 m3/s for the 2070–2100 periods and 5.916 m3/s for the 2015–2039 periods. The maximum mean discharge amount will be July 15.260 m3/s in the 2040–2069 periods.
Figure 14 demonstrates SSP3-7.0's predicted mean discharge between the years 2015 and 2100. In the periods of 2070 and 2100, the predicted mean discharge amount was higher than other periods with 6.627 m3/s, while the 2015–2039 periods predicted 5.945 m3/s, which is less.
Figure 15, which belongs to SSP5-8.5, illustrates the mean discharge between the 2015 and 2100 periods. The graph shows that the periods between 2070 and 2100 have a higher mean discharge with 6.668 m3/s, while the 2015–2039 periods predicted a lower amount of discharge with 5.864 m3/s.
Comparison of the mean discharge between 2015 and 2039 years projections in the CatBoost model under different SSP scenarios.
Comparison of the mean discharge between 2015 and 2039 years projections in the CatBoost model under different SSP scenarios.
Comparison of the mean discharge between 2040 and 2069 years projections in the CatBoost model under different SSP scenarios.
Comparison of the mean discharge between 2040 and 2069 years projections in the CatBoost model under different SSP scenarios.
Comparison of the mean discharge between 2070 and 2100 years projections in the CatBoost model under different SSP scenarios.
Comparison of the mean discharge between 2070 and 2100 years projections in the CatBoost model under different SSP scenarios.
DISCUSSION
This study shows that the CatBoost model exhibits significantly better performance than the GEP, SVM, and ridge regression models for discharge during the testing period (2007–2014). The future predictions of discharge data from the models have also been evaluated under four distinct scenarios: SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5. The models made projections for the periods of 2015–2100, and their performance between 2007 and 2014 was evaluated against observed values to measure the efficacy of the models. The CatBoost model, under all scenarios, forecast discharges with a remarkable increase in discharge. In the SSP1-2.6 scenario, a 5.404% increase; in the SSP2-4.5 scenario a 5.791% increase; in the SSP3-7.0 scenario a 10.262% increase; and in the SSP5-8.5 scenario 9.312% increase predicted. The predicted mean discharge results under all scenarios are given in Table 6. These findings are unique to the research area.
Comparison of the mean monthly discharge (m3/s) change in future projections observed by means of (%) percentage
Years . | Observed . | SSP1-2.6 . | SSP2-4.5 . | SSP3-7.0 . | SSP5-8.5 . |
---|---|---|---|---|---|
Mean . | Mean . | Mean . | Mean . | Mean . | |
2015–2039 | 5.936 | 5.916 | 5.945 | 5.864 | |
2040–2069 | 6.059 | 6.088 | 6.221 | 6.097 | |
2070–2100 | 5.968 | 6.025 | 6.627 | 6.668 | |
1988–2014 | 5.681 | ||||
Average | 5.681 | 5.988 | 6.010 | 6.264 | 6.210 |
Difference (%) | 5.404 | 5.791 | 10.262 | 9.312 |
Years . | Observed . | SSP1-2.6 . | SSP2-4.5 . | SSP3-7.0 . | SSP5-8.5 . |
---|---|---|---|---|---|
Mean . | Mean . | Mean . | Mean . | Mean . | |
2015–2039 | 5.936 | 5.916 | 5.945 | 5.864 | |
2040–2069 | 6.059 | 6.088 | 6.221 | 6.097 | |
2070–2100 | 5.968 | 6.025 | 6.627 | 6.668 | |
1988–2014 | 5.681 | ||||
Average | 5.681 | 5.988 | 6.010 | 6.264 | 6.210 |
Difference (%) | 5.404 | 5.791 | 10.262 | 9.312 |
The projection of annual discharge (Q) values under the NorESM2-MM SSP1-2.6 scenario by the CatBoost model and the observed data between 1988 and 2014 on a normal scale with a trendline.
The projection of annual discharge (Q) values under the NorESM2-MM SSP1-2.6 scenario by the CatBoost model and the observed data between 1988 and 2014 on a normal scale with a trendline.
The projection of annual discharge (Q) values under the NorESM2-MM SSP2-4.5 scenario by the CatBoost model and the observed data between 1988 and 2014 on a normal scale with a trendline.
The projection of annual discharge (Q) values under the NorESM2-MM SSP2-4.5 scenario by the CatBoost model and the observed data between 1988 and 2014 on a normal scale with a trendline.
The projection of annual discharge (Q) values under the NorESM2-MM SSP3-7.0 scenario by the CatBoost model and the observed data between 1988 and 2014 on a normal scale with a trendline.
The projection of annual discharge (Q) values under the NorESM2-MM SSP3-7.0 scenario by the CatBoost model and the observed data between 1988 and 2014 on a normal scale with a trendline.
The projection of annual discharge (Q) values under the NorESM2-MM SSP5-8.5 scenario by the CatBoost model and the observed data between 1988 and 2014 on a normal scale with a trendline.
The projection of annual discharge (Q) values under the NorESM2-MM SSP5-8.5 scenario by the CatBoost model and the observed data between 1988 and 2014 on a normal scale with a trendline.
Additionally, for the periods of 2015–2039 and 2040–2069, SSP3-7.0 predicts a higher monthly mean discharge of 5.945 and 6.221 m3/s, respectively, while for the periods of 2070–2100, SSP5-8.5 predicts more discharge of 6.668 m3/s. Furthermore, SSP1-2.6 predicts the less discharge of 6.059 and 5.968 m3/s for the 2040–2069 and 2070–2100 periods, respectively, while SSP5-8.5 predicts 5.864 m3/s for the 2015–2039 periods.
For each scenario, it has been calculated in this study that the predicted mean discharge is greater than the observed flow. We have determined that the current amount increases by 5.404% for SSP1-2.6, 5.791% for SSP2-4.5, 10.262% for SSP3-7.0, and 9.312% for SSP5-8.5 scenarios. It is predicted that the largest increase in the average current will occur under the SSP3-7.0 scenario with a temperature rise of 3.5–4°C, while the smallest increase is expected to occur under the SSP1-2.6 scenario with a temperature rise of 2°. The melting of snow that will occur with a temperature increase of 3.5–4° should also be considered as one of the reasons for this increase.
The Mann–Kendall test was calculated for the trend analysis of projections for the NorESM2-MM model under the various scenarios SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 over the 2015–2100 timeframes. The results of the Mann–Kendall trend test showed that SSP3-7.0 and SSP5-8.5 were trending slightly upward, but SSP1-2.6 and SSP2-4.5 were not trending.
Table 7 displays the results of non-parametric analyses, including Kendall's tau, Var(S), p-value, and slope. As given in Table 7, the calculated p-value is 7.097 and 8.467 for the SSP3-7.0 and SSP5-8.5 scenarios, respectively, and for all scenarios is higher than the significance level of alpha 0.05.
Descriptive statistics of the SSP scenarios between the years 2015 and 2100
Future scenarios . | Kendall's tau . | Var (S) . | p-value . | Slope . |
---|---|---|---|---|
SSP1-2.6 | 0.047 | 71,881.666 | 0.299 | 0.0011 |
SSP2-4.5 | 0.076 | 71,881.666 | 0.522 | 0.0019 |
SSP3-7.0 | 0.329 | 71,881.666 | 7.097 | 0.009 |
SSP5-8.5 | 0.422 | 71,881.666 | 8.467 | 0.0138 |
Future scenarios . | Kendall's tau . | Var (S) . | p-value . | Slope . |
---|---|---|---|---|
SSP1-2.6 | 0.047 | 71,881.666 | 0.299 | 0.0011 |
SSP2-4.5 | 0.076 | 71,881.666 | 0.522 | 0.0019 |
SSP3-7.0 | 0.329 | 71,881.666 | 7.097 | 0.009 |
SSP5-8.5 | 0.422 | 71,881.666 | 8.467 | 0.0138 |
CONCLUSION
The goal of this study was to compare the performance of conventional models and the CatBoost algorithm, as well as to analyze discharge future projections under climate change effects in the basin area using the best model. The basin's discharge data were computed utilizing the statistical downscaling approach with GEP and SVM, CatBoost, and ridge regression. There has been a comparison between these methods. The investigation demonstrates that CatBoost outperforms other models with NSE 0.636, KGE 0.650, and RSR 0.611.
CatBoost outperforms traditional hydrological forecasting models such as GEP, SVM, and ridge regression because it can effectively handle large datasets with high dimensionality and complex non-linear relationships. By handling categorical variables automatically and optimizing performance through strong feature interactions, its gradient boosting framework produces predictions that are more accurate. In addition, CatBoost produces better generalization and more accurate forecasting results than traditional models, because it is less likely to overfit and requires less data preprocessing.
The CatBoost model's projection results under the SSPs scenario were divided into three periods: 2015–2039, 2040–2069, and 2070–2100, and the monthly average discharge was graphically contrasted. The results demonstrate that SSP1-2.6 and SSP2-4.5 forecast a balancing trend, while SSP3-7.0 and SSP5-8.5 scenarios predict an increasing trend between 2015 and 2100.
Considering that the monthly mean discharge of the observed data is 5.681 m³/s, it indicates that, in all scenarios, the predicted mean discharge for all three periods is greater than 5.681 m³/s, suggesting that the measuring station and its basin are not significantly affected by climate change and that the flow is showing a positive increase.
By combining climate projections with hydrological models, it is possible to forecast the magnitude of the effects of climate change on water resources and its long-term consequences (Iltas et al. 2024). The evaluation of hydrological potential is directly impacted by the output of hydrological forecasting models, which produce precise long-term projections of streamflow, water availability, and other important hydrological variables. In order to assess the potential for managing water resources, such as flood control, irrigation planning, and reservoir operation, these forecasts are helpful. Good decision-making is made possible by accurate model outputs, which also minimize risks like droughts and floods and optimize hydrological potential for sustainable water use. These findings have important implications for the basin's long-term destiny.
Climate change has a major effect on the flow duration curve (FDC), which shows how streamflow changes over a given time period. The flow regime can be changed by rising temperatures and shifting precipitation patterns, which can result in longer and more frequent dry spells. Due to these modifications, streamflow predictions may become uncertain and historical data may become less trustworthy, making estimation more difficult at which point the river will stop flowing (Alakbar & Burgan 2024). These changes will affect a range of river-dependent activities, including hydropower generation.
In general, increasing a river's or water source's discharge increases the potential for hydropower generation by increasing the water flow that can power turbines. Because there is more kinetic energy available to be converted into electrical energy when more water flows through the system, this increased flow can result in higher energy production. But the overall effect on hydropower potential also depends on environmental factors like ecosystem health and sediment transport, as well as reservoir capacity and turbine efficiency.
The proposed models’ usefulness is limited by the data used in the study area. This study contributes to the field by analyzing several models under various climate change scenarios for predicting climatic datasets. It is expected that the study's findings will serve as a reference for hydrologists who estimate the amount of water in river basins as well as guidance for decision-makers in government organizations. And also for boosting models applied in hydrology, LightGBM and XGBoost can be considered in future studies.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.