ABSTRACT
Recognizing the differential impacts of climate change across geographical scales, this study emphasizes the importance of statistical downscaling. Using Gene Expression Programming (GEP) and Linear Genetic Programming (LGP), statistical downscaling transforms broad climate trends into region-specific insights. This allowed for detailed analyses of anticipated changes in sediment yield and discharge within a Euphrates River sub-basin in Türkiye using large-scale variables from the CanESM2 model. The dataset is divided into calibration (1970–1995) and validation (1996–2005) periods. To assess the models’ accuracy, statistical measures such as RMSE, MAE, NSE, and R were used. The analysis revealed that LGP outperformed GEP in both discharge and sediment yield during validation, with RMSE = 51.79 m3/s and 4,325.66 tons/day, MAE = 27.14 m3/s and 1,593.34 tons/day, NSE = 0.684 and 0.627, and R = 0.841 and 0.788, respectively. However, when simulating future periods based on the observed period (2006–2020), the GEP model was superior to LGP under RCP2.6, RCP4.5, and RCP 8.5 scenarios from CanESM2. In 2021–2100, models suggest a moderate decrease in discharge and sediment yield, indicating potential shifts in the basin's hydrodynamics. These changes could disrupt hydropower generation, challenge water management practices, and alter riverine ecosystems. The results necessitate a thorough assessment of potential ecological consequences.
HIGHLIGHTS
Climate change effects on streamflow and sediment yield have been explored.
Genetic programming such as GEP and LGP have been employed in statistical downscaling technique.
Future projection of streamflow and sediment yield have been done under different emission scenarios.
INTRODUCTION
Climate change, an emergent issue of the 21st century, has profound implications for the health and functioning of the biosphere. It is predominantly caused by human activities, specifically the burning of fossil fuels and the manufacturing of chemicals (Mampitiya et al. 2023). The hydrological cycle, a key component of the biosphere, is particularly sensitive to climate change. Changes in precipitation patterns, evaporation rates, and river flows affect both the quantity and quality of water resources (Verma et al. 2023). Water resources, in particular, are significantly affected by climate change, and hydrologic models serve as tools to study these effects. These models, often integrated with General Circulation Models (GCMs), provide insights into future hydrological scenarios under different climate change projections. Such studies are crucial for developing effective mitigation and adaptation strategies.
The projections of climate scenarios derived from GCMs for the 21st century are fundamentally important in determining potential adjustments in ecological, physical, and societal systems in response to climate change (Tabor & Williams 2010). However, the dimensions of GCM cells are not infinitely reducible; instead, they are constrained to a range of 100–300 km (Ramírez Villegas & Jarvis 2010). Techniques of downscaling provide scientists with the capability to generate localized forecasts of climatic variations, which span from the refinement and interpolation of GCM irregularities. Downscaling methodologies empower researchers to derive predictions of regional climate variations, extending from the refinement and interpolation of GCM discrepancies to the application of neural networks and regional climate simulations (Giorgi 1990). In a comprehensive perspective, downscaling can be split into two primary categories: statistical and dynamical downscaling. Statistical downscaling employs algebraic connections between large-scale and local climate factors to precisely predict future local scenarios, providing corrections for different biases and computational efficacy. On the other hand, dynamical downscaling utilizes high-resolution regional simulations based on physical laws to anticipate local conditions, delivering comprehensive and physically consistent projections, though it bears sensitivity to large-scale biases and requires extensive computational power.
Gene Expression Programming (GEP) (Ferreira 2001), is an adaptive algorithm that generates flexible programs. GEP is perceived as a learning algorithm striving to decipher the interconnections among variables within data sets. Distinct from its predecessors, Genetic Algorithm (GA) and Genetic Programming (GP), GEP codes individuals as linear strings of a fixed length, known as chromosomes, which are subsequently symbolized by expression trees, a simplified diagrammatic representation. GEP's strength lies in its distinct, multi-genic character that facilitates the evolution of intricate programs composed of multiple sub-programs. It merges the advantages of both GA and GP while mitigating some of the restrictions inherent to each of them individually (Shoaib et al. 2015).
GEP holds the benefit of creating functional correlations, enabling the exploration of intricate non-linear associations among the input parameters by delivering fast, reasonably precise, and computationally cost-effective approximations of the inherent physical/functional procedures that can be utilized beyond merely predictive applications. Another significant benefit of GEP in contrast to numerous other data-driven models, is its ability to offer an analytical representation of the correlation between input and output variables. This feature enables modelers to understand this relationship better and make adjustments as needed, demonstrating the flexibility of GEP (Billah et al. 2021). GEP is successfully applied to the forecasting of climatic data in several studies (Rahmani-Rezaeieh et al. 2020; Billah et al. 2021; Birbal et al. 2021; Esmaeili-Gisavandani et al. 2021; Muhammad et al. 2021; Guven & Pala 2022; Guven et al. 2022; Chansawang et al. 2023; Pouyanfar et al. 2023; Raheel et al. 2023; Song et al. 2023).
Linear Genetic Programming (LGP) is a variation of the GP method that develops sequences of instructions originating from an imperative programming language (such as C or C + +) or from a machine language, as opposed to expressions from a functional programming language typical of conventional tree-based GP. It is noteworthy that the term ‘linear GP’ pertains to the linear structure of genomes, and as such, the LGP method does not necessarily generate linear models for non-linear systems. In fact, LGP is frequently utilized for evolving highly non-linear models (Brameier 2004). Two types of LGP systems exist based on the executability of instructions by a computer's CPU: (i) machine-coded LGP, which executes instructions directly, reducing runtime and (ii) interpreted LGP, where instructions are executed via a higher-level virtual machine (Mehr et al. 2018).
The principal contrasts from customary tree-based GP encompass the graph-based data stream that emerges from repeated utilization of indexed variable (register) contents and the presence of structurally redundant code. Machine-coded LGP offers several advantages as a modeling tool, including its operational speed, its design against overfitting, and its capacity to generate robust solutions that execute swiftly when called by integrated software. Nevertheless, the determination of the number of registers employed in an LGP model is a critical factor, as an unfit selection could result in substantial complications in the program under evolution (Guven & Kişi 2010). LGP is applied in hydrological engineering (Guven et al. 2009; Azamathulla et al. 2011) and predicting hydro-meteorological variables (Azamathulla & Zahiri 2012; Guven & Kisi 2013; Mehr et al. 2013; Danandeh Mehr et al. 2014; Ravansalar et al. 2017). In this study, due to the aforementioned advantages, the statistical downscaling method is processed using GEP and Machine-Coded LGP models, and the linear regression model is applied for comparison. The hydrological effects of climate change at the local scale were examined, and future predictions of sediment yield and discharge were conducted.
This study introduces the application of GEP and LGP in a climatic analysis of the Euphrates River sub-basin, presenting an innovative approach to understanding the dynamics between climate change, sediment yield, and streamflow. By leveraging these advanced statistical downscaling and GP methods, not only the precision of projections are enhanced but also the intricacies of climate-river interactions in a region crucial for water resources are captured. The predictive insights derived are set to offer a valuable framework for both policymakers and hydrologists, enabling informed decision-making and strategic planning in water resource management amidst the exigencies of climate change. Our methodological advancement thereby stands as a significant stride toward more resilient water infrastructures in response to evolving climatic conditions.
Following this section, a detailed description of the study area and data are provided. This is followed by the study's methodology. The results section presents the findings, including future trends of Q and Qs under various models and scenarios. A discussion of the models' results is provided in the following section, before concluding with a summary of our key insights.
STUDY AREA AND DATA USED
The selection of this particular sub-basin of the Euphrates River for our study was driven by a confluence of crucial factors. Foremost, the availability of long-term observed data within this basin presented a solid foundation for applying statistical downscaling techniques, thereby enhancing the reliability and precision of our analysis. Additionally, the Euphrates River is one of the most important fluvial systems in the region, with a plethora of water infrastructures arrayed along its course. This not only underscores the basin's significance but also amplifies the relevance of understanding the impacts of climate change on its sediment yield and streamflow dynamics. Through this strategic selection, our study aims to provide actionable insights that could be vital in the sustainable management and planning of water resources within this vital river system.
Two data sets are required for the statistical downscaling method. The first data set consists of monthly sediment yield and discharge data obtained from the local station located in the basin and has been used as the predictand. The second data set is the large-scale predictor variables obtained from the Second Generation Canadian Earth System Model (CanESM2) model. The Canadian Center for Climate Modeling and Analysis (CCCma) of Environment and Climate Change Canada created the fourth generation coupled global climate model known as CanESM2. It is the Canadian modeling community's contribution to the Intergovernmental Panel on Climate Change, Fifth Assessment Report (IPCC AR5). In a 128 × 64 grid based on the T42 Gaussian grid, standardized daily data are extracted for each grid cell and stored in a single-column text file for the entire global area. This grid has virtually consistent horizontal resolution along the latitude of approximately 2.8125° and 2.8125° of longitude. In files with the names BOX_iiiX_jjY, where iii stands for the longitudinal index and jj for the latitudinal index, predictions are kept for each grid cell. This organization of the predictors makes it simple to utilize them as input for statistical downscaling models. The data set is downloaded as a zip file from Canadian Climate Data and Scenarios (https://climate-scenarios.canada.ca/) by choosing the relevant grid cell. This can be done by manually putting the centroid of the watershed or simply selecting the cell in the rectangular map on the website. By clicking the retrieve data button, a zip file named ‘BOX_017X_46Y’ was downloaded regarding the specific grid cell chosen. The data set has 26 variables for three scenarios RCP2.6, RCP4.5, and RCP8.5. The daily predictor data set has been converted into monthly average values. Table 1 shows the description of the CanESM2 variables.
Description of CanESM2 variables (predictors)
No. . | CanESM2 predictors . | Description . | Abbreviation . |
---|---|---|---|
1 | ceshmslpgl | Mean sea level pressure | v1 |
2 | ceshp1_fgl | 1,000 hPa Wind speed | v2 |
3 | ceshp1_ugl | 1,000 hPa Zonal wind component | v3 |
4 | ceshp1_vgl | 1,000 hPa Meridional wind component | v4 |
5 | ceshp1_zgl | 1,000 hPa Relative vorticity of true wind | v5 |
6 | ceshp1thgl | 1,000 hPa Wind direction | v6 |
7 | ceshp1zhgl | 1,000 hPa Divergence of true wind | v7 |
8 | ceshp5_fgl | 500 hPa Geopotential | v8 |
9 | ceshp5_ugl | 500 hPa Wind speed | v9 |
10 | ceshp5_vgl | 500 hPa Zonal wind component | v10 |
11 | ceshp5_zgl | 500 hPa Meridional wind component | v11 |
12 | ceshp5thgl | 500 hPa Relative vorticity of true wind | v12 |
13 | ceshp5zhgl | 500 hPa Wind direction | v13 |
14 | ceshp8_fgl | 500 hPa Divergence of true wind | v14 |
15 | ceshp8_ugl | 850 hPa Geopotential | v15 |
16 | ceshp8_vgl | 850 hPa Wind speed | v16 |
17 | ceshp8_zgl | 850 hPa Zonal wind component | v17 |
18 | ceshp8thgl | 850 hPa Meridional wind component | v18 |
19 | ceshp8zhgl | 850 hPa Relative vorticity of true wind | v19 |
20 | ceshp500gl | 850 hPa Wind direction | v20 |
21 | ceshp850gl | 850 hPa Divergence of true wind | v21 |
22 | ceshprcpgl | Total precipitation | v22 |
23 | ceshs500gl | 500 hPa Specific humidity | v23 |
24 | ceshs850gl | 850 hPa Specific humidity | v24 |
25 | ceshshumgl | 1,000 hPa Specific humidity | v25 |
26 | ceshtempgl | Air temperature at 2 m | v26 |
No. . | CanESM2 predictors . | Description . | Abbreviation . |
---|---|---|---|
1 | ceshmslpgl | Mean sea level pressure | v1 |
2 | ceshp1_fgl | 1,000 hPa Wind speed | v2 |
3 | ceshp1_ugl | 1,000 hPa Zonal wind component | v3 |
4 | ceshp1_vgl | 1,000 hPa Meridional wind component | v4 |
5 | ceshp1_zgl | 1,000 hPa Relative vorticity of true wind | v5 |
6 | ceshp1thgl | 1,000 hPa Wind direction | v6 |
7 | ceshp1zhgl | 1,000 hPa Divergence of true wind | v7 |
8 | ceshp5_fgl | 500 hPa Geopotential | v8 |
9 | ceshp5_ugl | 500 hPa Wind speed | v9 |
10 | ceshp5_vgl | 500 hPa Zonal wind component | v10 |
11 | ceshp5_zgl | 500 hPa Meridional wind component | v11 |
12 | ceshp5thgl | 500 hPa Relative vorticity of true wind | v12 |
13 | ceshp5zhgl | 500 hPa Wind direction | v13 |
14 | ceshp8_fgl | 500 hPa Divergence of true wind | v14 |
15 | ceshp8_ugl | 850 hPa Geopotential | v15 |
16 | ceshp8_vgl | 850 hPa Wind speed | v16 |
17 | ceshp8_zgl | 850 hPa Zonal wind component | v17 |
18 | ceshp8thgl | 850 hPa Meridional wind component | v18 |
19 | ceshp8zhgl | 850 hPa Relative vorticity of true wind | v19 |
20 | ceshp500gl | 850 hPa Wind direction | v20 |
21 | ceshp850gl | 850 hPa Divergence of true wind | v21 |
22 | ceshprcpgl | Total precipitation | v22 |
23 | ceshs500gl | 500 hPa Specific humidity | v23 |
24 | ceshs850gl | 850 hPa Specific humidity | v24 |
25 | ceshshumgl | 1,000 hPa Specific humidity | v25 |
26 | ceshtempgl | Air temperature at 2 m | v26 |
METHODOLOGY
The selection of the most effective predictors
In order to enhance the association between the predictor variables and predictands, normalization techniques were applied to the data sets. Specifically, each data set for predictors and predictands was transformed by applying the natural logarithm function (Ln) and standardization using the z-value. The variables were modified by implementing these normalization methods to facilitate more accurate and reliable predictions. To identify the most impactful predictors and assess the linear relationship between input and output variables, a Pearson rank correlation coefficient analysis was utilized. The correlation analysis revealed that the large-scale weather factors were correlated with local data at a confidence level of 99%. Those factors with the highest correlation coefficients were selected to identify the most effective predictors for the downscaling models. The highest correlation was obtained when the natural logarithms of both data sets were taken. Finally, 10 of 26 variables have been identified as the most effective predictors. These were: mean sea level pressure (V1), 1,000 hPa relative vorticity of true wind (V5), 500 hPa geopotential (V8), 500 hPa wind speed (V9), 500 hPa zonal wind component (V10), 500 hPa relative vorticity of true wind (V12), 500 hPa divergence of true wind (V14), 850 hPa divergence of true wind (V21) and total precipitation (V22). Although total precipitation (V22) and 1,000 hPa relative vorticity of true wind (V5) variables have a direct effect on discharge and sediment yield, other eight variables have an indirect effect on discharge and sediment yield. Total precipitation (V22) predictor has the highest relative importance of predictors.
Statistical downscaling using GP
GEP involves four main steps for obtaining solutions to problems. This study emphasizes that the first step is the identification of a set of functions to be used. The second step is determining the chromosome structure, which involves specifying the number of genes and their size. The third step involves selecting the linking function. The final step is to evaluate fitness using a specific measure. These four essential steps constitute the foundation of GEP, and their careful implementation is vital for obtaining effective solutions in problem-solving applications. Main setup parameters of GEP are given in Table 2.
Setup parameters of the GEP model for both predictands
Predictand . | Functions . | Number of genes per chromosome . | Linking function . | Fitness function . |
---|---|---|---|---|
Q | (+), (−), (*), (/), (−a), (1/a) | 16 | + | MSE |
Qs | (+), (−), (*), (/), (√) | 9 | + | R2 |
Predictand . | Functions . | Number of genes per chromosome . | Linking function . | Fitness function . |
---|---|---|---|---|
Q | (+), (−), (*), (/), (−a), (1/a) | 16 | + | MSE |
Qs | (+), (−), (*), (/), (√) | 9 | + | R2 |
As observed, the model utilized the V2, V5, V12, V14, and V22 predictors to generate the Equation (1). These predictors have been selected by GEP automatically among the previously selected 10 most effective predictors.
With sediment yield (Qs) data being the predictand, the GEP model used only four of the 10 selected predictors in Equation (2). These are total precipitation (V22), 500 hPa wind speed (V9), 500 hPa zonal wind component (V10), and 1,000 hPa Relative vorticity of true wind (V5).
LGP is referred to as machine-coded GP because it utilizes C or C ++ directly as programming languages, unlike tree-based GEP (Guven & Kişi 2010). Main setup parameters for LGP are given in Table 3.
Setup parameters of LGP for both predictands
Predictand . | Instruction set . | Program size . | Population size . | Fitness calculation . | |
---|---|---|---|---|---|
Initial . | Max . | error measurement . | |||
Q | (+),(−),(*),(/), (abs), (√), (sin), (cos), Arithmetic, condition, data transfer, comparison | 500 | 80 | 512 | Squared |
Qs | (+),(−),(*),(/), (abs), (√) | 750 | 80 | 512 | Squared |
Predictand . | Instruction set . | Program size . | Population size . | Fitness calculation . | |
---|---|---|---|---|---|
Initial . | Max . | error measurement . | |||
Q | (+),(−),(*),(/), (abs), (√), (sin), (cos), Arithmetic, condition, data transfer, comparison | 500 | 80 | 512 | Squared |
Qs | (+),(−),(*),(/), (abs), (√) | 750 | 80 | 512 | Squared |
LGP model generated a C code from the calibration data to predict the validation and future periods. The LGP model used all ten selected most efficient predictors in the analyses. Interested readers are encouraged to contact the authors directly to obtain the detailed code output.
The data sets are based on the monthly time scale, and the data were partitioned into two distinct components, namely calibration, and validation. The calibration period, spanning the years between 1970 and 1995, is utilized to train the model. The validation period includes the period between 1996 and 2005 and is essential for evaluating the model's performance. After selecting the most effective predictors, downscaling techniques, GEP and LGP were used to predict monthly discharge and sediment yield on the basin. In the next stage, the prediction results were obtained and compared with each other to demonstrate the best model performance. Evaluations of the proposed models' performance are carried out utilizing metrics such as root mean squared error (RMSE), mean absolute error (MAE), Nash–Sutcliff Efficiency (NSE), and correlation coefficient (R). These metrics are applied in several similar studies (Yang et al. 2017; Perera & Rathnayake 2019; Gupta et al. 2020; Karunanayake et al. 2020; Kidanemariam et al. 2020; Sireesha Naidu et al. 2020; Haleem et al. 2021; Xu et al. 2022; Fuladipanah et al. 2023; Mampitiya et al. 2023; Mollel et al. 2023; Tilahun et al. 2023; Verma et al. 2023)
RESULTS
In this study, the statistical downscaling method was used, and the predictor required for this method is the large-scale weather factors (26 input sets) obtained from the CanESM2. Local sediment yield and discharge data of the relevant station is used for predictand. Finally, for comparison with the non-linear models, Linear Regression Analysis is conducted.
The best-performing model for sediment yield and discharge data with both calibration and validation periods was LGP. The statistical outcomes for the validation period, derived through the formulas created by the artificial intelligence models using the data set provided for training and testing purposes, are presented in Tables 4 and 5.
Comparison of statistical results of observed and predicted Q (m3/s) for the validation period (1996–2005)
Data . | Mean . | Min . | Max . | Std. deviation . | MAE . | RMSE . | NSE . | R . |
---|---|---|---|---|---|---|---|---|
Observed | 84.92 | 18.1 | 442.89 | 82.9 | – | – | – | – |
GEP | 66.32 | 24.01 | 585.88 | 67.76 | 36.69 | 74.21 | 0.427 | 0.684 |
LGP | 69.75 | 30.14 | 237.34 | 56.72 | 27.14 | 51.79 | 0.684 | 0.841 |
Linear regression | 73.08 | 26.16 | 279.15 | 47.38 | 32.48 | 58.14 | 0.555 | 0.745 |
Data . | Mean . | Min . | Max . | Std. deviation . | MAE . | RMSE . | NSE . | R . |
---|---|---|---|---|---|---|---|---|
Observed | 84.92 | 18.1 | 442.89 | 82.9 | – | – | – | – |
GEP | 66.32 | 24.01 | 585.88 | 67.76 | 36.69 | 74.21 | 0.427 | 0.684 |
LGP | 69.75 | 30.14 | 237.34 | 56.72 | 27.14 | 51.79 | 0.684 | 0.841 |
Linear regression | 73.08 | 26.16 | 279.15 | 47.38 | 32.48 | 58.14 | 0.555 | 0.745 |
Comparison of statistical results of observed and predicted Qs (tons/day) for the validation period (1996–2005)
Data . | Mean . | Min . | Max . | Std. deviation . | MAE . | RMSE . | NSE . | R . |
---|---|---|---|---|---|---|---|---|
Observed | 2,890.33 | 35.4 | 102,010 | 10,348.66 | – | – | – | |
GEP | 2,014 | 52.97 | 80,179.9 | 7,760.17 | 2,880.9 | 11,871.7 | 0.567 | 0.673 |
LGP | 2,084.14 | 115.28 | 87534.2 | 8,693.8 | 1,593.34 | 4,325.66 | 0.627 | 0.788 |
Linear regression | 1,023.18 | 68.07 | 9,724.34 | 1,689.62 | 2,282.65 | 10,028.6 | 0.509 | 0.705 |
Data . | Mean . | Min . | Max . | Std. deviation . | MAE . | RMSE . | NSE . | R . |
---|---|---|---|---|---|---|---|---|
Observed | 2,890.33 | 35.4 | 102,010 | 10,348.66 | – | – | – | |
GEP | 2,014 | 52.97 | 80,179.9 | 7,760.17 | 2,880.9 | 11,871.7 | 0.567 | 0.673 |
LGP | 2,084.14 | 115.28 | 87534.2 | 8,693.8 | 1,593.34 | 4,325.66 | 0.627 | 0.788 |
Linear regression | 1,023.18 | 68.07 | 9,724.34 | 1,689.62 | 2,282.65 | 10,028.6 | 0.509 | 0.705 |
Table 4 compares the statistical results of observed and predicted discharge values during the validation period (1996–2005). This comparison shows that the LR model provides the best result in predicting the mean value. Although the GEP model performs poorly in predicting the mean value, it performs best in predicting the minimum value. Furthermore, all models underestimated the mean values and overestimated the minimum values. The GEP model exhibits the best performance for predicting the maximum value and standard deviation.
Table 4 also demonstrates that the LGP model performs best with the highest correlation coefficient of R = 0.841, NSE = 0.684 and the lowest MAE = 27.14 m3/s, and RMSE = 51.79 m3/s. LR and GEP models follow LGP in terms of performance, respectively, in line with the correlation factor (R) values.
Table 5 provides a comparison of the statistical results of observed and predicted sediment yield values during the validation period spanning from 1996 to 2005. Similar to previous scenarios, all models underestimated the mean value. While the minimum observed value was 35.40 tons/day, the LGP model predicted 115.28 tons/day, the GEP model predicted 52.97 tons/day, and the LR model predicted 68.07 tons/day, all above the minimum value. In predicting the maximum and standard deviation values, all models also made predictions below the observed value. The LGP model provided the closest predictions except for the minimum value. GEP model was superior to the LR model in estimating all statistical values.
In contrast to other statistics, the GEP model exhibited the weakest performance with MAE = 2,880.9 and RMSE = 11,871.73 values. The LGP model showed the best performance with MAE = 1,593.34, NSE = 0.627 and RMSE = 4,325.66 values. The LR model performed between these two models, which is consistent with the correlation factor (R) values.
Scatter plot of the observed and predicted monthly Ln Q (a) and Ln Qs (b) values for the validation period by all models.
Scatter plot of the observed and predicted monthly Ln Q (a) and Ln Qs (b) values for the validation period by all models.
Figure 5(b) represents the scatter plots of observed and predicted Qs values for the models in the validation period. The LGP model exhibited the highest performance with the highest correlation of R = 0.788, followed by the LR with a value of R = 0.705 and GEP with R = 0.673.
Observed and predicted monthly Q for the validation period by all models in logarithmic scale.
Observed and predicted monthly Q for the validation period by all models in logarithmic scale.
Observed and predicted monthly Qs for the validation period by all models in logarithmic scale.
Observed and predicted monthly Qs for the validation period by all models in logarithmic scale.
In January, which is the closest to the model's predictions, LGP has made an overestimation by 28.29%, while GEP and LR models have made underestimations by 11.47 and 12.08%, respectively. In September, the farthest from the models' predictions, the LGP model made a high overestimation by 135.38%, the LR model made an even higher overestimation by 155.10%, and the GEP model made the highest overestimation by 158.64%.
Future projection of discharge under different downscaling models and emission scenarios
Different output results were obtained from GEP and LGP models for three different scenarios, RCP2.6, RCP4.5, and RCP8.5. The outputs of the models for all three scenarios (horizontal axis) were compared to observed values (vertical axis) for the period 2006–2020, and scatter plots were created. It was determined that the R values were 0.408 for RCP2.6, 0.398 for RCP4.5, and 0.4585 for RCP8.5 in the GEP model, and 0.278 for RCP2.6, 0.347 for RCP4.5, and 0.351 for RCP8.5 in the LGP model. Based on these values, it is seen that the GEP model provides the best correlation for discharge (Q) in the RCP8.5 scenario. The GEP model has been superior in comparison to the LGP model by providing higher correlations in all scenarios. It is also observed that RCP8.5 performs better than other scenarios in both models.
Among the data generated by the model, the z-score method was employed to identify outliers, utilizing the formula , where x is a data point, μ is the mean, and σ is the standard deviation of the dataset. Outliers were identified as values with a z-score above 3 or below −3, as such scores reflect significant deviation from the mean. Due to their disruptive impact on comparisons, they are removed from the dataset. These values constitute 1.4% of the total data.
Scatter plot of the observed and projected monthly discharge under the CanESM2 RCP8.5 scenario by GEP (a) and LGP (b) models between 2006 and 2020 period.
Scatter plot of the observed and projected monthly discharge under the CanESM2 RCP8.5 scenario by GEP (a) and LGP (b) models between 2006 and 2020 period.
For all scenarios, downscaled results were divided into four time periods: the 2020s (2021–2040), 2040s (2041–2060), 2060s (2061–2080), 2080s (2081–2100), and were compared to two observed data periods which are observed calibration and validation periods combined (1970–2005) and future observed period (2006–2020).
The projection of average monthly discharge (Q) values under the CanESM2 RCP8.5 scenario for different periods by the GEP model in logarithmic scale.
The projection of average monthly discharge (Q) values under the CanESM2 RCP8.5 scenario for different periods by the GEP model in logarithmic scale.
The projection of annual average discharge (Q) values under the CanESM2 RCP8.5 scenario by GEP model and observed period of 1970–2020 in logarithmic scale.
The projection of annual average discharge (Q) values under the CanESM2 RCP8.5 scenario by GEP model and observed period of 1970–2020 in logarithmic scale.
The projection of annual average discharge (Q) values under the CanESM2 RCP8.5 scenario by GEP model in normal scale with a trendline.
The projection of annual average discharge (Q) values under the CanESM2 RCP8.5 scenario by GEP model in normal scale with a trendline.
The projection of annual average discharge (Q) values under the CanESM2 RCP8.5 scenario by GEP model and observed period of 1970–2020 for May in logarithmic scale.
The projection of annual average discharge (Q) values under the CanESM2 RCP8.5 scenario by GEP model and observed period of 1970–2020 for May in logarithmic scale.
Future projection of sediment yield under different downscaling models and emission scenarios
Scatter plot of the observed and projected monthly sediment yield (Qs) under the CanESM2 RCP8.5 scenario by GEP (a) and LGP (b) models between 2006 and 2020 period.
Scatter plot of the observed and projected monthly sediment yield (Qs) under the CanESM2 RCP8.5 scenario by GEP (a) and LGP (b) models between 2006 and 2020 period.
The projection of average monthly sediment yield (Qs) values under the CanESM2 RCP8.5 scenario for different periods by the GEP model in logarithmic scale.
The projection of average monthly sediment yield (Qs) values under the CanESM2 RCP8.5 scenario for different periods by the GEP model in logarithmic scale.
It is observed that the months with the highest data values for all periods are March, April, May, and June. Sediment yield values, which increase with a mild slope from January to March, exhibit a significant rise during March–April and April–May. For the observed periods of 1970–2005 and 2006–2020, and the projected periods of 2006–2020, the 2020s, 2040s, 2060s, and 2080s, the increase percentages are as follows: for March–April, sequentially, 33.48, 65.9, 43.43, 89.59, 189.73, 51.13, and 1,658.16%; for April–May, sequentially, 208.93, 176.72, 114.99, 57.99, 126.83, 34.77% increase, and 41.9% decrease. A significant decline is also observed from May to June (83.94, 61.24, 70, 40.73, 70.09, 81.36, 80.11%). Values that decrease from June to July experience no major fluctuations in the subsequent months. The results of the sediment yield predictions, divided into 20-year periods and compared with the observed values between 2006 and 2020, are clearly below the observed values for each month, as shown in Figure 14. The difference is particularly dramatic in January and March, with the percentage difference between observed (2006–2020) and predicted values being 163.35, 85.97, 171.66, 164.73, and 79.46% for January and 130.62, 72.58, 141.24, 151.7, 82.64% for March for the projected periods, 2006–2020, 2020s, 2040s, 2060s, 2080s, respectively. When comparing the observed period of 1970–2005 with the simulated values, the differences between the average monthly values of observed values and the average monthly values of simulated data are observed to be very high in April and May. These values have been calculated as percentage differences for April, sequentially, at 180.52, 91.12, 182.73, 181.83, 90.42%, and for May at 114.43, 61.95, 142.76, 154.18, and 67.17%.
The projection of annual average sediment yield (Qs) values under the CanESM2 RCP8.5 scenario by GEP model and observed period of 1970–2020 in logarithmic scale.
The projection of annual average sediment yield (Qs) values under the CanESM2 RCP8.5 scenario by GEP model and observed period of 1970–2020 in logarithmic scale.
The projection of annual average sediment yield (Qs) values under the CanESM2 RCP8.5 scenario by GEP model in normal scale with a trendline.
The projection of annual average sediment yield (Qs) values under the CanESM2 RCP8.5 scenario by GEP model in normal scale with a trendline.
The projection of annual average sediment yield values under values under CanESM2 RCP8.5 scenario by GEP model in normal scale with trendline is given in Figure 16 to figure out the trend of projected sediment yield data. It is seen from Figure 16 projected sediment yield data has a remarkable decreasing trend with a value of 5.625 tons/days/year. Although the trendline is substantially similar to minimum and maximum projected data, for the period of 2059–2060, 2072 and 2085–2090 the sediment yield data is out of from trendline. Out of trendline data can be considered the extreme sediment yield data, especially in 2089.
The projection of annual average sediment yield (Qs) values under the CanESM2 RCP8.5 scenario by GEP model and observed period of 1970–2020 for May, in logarithmic scale.
The projection of annual average sediment yield (Qs) values under the CanESM2 RCP8.5 scenario by GEP model and observed period of 1970–2020 for May, in logarithmic scale.
DISCUSSION
The divergent performances of GEP and LGP across different periods in our study could be attributed to a multitude of factors rooted in their inherent model structures and evolutionary algorithms. Initially, LGP's superior performance in the validation period (1996–2005) might be attributed to its effective learning and generalization from the historical data available until 1995, possibly offering a more precise representation of the observed relationships within the data during this period. On the other hand, GEP's hybrid structure, which combines linear chromosomes with tree-like expressions, might provide a broader exploration of the solution space, thereby potentially capturing evolving patterns in the data more comprehensively. This attribute could have contributed to GEP's superior performance when projecting into the future (2006–2020), a period that might have witnessed new or evolving patterns not apparent in the earlier data. Furthermore, GEP's adaptability, stemming from its evolutionary algorithm nature, might have enabled a more robust adjustment to changing data trends over time, which could be particularly beneficial in the scenario analysis encompassing various RCP scenarios. This adaptability might explain the higher correlation values exhibited by GEP across all RCP scenarios in the future projections. Additionally, the presence of outliers in the future projection data might have posed challenges that GEP was potentially better equipped to handle, thereby contributing to its better performance in this period. The scenario analysis revealed GEP's potentially superior capability in handling uncertainties associated with future climate projections, which is pivotal in the context of climate change impact analysis. Collectively, these factors underscore the nuanced interplay between model structure, evolutionary algorithms, and data characteristics in determining the relative performances of GEP and LGP across different periods of the study.
LGP model provides correlation (R) values of 0.841 for Q and 0.788 for Qs, while the GEP model provides correlation values of 0.684 for Q and 0.673 for Qs during the validation period. This comparison shows that the LGP model exhibits significantly better performance than the GEP model for discharge and sediment yield during validation period (1996–2005). The future predictions of sediment yield and discharge data from the models have also been evaluated under three distinct scenarios: RCP2.6, RCP4.5, and RCP8.5. The models made projections for the period of 2006–2100, and their performance between 2006 and 2020 was evaluated against observed values to measure the efficacy of the models. The GEP model showed the better correlation with the observed period with the R = 0.4585 while LGP has the R-value of 0.351. Moreover, the GEP and LGP models performed best for both sediment yield and discharge under the RCP8.5 scenario. The GEP model, under the RCP8.5 scenario, apart from the extreme values in abovementioned periods, forecasted a slight decrease in discharge and a noticeable decrease in sediment yield from 2021 to 2100. These findings, while unique to the research area, align with broader research that suggests climate change will lead to decreased discharge and sediment yield in river basins. For instance, Bozkurt et al. (2015) predict substantial decreases in mean annual discharge for the Euphrates and Tigris Rivers by the century's end, ranging from 19 to 58%. Similarly, Adamo et al. (2018) project a decrease in discharge within the Euphrates-Tigris basin. Although focused on a different region, Hirschberg et al. (2021) illustrate how climate change-induced shifts in precipitation and temperature can reduce sediment yield (−48%) and debris-flow occurrence (−23%). Finally, Tian et al. (2020) demonstrate that climate change results in decreased streamflow and sediment yield under RCP4.5 and RCP8.5 emission scenarios.
CONCLUSIONS
The aim of this research was to assess the performance of two artificial intelligence modeling techniques, GEP and machine-coded LGP, in the statistical downscaling method and to analyze projection of discharge and sediment yield under climate change effects in basin area to acquire trend of projected data. The sediment yield and discharge data of the basin have been estimated using the statistical downscaling method with GEP and LGP models, and these models have been compared with each other, as well as with a devised Linear Regression method.
The obtained projection results of GEP model under RCP 8.5 were divided into 20-year period, namely, 2020s (2021–2040), 2040s (2041–2060), 2060s (2061–2080), 2080s (2081–2100) and monthly average discharge and sediment values were graphically compared with each other and with the observed monthly average values from 2006 to 2020.
In order to view all observed and predicted years together, the values observed between 1970 and 2020 and the values predicted by the GEP model under the RCP8.5 scenario between 2021 and 2100 have been used. These are annual average values covering all results from 1970 to 2100. In the annual average values of the data projected by the GEP model RCP8.5 scenario, the general trend of discharge and sediment yield decreases slightly between 2021 and 2100.
These results suggest significant implications for the future of the basin. Climate change is expected to not only decrease water supply but also fundamentally alter the river's flow patterns. These changes will impact a range of activities dependent on the river, including hydropower generation and agricultural practices. The hydropower potential of the basin is likely to diminish due to reduced discharge, potentially affecting energy production. Additionally, regional water management practices may need revision to accommodate a decreasing water supply and the challenges it poses for agriculture. Furthermore, the projected decrease in sediment yield carries its own implications. Reduced sediment could lead to channel erosion, impacting riverine ecosystems and potentially infrastructure. On the other hand, lower sediment loads might decrease reservoir siltation, extending the lifespan of existing dams. These findings highlight the complex challenges posed by climate change. The development of effective adaptation and mitigation strategies for the basin necessitates careful consideration of these interrelated factors – decreased discharge, reduced sediment yield, and their cascading effects on both human activities and the natural environment.
The applicability of the proposed models is limited with the range of data used in the study area. The contribution of this study to the literature involves the examination of different GP models under three distinct climate change scenarios (RCP2.6, RCP4.5, RCP8.5) for predictions of two separate climatic data sets. The findings of this research are anticipated to serve as an informative resource for policymakers within governmental bodies. Additionally, they offer a significant reference point for hydrologists engaged in the quantification of water resources within river basins. This enhanced understanding can facilitate more effective management and strategic planning around water resource allocation.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.