This study is based on the investigation of the performance of the band similarity (BS) method, which is quite new in the literature, in the prediction of flow and in determining the memory properties of the flow phenomenon. For this purpose, flow prediction models for the monthly flow data of the Sarız station, located in the Seyhan Basin in Türkiye, were produced first with the particle swarm optimization (PSO) algorithm. Second, these models were used in the BS method to create the BSPSO approach. Then, flow prediction was made for the same data set with support vector regression (SVR). In the test period, the standalone PSO, BSPSO, and SVR models achieved the most successful Nash–Sutcliffe efficiency (NSE) values of 0.516, 0.691, and 0.659, respectively. As a result, it was seen that BS increased the success of PSO by approximately 35%.

  • The band similarity (BS) method was applied for the first time in flow forecasting.

  • Among the machine learning models, the BS-particle swarm optimization (PSO) hybrid model was found to be the most successful.

  • In all four input combinations, BS significantly increased the success of PSO.

  • In flow memory, which is analyzed by BS, low flow is of great importance.

The increasing world population, industrialization, agricultural needs, environmental pollution, and global climate change increase water stress and place increasing pressure on water resources (Hssaisoune et al. 2020; Kılıç 2020; Luo et al. 2020). Therefore, it is necessary to protect, develop, and control these resources. However, changes in the climatic and environmental factors and the rainfall–runoff relationship make it difficult to control and use renewable but limited water resources (Kilinc 2022). For this reason, prospective estimation of river flows is of great importance to realize the most rational and sustainable use of water resources (Kilinc & Haznedar 2022). Streamflow is a phenomenon dependent on many geographical (vegetation, topography) and meteorological (temperature, precipitation, evaporation) variables. Therefore, the estimation of streamflow is quite complex due to its interactions with stochastic and nonlinear processes (Yaseen et al. 2019; Akçakoca & Apaydın 2020).

Artificial Intelligence (AI) models and physically based models (PBMs) are used by hydrologists to estimate streamflow (Koycegiz & Buyukyildiz 2019; Yang et al. 2020). However, the complex structure of PBMs, the large amount of data on the basin, and the need for expert knowledge for model use lead PBMs to be less feasible (Peng et al. 2017; Ritto & Rochinha 2021). Therefore, as in different phenomena in hydrology, researchers have focused more on AI methods in streamflow prediction. The most well-known and widely used AI models are support vector machine (SVM), artificial neural networks (ANN), and adaptive-based fuzzy logic inference system (ANFIS) (Mohammadi et al. 2021; Mohapatra et al. 2021; Vadiati et al. 2022).

Hadi & Tombul (2018) employed ANN, SVM, ANFIS, and linear autoregressive (AR) models to simulate the daily flow data of three stations in the Seyhan Basin of Türkiye. In all three stations, AR models showed the lowest performance, while ANN models showed the best performance. Parisouj et al. (2020) used support vector regression (SVR), ANN-back propagation (ANN-BP), and extreme learning machine methods to predict streamflow at monthly and daily scales in four river basins in the United States. SVR (ANN-BP) performed the best (worst) on monthly and daily scales. In the modeling of hydrological processes, nature-inspired models have been widely used by researchers as either single or hybridized with AI models (Riahi-Madvar et al. 2021; Samanataray & Sahoo 2021; Adnan et al. 2022). Samanataray & Sahoo (2021) evaluated the streamflow forecasting capability of ANFIS-particle swarm optimization (PSO) hybridized with PSO to optimize ANFIS parameters for monthly streamflow forecasting of the Barak River in India. The findings showed that the ANFIS-PSO model was more successful than the classical ANFIS models.

Recently, deep learning (DL) (e.g., long short-term memory (LSTM)) and boosting algorithms (e.g., XGBoost, LightGBM, and CatBoost) have gained popularity in modeling hydrological processes (Ni et al. 2020; Ghimire et al. 2021; Szczepanek 2022; Hao & Bai 2023; Vogeti et al. 2024). DL algorithms are neural networks with additional hidden layers, and as the data size increases, the model performance increases (Sarker 2021). Hao & Bai (2023) found that the LSTM technique perform better than SVR and XGBoost in daily flow simulation in the Ao River basin in China.

The band similarity (BS) method was developed by Yilmaz (2022) to improve prediction success using model information generated by optimization algorithms. The BS method produces two main outputs. The first is to increase the modeling success of optimization algorithms. The second is to examine the memory structure of the phenomenon under study. Of course, it would not be correct to state that the BS method produces a new contribution to the literature in the first of these two scopes. However, the BS method produces quite original results in the second case. It is known that modeling, forecasting, and trend studies are carried out in the literature, especially within the flow phenomenon. The BS method provides a different perspective and reveals the memory characteristics of the relevant phenomenon. It is known that many natural phenomena exhibit a cyclical structure. The BS method produces more specific results about this cycle.

In this study, the usability of the BS approach, which is relatively new in the literature, is investigated to improve the monthly flow forecast performance. For this purpose, the performance of the BSPSO model hybridized with the PSO method is evaluated by comparing the results of the single PSO and SVR models. In addition, the memory mechanism in the stream was examined using the BS method. The novelty of this study is that it is the first in which the BS approach was used to estimate the monthly average flow and the memory mechanism in the flow was investigated. In this respect, it is thought that the present study fills a gap in the literature.

Study area and data

The Seyhan Basin is located in the Mediterranean basin, which has been identified by the IPCC as one of the basins that will be primarily affected by climate change. In the south, tourism and industry sectors expose water resources to significant anthropogenic impacts. There are also important agricultural activities in the north of the basin. The river section determined as the study area is located on the border of these two driving factors. Moreover, the intense continental climate and strong rainfall variability are important natural factors that increase the uncertainty of the river regime. Therefore, this study aims to test the forecasting performance of the BS method in a study area subjected to considerable challenges. The Seyhan Basin, located in the Sarız district, which is determined as the study area, has a continental climate in the north and a Mediterranean climate in the south. The elevation, which is high in the northern parts of the basin, descends to the sea level toward the south. Agricultural activities are carried out in the northern parts of the Seyhan Basin, and industry and urbanization are mainly done in the southern parts (Koycegiz & Buyukyildiz 2022a, 2022b, 2023). The Seyhan Basin is located at 36°30′ and 39°15′ North latitudes and 34°45′ and 37°00′ East longitudes (Figure 1).
Figure 1

The digital elevation model (DEM) of the Seyhan Basin and the flow observation station (D16A032) location.

Figure 1

The digital elevation model (DEM) of the Seyhan Basin and the flow observation station (D16A032) location.

Close modal

The average annual total precipitation of the Seyhan Basin varies between 400 and 1,000 mm (Gumus 2019). Sarız, located in the north of the Seyhan Basin, has an intense continental climate and spends the winter months with rain. Sarız Stream, which forms one of the branches of the Seyhan River, is a headwater. Considering that the Seyhan River makes an important contribution to the development of the region, the importance of the Sarız Stream comes to the fore. Sarız district is at an altitude of 1,560 m above sea level, and its surface area is approximately 1,410 km2. Agricultural activities in the district are carried out intensively together with animal husbandry.

Within the scope of the study, monthly average flow data of the D18A032 station (Sarız River-Şarköy) operated by the General Directorate of State Hydraulic Works were used. D18A032 has coordinates 38.32°N–36.31°E. The study period was determined to be between 1990 and 2017. Long-term monthly averages of flow data are given in Figure 2. Accordingly, the highest flow average is observed in April (10.34 m3/s), and the lowest flow average is observed in August (1.77 m3/s). The DEM information was obtained from the Shuttle Radar Topography Mission (SRTM) to be used in drawing the location map in Figure 1 and associating the obtained findings with the altitude. The spatial resolution of this DEM is approximately 30 m (1 arcsec).
Figure 2

Long-term monthly average streamflow bar graph of D18A032.

Figure 2

Long-term monthly average streamflow bar graph of D18A032.

Close modal

Particle swarm optimization

PSO, a swarm-based algorithm, was proposed by Kennedy & Eberhart (1995). In the PSO method, all the particles are randomly scattered in nature and determine the places of possible food sources in the range. The quality of the obtained solution points is determined by running the Z objective function (Equations (7)–(11)) for the possible solution points where each particle is located. At this stage, the best food location obtained by each particle individually is recorded as pbest, and the best quality food location that the colony reaches cumulatively is recorded as gbest. Afterward, as shown in Equation (1), velocity values are calculated for each particle to orient toward the best results obtained both locally and globally. Position updates are then performed for each particle as shown in Equation (2).
(1)
(2)

In Equation (1), the velocity value obtained for the solution point xij is shown. The w value here is the inertia weight. The c1 and c2 values in Equation (1) are acceleration coefficients. The r1 and r2 values are the parameters that take random values in the (0–1) range and provide the randomness of the PSO. After the velocity values of all particles are calculated, position updates are performed as shown in Equation (2). Subsequently, the Z objective function (Equations (7)–(11)) is run again according to the up-to-date positions of the particles. The quality of the new solution points is determined. If the pbest or gbest values in the previous iteration are improved, the new values are recorded instead of the old ones.

Model structure

In this study, linear models were created for four different models, as shown in Equations (3)–(6), to obtain the Qt values to be produced by the PSO algorithm. Here, it is aimed to find the most successful solution for each model by searching the range for β values.
(3)
(4)
(5)
(6)
If the Qt values obtained as a result of the produced models are expressed as yM and the observed Qt value as yO, the objective functions to be used by the PSO algorithm can be formed over different performance metrics as seen in Equations (7)–(11). Equations (7)–(10) represent the mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2), and Nash–Sutcliffe efficiency (NSE) coefficient, respectively. Any of these criteria can be used as an objective function alone or in proportion to each other as illustrated in Equation (11). Here, if Z1, Z2, or Z5 is preferred as the objective function, the optimization problem turns into minimization, and if Z3 or Z4 is preferred, it turns into maximization.
(7)
(8)
(9)
(10)
(11)

As a result, with the help of the PSO algorithm, values in the range expressing the search space for the β coefficients are produced and the yM models that will produce the most successful result according to the selected objective function are searched.

BS method

The BS method developed by Yilmaz (2022) uses the model information produced by optimization algorithms to increase modeling success and helps reveal the memory characteristics of the phenomenon studied.

Suppose a model is created on any phenomenon. If the temporal change (increase, decrease, acceleration, deceleration, etc.) of the input parameters of the model repeats in the future and if the output parameter of the model can respond to this change similarly, it is accepted that the related phenomenon remembers the changes that occurred in its past. This mechanism is defined as temporal interaction in the BS method. Many phenomena occurring in nature are expected to be cyclical. In the BS method, when all the input parameters of the model exhibit the same change, if the output parameter of the model can also respond to this, it is accepted that the temporal interaction is established.

In the BS method, Ki,j values, which enable the determination of temporal similarity between the individuals forming the data set, are calculated as shown in Equation (12). Here, T indicates the total number of individuals in the data set and D indicates the dimension of the problem (the number of input parameters of the model).
(12)
The xi,j values in Equation (12) show the variation of the independent variables in the model over time in different ways. Here, changes such as increase, decrease, acceleration, and deceleration of the data over time are modeled. Equations (13)–(17) determine the change according to the value r time ago, the percentage of change according to the value r time ago, the change velocity according to the value r time ago, the change of normalized data, and the change velocity of normalized data, respectively.
(13)
(14)
(15)
(16)
(17)

Since Ki,j values of all xi,j values are calculated, whether the changes made by testing and training individuals in the data set are similar to each other over time will be determined by Ki,j values. For this purpose, a band structure is created beginning from the first test individual. Among the training individuals, individuals who can place all Ki,j values in this band structure are sought. If this situation has occurred, it is understood that the training individuals related to this test individual behave similarly to each other in a temporal sense.

V1,j and V2,j values that make up the band structure are formed as in Equation (18). Here, the number of training individuals is expressed as Tn. The μk value used in the band structure is a value related to the bandwidth. The μk value represents the different iteratively varying μ values obtained by dividing the predetermined upper (μu) and lower (μl) limit for the bandwidth into h pieces (Equation (19)).
(18)
(19)
A detailed flowchart of the BSPSO method is shown in Figure 3.
Figure 3

Flowchart of the BSPSO method.

Figure 3

Flowchart of the BSPSO method.

Close modal

Support vector regression

SVM developed by Vapnik (1995) is an algorithm with high generalization ability, strong theoretical structure, and high performance in applications. For this reason, it is frequently used in classification (SVM) and regression (SVR) applications in many areas. The SVR function is given in Equation (20).
(20)

Here; is the Lagrange multipliers, is the kernel function, and b is the trend value. Error term (ε), regulation factor (C), and kernel function are three factors that affect the performance of SVR models. In this study, the radial basis kernel function is used in SVR applications.

Evaluation of prediction performance of models

In the models created for monthly average flow estimation (Qt), it represents the neuron in the output layer as the monthly average flow at time t. Four input combinations (M1, M2, M3, M4) consisting of different delays of Qt were used in the study. Qt estimation was performed with Qt−1 model input for M1, Qt−1, Qt−2 for M2, Qt−1, Qt−2, Qt−3 for M3, and Qt−1, Qt−2, Qt−3, Qt−4 for M4. About 70% of the monthly average flow data for the period 1990–2017 (28 years) were used for training purposes and the rest for testing purposes. In this study, linear models were produced for four different models with the PSO method, as shown in Equations (3)–(6) for the estimation of Qt values. In the PSO method applications, the limit values expressing the search space are determined as (−5;5). The number of particles is taken as 8 and c1 = c2 = 2. The inertia weight (w) was determined to decrease linearly over the iterations in the range of 0.9–0.4. With the PSO method, 10 linear models were produced for each model shown in Equations (3)–(6). The performance criteria of the produced models are shown in Appendix A in detail. The PSO performance criteria of the most successful model produced for each model are shown in Table 1 for the training and testing period.

Table 1

PSO, BSPSO, and SVR model results

Model namesPerformance metricM1M2M3M4
PSO Training MAE (m3/s) 1.688 1.736 1.654 1.599 
RMSE (m3/s) 2.886 2.870 2.822 2.811 
R2 0.387 0.466 0.459 0.460 
NSE 0.304 0.312 0.335 0.340 
Testing MAE (m3/s) 1.429 1.359 1.316 1.280 
RMSE (m3/s) 2.526 2.407 2.383 2.372 
R2 0.498 0.589 0.580 0.580 
NSE 0.452 0.502 0.512 0.516 
BSPSO Training MAE (m3/s) – – – – 
RMSE (m3/s) – – – – 
R2 – – – – 
NSE – – – – 
Testing MAE (m3/s) – 1.149 1.205 1.130 
RMSE (m3/s) – 1.949 1.895 2.029 
R2 – 0.718 0.734 0.675 
NSE – 0.673 0.691 0.646 
SVR Training MAE (m3/s) 1.608 1.429 1.494 1.357 
RMSE (m3/s) 2.745 2.275 2.292 2.290 
R2 0.383 0.569 0.565 0.570 
NSE 0.371 0.568 0.561 0.562 
Testing MAE (m3/s) 1.510 1.268 1.348 1.268 
RMSE (m3/s) 2.481 1.991 2.050 2.130 
R2 0.493 0.673 0.656 0.654 
NSE 0.471 0.659 0.639 0.610 
Model namesPerformance metricM1M2M3M4
PSO Training MAE (m3/s) 1.688 1.736 1.654 1.599 
RMSE (m3/s) 2.886 2.870 2.822 2.811 
R2 0.387 0.466 0.459 0.460 
NSE 0.304 0.312 0.335 0.340 
Testing MAE (m3/s) 1.429 1.359 1.316 1.280 
RMSE (m3/s) 2.526 2.407 2.383 2.372 
R2 0.498 0.589 0.580 0.580 
NSE 0.452 0.502 0.512 0.516 
BSPSO Training MAE (m3/s) – – – – 
RMSE (m3/s) – – – – 
R2 – – – – 
NSE – – – – 
Testing MAE (m3/s) – 1.149 1.205 1.130 
RMSE (m3/s) – 1.949 1.895 2.029 
R2 – 0.718 0.734 0.675 
NSE – 0.673 0.691 0.646 
SVR Training MAE (m3/s) 1.608 1.429 1.494 1.357 
RMSE (m3/s) 2.745 2.275 2.292 2.290 
R2 0.383 0.569 0.565 0.570 
NSE 0.371 0.568 0.561 0.562 
Testing MAE (m3/s) 1.510 1.268 1.348 1.268 
RMSE (m3/s) 2.481 1.991 2.050 2.130 
R2 0.493 0.673 0.656 0.654 
NSE 0.471 0.659 0.639 0.610 

Note: Bold values give the highest NSE values for PSO, BSPSO, and SVR.

In the literature, R2 and NSE (alone or both) are widely used in evaluating the prediction performance of models. There are also studies where the advantages and disadvantages of both metrics are discussed (Moriasi et al. 2007; Gupta et al. 2009; Ritter & Muñoz-Carpena 2013; Onyutha 2020, 2022). In the methods used in our study, the NSE value was taken into account as the objective function in optimizing the parameters of each method. When the R2 and NSE values given in Table 1 are compared, it is seen that the R2 values are higher than the NSE values. According to Table 1, in all three methods, it is seen that the best model scenarios determined by the NSE are generally also achieved with the R2 metric. In this study, the most successful model in the monthly average flow estimation was decided according to the model input with the highest NSE value in the test period.

In Table 1, although the highest R2 value in the PSO models is in M2 with 0.589, the M4 input structure with the highest NSE (0.516) and the lowest MAE (1.280 m3/s) and RMSE (2.372 m3/s) values has been the most successful PSO model in predicting monthly average flow. The M1 scenario with the highest MAE, RMSE, and the lowest R2 and NSE values was the PSO model with the lowest success in flow estimation. The PSO-M2 and PSO-M3 models also performed closely to the PSO-M4 model, with NSE values of 0.502 and 0.512, respectively.

The BS method was applied using the model information produced by the PSO algorithm, and the results were recorded as BSPSO. To investigate the temporal interaction between training and testing individuals in BS applications, five different simulation models shown in Equations (13)–(17) were used. The simulation models shown in Equations (13)–(17) are referred to as BSM2, BSM3, BSM4, and BSM5, respectively. In all simulation models, the r value was selected in the range of 1–12. The test NSE and μbest values of the best results obtained with the BS method for all models are shown in Appendix B.

As a result of the BS method, the most successful result value in the M2 model was obtained for the 4-month difference values of the BSM4 normalized data (μbest = 0.024, TestNSE = 0.673). The most successful results in both M3 and M4 models were obtained for the 5-month and 9-month difference values ​​of the data in BSM1, respectively. The performance criteria of the most successful results produced for all models as a result of the BS method are also shown in Table 1. Since it carries out its studies on the model results produced based on the PSO method, there are no training results obtained with the BS method. Therefore, there are no performance metrics for the training period in Table 1. However, since the same result was obtained in all yiM1 models (Appendix A) produced in the M1 model, where only the Qt−1 input parameter was used, and there was no model difference, the BS method was not applied; therefore, the metrics of the test period are not given in Table 1 for the M1 model.

According to Table 1, the most successful BSPSO model was achieved in the M3 scenario. In the BSPSO-M3 model, performance metrics for the testing period were obtained as R2 = 0.734, NSE = 0.691, MAE = 1.205 m3/s, and RMSE = 1.895 during the test period. The lowest success in BSPSO models was the M4 scenario with NSE = 0.646.

In the SVR method, the C, ε, and γ parameters were optimized in the ranges of 1–50, 0.01–0.5, and 0.1–5, respectively. According to the results of the most successful SVR models shown in Table 1, the SVR model with the lowest prediction success in flow prediction was the M1 scenario with NSE = 0.471. In the other three combinations, NSE values are higher than 0.6. As a result, the SVR model with the highest prediction success is the M2 input structure with the highest NSE and R2 and lowest MAE and RMSE values (0.659, 0.673, 1.268 m3/s, and 1.991 m3/s, respectively).

When the results of the SVR, PSO, and BSPSO models are compared, it is seen that the BSPSO method is the most successful model in each input structure according to the metrics used. The highest R2 obtained in BSPSO is 0.734. However, the highest R2 values obtained in PSO and SVR are 0.589 and 0.673, respectively. Therefore, these values indicate that the BSPSO method is more successful than the other two methods. According to the criteria stated in Moriasi et al. (2007), if 0.65 < R2 ≤ 0.75, the model success is considered ‘good’ and if 0.75 < R2 ≤ 1, it is considered ‘very good’. In our study, the value of 0.734 obtained in BSPSO is in the ‘good’ category. It is even very close to the ‘very good’ category.

Table 2 shows the improvement rates of the BSPSO method on PSO results. According to Table 2, it can be said that the BSPSO method improves the success achieved in PSO by increasing the R2 and NSE values and decreasing the values of error metrics. Among the metrics, the highest contribution of BS generally occurred in NSE values. According to NSE values, the performance of the BS method in improving success was 34.10, 34.99, and 25.16% for M2, M3, and M4, respectively.

Table 2

Rates of BS improvement of PSO test results

Performance metricM1 (%)M2 (%)M3 (%)M4 (%)Variation
MAE (m3/s) – 15.43 8.42 11.69 ↓ 
RMSE (m3/s) – 19.00 20.46 14.49 ↓ 
R2 – 21.88 26.51 16.38 ↑ 
NSE – 34.10 34.99 25.16 ↑ 
Performance metricM1 (%)M2 (%)M3 (%)M4 (%)Variation
MAE (m3/s) – 15.43 8.42 11.69 ↓ 
RMSE (m3/s) – 19.00 20.46 14.49 ↓ 
R2 – 21.88 26.51 16.38 ↑ 
NSE – 34.10 34.99 25.16 ↑ 

The time series of the most successful input combinations of the PSO, BSPSO, and SVR methods are given in Figure 4. In Figure 4, it is seen that the success of the SVR at low flows and high flows is low compared to other models. Conversely, the PSO model showed lower success, especially in predicting high flows. It can be seen in Figure 4 that the BSPSO method produces satisfactory results, especially in peak flows. Also, all models produced quite satisfactory results in terms of the timing of peak flows.
Figure 4

Time series of the most successful input combination of SVR, PSO, and BSPSO models.

Figure 4

Time series of the most successful input combination of SVR, PSO, and BSPSO models.

Close modal
In Figure 5, the Taylor graph is given for the best input combinations of the models. Accordingly, it is seen that BSPSO achieves higher success compared to SVR and PSO models.
Figure 5

Taylor graph of the most successful models.

Figure 5

Taylor graph of the most successful models.

Close modal

Investigation of memory property of streamflow

It has been seen that the BSPSO method is the most successful among all the methods used, and then the memory properties of the flow phenomenon studied are examined. The reason for the success of the BS method is the model updates based on the temporal similarity behavior between the training and test individuals. Especially when the improvement rates of the BS method in PSO results are examined in Table 2, it is seen that this mechanism works clearly.

As seen in Table 1, the BSPSO method produced the most successful result in the M3 input model structure. This result was obtained according to the 5-month changes of the data with BSM1. Accordingly, it was seen that the most successful result was achieved in the bandwidth obtained for μbest = 1.175. To examine the memory properties of the flow data in detail, the training months that can place all Ki,j elements in the band structure for μbest = 1.175 for each testing month were investigated in detail and the findings obtained were visualized in 3D. Visual results for the periods 2009–2011, 2012–2014, and 2015–2017 are presented in Figures 68, respectively. In summary, it is possible to examine which test month shows a similar behavior with which training months or which test month remembers which training months with the help of these figures.
Figure 6

The variation of the temporal interaction in the test and training months for μbest = 1.175 in BSM1 for the period 2009–2011.

Figure 6

The variation of the temporal interaction in the test and training months for μbest = 1.175 in BSM1 for the period 2009–2011.

Close modal
Figure 7

The variation of the temporal interaction in the test and training months for μbest = 1.175 in BSM1 for the period 2012–2014.

Figure 7

The variation of the temporal interaction in the test and training months for μbest = 1.175 in BSM1 for the period 2012–2014.

Close modal
Figure 8

The variation of the temporal interaction in the test and training months for μbest = 1.175 in BSM1 for the period 2015–2017.

Figure 8

The variation of the temporal interaction in the test and training months for μbest = 1.175 in BSM1 for the period 2015–2017.

Close modal

In the grid system shown in Figures 68, the x-axis represents the year information in the training period and the y-axis represents the month information. In this way, it can be observed which test month is temporally similar to which training months. For example, 1992/05 and 1998/05 months within the training period were able to enter the band structure of 2009/04 in the test period. These months are shown as red-colored cells in Figure 6. Based on this, it is understood that 2009/04 exhibits a temporally similar behavior with 1992/05 and 1998/05 months. In this way, the images in Figures 68 were obtained by showing which training months all test months remembered with red-colored cells.

When Figures 68 are examined, it is seen that the number of red cells is higher in some test months and lower in some months. At this stage, it is understood that there is a stronger and more distinct memory mechanism in the months when the number of red cells is high. For this reason, in the continuation of the study, it was investigated how many training months each testing month remembered, and the number of training months remembered by each testing month was determined and shown in Figure 9 with the flow data. At this stage, the number of training months remembered by any test month was interpreted as the memory intensity of the relevant test month, and the variation of the memory intensity of the current data over time was examined. Here, the findings obtained within the water year framework, which has an important place in the interpretation of flow data, are interpreted. As it is known, the end of September of the previous year of each Gregorian year and the beginning of October of the related year are accepted as the water year of the relevant year, and hydrological studies are mainly evaluated within the framework of the water year. For example, the 2013 water year can be expressed as the period between 1 October 2012 and 30 September 2013.
Figure 9

Change of memory intensities of test months over time.

Figure 9

Change of memory intensities of test months over time.

Close modal

In line with the findings obtained by the BS method, two main outputs were obtained about the memory properties of the flow data studied. The first of these can be expressed as the water year effect. As stated before, the most successful result with the BSPSO method was obtained with the BSM1 for the 5-month difference values of the data. Accordingly, when 2009/01–2009/03 and 2011/01–2011/03 periods in Figure 6, 2012/02–2012/04 periods in Figure 7, 2015/01–2015/02 and 2016/01–2016/03 periods are examined, it is seen that the memory is intensified, and the test months remember more training months. However, when Figure 9 is examined, it is seen that the most severe memory structure occurs in January, February, and March. In this context, it should be considered that the most successful result is obtained according to the 5-month difference values and the most severe memory structure occurs approximately in the January–March period. Accordingly, it is seen that the August–October period corresponds to 5 months before the January–March period and the August–October period includes the beginning of the water year. Theoretically, it was concluded that they remembered the changes they had made at a high rate according to the August–October period when the flow data fell to the lowest values in the water year. When the findings obtained are examined in more detail in this context, it is seen that the months of October and March stand out. The fact that the differentiation between the theoretically lowest and highest periods in a water year is remembered shows that this change has a significant impact on the flow dynamics.

The second important output obtained on the determination of memory properties with the BS method was that a more intense remembrance occurred especially in water years when low flow values were observed. Especially when the periods in Figure 7 (2014/01–2014/08) and the period in Figure 8 (2017/01–2017/08) are examined, it is seen that there is a long-lasting and severe remember mechanism. When the number of remembered months shown in Figure 9 is examined, it is seen that there is a severe memory structure in the 2014 and 2017 water years, which continues almost throughout the water year. When the flow data in Figure 9 and the memory properties are examined together, it is seen that the flow data in the 2014 and 2017 water years took quite low values. In this context, it has been observed that a more severe and intense memory structure is formed, especially in water years when low flow values occur. As a result of the studies, it was concluded that memory is stronger in periods when flow values are low. In this context, it is thought that low flow values may contain important information. It is known that in the majority of studies on streamflow phenomena, researchers are interested in high flow values. However, this study suggests that low flow values may also contain important information.

In this study, the performance of the BSPSO model obtained by integrating the BS method, which is applied for the first time in river flow forecasting, with the PSO method is compared with the single PSO and SVR model results. BS also has the potential to estimate the memory properties of the time series. In models created utilizing historical significant lags of streamflow data, the findings show that the BS approach improves model performance.

The first advantage of the BS method is that it is not a preprocessing tool and performs dynamic analysis with the model. In addition, the fact that it is integrated into the model regardless of the characteristics of the data set leads to the fact that it is not affected by outliers and jumps in the data set. Thus, it allows for working with different data sets. The BS method, which is easily hybridized with evolutionary optimization algorithms, takes advantage of the memory feature of the data set. This opens up the opportunity to include physical features in black box models. Based on the different flow forecasting studies in the literature and the findings of this study, the BS method appears to be an effective tool.

As a result of this study, the fact that the BSPSO method has achieved more successful results compared to the SVR, which is a widely used and proven method in the literature, shows that the BS method is powerful in modeling and estimation. In the first study where the BS method was introduced by Yılmaz (2022), Artificial Bee Colony (ABC) with three different input model structures was used in the estimation of urban water consumption. Afterward, with the application of the BS method, an increase of approximately 30% in R2 values was obtained according to the model success obtained by ABC. The BS method was also used by Yılmaz & Alpars (2023) to determine the memory properties of the water consumption phenomenon and by Yilmaz (2023) to determine the evaporation phenomenon. In both studies, significantly successful results were obtained.

For the same data set used in the current study, the model successes obtained by Koycegiz & Buyukyildiz (2022b) with four different ANN methods can be compared with the results of this study. It has been determined that BS, which is an innovative method, almost achieves ANN success, and even reaches higher success than some ANN algorithms. There is no other study in the literature in which BS was applied. The results obtained from these studies, which are quite limited in number, show that the BS method can be used as an alternative method in estimation studies.

In this study, in which monthly average flow estimation is performed, although the models are established linearly, different nonlinear models can be produced with the help of optimization algorithms. In this respect, the BS method can be applied to nonlinear models with different characters. In short, being easy to use, applying to different mathematical models, and increasing the success of the optimization algorithms in parallel with the success of the optimization algorithms are the strongest aspects of the BS method. In addition, it is thought that the BS method, which can obtain important findings in the examination of the memory properties of the studied phenomenon, is a method that can produce successful results in different fields.

In this study, the memory properties obtained with the help of the BS method on the flow data were evaluated on two outputs. The first of these is called the water year effect. In line with the results obtained, it was seen that the flow data remembered the change between the periods when the theoretical values were lowest and the periods when they started to increase rapidly. In this respect, the fact that the change compared to the period corresponding to the beginning of the water year is remembered at a high rate shows that especially low flow values have a significant effect on the memory structure. When the memory mechanism is analyzed within the framework of the water year from a holistic point of view, it is the second output reached that there is a continuous and intense memory mechanism, especially in the water years when low flow values are observed. Therefore, these two main outputs can be combined on a single main title as the effect of low flow data on memory. A more intense remembrance mechanism is observed in water years when low flow values are obtained both locally and globally in terms of the water year effect. In this respect, it has been concluded that especially low current values have a significant effect on memory properties. This result was accepted as the main output of this study.

It is known that peak flows attract more attention in the analysis of flow data and researchers concentrate on this area. However, within the scope of this study, the fact that the remembrance is more severe, especially in the periods when low current values are observed, shows that the dynamics in these periods are learned better. Therefore, it is thought that hydrologically more meaningful and original findings can be obtained by examining the periods in which low flow values occur in more detail within the scope of the analysis of flow data.

The present study explores the usability of the BS approach, which is relatively new in the literature, in the estimation of flow data as well as memory properties in the flow phenomenon. For this purpose, the models generated by the PSO method are used together with the BS method to perform the flow estimation. The performance of the hybrid BSPSO in improving the success of the single PSO is evaluated and the results are also compared with the results of the SVR method.

The studies' findings showed that, by updating the models produced by the PSO method according to the similarity of the temporal changes between the data, the BS method both increased the success of PSO and achieved more successful results compared to a valid method such as SVR. As a result of the studies, the highest NSE values for the test period were obtained in different input structures for the singular PSO and SVR models and the hybrid BSPSO model. The highest NSE values for each method are obtained as 0.659 for SVR, 0.516 for PSO, and 0.691 for BSPSO. The SVR method has a higher success than PSO in all input structures. However, the BSPSO method outperformed both SVR and PSO in the input structures used. The improvement rates of BSPSO over PSO vary between 25 and 35% for the NSE metric depending on the input structures.

It can be said that the temporal interaction mechanism used by the BS method worked and produced successful results. It is known that in most of the studies on flow analysis, researchers are interested in peak flow values. However, in line with this study, it is thought that remarkable outputs can be obtained by examining the periods in which low flow values occur in more detail, and it is recommended for future studies.

In future studies, it is considered to compare the performance of the BS method, which can use model information produced by different optimization algorithms, can be easily adapted to different mathematical models, and its operating mechanism is simple and easy, with the prediction performance of newly developed DL models and meta-heuristic algorithms, and it is recommended to researchers.

CK contributed to material preparation and data collection. VY and MB contributed to methodology and analysis. VY and CK contributed to conceptualization and visualization. All authors contributed to the study conception, design, writing, and editing. All authors read and approved the manuscript.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Adnan
R. M.
,
Mostafa
R. R.
,
Elbeltagi
A.
,
Yaseen
Z. M.
,
Shahid
S.
&
Kisi
O.
(
2022
)
Development of new machine learning model for streamflow prediction: Case studies in Pakistan
,
Stochastic Environmental Research and Risk Assessment
,
36
,
999
1035
.
https://doi.org/10.1007/s00477-021-02111-z
.
Akçakoca
F.
&
Apaydın
H.
(
2020
)
Modelling of Bektas Creek daily streamflow with generalized regression neural network method
,
International Journal of Advances in Scientific Research and Engineering
,
6
(
2
),
97
103
.
Ghimire
S.
,
Yaseen
Z. M.
,
Farooque
A. A.
,
Deo
R. C.
,
Zhang
J.
&
Tao
X.
(
2021
)
Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks
,
Scientific Reports
,
11
(
1
),
17497
.
https://doi.org/10.1038/s41598-021-96751-4
.
Gumus
V.
(
2019
)
Spatio-temporal precipitation and temperature trend analysis of the Seyhan–Ceyhan River Basins, Turkey
,
Meteorological Applications
,
26
(
3
),
369
384
.
https://doi.org/10.1002/MET1768
.
Gupta
H. V.
,
Kling
H.
,
Yilmaz
K. K.
&
Martinez
G. F.
(
2009
)
Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling
,
Journal of Hydrology
,
377
(
1–2
),
80
91
.
Hadi
S. J.
&
Tombul
M.
(
2018
)
Forecasting daily streamflow for basins with different physical characteristics through data-driven methods
,
Water Resources Management
,
32
,
3405
3422
.
https://doi.org/10.1007/s11269-018-1998-1
.
Hao
R.
&
Bai
Z.
(
2023
)
Comparative study for daily streamflow simulation with different machine learning methods
,
Water
,
15
(
6
),
1179
.
https://doi.org/10.3390/w15061179
.
Hssaisoune
M.
,
Bouchaou
L.
,
Sifeddine
A.
,
Bouimetarhan
I.
&
Chehbouni
A.
(
2020
)
Moroccan groundwater resources and evolution with global climate changes
,
Geosciences
,
10
(
2
),
81
.
https://doi.org/10.3390/geosciences10020081
.
Kennedy
J.
&
Eberhart
R.
(
1995
)
Particle swarm optimization
. In:
Proceedings of ICNN'95 – International Conference on Neural Networks
4
, pp.
1942
1948
.
https://doi.org/10.1109/ICNN.1995.488968
.
Kılıç
Z.
(
2020
)
The importance of water and conscious use of water
,
International Journal of Hydrology
,
4
(
5
),
239
241
.
Kilinc
H. C.
&
Haznedar
B.
(
2022
)
A hybrid model for streamflow forecasting in the Basin of Euphrates
,
Water
,
14
(
1
),
80
.
https://doi.org/10.3390/w14010080
.
Koycegiz
C.
&
Buyukyildiz
M.
(
2022a
)
Investigation of precipitation and extreme indices spatiotemporal variability in Seyhan Basin, Turkey
,
Water Supply
,
22
(
12
).
https://doi.org/10.2166/ws.2022.391
.
Koycegiz
C.
&
Buyukyildiz
M.
(
2022b
)
Estimation of streamflow using different artificial neural network models
,
Osmaniye Korkut Ata University Journal of the Institute of Science and Technology
,
5
(
3
),
1141
1154
.
https://doi.org/10.47495/okufbed.1037242
.
Koycegiz
C.
&
Buyukyildiz
M.
(
2023
)
Investigation of spatiotemporal variability of some precipitation indices in Seyhan Basin, Turkey: Monotonic and sub-trend analysis
,
Natural Hazards
,
116
,
2211
2244
.
https://doi.org/10.1007/s11069-022-05761-6
.
Luo, P., Sun, Y., Wang, S., Wang, S., Lyu, J., Zhou, M., Nakagami, K., Takara, K. & Nover, D.
(
2020
)
Historical assessment and future sustainability challenges of Egyptian water resources management
,
Journal of Cleaner Production
,
263
,
121154
.
https://doi.org/10.1016/j.jclepro.2020.121154
.
Mohammadi
B.
,
Moazenzadeh
R.
,
Christian
K.
&
Duan
Z.
(
2021
)
Improving streamflow simulation by combining hydrological process-driven and artificial intelligence-based models
,
Environmental Science and Pollution Research
,
28
,
65752
65768
.
https://doi.org/10.1007/s11356-021-15563-1
.
Mohapatra
J. B.
,
Jha
P.
,
Jha
M. K.
&
Biswal
S.
(
2021
)
Efficacy of machine learning techniques in predicting groundwater fluctuations in agro-ecological zones of India
,
Science of the Total Environment
,
785
,
147319
.
https://doi.org/10.1016/j.scitotenv.2021.147319
.
Moriasi
D. N.
,
Arnold
J. G.
,
Van Liew
M. W.
,
Bingner
R. L.
,
Harmel
R. D.
&
Veith
T. L.
(
2007
)
Model evaluation guidelines for systematic quantification of accuracy in watershed simulations
,
Transactions of the ASABE
,
50
(
3
),
885
900
.
Ni
L.
,
Wang
D.
,
Wu
J.
,
Wang
Y.
,
Tao
Y.
,
Zhang
J.
&
Liu
J.
(
2020
)
Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model
,
Journal of Hydrology
,
586
,
124901
.
https://doi.org/10.1016/j.jhydrol.2020.124901
.
Onyutha
C.
(
2020
)
From R-squared to coefficient of model accuracy for assessing ‘goodness-of-fits’
,
Geoscientific Model Development Discussions
,
2020
,
1
25
.
Onyutha
C.
(
2022
)
A hydrological model skill score and revised R-squared
,
Hydrology Research
,
53
(
1
),
51
64
.
Parisouj
P.
,
Mohebzadeh
H.
&
Lee
T.
(
2020
)
Employing machine learning algorithms for streamflow prediction: A case study of four river basins with different climatic zones in the United States
,
Water Resources Management
,
34
,
4113
4131
.
https://doi.org/10.1007/s11269-020-02659-5
.
Peng
T.
,
Zhou
J.
,
Zhang
C.
&
Fu
W.
(
2017
)
Streamflow forecasting using empirical wavelet transform and artificial neural networks
,
Water
,
9
(
6
),
406
.
https://doi.org/10.3390/W9060406
.
Riahi-Madvar
H.
,
Dehghani
M.
,
Memarzadeh
R.
&
Gharabaghi
B.
(
2021
)
Short to long-term forecasting of river flows by heuristic optimization algorithms hybridized with ANFIS
,
Water Resources Management
,
35
(
4
),
1149
1166
.
https://doi.org/10.1007/s11269-020-02756-5
.
Ritto
T. G.
&
Rochinha
F. A.
(
2021
)
Digital twin, physics-based model, and machine learning applied to damage detection in structures
,
Mechanical Systems and Signal Processing
,
155
,
107614
.
https://doi.org/10.1016/j.ymssp.2021.107614
.
Samanataray
S.
&
Sahoo
A.
(
2021
)
A comparative study on prediction of monthly streamflow using hybrid ANFIS-PSO approaches
,
KSCE Journal of Civil Engineering
,
25
,
4032
4043
.
https://doi.org/10.1007/s12205-021-2223-y
.
Sarker, I. H. (2021) Machine learning: Algorithms, real-world applications and research directions. SN computer science, 2 (3), 160.
Szczepanek
R.
(
2022
)
Daily streamflow forecasting in mountainous catchment using XGBoost, LightGBM and CatBoost
,
Hydrology
,
9
(
12
),
226
.
https://doi.org/10.3390/hydrology9120226
.
Vadiati
M.
,
Rajabi Yami
Z.
,
Eskandari
E.
,
Nakhaei
M.
&
Kisi
O.
(
2022
)
Application of artificial intelligence models for prediction of groundwater level fluctuations: Case study (Tehran-Karaj alluvial aquifer)
,
Environmental Monitoring and Assessment
,
194
(
9
),
619
.
https://doi.org/10.1007/s10661-022-10277-4
.
Vapnik
V.
(
1995
)
The Nature of Statistical Learning Theory
.
New York
:
Springer
.
Vogeti
R. K.
,
Jauhari
R.
,
Mishra
B. R.
,
Raju
K. S.
&
Nagesh Kumar
D.
(
2024
)
Deep learning algorithms and their fuzzy extensions for streamflow prediction in climate change framework
,
Journal of Water and Climate Change
,
15
(
2
),
832
848
.
Yang
S.
,
Yang
D.
,
Chen
J.
,
Santisirisomboon
J.
,
Lu
W.
&
Zhao
B.
(
2020
)
A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data
,
Journal of Hydrology
,
590
,
125206
.
https://doi.org/10.1016/j.jhydrol.2020.125206
.
Yaseen
Z. M.
,
Sulaiman
S. O.
,
Deo
R. C.
&
Chau
K. W.
(
2019
)
An enhanced extreme learning machine model for river flow forecasting: State-of-the-art, practical applications in water resource engineering area and future research direction
,
Journal of Hydrology
,
569
,
387
408
.
https://doi.org/10.1016/j.jhydrol.2018.11.069
.
Yılmaz
V.
(
2022
)
The use of band similarity in urban water demand forecasting as a new method
,
Water Supply
,
22
(
1
),
1004
1019
.
https://doi.org/10.2166/ws.2021.221
.
Yilmaz
V.
(
2023
)
Analysis of the memory mechanism in the pan evaporation phenomenon by the band similarity method
,
Theoretical and Applied Climatology
,
153
,
635
648
.
https://doi.org/10.1007/s00704-023-04502-4
.
Yilmaz
V.
&
Alpars
M.
(
2023
)
An investigation of the temporal interaction of urban water consumption in the framework of settlement characteristics
,
Water Resources Management
37
(
4
),
1619
1639
.
https://doi.org/10.1007/s11269-023-03447-7
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data