Abstract
Rainfall–runoff modelling is crucial for enhancing the effectiveness and sustainability of water resources. Conceptual models can have difficulties, such as coping with nonlinearity and needing more data, whereas data-driven models can be deprived of reflecting the physical process of the basin. In this regard, two hybrid model approaches, namely Génie Rural à 4 paramètres Journalier (GR4 J)–wavelet-based data-driven models (i.e., wavelet-based genetic algorithm–artificial neural network (WGANN); GR4 J–WGANN1 and GR4 J–WGANN2), were implemented to improve daily rainfall–runoff modelling. The novel GR4 J–WGANN1 hybrid model includes the outflow (QR) and direct flow (QD) obtained from the GR4 J model, and the GR4 J–WGANN2 hybrid model includes the soil moisture index (SMI) obtained from the GR4 J model as input data. In hybrid models, wavelet analysis and the Boruta algorithm were implemented to decompose input data and select wavelet components. Four gauging stations in the Eastern Black Sea and Kızılırmak basins in Turkey were used to observe modelling performance. The GR4 J model exhibited poor performance for extreme flow forecasting. The novel GR4 J–WGANN1 approach performed better than the GR4 J–WGANN2 model, and the hybrid models improved modelling performance up to 40% compared to the GR4 J model. In this regard, integrated conceptual–wavelet-based data-driven models can be useful for improving the conceptual model performance, especially regarding extreme flow forecasting.
HIGHLIGHTS
Two hybrid models (i.e., GR4 J–WGANN1 (Génie Rural à 4 paramètres Journalier-wavelet–genetic algorithm–artificial neural network) and GR4 J–WGANN2) and the GR4 J conceptual model were implemented for rainfall–runoff modelling.
The Boruta algorithm indicated a higher importance level of outputs obtained from the GR4 J model as input data than other variables in hybrid models.
The hybrid models improved the performance by up to 40% compared to the GR4 J model.
The proposed GR4 J–WGANN1 model yielded better results than the GR4 J–WGANN2 model.
Graphical Abstract
INTRODUCTION
Rainfall–runoff modelling has been one of the significant issues in hydrology concerning water resources management and planning. Due to climate change, extreme events, such as floods and droughts, have been observed frequently worldwide in recent years. Hydrological modelling has become crucial for efficient water management and precautions against extreme events. Different hydrological modelling methods, such as conceptual, data-driven, and physically based models, have been put forward. Beven (1989) discussed the applicability of the physically based models in reality and stated the limitations derived from the incompatibility of the equations in the models for the heterogeneous conditions, the lack of theory related to scaling integration, and practical restrictions on solution methodologies and dimensionality problems in parameter calibration. Beven (2001) argued the problems related to implementing distributed hydrological models, such as nonlinearity, scale, equifinality, uniqueness, and uncertainty problems, and focused on the alternative approach based on the prior evaluation of models regarding physical realism on the value of data in model rejection. Frame et al. (2021) investigated the performance of three different long short-term memory (LSTM)-based models depending on the usage of only United States National Water Model (NWM) outputs (LSTM_PP), LSTM post-processor trained on the NWM outputs and atmospheric forcings (LSTM_PPA), and an LSTM model trained only on atmospheric forcing (LSTM_A) for daily streamflow prediction. It was stated by Frame et al. (2021) that all three LSTM deep learning model structures performed better than the NWM, and LSTM significantly contributed to the performance improvement of the NWM. Rozos et al. (2022) used the machine learning approximators (i.e., LSTM and recurrent neural network (RNN)) to evaluate the HYMOD2 and linear regression hydrological modelling (LRHM) models (Rozos et al. 2020), and they found significant performance improvements for runoff forecasting. They pointed out that this approach can be useful for dealing with the poor performance of the hydrological models due to anomalies in the catchment dataset without physical processes being taken into account. Koutsoyiannis & Montanari (2022) introduced a method called ‘Bluecat’ for the simulation and prediction of hydrological processes, which depends on using a generic deterministic model that can be converted to a stochastic formulation. Thus, it can connect the deterministic and stochastic model approaches and simulate and predict hydrologic variables with uncertainty evaluation.
Conceptual models are based on the relationship between hydrometeorological variables, which are usually shown via mathematical equations. Conceptual models have many advantages, such as observing the physical relationship between the parameters and having fewer required parameters than the distributed models, which can also be used for rainfall–runoff modelling as an another approach (Zhou et al. 2019). However, conceptual models can require a larger dataset and can have challenges for the calibration process (Bai et al. 2021; Jahandideh-Tehrani et al. 2021). Several conceptual models, such as Génie Rural à 4 paramètres Journalier (GR4 J), have been extensively used for rainfall–runoff modelling in the last two decades. The GR4 J model is a lumped daily conceptual model with four free parameters introduced by Perrin et al. (2003). Perrin et al. (2003) showed the usefulness and effectiveness of the GR4 J model by comparing it with other conceptual models, which have more parameters, such as Xinanjiang (XAJ) (Zhao & Liu 1995), Hydrologiska Byråns Vattenbalansavdelning (HBV) (Bergström 1995), and Identification of unit Hydrographs And Component flows from Rainfall Evapotranspiration and Streamflow (IHACRES) (Jakeman et al. 1990). In this regard, the GR4 J model has been implemented in many basins across the world (Massmann 2015; Dakhlaoui et al. 2017; Brulebois et al. 2018; Sezen et al. 2019). Anshuman et al. (2018) implemented GR4 J and Australian Water Balance (AWBM) models for rainfall–runoff modelling at two catchments in the upper Godavari basin in India. They revealed that the GR4 J model yielded better than the AWBM model for streamflow forecasting, and the GR4 J model can be useful for water balance modelling despite its simple structure. Ghimire et al. (2020) used GR4 J, IHACRES catchment wetness index (IHACRES CWI), Hydrological Engineering Center-Hydrological Modeling System (HEC-HMS), and Soil & Water Assessment Tool (SWAT) models in Ayeyarwady, in Myanmar. They stated that process-based and relatively complex models could be useful in data-rich catchments, whereas conceptual models with simple model structures, such as the GR4 J model, can be applicable in data-limited basins for rainfall–runoff modelling. Wijayarathne & Coulibaly (2020) investigated the performance of GR4 J, Sacramento Soil Moisture Accounting (SAC-SMA), McMaster University HBV (MAC-HBV), HEC-HMS, and University of Waterloo Flood Forecasting System (WATFLOOD) models in the Waterford River watershed, Canada. They found out that SAC-SMA and GR4 J models exhibited better performance than other models with regard to the estimation of low, medium, and high flows. All these studies showed the usefulness and applicability of the GR4 J model in different basins.
The data-driven models, which are based on the relationship between input and output data, are less dependent on the physical process of the basin. Artificial intelligence-based data-driven models can have advantages such as coping with the nonlinear hydrological process and a large amount of dynamicity in datasets (Nourani et al. 2014). Papacharalampous et al. (2019) implemented several stochastic (such as autoregressive integrated moving average (ARIMA) and autoregressive fractionally integrated moving average (ARFIMA)) and machine learning methods (such as artificial neural networks (ANN), support vector machines (SVM), and random forest (RF)) for the hydrological processes forecasting. They found out that the forecasting performance of the stochastic and machine learning methods did not vary significantly. Tyralis et al. (2019) reviewed the different implementations of RF in water resources. They indicated the applicability and usefulness of RF in many aspects such as regression, classification, and variable selection. Although the applicability of data-driven models has been indicated in many studies, they can also have some shortcomings such as forecasting extreme values and inadequacy for handling non-stationary data (Cannas et al. 2006; Kurtulus & Razack 2007; Shoaib et al. 2018). In this regard, data pre-processing techniques, such as wavelet transformation with the data-driven models, have been widespread (Kalteh 2013; Ramana et al. 2013; Sharghi et al. 2019). Wavelet transformation can improve the performance of data-driven models by isolating the noise and providing spatial and temporal information for time series (Nourani et al. 2013; Shoaib et al. 2018). In recent years, the performance of conceptual and data-driven models has been compared, and they have been combined for rainfall–runoff modelling. Humphrey et al. (2016) integrated the GR4 J model into the ANN model and used the hybrid model for monthly streamflow forecasting. They compared the performance of the hybrid model with GR4 J and ANN models. In this regard, Humphrey et al. (2016) pointed out that the hybrid model yielded more accurate forecasting results than the GR4 J and ANN models, especially for high flows. Kumanlioglu & Fistikoglu (2019) integrated the GR4 J and ANN models by using net rainfall (Pn), net evapotranspiration (En), percolation (Perc), and the part of Pn filling the production store (Ps) obtained from the GR4 J model as input data and genetic algorithm (GA) for the calibration of the conceptual model. They stated that the integrated model approach could be useful with regard to more robust and better modelling performance, especially in highly nonlinear conditions. Ghaith et al. (2020) implemented a hybrid model, which is based on the integration of the HYdrological MODel (HYMOD) and ANN models, a hybrid hydrological data-driven (HHDD) model, for daily streamflow forecasting. They stated that HHDD yielded better and more robust prediction results than the HYMOD. Wang et al. (2021) integrated the XAJ model with the wavelet-based random forest (WRF) model for daily rainfall–runoff modelling. They revealed that the XAJ–WRF model yielded better simulation results than the XAJ and RF models, and wavelet analysis contributed to the performance improvement. Okkan et al. (2021) implemented two different hybrid model approaches, namely nested and parallel coupling methods, to integrate the dynamic water balance model (dynwbm) and ANN, support vector regression (SVR) models for monthly rainfall–runoff modelling in the Gediz River basin, Turkey. They indicated the outperformance of the hybrid model approaches against the dynwbm model and stated that the nested hybrid model approach could be useful for streamflow simulation. As it can be seen, with the integration of the benefits of both the conceptual and data-driven models, hybrid models can improve the rainfall–runoff modelling performance.
In this study, two different hybrid GR4 J and wavelet-based genetic algorithm–ANN (WGANN) (i.e., GR4 J–WGANN1 and GR4 J–WGANN2) model approaches were implemented to improve the modelling performance of the GR4 J model. In the first hybrid model (GR4 J–WGANN1), routing store outflow (QR) and direct flow (QD) acquired from the GR4 J model, which represent the novelty of this study, were used as input variables. In the second model (GR4 J–WGANN2), the soil moisture index (SMI) obtained from the GR4 J model was used as an input variable in the optimized ANN model by the GA. Implementing the coupled GR4 J conceptual and wavelet-based data-driven models will be another novel aspect of this study. Furthermore, the Boruta algorithm was also implemented to select wavelet components for each input data in hybrid models as a new approach. Thus, the performance of the GR4 J, GR4 J–WGANN1, and GR4 J–WGANN2 models will be compared for daily rainfall–runoff modelling. The models will be implemented in three sub-basins of the Eastern Black Sea, and one sub-basin of the Kızılırmak basin, Turkey. In this regard, it will be observed whether both hybrid models can develop the performance of the GR4 J model.
DATA AND METHODS
Basin characteristics and data
In this study, discharge data of four gauging stations, namely Cevizdere Düzköy, Akçay Duraklı, and Cura Çayı Curi, in the western part of the Eastern Black Sea basin and Çağşur Çayı Esenler in the Kızılırmak basin in Turkey were used for daily rainfall–runoff modelling. The Turkish General Directorate of State Hydraulic Works provided the runoff data, and the Turkish State Meteorological Service provided temperature and precipitation data. The Eastern Black Sea basin has an irregular flood regime, and flood cases can be observed. In this regard, floods have led to economic and human losses in recent years (Republic of Turkey Ministry of Agriculture & Forestry General Directorate of Water Management 2020). In addition, precipitation also has a changing pattern due to the geographical formation of the basin. Precipitation occurs regularly in coastal parts of the basin during all seasons, whereas the summer season is usually arid in the terrestrial part of the basin. In general, coastal parts have moderate climatic conditions, while continental climate conditions dominate the interior parts. In the Kızılırmak basin, moderate climate conditions also dominate the coastal parts compared to the interior parts. However, the precipitation is relatively less in the coastal part of the Kızılırmak basin than in the Eastern Black Sea basin. The meteorological and discharge gauging stations used in this study are shown in Figure 1. Areal precipitation was calculated by using a Thiessen polygon for Ceviz, Akçay, Cura, and Çağşur streams via ArcGIS software (Figure 1). The temperature data of Ünye (17624) for Ceviz and Cura streams, Terme/Kozluk Beldesi (18544) for the Akçay stream (18544) in the Eastern Black Sea basin, and Bafra (17622) for the Çağşur stream in the Kızılırmak basin were used. The runoff and hydrometeorological data cover the period between 01.01.2012 and 30.09.2020 in Ceviz, Akçay, Cura, and Çağşur streams. Accordingly, the total data of meteorological and discharge stations consists of 3,196 days. The data ranges for meteorological and discharge gauging stations can be seen in Table 1. The first year of the data period was used as a warm-up period in the GR4 J model. Then, the period between 01.01.2013 and 04.06.2018 was used for the calibration and 05.06.2018 and 30.09.2020 for the validation in conceptual and hybrid models for Ceviz, Akçay, Cura, and Çağşur streams. The suggestion of Klemeš (1986) was taken into account to determine the calibration and test periods. Klemeš (1986) referred that the data can be split as 70% for the calibration and 30% for the validation if the available data are not long enough. Missing precipitation data were imputed by using RF via the ‘missForest’ package, part of R software (Stekhoven & Bühlmann 2012; Stekhoven 2013), whereas missing temperature data of the Terme/Kozluk Beldesi station were imputed by regression analysis using the temperature data of the Ünye station. The applicability and useful implementation of missForest, which is one of the RF algorithms, were shown for imputing missing data (Tang & Ishwaran 2017; Aguilera et al. 2020; Addi et al. 2022). Addi et al. (2022) stated that missForest imputation yielded good performance in predicting the dry and wet periods and extreme rainfall values. As for imputing temperature data, regression analysis was preferred due to the high correlation (r=0.993) between the temperature data of Terme/Kozluk Beldesi and Ünye stations. The potential evapotranspiration data can be calculated using several methods such as Penman––Monteith, Hargreaves, Oudin, and Thornthwaite. The performance of different evapotranspiration calculation methods has been investigated in many studies (Tegos et al. 2015; Kodja et al. 2020). Tegos et al. (2015) compared the performance of an evapotranspiration model depending on a simplification of the Penman–Monteith formula with radiation and temperature-based methods. They found that the simplified version of the Penman–Monteith model approach can be useful and efficient in evaluating potential evapotranspiration fields. Kodja et al. (2020) implemented the Penman–Monteith and Oudin methods for the evapotranspiration calculation in the GR4 J model in tropical basins of West Africa. They found that implementing different evapotranspiration formulations had few impacts on the GR4 J model. Flores et al. (2021) also indicated the effectiveness and outperformance of the Oudin formula for rainfall–runoff modelling. Accordingly, evapotranspiration was calculated using the Oudin formula (Oudin et al. 2005), which is based on the daily temperature data and the basin location, in this study. Oudin et al. (2005) showed that the Oudin formula could be practical and efficient for rainfall–runoff modelling. It requires less data to calculate evapotranspiration than other methods like the Penman formula. One can refer to the study of Oudin et al. (2005) for further details regarding the calculation of the Oudin formula. Statistical data regarding the runoff and hydrometeorological data can be seen in Table 2. In this respect, the calculated minimum, maximum, mean, standard deviation, skewness, and excess kurtosis (i.e., kurtosis coefficient minus 3) values are given in Table 2. The statistical data, such as mean, standard deviation, skewness, and excess kurtosis belonging to the time series used in this study are generally in agreement with the rest of the hydrometeorological data of many stations around the world, which were statistically analysed by Dimitriadis et al. (2021a, 2021b). This agreement can be evaluated as a significant indicator with regard to the compatibility of the time series for the analysis. As for the investigation of the long-term persistence of time series, the Hurst phenomenon has been a widely used method in many studies (Koutsoyiannis 2003, 2020; Dimitriadis et al. 2021a, 2021b). In this regard, the Hurst exponent, which was introduced by Hurst (1951), was calculated to identify the long-term persistence in precipitation, evapotranspiration, and streamflow time series in this study and is presented in Table 2. The Hurst parameter (H) for 0.5<H<1 refers to long-term persistence behaviour in the time series, for 0<H<0.5 the anti-persistent behaviour, and for H=0.5 a white-noise behaviour (Dimitriadis et al. 2021a). According to Table 2, H values show the long-term persistence behaviour in precipitation, evapotranspiration, and streamflow time series used in this study. Furthermore, the precipitation has remarkably less long-term persistence behaviour than the evapotranspiration and streamflow data and is even very close to white-noise behaviour, especially in Ceviz, Cura, and Akçay streams. The cross-correlations for the precipitation–runoff and evapotranspiration–runoff are presented in Table 3 to observe the relationship between variables. Accordingly, the correlations for the lag0 and lag1 are high compared to the subsequent lags for precipitation–runoff. On the other hand, the correlations for each lag are close to each other for evapotranspiration–runoff.
Station types . | Station no. . | Station name . | Location . | Data range . | Data used in the warm-up period for the conceptual model . | Modelling period . | Missing data ratio in the modelling period (%) . | |
---|---|---|---|---|---|---|---|---|
Calibration . | Validation . | |||||||
Meteoorological stations | 17622 | Bafra | 35.92 E, 41.55 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – |
18134 | Vezirköprü | 35.45 E, 41.14 N | 01.06.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
18539 | Havza | 35.71 E, 40.99 N | 01.01.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 12.9 | |
18540 | Kavak | 36.06 E, 41.1 N | 01.01.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 12.9 | |
18544 | Terme/Kozluk Beldesi | 37.16 E, 41.14 N | 01.02.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 14 | |
18531 | Kumru | 37.25 E, 40.86 N | 01.02.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 14 | |
18526 | Çaybaşı | 37.1 E, 41.02 N | 01.02.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 14 | |
18131 | Akkuş | 37.02 E, 40.79 N | 01.11.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
17624 | Ünye | 37.29 E, 41.14 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
18529 | İkizce/Şenbolluk | 37.00 E, 41.05 N | 01.03.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 15 | |
Discharge gauging stations | E22A066 | Cevizdere Düzköy | 37.31 E, 41.06 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – |
E22A054 | Akçay Duraklı | 37.15 E, 41.11 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
E22A071 | Cura Çayı Curi | 37.20 E, 41.12 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
E15A049 | Çağşur Çayı Esençay | 35.84 E, 41.33 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – |
Station types . | Station no. . | Station name . | Location . | Data range . | Data used in the warm-up period for the conceptual model . | Modelling period . | Missing data ratio in the modelling period (%) . | |
---|---|---|---|---|---|---|---|---|
Calibration . | Validation . | |||||||
Meteoorological stations | 17622 | Bafra | 35.92 E, 41.55 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – |
18134 | Vezirköprü | 35.45 E, 41.14 N | 01.06.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
18539 | Havza | 35.71 E, 40.99 N | 01.01.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 12.9 | |
18540 | Kavak | 36.06 E, 41.1 N | 01.01.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 12.9 | |
18544 | Terme/Kozluk Beldesi | 37.16 E, 41.14 N | 01.02.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 14 | |
18531 | Kumru | 37.25 E, 40.86 N | 01.02.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 14 | |
18526 | Çaybaşı | 37.1 E, 41.02 N | 01.02.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 14 | |
18131 | Akkuş | 37.02 E, 40.79 N | 01.11.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
17624 | Ünye | 37.29 E, 41.14 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
18529 | İkizce/Şenbolluk | 37.00 E, 41.05 N | 01.03.2014–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | 15 | |
Discharge gauging stations | E22A066 | Cevizdere Düzköy | 37.31 E, 41.06 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – |
E22A054 | Akçay Duraklı | 37.15 E, 41.11 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
E22A071 | Cura Çayı Curi | 37.20 E, 41.12 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – | |
E15A049 | Çağşur Çayı Esençay | 35.84 E, 41.33 N | 01.01.2012–30.09.2020 | 01.01.2012–31.12.2012 | 01.01.2013–04.06.2018 | 05.06.2018–30.09.2020 | – |
Rainfall stations for Thiessen polygon . | Discharge gauging stations . | Streams . | P (mm) . | E (mm) . | Q (mm/d) . | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Min. . | Max. . | Mean . | Std. . | Skew. . | Kurt. . | Hurst exp. . | Min. . | Max. . | Mean . | Std. . | Skew. . | Kurt. . | Hurst exp. . | Min. . | Max. . | Mean . | Std. . | Skew. . | Kurt. . | Hurst exp. . | |||
Çaybaşı (18526), Ünye (17624), Akkuş (18131), Kumru (18531), Terme/Kozluk Beldesi (18544) | Cevizdere Düzköy (E22A066) | Ceviz | 0 | 57.8 | 2.9 | 5.7 | 3.8 | 20.1 | 0.51 | 0.2 | 5.5 | 2.6 | 1.5 | 0.27 | −.1.4 | 0.66 | 0.05 | 35.4 | 1.5 | 1.96 | 4.6 | 41.0 | 0.68 |
Çaybaşı (18526), Akkuş (18131), İkizce/Şenbolluk (18529), Terme/Kozluk Beldesi (18544) | Akçay Duraklı (E22A054) | Akçay | 0 | 133.3 | 4.0 | 9.1 | 5.0 | 41.0 | 0.53 | 0.13 | 5.4 | 2.5 | 1.5 | 0.26 | −1.4 | 0.66 | 0.07 | 50.9 | 2.2 | 3.8 | 4.7 | 35.3 | 0.67 |
Çaybaşı (18526), Akkuş (18131), İkizce/Şenbolluk (18529), Terme/Kozluk Beldesi (18544) | Cura Çayı Curi (E22A071) | Cura | 0 | 100.1 | 3.4 | 7.4 | 4.7 | 34.8 | 0.53 | 0.2 | 5.5 | 2.6 | 1.5 | 0.27 | −.1.4 | 0.66 | 0 | 30.6 | 1.9 | 2.36 | 3.6 | 22.9 | 0.70 |
Bafra (17622), Vezirköprü (18134), Havza (18539), Kavak (18540) | Çağşur Çayı Esençay (E15A049) | Çağşur | 0 | 43.7 | 1.7 | 4.0 | 3.9 | 20.3 | 0.58 | 0.007 | 5.5 | 2.5 | 1.5 | 0.25 | −1.4 | 0.67 | 0.007 | 15.1 | 0.55 | 0.87 | 5.2 | 51.3 | 0.70 |
Rainfall stations for Thiessen polygon . | Discharge gauging stations . | Streams . | P (mm) . | E (mm) . | Q (mm/d) . | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Min. . | Max. . | Mean . | Std. . | Skew. . | Kurt. . | Hurst exp. . | Min. . | Max. . | Mean . | Std. . | Skew. . | Kurt. . | Hurst exp. . | Min. . | Max. . | Mean . | Std. . | Skew. . | Kurt. . | Hurst exp. . | |||
Çaybaşı (18526), Ünye (17624), Akkuş (18131), Kumru (18531), Terme/Kozluk Beldesi (18544) | Cevizdere Düzköy (E22A066) | Ceviz | 0 | 57.8 | 2.9 | 5.7 | 3.8 | 20.1 | 0.51 | 0.2 | 5.5 | 2.6 | 1.5 | 0.27 | −.1.4 | 0.66 | 0.05 | 35.4 | 1.5 | 1.96 | 4.6 | 41.0 | 0.68 |
Çaybaşı (18526), Akkuş (18131), İkizce/Şenbolluk (18529), Terme/Kozluk Beldesi (18544) | Akçay Duraklı (E22A054) | Akçay | 0 | 133.3 | 4.0 | 9.1 | 5.0 | 41.0 | 0.53 | 0.13 | 5.4 | 2.5 | 1.5 | 0.26 | −1.4 | 0.66 | 0.07 | 50.9 | 2.2 | 3.8 | 4.7 | 35.3 | 0.67 |
Çaybaşı (18526), Akkuş (18131), İkizce/Şenbolluk (18529), Terme/Kozluk Beldesi (18544) | Cura Çayı Curi (E22A071) | Cura | 0 | 100.1 | 3.4 | 7.4 | 4.7 | 34.8 | 0.53 | 0.2 | 5.5 | 2.6 | 1.5 | 0.27 | −.1.4 | 0.66 | 0 | 30.6 | 1.9 | 2.36 | 3.6 | 22.9 | 0.70 |
Bafra (17622), Vezirköprü (18134), Havza (18539), Kavak (18540) | Çağşur Çayı Esençay (E15A049) | Çağşur | 0 | 43.7 | 1.7 | 4.0 | 3.9 | 20.3 | 0.58 | 0.007 | 5.5 | 2.5 | 1.5 | 0.25 | −1.4 | 0.67 | 0.007 | 15.1 | 0.55 | 0.87 | 5.2 | 51.3 | 0.70 |
In parenthesis, ‘Station no.’ was written for both rainfall and discharge gauging stations. The calculated kurtosis values in the table are the excess kurtosis values (i.e., kurtosis coefficient minus 3).
Streams . | Correlated conjugates . | Lags . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
lag0 . | lag1 . | lag2 . | lag3 . | lag4 . | lag5 . | lag6 . | lag7 . | lag8 . | lag9 . | lag10 . | ||
Ceviz | P–Q | 0.51 | 0.22 | 0.04 | 0.004 | −0.004 | 0.005 | −0.02 | −0.02 | −0.01 | −0.004 | −0.008 |
E–Q | −0.34 | −0.33 | −0.33 | −0.32 | −0.31 | −0.31 | −0.30 | −0.29 | −0.29 | −0.28 | −0.28 | |
Akçay | P–Q | 0.65 | 0.24 | 0.03 | 0.004 | −0.003 | 0.02 | −0.008 | −0.03 | −0.02 | −0.008 | −0.01 |
E–Q | −0.29 | −0.28 | −0.27 | −0.27 | −0.26 | −0.26 | −0.25 | −0.25 | −0.24 | −0.24 | −0.24 | |
Cura | P–Q | 0.58 | 0.26 | 0.06 | 0.02 | 0.004 | 0.017 | 0.009 | 0.004 | −0.004 | −0.014 | −0.008 |
E–Q | −0.34 | −0.34 | −0.33 | −0.32 | −0.31 | −0.31 | −0.30 | −0.29 | −0.29 | −0.28 | −0.28 | |
Çağşur | P–Q | 0.27 | 0.11 | 0.06 | 0.06 | 0.04 | 0.06 | 0.04 | 0.02 | 0.03 | 0.05 | 0.03 |
E–Q | −0.13 | −0.12 | −0.11 | −0.10 | −0.09 | −0.08 | −0.07 | −0.06 | −0.05 | −0.04 | −0.04 |
Streams . | Correlated conjugates . | Lags . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
lag0 . | lag1 . | lag2 . | lag3 . | lag4 . | lag5 . | lag6 . | lag7 . | lag8 . | lag9 . | lag10 . | ||
Ceviz | P–Q | 0.51 | 0.22 | 0.04 | 0.004 | −0.004 | 0.005 | −0.02 | −0.02 | −0.01 | −0.004 | −0.008 |
E–Q | −0.34 | −0.33 | −0.33 | −0.32 | −0.31 | −0.31 | −0.30 | −0.29 | −0.29 | −0.28 | −0.28 | |
Akçay | P–Q | 0.65 | 0.24 | 0.03 | 0.004 | −0.003 | 0.02 | −0.008 | −0.03 | −0.02 | −0.008 | −0.01 |
E–Q | −0.29 | −0.28 | −0.27 | −0.27 | −0.26 | −0.26 | −0.25 | −0.25 | −0.24 | −0.24 | −0.24 | |
Cura | P–Q | 0.58 | 0.26 | 0.06 | 0.02 | 0.004 | 0.017 | 0.009 | 0.004 | −0.004 | −0.014 | −0.008 |
E–Q | −0.34 | −0.34 | −0.33 | −0.32 | −0.31 | −0.31 | −0.30 | −0.29 | −0.29 | −0.28 | −0.28 | |
Çağşur | P–Q | 0.27 | 0.11 | 0.06 | 0.06 | 0.04 | 0.06 | 0.04 | 0.02 | 0.03 | 0.05 | 0.03 |
E–Q | −0.13 | −0.12 | −0.11 | −0.10 | −0.09 | −0.08 | −0.07 | −0.06 | −0.05 | −0.04 | −0.04 |
Models
GR4 J conceptual model
The GR4 J model is a lumped conceptual model for daily rainfall-runoff modelling, which has four parameters, introduced by Perrin et al. (2003). In the GR4 J model, precipitation (P) and evapotranspiration (E) are used as input data. The GR4 J model has x1, x2, x3, and x4 parameters, where x1 (mm) stands for the maximum capacity of the production store, x2 (mm) for the groundwater exchange coefficient, x3 (mm) for 1 day ahead maximum capacity of the routing store, and x4 (days) for the time base of the unit hydrograph of UH1. In the GR4 J model, first, net precipitation and evapotranspiration are obtained, and in the final step, the simulated discharge was calculated via the summation of the QD and routing store QR. The SMI, routing store QR, and QD were obtained from the GR4 J model and used in hybrid models. The SMI values are sequences of the computed water level in production store (S) values in the GR4 J model (Figure 2) (Anctil et al. 2004). One can refer to the study of Perrin et al. (2003) for further details regarding the GR4 J model structure and Anctil et al. (2004) for the computation of the SMI. The method described by Michel (1991) was used, and root mean square error (RMSE) was utilized as the objective function to calibrate the GR4 J model. Firstly, in the calibration algorithm, a screening process is performed using either a predefined grid or a list of initial parameter sets, and then the steepest descent local search algorithm is carried out (Michel 1991; Coron et al. 2017, 2021). In this paper, the calibration and simulation of runoff data for the GR4 J model were fulfilled via the AirGR package (Coron et al. 2017, 2021) that is part of the R software (R Development Core Team 2015).
Wavelet transformation
ANN optimized by GA
Boruta algorithm
Deciding the input variables for predicting the output variable is one of the critical points in data-driven models. In this study, the Boruta algorithm was used to determine wavelet components and input variables for rainfall–runoff modelling. The Boruta algorithm is a feature selection algorithm, a wrapper around an RF algorithm (Kursa et al. 2010). The implementation of the Boruta algorithm can be sorted as follows for a set of P samples of predictor variables (xpɛRn), n stands for the number of inputs and p=1, 2, 3,…, P, and yp stands for the target variable (Kursa et al. 2010; Kursa & Rudnicki 2010; Prasad et al. 2019):
- (1)
A randomly ordered shadow input vector, , is generated for the respective input vector, xp, to isolate the correlations between the input variable and provide the randomness.
- (2)
The RF model is performed to predict the yp by using the xp and input vectors.
- (3)For each run, variable importance is computed. In this regard, permutation importance or Mean Decrease Accuracy (MDA) for every input variable (xp) and shadow input (x′p) is computed over all trees. According to the previous studies, the number of trees, mtree, was decided as 500 (Strobl et al. 2008; Hur et al. 2017; Prasad et al. 2019). The computation of the MDA is as follows:where I(.) stands for the indicator function, OOB (out-of-bag) for the prediction error of each of the training samples utilizing bootstrap aggregation, () for the predicted values before permuting, and () for the predicted values after permuting.
- (4)
- (5)
Accordingly, Z-scores of input variables are compared with the corresponding shadow variables. The input variables for Z<MZSA are assessed as ‘Unimportant’, whereas input variables for Z>MZSA are evaluated as ‘Important’.
- (6)
For each iteration, new shadows are generated, and the Boruta algorithm halts if all variables are confirmed as ‘Important’ or ‘Unimportant’ or the maximum iteration number is reached.
- (7)
The unconfirmed input variables are labelled as ‘Tentative’ if the maximum iteration number is reached. They can be either confirmed or rejected by comparing the respective median Z-scores with the median Z-scores of the best shadow input variable.
Conceptual–data-driven hybrid model structure
In this study, two different outputs of the GR4 J model were used in the hybrid models. In the first hybrid model (GR4 J–WGANN1), QR and QD obtained from GR4 J, P, E, and API were used as input variables. In this respect, preceding 2 days (i.e., Pt−2, Et−2, APIt−2, QRt−2, and QDt−2), preceding 1 day (i.e., Pt−1, Et−1, APIt−1, QRt−1, and QDt−1), and data for the day t (i.e., P, E, API, QR, and QD) were used for runoff prediction in the GR4 J–WGANN1 model. Thus, the effect of flow outputs of the GR4 J model (i.e., routing store QR and QD from the unit hydrograph 2) was investigated for runoff forecasting, representing the novelty of this study. In the second hybrid model (GR4 J–WGANN2), the SMI obtained from GR4 J, P, E, and API was implemented as input data. In the GR4 J–WGANN2 model, preceding 2 days (i.e., Pt−2, Et−2, APIt−2, and SMIt−2), preceding 1 day (i.e., Pt−1, Et−1, APIt−1, and SMIt−1), and data for the day t (i.e., P, E, API, and SMI) were used for runoff estimation. Thus, initial basin conditions were provided by using the API and the SMI as input variables, as implemented in previous studies (Anctil et al. 2004; Brocca et al. 2008; Humphrey et al. 2016). In both hybrid models, wavelet transform was used to acquire the wavelet components of each input variable in the hybrid model. Then, the Boruta algorithm was implemented to select the wavelet components for rainfall–runoff modelling. Accordingly, the rejected components were discarded from the input variables. Then, runoff forecasting was fulfilled by using optimized ANN by the GA. The flow chart for the GR4 J–WGANN1 and GR4 J–WGANN2 models can be seen in Figures 2 and 3, respectively. With the utilization of the hybrid models, it will be observed whether using different outputs of the conceptual model in hybrid GR4 J–WGANN models will improve the rainfall–runoff modelling performance compared to the GR4 J model.
Assessment of the model performance
RESULTS AND DISCUSSION
The performance of the GR4 J model
First, the calibration of the GR4 J model was performed and the calibration parameters were obtained. Then, the runoff simulation was carried out. The simulation results of the GR4 J model can be seen in Table 4 and Figures 4(a)–7(a). According to the NSE, the GR4 J model yielded a good performance, as seen in Table 4. However, the GR4 J model overestimated low flows, whereas it underestimated high flows in Akçay, Ceviz, Cura, and Çağşur streams, as seen in Figures 4(a)–7(a). Furthermore, it was observed that the GR4 J model yielded the best simulation results in the Akçay stream, while it yielded the worst results in the Ceviz stream, according to Table 4. In recent years, the flashiness of precipitation and flash floods, which could be related to climate change, has tended to increase in the area especially for the Ceviz stream (Beden & Ulke Keskin 2021), and this could affect the rainfall–runoff modelling performance of the GR4 J model. Poncelet et al. (2017) also stated that the flashiness of precipitation and streamflow could negatively affect the model performance. In this regard, models can underestimate the flow variability and not yield better results for predicting peak flows (Gupta et al. 2009; van Esse et al. 2013; Poncelet et al. 2017). It can negatively affect the runoff estimation performance of the model. According to Figures 8(a)–11(a), the GR4 J model underestimated high flows and overestimated low flows in all streams. In Akçay, Ceviz, and Cura streams, the GR4 J model underestimated flows in the winter and summer seasons, whereas it overestimated flows during the spring season. The Çağşur stream has a lower flow period than other streams, and the GR4 J model overestimated low flows in the summer and autumn seasons while it underestimated flows in winter and spring, as seen in Figure 11(a). Scatter diagrams also indicated the overestimation of low flows and the underestimation of high flows in all streams according to Figures 8(a)–11(a).
Streams . | GR4 J . | GR4 J–WGANN1 . | GR4 J–WGANN2 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NSE . | KGE . | MAE (mm/d) . | RMSE (mm/d) . | NSE . | KGE . | MAE (mm/d) . | RMSE (mm/d) . | NSE . | KGE . | MAE (mm/d) . | RMSE (mm/d) . | |
Akçay | 0.61 | 0.58 | 1.35 | 2.79 | 0.71 | 0.67 | 1.1 | 2.39 | 0.70 | 0.68 | 1.12 | 2.45 |
Ceviz | 0.50 | 0.54 | 0.68 | 1.19 | 0.70 | 0.77 | 0.5 | 0.93 | 0.67 | 0.83 | 0.55 | 0.98 |
Cura | 0.55 | 0.60 | 0.93 | 1.67 | 0.69 | 0.73 | 0.75 | 1.38 | 0.68 | 0.72 | 0.71 | 1.4 |
Çağşur | 0.54 | 0.56 | 0.22 | 0.45 | 0.64 | 0.62 | 0.19 | 0.40 | 0.60 | 0.58 | 0.20 | 0.43 |
0.55 | 0.57 | 0.795 | 1.525 | 0.685 | 0.698 | 0.635 | 1.275 | 0.663 | 0.703 | 0.645 | 1.315 |
Streams . | GR4 J . | GR4 J–WGANN1 . | GR4 J–WGANN2 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NSE . | KGE . | MAE (mm/d) . | RMSE (mm/d) . | NSE . | KGE . | MAE (mm/d) . | RMSE (mm/d) . | NSE . | KGE . | MAE (mm/d) . | RMSE (mm/d) . | |
Akçay | 0.61 | 0.58 | 1.35 | 2.79 | 0.71 | 0.67 | 1.1 | 2.39 | 0.70 | 0.68 | 1.12 | 2.45 |
Ceviz | 0.50 | 0.54 | 0.68 | 1.19 | 0.70 | 0.77 | 0.5 | 0.93 | 0.67 | 0.83 | 0.55 | 0.98 |
Cura | 0.55 | 0.60 | 0.93 | 1.67 | 0.69 | 0.73 | 0.75 | 1.38 | 0.68 | 0.72 | 0.71 | 1.4 |
Çağşur | 0.54 | 0.56 | 0.22 | 0.45 | 0.64 | 0.62 | 0.19 | 0.40 | 0.60 | 0.58 | 0.20 | 0.43 |
0.55 | 0.57 | 0.795 | 1.525 | 0.685 | 0.698 | 0.635 | 1.275 | 0.663 | 0.703 | 0.645 | 1.315 |
Bold numbers demonstrate average values of the NSE, KGE, MAE, and RMSE.
The performance of the hybrid GR4 J–WGANN1 model
In GR4 J–WGANN1, first QR and QD were obtained from the GR4 J model, and the API was calculated. Then, P, E, API, QR, and QD input data were decomposed into 10 detailed components and one approximation component by using the DWT. To determine the wavelet decomposition level, the study by Sang (2012) takes times-series length into consideration. Accordingly, Sang (2012) stated that one minus value of the maximum wavelet decomposition level according to the data length (i.e., the integer part of log2(2,830)−1) could be appropriate for the wavelet decomposition level selection for de-noising series. Then, the wavelet components of each variable used in runoff forecasting were determined using the Boruta algorithm. According to Figures 12(a) and 13(a), QR and QD have more important wavelet components than other input variables for the Akçay and Çağşur streams. The unimportant components (generally D1 and D2 components) were extracted from the P, E, and API data. Similar findings from the Boruta analysis were also obtained for Ceviz and Cura streams. The extraction of high-frequency components, which generally represent the noisy part of data (especially D1), can improve the model performance, as emphasized in previous studies (Santos et al. 2014; Tayyab et al. 2019). The GR4 J–WGANN1 model yielded better performance than the GR4 J model with regard to runoff forecasting, as seen in Figures 4(b)–7(b). According to Table 4, the NSE values change between 0.64 and 0.71, and the increase in runoff forecasting performance is 15, 40, 25, and 14% for the GR4 J–WGANN1 model compared to the GR4 J model in Akçay, Ceviz, Cura, and Mert streams, respectively. Other evaluation criteria showed more superior performance of the GR4 J–WGANN1 model than the GR4 J model for each stream. The GR4 J–WGANN1 model outperformed the GR4 J–WGANN2 model, as seen in Table 4. Using more input data obtained from the GR4 J model, which has a high importance level (i.e., QR and QD) compared to just single-input data from the GR4 J model (i.e., SMI), could lead to better runoff prediction performance. The GR4 J–WGANN1 model also improved low and high runoff forecasting performance, as seen in Figures 4(b)–7(b). In addition, the simulated monthly runoff by the GR4 J–WGANN1 model is more compatible with the observed monthly runoff than the GR4 J model, especially in the Ceviz stream (Figures 8(b)–11(b)). The runoff simulation for April–December was successfully predicted, particularly in Akçay and Ceviz streams. Furthermore, the GR4 J–WGANN1 model yielded better low flow simulation in the Çağşur stream in the summer season. The scattering diagrams also showed that the simulated runoff was more consistent with the observed runoff than the GR4 J model, especially regarding low- and high-flow simulation. Accordingly, using the QR and QD in the GANN model as input data and wavelet transform enhanced the rainfall–runoff modelling performance, particularly regarding low and high runoff simulation.
The performance of the hybrid GR4 J–WGANN2 model
In GR4 J–WGANN2, SMI, API, P, and E were decomposed by using the wavelet analysis initially. Accordingly, 10 detailed components and 1 approximation component for each component were obtained. Then, the Boruta algorithm was implemented to determine the wavelet components used for rainfall–runoff modelling. The Boruta analysis indicated that the importance level of the SMI is high compared to wavelet components of other input variables in the Akçay and Çağşur streams, as seen in Figures 12(b) and 13(b). Similarly, the importance level of the SMI components was also higher than the wavelet components of other input variables in the Ceviz and Cura streams. The rejected components (mostly D1 and D2) were extracted from the input variables and important components were used for runoff forecasting. With regard to the importance of soil moisture, Tramblay et al. (2010) pointed out that soil moisture can be useful for providing the initial conditions for rainfall–runoff modelling, particularly in small basins. Tayfur & Brocca (2015) stated that the utilization of soil moisture and the rainfall data in a fuzzy model enhanced the rainfall–runoff modelling performance, particularly in capturing the peak flow values. The runoff forecasting performance of the GR4 J–WGANN2 model can be observed in Table 4. According to Table 4, NSE values change between 0.60 and 0.70, and the performance improvement is 15, 34, 24, and 7% for the GR4 J–WGANN2 model compared to the GR4 J model in Akçay, Ceviz, Cura, and Çağşur streams, respectively. Other evaluation criteria also showed that the GR4 J–WGANN2 model outperformed the GR4 J model. The GR4 J–WGANN2 model underperformed compared to the GR4 J–WGANN1 model. As seen in Figures 4(c)–7(c), the GR4 J–WGANN2 model yielded better results with regard to capturing low and peak flows than the GR4 J model. Humphrey et al. (2016) also found that the hybrid GR4 J–ANN model produced more precise forecasting results than the individual GR4 J and ANN models, particularly for forecasts based on climatology. They also stated that the hybrid model yielded more reliable forecasting results than the ANN and GR4 J models regarding the high-flow simulation. Furthermore, the inclusion of the SMI and the API could lead to better performance of the hybrid model than the GR4 J model since the SMI and the API can give more information regarding the initial catchment conditions (Humphrey et al. 2016) and for the GANN model. Anctil et al. (2004) also used the API and the SMI in the ANN model as input data. They found out that slow-response data could be helpful to develop the streamflow prediction efficiency of the ANN model in addition to the fast-response time series such as recently observed streamflow and rainfall sequences. Furthermore, Sharghi et al. (2019) put forward that wavelet transform could be useful to handle the non-stationary behaviour of data and found out that wavelet analysis developed the estimation of peak flows. Badrzadeh et al. (2018) stated that the decomposition of historical time series at different frequency levels via wavelet transform could predict extreme values more accurately. According to Figures 8(c)–11(c), the GR4 J–WGANN2 model yielded more accurate monthly runoff predictions than the GR4 J model. As seen in scattering diagrams, the simulation of low and high flows is also more reliable than the simulated low and high runoff by the GR4 J model.
The hybrid models performed better than the GR4 J model; however, they underestimated high flows. In recent years, flash floods and extreme precipitation events have increased in the study area (especially in coastal parts). The excessiveness of high-flow events and irregular hydrologic regime in the study area (i.e., coastal parts of the Black Sea in Turkey, in particular) can prevent better high-flow forecasting performance of the models. Pulido-Calvo & Portela (2007) implemented the computational neural networks (CNNs) for daily streamflow forecasting in Portuguese watersheds and stated that a highly irregular hydrologic regime could prevent better high-flow prediction performance. In addition, they also found that a comparatively small number of high flows in the training set could make the learning process of CNN models difficult with regard to high-flow prediction. In this regard, using longer data with a comparatively larger number of extreme events in the calibration and validation period can improve the hybrid model performance. Collier (2007) found that the hydrological simulation of peak flows can be very uncertain, and it is important to understand spreading uncertainty through the flood forecast chain. In this regard, this uncertainty could lead to limitations on predictability under a changing climate. Beven & Binley (2014) discussed the implementation of the generalized likelihood uncertainty estimation (GLUE) methodology for uncertainty estimation and stated that the uncertainties related to the input and evaluation data as well as model parameters and structures could have significant effects on modelling performance. Tian et al. (2014) investigated the model parameter and structure uncertainty for the GR4 J, HBV, and XAJ models using the GLUE methodology. They revealed that parameter uncertainty became larger when discharge values increased, and this could be related to the rarity of extreme values in the observed data compared to normal discharge values. In addition, Koutsoyiannis (2019) investigated the irreversibility of stochastic characterization and the simulation of the atmospheric and hydrological processes. In this study, it was revealed that atmospheric processes, such as temperature and rainfall, did not show explicit irreversibility, while the irreversibility was apparent for scales of several days for streamflow. In this regard, Koutsoyiannis (2019) stated that this situation created the need for reproducing discharge time series in flood simulations. Vavoulogiannis et al. (2021) also emphasized that temporal irreversibility can be an explicit feature of the streamflow process that evinces itself over several days. Concerning the long-range dependence, Serinaldi & Kilsby (2016) stated that complexity streamflow dynamics need modelling approaches that take temporal asymmetry and nonlinearity into consideration. They also found that nonlinearity could conceal linear short- and long-range dependence. Szolgayova et al. (2014b) analysed the long-term variability of the flow regime in the Danube River and the relationship between flow and precipitation and air temperature using wavelet analysis approaches. They revealed that long-range dependence in precipitation spreads into discharge and precipitation data could be used with wavelet decomposition to attain a multivariate stochastic discharge model, especially at a monthly scale. Similarly, Szolgayova et al. (2014a) indicated the usefulness of wavelet transform for modelling and forecasting daily discharge time series exhibiting long-range dependence. Accordingly, wavelet transformation can be evaluated as a useful method for the performance improvement in hybrid models when using precipitation, evapotranspiration, and streamflow data, which exhibited almost the white-noise and long-term persistent behaviours, respectively, in this study. As can be seen from previous studies, many factors can affect the rainfall–runoff modelling performance and these factors will be investigated in future studies.
CONCLUSIONS
In this study, the performance of the GR4 J conceptual model and GR4 J–WGANN1 and GR4 J–WGANN2 hybrid models were investigated for daily rainfall–runoff modelling. The study mainly aimed to implement a new hybrid wavelet-based GR4 J–GANN1 (i.e., GR4 J–WGANN1) model that uses the QR and QD, outputs of the GR4 J model as input data in the WGANN model. Another aim of this study is to observe the improvement in the rainfall–runoff modelling performance using the hybrid GR4 J–WGANN1 and GR4 J–WGANN2 hybrid models compared to the GR4 J model. The western part of the Eastern Black Sea basin and one sub-basin in the Kızılırmak basin were selected as study areas. The deductions of this study can be summarized as follows:
The GR4 J model yielded good performance in all streams (NSE≥0.50). However, the GR4 J model overestimated low flows and underestimated high flows in Ceviz, Cura, Akçay, and Çağşur streams. Flash precipitation, flood events, and unstable flow regime could prevent the better modelling performance of the GR4 J model in the Eastern Black Sea basin regarding the estimation of extreme flows. In the Çağşur stream, the drier period is relatively long compared to other streams, which could negatively affect the GR4 J performance.
In the GR4 J–WGANN1 model, QR and QD obtained from the GR4 J model, the API, precipitation, and evapotranspiration; in the GR4 J–WGANN2 model, the SMI obtained from the GR4 J model and the API, precipitation, and evapotranspiration were used as input data. After the input data were decomposed using wavelet transformation, the Boruta algorithm was implemented to determine the wavelet components used in rainfall–runoff modelling. The Boruta algorithm indicated that the importance level of QR and QD's wavelet components in the GR4 J–WGANN1 model and SMI's wavelet components in the GR4 J–WGANN2 model were remarkably high compared to wavelet components of other input variables. In this regard, using wavelet transform and different outputs of the GR4 J model as input data in data-driven models improved the rainfall–runoff modelling performance compared to the GR4 J model.
The GR4 J–WGANN1 and GR4 J–WGANN2 hybrid models developed low- and high-flow simulation performance compared to the GR4 J model. It shows that wavelet-based conceptual–data-driven hybrid models can remarkably improve the rainfall–runoff modelling performance in basins with challenging hydrological conditions.
The proposed GR4 J–WGANN1 model yielded better results than the GR4 J–WGANN2 model. The Boruta algorithm showed that QR and QD generally have more important wavelet components than the SMI. In other words, using more significant input data obtained from the GR4 J model (i.e., QR and QD) compared to using only the SMI could improve the performance and the novel GR4 J–WGANN1 model approach can be an alternative to the GR4 J–WGANN2 model.
Using the Boruta algorithm in hybrid models for rainfall–runoff modelling, which is another novel aspect of this study, could also be advantageous and remarkable with regard to observing the significance of input data. Thus, it can be inferred that using different hybrid model approaches that combine the advantages of the conceptual and data-driven models can provide better results for rainfall–runoff modelling.
Although the hybrid models performed better than the GR4 J model, they could not perform very well, especially with regard to forecasting high flows. The irregular hydrologic regime, uncertainty, and smaller number of extreme events in the calibration and validation period could prevent better high-flow prediction performance of the hybrid models. In this regard, the aim is that the uncertainty of model parameters and the prediction performance will be investigated in future studies. In addition, the implementation of the larger datasets at different scales (such as hourly, daily, and monthly) in hybrid models for rainfall–runoff modelling is planned. In further studies, the performance of different hybrid models, which also distinguish the contribution of the wavelet transform and the hybrid model structure, will be investigated to enhance the rainfall–runoff modelling performance of both the conceptual and data-driven models.
ACKNOWLEDGEMENTS
The authors are grateful to the Turkish State Meteorological Service and the General Directorate of State Hydraulic Works for providing the data. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICTS OF INTEREST STATEMENT
The authors declare there is no conflict.