Improving the performance of rainfall-runoff models using the gene expression programming approach

In this study, ﬁ ve hydrological models, including the soil and water assessment tool (SWAT), identi ﬁ cation of unit hydrograph and component ﬂ ows from rainfall, evapotranspiration, and stream ﬂ ow (IHACRES), Hydrologiska Byråns Vattenbalansavdelning (HBV), Australian water balance model (AWBM), and Soil Moisture Accounting (SMA), were used to simulate the ﬂ ow of the Hablehroud River, north-central Iran. All the models were validated based on the root mean square error (RMSE), coef ﬁ cient of determination (R 2 ), Nash-Sutcliffe model ef ﬁ ciency coef ﬁ cient (NS), and Kling-Gupta ef ﬁ ciency (KGE). It was found that SWAT, IHACRES, and HBV had satisfactory results in the calibration phase. However, only the SWAT model had good performance in the validation phase and outperformed the other models. It was also observed that peak ﬂ ows were generally underestimated by the models. The sensitivity analysis results of the model parameters were also evaluated. A hybrid model was developed using gene expression programming (GEP). According to the error measures, the ensemble model had the best performance in both calibration (NS ¼ 0.79) and validation (NS ¼ 0.56). The GEP combination method can combine model outputs from less accurate individual models and produce a superior river ﬂ ow estimate. The GEP was used to construct a hybrid model by ensembling the ﬁ ve calibrated hydrological models to improve the results. The ensemble models and SWAT yielded high ﬂ ow and low ﬂ ow calculations close to the observed data in both the calibration and validation phases. shows satisfactory results for the SWAT, IHACRES, and HBV models in the calibration phase; the RMSE values are for moisture condition II (CN2), initial depth of water in the shallow aquifer (mm H2O) (SHALLST), base ﬂ ow factor (ALPHA_BF), soil evaporation compensation factor (ESCO), groundwater delay time (GW_DELAY), and snowfall temperature (SFTMP), were found to have the highest sensitivity to the monthly discharge of the basin as they had the largest absolute t-stat values and p , 0.1.

235 mm, and 7.5 m 3 /s, respectively, from 1985 to 2005. The regime of its principal tributaries, the Firuzkuh and the Namroud, is combined rain-fed/snow-fed. The highest flows occur in April and May on average. According to the Köppen-Geiger climate classification system (Kottek et al. 2006), the climate of the Hablehroud Basin is mid-latitude arid (BWk) in the south and mid-latitude semi-arid (BSk) in the north (B: arid; W: desert; S: steppe; k: cold). The daily-observed data of precipitation, minimum and maximum temperatures, sunshine hours, wind speed, and relative humidity recorded at eleven meteorological stations throughout the basin and the daily streamflow data recorded at the basin outlet were used. The FAO-56 Penman-Monteith (FAO-PM) equation (Allen et al. 1998) was used to estimate the daily reference evapotranspiration rates.

SWAT
The soil and water assessment tool (SWAT) is a conceptual semi-distributed time-continuous hydrological model developed by the USDA Agricultural Research Service (USDA-ARS) to predict the impacts of land management practices on water supplies, sediments, and agricultural chemical yields in large, complex river basins. Hydrology, climate, sedimentation, soil temperature, crop growth, nutrients, pesticides, and agricultural management are the major components of SWAT. These components are described in detail by previous studies (Srinivasan et al. 1998). The hydrology component is based on the water balance equation: where SW t is the final soil water content on day t (mm H 2 O), SW 0 is the initial soil water content (mm H 2 O), and R day , Q surf , ET a , P seep , and QR gw are precipitation, surface runoff, actual evapotranspiration, percolation, and return flow, respectively, on day i (all in mm H 2 O). To represent the spatial heterogeneity of a complex basin, SWAT divides the basin into subbasins. Each subbasin is further divided into homogenous areas known as hydrologic response units (HRUs) that have the same soil type, slope, and land use. HRUs serve as basic spatial modeling units in a subbasin. Evapotranspiration, soil water content, aquifer recharge, surface runoff, soil temperature, nutrient cycles, sediment yield, crop growth, and management practices are individually simulated for each HRU. Then, individual HRU responses are aggregated over the entire basin. SWAT needs a combination of spatial and non-spatial input data. The digital elevation model (DEM) of the Hablehroud Basin and subbasins delineated using SWAT and the Hablehroud river network are represented in Figure 1. The land-use map of the study area is shown in Figure 2. Land use is a critical factor affecting the runoff, evapotranspiration, infiltration, and surface erosion of a basin. Therefore, to prepare land use maps, Landsat 5 and the Enhanced Thematic Mapper (ETMþ) satellite images were used, as shown in Table 1. Also, ENVI 5.3, Google Earth, and ArcGIS 10.3 were employed to process the satellite images and obtain land use information and the number of changes made. The satellite images were classified into six groups, as shown in Table 1.
ArcSWAT 2012, an ArcView extension and graphical user input interface for SWAT, was used to set up the model. Model calibration and uncertainty analysis were carried out using the sequential uncertainty fitting program of SUFI-2 (Abbaspour et al. 2004(Abbaspour et al. , 2007. In SUFI-2, SWAT output uncertainty was quantified using the 95PPU band ranging from the 2.5th percentile to the 97.5th percentile of the SWAT output. The percentage of the observed data bracketed by 95PPU is defined as the p-factor (Abbaspour et al. 2004;Yang et al. 2008). The average width of the 95PPU band, divided by the standard deviation of all observed data, is known as the d-factor (Abbaspour et al. 2007;Adib et al. 2021b). While the ideal value of the first factor is 100%, the second factor has an optimal value of zero. There is a trade-off between these two factors. Abbaspour et al. (2015) recommended the p-factor to be greater than 0.7 and the d-factor to be smaller than 1.5.
The region's cultivation pattern, planting and harvesting date, amount of water required, and irrigation resource were introduced as information and agricultural management to the SWAT model. A variety of plants are cultivated in the Hablehroud basin. However, it is impossible to simulate all of the plants. Thus, this study considered the dominant cultivation plants of the region, including wheat, barley, alfalfa, and potatoes as crops and apples, walnuts and peanuts as orchard products.

IHACRES
The identification of unit hydrographs and component flows from rainfall, evapotranspiration, and streamflow (IHACRES) model was introduced by Jakeman et al. (1990). IHACRES was designed to avoid issues such as data acquisition and parameter estimation, which are associated with physically-based rainfall-runoff models (Dye & Croke 2003). It consists of two components: (1) a non-linear rainfall loss module, which converts observed rainfall into excess rainfall and (2) a linear module that converts the estimated excess rainfall into streamflow. While the former could be considered as a lumped conceptual model, the latter is basically a data-driven technique. Various versions of the non-linear rainfall loss module are available (Young & Beven 1991;Jakeman & Hornberger 1993;Post & Jakeman 1996; Evans & Jakeman      Croke & Jakeman 2004). In IHACRES, the excess rainfall u k could be estimated as (Lotfirad et al. 2018): where r k is the observed rainfall at time k, p is the exponential loss parameter, l is a threshold for the rain to give streamflow, and s k is a wetness index defined as: where c is a response parameter selected to conserve the mass-balance of the basin, τ w is a time parameter for the decline in s k , t k is the observed temperature at time k (°C), and f is a temperature modulation parameter. The linear module of IHACRES treats a basin as a combination of two parallel components (or stores): a quick flow component x k (q) and a slow flow component x k (s) (Kim 2015). Thus, the streamflow at time k can be written as: where Δ is the time step, τ q and τ s are the decay time constants for the quick and slow stores, respectively, and v q and v s are the relative volumetric throughputs for the quick and slow stores, respectively (vs ¼ 1 À vq). The linear module has only three parameters (i.e., τ q , τ s and v s ). Thus, there are a total of seven parameters in the model (the tree parameters of the nonlinear module are τ w , c, and f).

HBV light
The HBV hydrological model was developed by Bergstrom & Forsman (1973) at the Swedish Meteorological and Hydrological Institute. It is a widely used semi-distributed conceptual rainfall-runoff model. HBV has a simple structure and uses a few input variables to simulate the most important processes affecting the runoff produced in a basin. According to elevation and land cover, the model divides the basin into sub-basins. HBV uses precipitation, temperature, and potential evapotranspiration as input variables to simulate the streamflow and soil moisture content. HBV light is an updated variant of the original model with new capabilities such as simulating the groundwater contribution to the streamflow and considering a different basin reaction procedure through the delay parameter. Figure 4 represents a schematic of the HBV model. The snow routine, soil moisture routine, response routine, and routing routine are the four routines of the HBV model. The snow routine considers a threshold melting temperature (TT) and uses the simple method of degree-day to simulate the snowmelt process: here CFMAX is the degree-day coefficient (4-5.1 for Sweden). Rainfall or the melted snow remains in the snowpack until it exceeds the threshold or CWH, which is usually 0.1. The liquid water quantity within the snowpack that may refreeze is simulated by:

Uncorrected Proof
Changes in soil moisture content (SM) and groundwater recharge are controlled by the soil moisture routine based on the amount of water coming from the previous routine (P) and the maximum soil moisture content (Field Capacity, FC): If SM/FC is greater than LP, the actual soil evaporation is equal to potential evaporation. Otherwise, the actual evaporation is linearly reduced as: HBV considers a two-reservoir model for the basin under study. The surface runoff is calculated as the difference between the precipitations and losses (i.e., infiltration and evapotranspiration) and represents the inflow of the first reservoir. The first reservoir outflow is divided into fast flow (Q1) and intermediate flow (Q2). The outflow from the second reservoir is the groundwater flow (Q3), and the total flow is the sum of two or three outflows, depending on whether upper zone storing (SUZ) is above the threshold (UZL): Using a triangular weighting function and the parameter MAXBAS, the simulated runoff is calculated as: In the case of using different elevation zones, two parameters (i.e., PCALT and TCALT) are used to calculate the changes in precipitation and temperature with elevation: T To correct the long-term mean of the potential evaporation, the deviations of the temperature from its long-term mean and a correction factor (C ET ) are used as:

AWBM
The AWBM model was developed by the Australian CRCCH Institute to simulate the hydrological processes of catchments and has been one of the simplest and most practical continuous models in recent years in the world, especially in Australia (Boughton & Chiew 2007). Despite its simple structure, the AWBM hydrological model accurately calculates the excess rainfall. The data required in model calibration include precipitation, evapotranspiration, and observational discharge. Also, it is possible to obtain features such as automatic calibration through various optimization algorithms, e.g., the genetic algorithm, pattern search multi-start, uniform random sampling, Resenbrock multi-start optimizer, Resenbrock single start, SCE-UA, and pattern search. The AWBM model is based on partial saturation flow theory, similar to saturation surface flow theory. In general, the model uses three storage levels, including C1, C2, and C3, with areas A1, A2, and A3 to simulate the runoff coefficient. The water balance at each level of storage is calculated independently. Thus, in the AWBM model, the water balance of each partial area is calculated at each time step. At each stage of precipitation, based the moisture storage in at the water storage levels in the soil and evapotranspiration, the water balance is obtained as: If the storage moisture is negative, it is considered to be zero. If the storage moisture exceeds the reservoir capacity, the excess water is converted into runoff, and the storage moisture remains equal to the reservoir capacity. The model assumes that runoff results from two main resources: surface runoff and base water.
The model parameters are (1) the baseline flow index, (2) daily flow dryness constant, and (3) surface storage capacity (C1, C2, and C3) and the corresponding levels of the capacities (i.e., A1, A2, and A3). They are calculated by multi-regression method and automatic variables.

SMA
The SMA model is a lumped rainfall-runoff model based on soil moisture to simulate the runoff produced by a basin. SMA divides the basin into two zones: a lower zone and an upper zone. It seeks to define the distribution of moisture and free water components in each zone using a set of parameters. Precipitation and temperature are the input variables. Parameters defining the states of soil moisture and parameters describing the relative permeability of the basin are used to estimate the input water, stored water, and output water of the basin (Sorooshian et al. 1993). The SMA model can estimate hydrological processes, such as evapotranspiration, percolation, interflow, and different forms of runoff from a basin. It can be used to forecast river flow or water supply or to assess the effects of climate change on hydrological variables. The model works best for large drainage basins and uses multiple years of records for calibration. Figure 6 illustrates a schematic of the SMA model.

Gene Expression Programming
Gene expression programming (GEP) is a method for developing computer programs and mathematical modeling based on evolutionary calculations. It is inspired by natural evolution. GEP was invented by Ferreira in 1999 and was officially introduced in 2001 (Ferreira 2002). It selectively abandons an unsuitable population and produces modified offspring. No functional relationship is considered in the beginning of the process. This method can optimize the model structure and components.
GEP works on the tree structure of formulas. Tree structures are created from a set of functions and terminals. The GEP steps include: Figure 6 | Schematic of the SMA model (Bennett 1998).
Journal of Water and Climate Change Vol 00 No 0, 11

Uncorrected Proof
(1) A set of terminals (problem variables and random fixed numbers), (2) A set of mathematical operators used in formulas, (3) Selecting the fitting function, and (4) Determining program control parameters (e.g., the population size and operator weights).
This study used GeneXproTools V.5.0 to apply the GEP model. The main idea was to construct a decomposition tree and mathematical functions for model interpretation.

Performance Criteria
The error measures used to evaluate the accuracy of the models included the root mean square error (RMSE), Nash-Sutcliffe efficiency coefficient (NS), coefficient of determination (R 2 ), and Kling-Gupta efficiency (KGE) (McCuen et al. 2006;Gupta et al. 2009;Knoben et al. 2019;Adib et al. 2019). These measures are defined as: where cov(Q i Obs , Q i Cal ) is the covariance of the observed and calculated values, s Obs and s Cal are the corresponding standard deviations, Q Obs and Q Cal are the observed runoff and calculated runoff, respectively, Q i Obs is the mean observed runoff, cc is the linear correlation coefficient, a is the ratio of the standard deviation of the calculated runoff to standard deviation of observed runoff, and b is the average of the calculated runoff and observed runoff.
The Taylor diagram was employed to assess the accuracy of the models in the calibration and validation phases.

RESULTS AND DISCUSSION
The models were calibrated and validated based on monthly data (daily mean averaged over the month). The observed discharge data during 1985-2001 at the basin outlet were used in calibration, and the discharge data during 2002-2005 were employed to validate the calibrated models. The SWAT model inputs, such as temperature and precipitation, were daily data. The SWAT was calibrated and validated based on monthly data (daily mean averaged over the month). The fitted, minimum, and maximum values of the model are shown in Table 5. The genetic algorithm and Powell (GAP) method was used to calibrate the parameters of the HBV model (Pervin et al. 2021). The characteristics of the GAP method are listed in Table 6. The optimized values for the HBV parameters in the routines (i.e., the snow routine, soil moisture routine, response routine, and routing routine) are represented in Table 7.
The SMA model uses different methods, including the loss method (to model infiltration losses combined with the simple canopy and the simple surface method), SCS (to model the transformation of precipitation excess into direct surface runoff), Muskingum routing method, and linear reservoir method (to model base flow) in order to simulate the components of the rainfall-runoff process. The optimized values of the parameters in the aforementioned methods are presented in Table 8.
IHACRES was calibrated based on the daily mean temperature, rainfall, and discharge data. Finally, the model was calibrated and validated in certain periods. The fitted parameter values for the non-linear and linear modules of the model are shown in Table 9. The exponential loss parameter (p) and the threshold for the rain to give streamflow (l ), were considered, respectively, one and zero.
To calibrate the AWBM model, the multi-start pattern search method was used. The Nash-Sutcliffe efficiency coefficient and baseflow method were considered as the first and second objective functions, respectively. The fitted parameter values are listed in Table 10.
The fitness statistics (i.e., RMSE, NS, R 2 , and KGE) in the calibration and validation phases are given in Table 11. Table 11 shows satisfactory results for the SWAT, IHACRES, and HBV models in the calibration phase; the RMSE values are reasonably small, and the NS values are more significant than 0.50. In general, the river flow simulation can be said to be satisfactory if NS is more significant than 0.5 (Moriasi et al., 2007;Salehpoor et al., 2020). However, the negative values of NS in the validation phase indicate that HBV, IHARES, AWBM, and SMA failed to correctly simulate the process of rainfall-runoff in the basin under study. Also, only SWAT had good performance in the validation phase and outperformed the other models. Figure 7 compares the monthly observed and simulated discharges in calibration and validation. While the overall fitness is satisfactory for calibration, the peak flows that occur in response to high rainfalls were generally underestimated by the models. According to Figure 7, the SWAT model had the best performance in simulating peak flows in the basin under study. The models are also compared in Figure 7 in the calibration and validation phases using the Taylor diagram (Adib et al. 2021c;Taylor 2001) Figure 7 shows that the models developed for the basin under study had approximately the same standard deviation in the calibration phase. However, all the models underestimated the observed standard deviation in the calibration phase and had insignificant variability. For the validation phase, the HBV model and the observations have approximately the same standard deviation. It can be seen in Figure 7 that the SWAT model generally agreed best with the observations, with the highest calibration and validation correlation coefficients and lowest calibration and validation RMSEs.
The GEP was used to construct a hybrid model by combining the five calibrated hydrological models. Utilizing the GEP method, the following combined equation was achieved for estimating the outflow of the basin under study using the outputs generated by the SWAT, IHACRES, HBV, AWBM, and SMA models:

Uncorrected Proof
The fitness statistics (i.e., RMSE, NS, R 2 , and KGE) of the hybrid model in the calibration and validation phases are given in Table 11. According to the NS values, the hybrid model had satisfactory performance in both calibration and validation phases. It is seen from Table 11 and Figures 7 and 8 that the hybrid model outperformed all the hydrological models (the lowest RMES and the highest NS, R 2 , and KGE in both calibration and validation phases and the most realistic model in the Taylor diagram). The response parameter (c, 1/mm) 0.003 Temperature modulation parameter ( f, 1/°C) 4.00 As shown in Figure 8, the SWAT and hybrid models realistically calculated high flow and low flow close in both the calibration and validation phases. To more accurately evaluate the performance of the rainfall-runoff model and combined models, flows below the first quarter were selected as low flow, and flows above the third quarter were treated as high flow. According to Table 12, the performance of the models was much better in high flow simulation than in low flow  Uncorrected Proof simulation. In high flow simulation, the best performance was found for GEP, SWAT, and IHACRES models in calibration, respectively. For the validation phase, the best performance was shown by the SWAT, GEP, and HBV models, respectively. This is due to the higher variance of high flow than low flow; the simulation value is more likely to fall within this range at a higher variance. The sensitivity analysis of parameters is an essential step in SWAT calibration. SUFI-2 uses multiple regression analysis to identify the relative significance of the parameter via the t-stats of the parameters (i.e., the parameter coefficient divided by the standard error) and the p-value of the t-stat. The present study started the calibration process with 29 parameters in SUFI-2. As can be seen in Figure 9, seven parameters, including available water capacity (SOL_AWC), initial SCS runoff curve number Figure 9 | Sensitivity analysis of parameters using the SUFI-2 algorithm.
Journal of Water and Climate Change Vol 00 No 0, 18 Uncorrected Proof for moisture condition II (CN2), initial depth of water in the shallow aquifer (mm H2O) (SHALLST), base flow alpha factor (ALPHA_BF), soil evaporation compensation factor (ESCO), groundwater delay time (GW_DELAY), and snowfall temperature (SFTMP), were found to have the highest sensitivity to the monthly discharge of the basin as they had the largest absolute t-stat values and p , 0.1. The local sensitivity analyses of the lumped models HBV, SMA, IHACRES, and AWBM were carried out to identify the most sensitive parameters with the greatest impacts on the model output.
The IHACRES model had fewer parameters than the other models. The parameter Tw had the highest sensitivity to the objective function (NS). Among the HBV parameters, UZL, K1, FC, TT, and MAXBAS were very sensitive with higher sensitivity to reduction than increase, based on the NS criterion. These results are consistent with Yaghoubi & MASSAH (2014).
In the AWBM model, the parameters BFI, C3, Kbase, and Ksurf were the most sensitive ones; Ksurf was more sensitive to reduction, and C3 was more sensitive to increase. This is consistent with Boughton & Chiew (2007) and Chouhan et al. (2016).
In the SMA model, the parameters GW1 storage, GW1 coefficient, impervious, and tension storage were more sensitive to reduction than increase. This is consistent with Bashar & Zaki (2005) and Ouédraogo et al. (2018).
The main purpose of this study was to provide an accurate method for estimating runoff in the arid basin of Hablehroud as a significant representation of other basins in Iran as most Iranian basins have arid and semi-arid climates. The novelty of the present study includes the use of lumped and distributed rainfall-runoff models in order to simulate the flow of the arid basin of Hablehroud and combine the outputs of the models with GEP and provide a mathematical relationship by exploiting the advantages of all five models used in runoff simulation. The ensemble model outperformed the others models in both the calibration and validation phases. This was also reported by Fernando et al. (2012); although Fernando et al. (2012) used only lumped hydrological models, the present study employed both lumped and distributed models.

CONCLUSION
Various models are available to simulate the complex rainfall-runoff process in watersheds. To find the most appropriate model for simulating the monthly flow (daily mean averaged over month) for the Hablehroud River in Tehran Province,