## Abstract

One of the more perplexing challenges for the hydrologic research community is the need for development of coupled systems involving integration of hydrologic, atmospheric and socio-economic relationships. Given the demand for integrated modelling and availability of enormous data with varying degrees of (un)certainty, there exists growing popularity of data-driven, unified theory catchment scale hydrological modelling frameworks. Recent research focuses on representation of distinct hydrological processes using mathematical model components that vary in a controlled manner, thereby deriving relationships between alternative conceptual model constructs and catchments’ behaviour. With increasing computational power, an evolutionary approach to auto-configuration of conceptual hydrological models is gaining importance. Its successful implementation depends on the choice of evolutionary algorithm, inventory of model components, numerical implementation, rules of operation and fitness functions. In this study, genetic programming is used as an example of evolutionary algorithm that employs modelling decisions inspired by the Superflex framework to automatically induce optimal model configurations for the given catchment dataset. The main objective of this paper is to identify the effects of entropy, hydrological and statistical measures as optimization objectives on the performance of the proposed approach based on two synthetic case studies of varying complexity.

## INTRODUCTION

Increasing demand for integrated environmental modelling and diversity of hydrological modelling approaches motivates the development of unified modelling frameworks (Clark *et al.* 2015b) that integrate, compare and evaluate multiple model representations. Unified hydrological modelling frameworks can be implemented by defining a general set of conservation equations and their numerical implementation, model structure units/reservoirs, flux parameterizations, model complexity that governs granularity of process representations, spatial discretizations and hydrologic connectivities. In this context, flexible multi-model frameworks such as Rainfall-Runoff Modelling Toolbox (RRMT) (Wagener *et al.* 1999), Framework for Understanding Structural Errors (FUSE) (Clark *et al.* 2008) and multi-component frameworks such as FARM (Euser *et al.* 2013), Superflex (Fenicia *et al.* 2011, 2014; Kavetski & Fenicia 2011; van Esse *et al.* 2013) and SUMMA (Clark *et al.* 2015a) have been developed. In earlier applications, selection of the optimal model configuration was carried out based on hypothesizing model representations using prior knowledge of interactions among observed data and field observations, followed by evaluation of all possible combinations of the selected hypotheses using Bayesian framework, which is computationally intensive. On the other side, with advancements in computing power, use of machine learning tools is gaining popularity and data are becoming drivers for automatic optimal model selection, formulating modelling strategies. One recent study has presented the application of data mining algorithm Automatic Model Configuration Algorithm (AMCA) (Vitolo 2015) to identify the most suitable model configuration using minimum data and FUSE as sample model inventory. The success of an ideal data-driven multi-component modelling framework is not only to perform automatic selection from certain hypotheses, but also to search an entire model space extensively to evolve the most suitable combinations of conceptual model components and generate novel modelling philosophies to fill knowledge gaps. Evolutionary computation (EC) techniques (Goldberg 1994; Babovic *et al.* 1995, 2017; Babovic & Abbott 1997; Babovic & Keijzer 2000) mimic the mechanisms of natural evolution to provide global or near global solutions to real world optimization problems. Literature shows the successful application of EC techniques in modelling hydrological processes of complex and chaotic nature in application areas, namely, rainfall–runoff prediction (Khu *et al.* 2001; Babovic & Keijzer 2002), settling velocity estimation (Babovic *et al.* 2001), water quality (Jeong *et al.* 2003), sediment transport (Kizhisseri *et al.* 2006), model error correction (Zechman & Ranjithan 2007), reservoir operation (Rani & Moreira 2010), water distribution systems (Savic & Walters 1997; Bi *et al.* 2016), etc. Earlier studies involve the use of EC techniques for evolving empirical, grey box models for short-term prediction rather than harnessing the idea of automated knowledge evolution and improving our current understanding of the system. Recent studies have attempted to incorporate hydrological concepts into the framework of an evolutionary data-driven algorithm, say, genetic programming (GP) (Koza 1992) to automatically estimate the input (rainfall) history for the prediction of runoff (Havlíček *et al.* 2013) and to automatically evolve Sugawara tank (Sugawara 1979) model configurations (Chadalawada *et al.* 2017) for the catchments of interest.

In this study, GP is used in conjunction with Superflex titled Evolutionary Superflex framework (http://scholarbank.nus.edu.sg/handle/10635/138678) for automatic evolution of optimal model configuration for the given catchment. As mentioned earlier, Superflex (Fenicia *et al.* 2011; Kavetski & Fenicia 2011) is a multi-component hydrological modelling framework that consists of building blocks that are customized into spatially lumped flow network configurations. The building blocks are reservoirs, flux and lag functions, connection elements. Twelve submodels (M1 to M12) (Figure 1) from Superflex framework (Fenicia *et al.* 2014) of varying complexities along with basic algebraic functions form the function set of GP. The selected submodels consist of five different types of reservoir units (interception (IR), unsaturated (UR), riparian (RR), fast reservoir and slow reservoir (FR and SR)) with serial, linear and parallel connections as illustrated in Figure 1. The constitutive functions (e.g., Monod, Modified logistic, Linear, Power, Reflected Hyperbolic, Tessier) governing storage–discharge relationships and evaporation losses, shape of lag functions (e.g., Triangle lag scheme), connection elements and ranges of associated parameters, which include, correction factors to inputs, splitters, coefficients of storages and outflows, smoothing factors, lag time are defined based on literature (Fenicia *et al.* 2011). The selected submodels represent a subset of possible conceptual model constructs of Superflex framework. Submodels M1 and M2 contain a single reservoir unit varying in terms of functions governing storage–discharge relation. M2 to M6 contain reservoir units with serial connections differing in terms of number and type of reservoir units, lag components, flux functions and associated parameters. M8 has two reservoir units in parallel with a splitter (D) dividing the incoming flux precipitation (P) between them. M9 to M12 represent the complex structures with either one (UR) or two reservoirs (IR and UR) preceding the parallelly connected FR and SR. The unsaturated zone reservoir is initialized to a percentage of its maximum capacity. All other reservoirs are assumed to be empty initially. The inputs to GP framework include observed forcings Precipitation (P) and Evaporation (E), Discharge (Q) and random constants. Other GP settings are listed in the Appendix (Table A1, available with the online version of this paper). For more details on GP framework consult (http://scholarbank.nus.edu.sg/handle/10635/138678). In Figure 1, rectangular boxes represent storage units and variables within parentheses represent parameters of the constitutive functions.

## OBJECTIVE FUNCTIONS

The model identification and its performance evaluation should be based on ability to reproduce hydrological behaviour and capture properties of catchments rather than merely matching observed and simulated responses. There are a variety of performance indices that measure the ability of a hydrological model to reproduce observed streamflow. It is important for the simulated time series to preserve critical aspects of observed streamflow hydrograph (Vis *et al.* 2015) so that the governing model structure can also be used as a catchment classifier for predictions in ungauged basins. One of the recent studies (Shafii & Tolson 2015) highlights the point that the results are more sensitive to objective function formulation as opposed to the choice of optimization algorithm. Therefore, in this paper, a range of entropy, statistical and hydrological signatures listed in Table 1 are used as optimization objectives of Evolutionary Superflex approach.

Objective function . | Description . | Criteria . | Equations . |
---|---|---|---|

1. Madsen (Madsen 2000) | a. Water balance; b. Hydrograph shape, good agreement of high and low flows | a. F1: Mean absolute error (MAE); b. F2, F3, F4: Overall, high, low flows RMSE | |

Range: 0 (best) to | |||

2. NS0 (Nash & Sutcliffe 1970) | Sensitive to high flows | NA | |

Range: to 1 (best) | |||

Logarithmic NSE (Krause et al. 2005) | Sensitive to low flows | NA | |

3. Mai0 (Mai et al. 2016) | a. Sensitivity to very high flows; b. Weightage to low flows | a. NSE; b. logNSE | |

VE (Criss & Winston 2008) | Volumetric efficiency | NA | |

Range: 0 to 1 (best) | |||

4. Vis_C1 (Vis et al. 2015) | a. Sensitivity to very high flows; b, c. Weightage to low flows, runoff volume | a. NSE; b. logNSE; c. VE | |

MARE (Dawson et al. 2007) | Sensitivity to lower magnitudes | NA | |

Range: 0 (best) to | |||

5. Vis_C2 (Vis et al. 2015) | a. Sensitivity to very high flows; b. Sensitivity to both peaks and lows; c, d. Good agreement of timing, discharge magnitude and runoff volume | a. NSE; b. MARE; c. Pearson correlation coefficient (r); d. VE | |

6. Vis_C3 (Vis et al. 2015) | a, b. Good agreement of timing, discharge magnitude and runoff volume | a. r; b. VE | |

7. KG10; 8. KG20 (Gupta et al. 2009; Kling et al. 2012) | Model efficiency measure based on equal weighting of correlation, bias and flow variability measures | a. r; b. c. or ; ( ensures bias and variability ratios are not cross correlated) | KGE1 Range: 0 to 1 (best) Range: 0 to 1 (best) |

CED (Pechlivanidis et al. 2012) | Importance weighted informational entropy-based metric | NA | |

Range: 0 (best) to 1 | |||

9. CED_new (Pechlivanidis et al. 2014) | a. Static information of signal b. Timing information | a. CED; b. KGE1 | CED|KG10 |

10. Borsanyi (Borsányi et al. 2016) | Ideal point in 4D space in which correlation, mean squared error (MSE) are weighted more | a. r; b. NSE; c. VE; d. KGE1 | |

Modified Nash Sutcliffe efficiency (MNS) (Price et al. 2012) | Modified NSE | NA | |

Range: to 1 (best) | |||

11. Price (Price et al. 2012) | a. Emphasize on flood peaks and low flows; b. Prioritize flow variability | a. NSE, MNS; b. | |

PI (Dawson et al. 2007) | Best estimate for future is given by the latest value | NA | |

Range: to 1 (best) | |||

12. Dawson (Dawson et al. 2012) | a. Emphasize on high flows; b, c. Overall agreement; d. Timing | a. Overall RMSE; b. r^{2}; c. MAE; d. PI | |

13. Modified Index of Agreement (Legates & McCabe 1999) | Detects additive and proportional differences in observed and simulated means and variances | NA | |

j = 1 | |||

Range: 0 to 1 (best) | |||

14. Relative Index of Agreement (Krause et al. 2005) | Sensitive to over- or under-prediction of low flows. (Cannot be computed if any of observed flows are equal to zero) | NA | |

Range: 0 to 1 (best) | |||

Objective function . | Description . | Criteria . | Equations . |
---|---|---|---|

1. Madsen (Madsen 2000) | a. Water balance; b. Hydrograph shape, good agreement of high and low flows | a. F1: Mean absolute error (MAE); b. F2, F3, F4: Overall, high, low flows RMSE | |

Range: 0 (best) to | |||

2. NS0 (Nash & Sutcliffe 1970) | Sensitive to high flows | NA | |

Range: to 1 (best) | |||

Logarithmic NSE (Krause et al. 2005) | Sensitive to low flows | NA | |

3. Mai0 (Mai et al. 2016) | a. Sensitivity to very high flows; b. Weightage to low flows | a. NSE; b. logNSE | |

VE (Criss & Winston 2008) | Volumetric efficiency | NA | |

Range: 0 to 1 (best) | |||

4. Vis_C1 (Vis et al. 2015) | a. Sensitivity to very high flows; b, c. Weightage to low flows, runoff volume | a. NSE; b. logNSE; c. VE | |

MARE (Dawson et al. 2007) | Sensitivity to lower magnitudes | NA | |

Range: 0 (best) to | |||

5. Vis_C2 (Vis et al. 2015) | a. Sensitivity to very high flows; b. Sensitivity to both peaks and lows; c, d. Good agreement of timing, discharge magnitude and runoff volume | a. NSE; b. MARE; c. Pearson correlation coefficient (r); d. VE | |

6. Vis_C3 (Vis et al. 2015) | a, b. Good agreement of timing, discharge magnitude and runoff volume | a. r; b. VE | |

7. KG10; 8. KG20 (Gupta et al. 2009; Kling et al. 2012) | Model efficiency measure based on equal weighting of correlation, bias and flow variability measures | a. r; b. c. or ; ( ensures bias and variability ratios are not cross correlated) | KGE1 Range: 0 to 1 (best) Range: 0 to 1 (best) |

CED (Pechlivanidis et al. 2012) | Importance weighted informational entropy-based metric | NA | |

Range: 0 (best) to 1 | |||

9. CED_new (Pechlivanidis et al. 2014) | a. Static information of signal b. Timing information | a. CED; b. KGE1 | CED|KG10 |

10. Borsanyi (Borsányi et al. 2016) | Ideal point in 4D space in which correlation, mean squared error (MSE) are weighted more | a. r; b. NSE; c. VE; d. KGE1 | |

Modified Nash Sutcliffe efficiency (MNS) (Price et al. 2012) | Modified NSE | NA | |

Range: to 1 (best) | |||

11. Price (Price et al. 2012) | a. Emphasize on flood peaks and low flows; b. Prioritize flow variability | a. NSE, MNS; b. | |

PI (Dawson et al. 2007) | Best estimate for future is given by the latest value | NA | |

Range: to 1 (best) | |||

12. Dawson (Dawson et al. 2012) | a. Emphasize on high flows; b, c. Overall agreement; d. Timing | a. Overall RMSE; b. r^{2}; c. MAE; d. PI | |

13. Modified Index of Agreement (Legates & McCabe 1999) | Detects additive and proportional differences in observed and simulated means and variances | NA | |

j = 1 | |||

Range: 0 to 1 (best) | |||

14. Relative Index of Agreement (Krause et al. 2005) | Sensitive to over- or under-prediction of low flows. (Cannot be computed if any of observed flows are equal to zero) | NA | |

Range: 0 to 1 (best) | |||

*Notes:* and denote entire length of data, , are the number of time steps corresponding to high and low flow values, respectively, and denote simulated and observed streamflows, respectively. represents transformation constants such that have the same distance to the ideal point (origin), denotes minimum values of estimated from initial GP population. denotes probability of outcome such that the probabilities sum to 1, , denote unscaled and scaled Shannon entropies and SUSE represents maximum of unscaled and scaled Shannon entropy difference. m is the number of FDC segments and sg is the probability of exceedance for each segment of FDC. and represent standard deviation of observed and simulated flows, respectively, and represent mean of observed and simulated flows, respectively. denotes relative variability in simulated and observed flows, (bias) denotes ratio between mean observed and mean simulated flows and represents ratio of coefficient of variation of simulated and observed flows.

In this review, a single objective optimization scheme is adopted to clearly assess each of the 14 objective functions highlighted in Table 1. The objective functions defined consist of one criterion or combined criteria representing different aspects of streamflow. This includes three one criterion (NS0, MD0 and RD0) and the rest as combined criteria objective functions.

The main characteristics of the metrics used in this study can be summarized as follows:

Nash–Sutcliffe efficiency (NSE) and mean absolute relative error (MARE) are sensitive to peak and low flows, respectively.

Modified NSE and logarithmic NSE emphasize low flows.

Mean absolute error (MAE) measures the agreement between the average simulated and observed catchment runoff volume and root mean squared error (RMSE) measures the overall agreement of hydrograph shape.

Volumetric efficiency (VE) measures the accuracy in prediction of overall runoff volume over the entire simulation period.

Correlation coefficient (r) measures the success in replicating overall timing and magnitude of discharge but disregards differences in absolute values.

Kling–Gupta efficiency (KGE) combines correlation, bias and variability into one objective metric and is proven to have overcome the problems associated with NSE.

Persistence index (PI) aims at comparing the performance of a model against a simple model using the observed value of the previous day as the prediction for the current day as opposed to a constant mean. This is based on the assumption that the process is a Wiener process, i.e., variance increases linearly with time.

Index of agreement is based on a two-part squared distance measurement in which the absolute difference between model and simulated value and observed mean is added to the absolute difference between observed record and observed mean. Modified index agreement (MD) involves replacing the squared (power = 2) differences that are sensitive to extreme values by different values of power. On the other hand, relative index of agreement (RD) quantifies the differences between observed and predicted values as relative deviations which reduce the influence of absolute differences during high flows and enhance the influence of absolute low flow differences.

Signature measures of variability of runoff generation mechanisms (runoff coefficients and recession curves) are believed to capture significant catchment information that is not conveyed regularly using least squares or conventional variance-based measures of fit. Flow duration curve (FDC) can be considered a hydrologically informative signature of catchment behaviour. Entropy-based descriptors are more sensitive to subtle differences in streamflow (Pechlivanidis

*et al.*2014). Conditioned entropy difference (CED) is a diagnostic FDC signature representing static information of flow frequency distribution. CED is temporally insensitive and is said to result in better performance when combined with measures that characterize temporal dynamics such as KGE.

The main objective of this work is to evaluate the ability of different objective functions in bringing out the best performance of the Evolutionary Superflex approach thereby resulting in simulated response that adequately preserves observed flow characteristics. The outline of the remainder of this paper is as follows. The next section provides a description of the synthetic dataset and Evolutionary Superflex simulations. The results and analysis of the simulations follow and finally the key conclusions are summarized.

## DATA AND METHODOLOGY

Data used in this study are collected at headwater catchments Attert Basin, Luxembourg (Fenicia *et al.* 2014), over a period of five years from 1 September 2004 to 31 August 2009, covering a wide range of runoff events. Data consist of daily time steps of precipitation (*P*) and potential evaporation (*E*) in mm/h. Two synthetic streamflow time series are generated using the observed forcings (*P* and *E*) of two catchments, namely, Huewelerbach and Weierbach. Huewelerbach and Weierbach catchments have contrasting geology dominated by sandstone-marly and schists, respectively. Huewelerbach is a comparatively larger catchment (2.7 km^{2}) with landuse consisting of forests and agricultural land, while Weierbach (0.42 km^{2}) is a fully forested catchment. Runoff behaviour varies between stable, low reactivity of Huewelerbach catchment with runoff coefficient (R_{c}) of 0.3 and strong seasonality of Weierbach catchment with R_{c} of 0.98 in wet and 0.095 in dry periods, respectively.

*Q_MIV*) and Huewelerbach (

*Q_MXII*) catchments, respectively. Among the Superflex submodels considered in this study, M12 is the most complex while M4 has a comparatively simpler structure (see Figure 1). The idea behind using synthetic data is to assess the ability of the objective function to guide GP towards the optimal known solution. In the case of synthetic data, the existence of the optimal solution can be guaranteed. The entire length of data (five years) is used for every simulation. First, one month is reserved for model spin up. All the reservoirs of the 12 selected Superflex submodels are initially assumed to be empty except for the UR reservoir that is initialized to a percentage of its maximum capacity. Euler implicit stepping scheme with fixed daily time steps is used in the flux computations. In this setting, GP minimizes the objective functions, evolves optimal model configurations using components (Superflex submodels and basic algebraic functions) of the function set, terminal set (P, E, constants) and other settings listed in the Appendix (Table A1). For each of the 14 objective functions listed in Table 1, 10 Evolutionary Superflex trials are carried out resulting in 14 model configurations having the corresponding best training fitnesses. The resultant model configurations are returned as simplified, easily interpretable, and ready to use equations using Yacas (Pinkus

*et al.*2012). To evaluate the performance of the resultant model configurations in representing target flow characteristics, standardized signature index sum criterion (Ley

*et al.*2016) is calculated: where s indicates model configuration induced by Evolutionary Superflex approach, a indicates type of FDC signature index, x its value, indicates absolute values of every signature index, and represent mean and standard deviation of for all s. represents Z score or standardized values. represents the sum of standardized indices of each model configuration.

Sum of Standardized Indices (SIS) is a combined measure of signature indices derived from FDC (Yilmaz *et al.* 2008) that distinguishes the merits of the evolved model alternatives. FDC signature indices considered in the formulation of SIS include high flows corresponding to extreme rainfall events (FHV (FDC high-segment volume): probability of flow exceedance <2%), intermediate flows between high and medium runoff (FMV (FDC intermediate-segment volume): probability of flow exceedance between 2% and 20%), medium flows covering the mid slope of FDC corresponding to vertical moisture distribution (FMS (FDC mid-segment slope): flow exceedance probability between 20% and 70%) and low flows corresponding to base flows (FLV (FDC low-segment volume) flow exceedance probability >70%). These indices identify the portion of hydrograph where bias is the highest. SIS compares the deviation of simulated FDCs from observed FDC weighting all parts of FDC equally. Negative SIS values indicate above average performance and the model configuration with the lowest SIS value is the one that performs the best for the given catchment.

## RESULTS

The equations of resultant model configurations induced by Evolutionary Superflex simulations using two synthetic case studies are presented in Tables 2 and 3, respectively. It is evident that the respective underlying model structures are induced correctly for all objective functions except for CED|KG10 under *Q_MIV*.

Fitness metric . | Equation of the best model configuration (in terms of training fitness) . | Best fitness (out of 10 runs) . | True positives (out of 10 runs) . | False positives (out of 10 runs) . | Best models out of 10 runs of 50 generations (fitness range) . |
---|---|---|---|---|---|

Borsanyi | Q_MIV = M4(P, E, 2.10, 0.16, 1.08, 440.10, 8.62, 4.47, 0.86) | 0.05 | 5 | M3(1), M9(4) | M3(5), M4(5) (0.05–0.096) |

CED|KG10 | Q_MIV = M3(P, E, 1.75, 0.79, 1.37, 120.10, 1.13, 0.11, 0.44) | 0.05 | 4 | M3(5), M9(1) | M3(5), M4(5) (0.046–0.067) |

Dawson | Q_MIV = M4(P, E, 1.86, 0.33, 1.22, 330.10, 5.57, 8.12, 0.81) | 0.13 | 3 | M7(1), M9(5), M12(1) | M4(10) (0.132–0.164) |

KG10 | Q_MIV = M4(P, E, 1.08, 1.37, 1.17, 330.10, 6.67, 7.15, 0.81) | 7 × 10^{−3} | 3 | M2(3), M9(4) | M4(4), M9(6) (7.4 × 10^{−3}–0.012) |

KG20 | Q_MIV = M4(P, E, 1.94, 0.25, 0.98, 230.10, 3.27, 0.15, 0.43) | 5 × 10^{−3} | 3 | M3(4), M9(3) | M4(4), M9(6) (4.5 × 10^{−3}–0.016) |

Mai0 | Q_MIV = M4(P, E, 2.49, 0.09, 1.26, 340.10, 6.56, 7.98, 0.86) | 0.02 | 3 | M9(4), M10(2), M12(1) | M4(10) (0.019–0.038) |

MD0 | Q_MIV = M4(P, E, 1.51, 0.6, 1.14, 250.10, 4, 0.57, 0.73) | 0.03 | 2 | M3(1), M9(7) | M3(4), M4(6) (0.03–0.05) |

Madsen | Q_MIV = M4(P, E, 2.04, 0.17, 0.98, 250.10, 4.11, 0.19, 0.72) | 0.2 | 6 | M9(3), M12(1) | M4(10) (0.193–0.447) |

NS0 | Q_MIV = M4(P, E, 1.42, 0.83, 1.12, 430.10, 8.54, 10, 0.85) | 4 × 10^{−3} | 1 | M2(1), M3(1), M9(7) | M3(2), M4(2), M9(6) (4 × 10^{−3}–0.012) |

Price | Q_MIV = M4(P, E, 2.66, 0.04, 1.09, 230.10, 3.42, 0.35, 0.67) | 0.05 | 5 | M9(5) | M4(10) (0.05–0.08) |

RD0 | Q_MIV = M4(P, E, 2.18, 0.18, 1.07, 290.10, 7.24, 1.19, 0.85) | 2 × 10^{−6} | 6 | M6(1), M9(3) | M4(10) (2.4 × 10^{−6}–5 × 10^{−6}) |

Vis_C1 | Q_MIV = M4(P, E, 1.96, 0.21, 1, 230.10, 4.01, 0.2, 0.68) | 0.03 | 4 | M9(6) | M4(10) (0.026–0.1) |

Vis_C2 | Q_MIV = M4(P, E, 2.06, 0.18, 1.1, 250.10, 4.49, 0.43, 0.74) | 0.14 | 4 | M6(1), M9(1), M10(4) | M4(10) (0.137–0.317) |

Vis_C3 | Q_MIV = M4(P, E, 1.83, 0.29, 1.13, 370.10, 6.88, 3.57, 0.82) | 0.05 | 2 | M9(7), M10(1) | M4(6), M9(4) (0.05–0.133) |

Target | Q_MIV = M4(P, E, α_Qq_FR, K_Qq_FR, Ce, Smax_UR, β_Qq_UR, β_E_UR, SiniFr_UR)Q_MIV = M4(P, E, 2.00, 0.20, 1.00, 200.00, 3.00, 0.10, 0.50) |

Fitness metric . | Equation of the best model configuration (in terms of training fitness) . | Best fitness (out of 10 runs) . | True positives (out of 10 runs) . | False positives (out of 10 runs) . | Best models out of 10 runs of 50 generations (fitness range) . |
---|---|---|---|---|---|

Borsanyi | Q_MIV = M4(P, E, 2.10, 0.16, 1.08, 440.10, 8.62, 4.47, 0.86) | 0.05 | 5 | M3(1), M9(4) | M3(5), M4(5) (0.05–0.096) |

CED|KG10 | Q_MIV = M3(P, E, 1.75, 0.79, 1.37, 120.10, 1.13, 0.11, 0.44) | 0.05 | 4 | M3(5), M9(1) | M3(5), M4(5) (0.046–0.067) |

Dawson | Q_MIV = M4(P, E, 1.86, 0.33, 1.22, 330.10, 5.57, 8.12, 0.81) | 0.13 | 3 | M7(1), M9(5), M12(1) | M4(10) (0.132–0.164) |

KG10 | Q_MIV = M4(P, E, 1.08, 1.37, 1.17, 330.10, 6.67, 7.15, 0.81) | 7 × 10^{−3} | 3 | M2(3), M9(4) | M4(4), M9(6) (7.4 × 10^{−3}–0.012) |

KG20 | Q_MIV = M4(P, E, 1.94, 0.25, 0.98, 230.10, 3.27, 0.15, 0.43) | 5 × 10^{−3} | 3 | M3(4), M9(3) | M4(4), M9(6) (4.5 × 10^{−3}–0.016) |

Mai0 | Q_MIV = M4(P, E, 2.49, 0.09, 1.26, 340.10, 6.56, 7.98, 0.86) | 0.02 | 3 | M9(4), M10(2), M12(1) | M4(10) (0.019–0.038) |

MD0 | Q_MIV = M4(P, E, 1.51, 0.6, 1.14, 250.10, 4, 0.57, 0.73) | 0.03 | 2 | M3(1), M9(7) | M3(4), M4(6) (0.03–0.05) |

Madsen | Q_MIV = M4(P, E, 2.04, 0.17, 0.98, 250.10, 4.11, 0.19, 0.72) | 0.2 | 6 | M9(3), M12(1) | M4(10) (0.193–0.447) |

NS0 | Q_MIV = M4(P, E, 1.42, 0.83, 1.12, 430.10, 8.54, 10, 0.85) | 4 × 10^{−3} | 1 | M2(1), M3(1), M9(7) | M3(2), M4(2), M9(6) (4 × 10^{−3}–0.012) |

Price | Q_MIV = M4(P, E, 2.66, 0.04, 1.09, 230.10, 3.42, 0.35, 0.67) | 0.05 | 5 | M9(5) | M4(10) (0.05–0.08) |

RD0 | Q_MIV = M4(P, E, 2.18, 0.18, 1.07, 290.10, 7.24, 1.19, 0.85) | 2 × 10^{−6} | 6 | M6(1), M9(3) | M4(10) (2.4 × 10^{−6}–5 × 10^{−6}) |

Vis_C1 | Q_MIV = M4(P, E, 1.96, 0.21, 1, 230.10, 4.01, 0.2, 0.68) | 0.03 | 4 | M9(6) | M4(10) (0.026–0.1) |

Vis_C2 | Q_MIV = M4(P, E, 2.06, 0.18, 1.1, 250.10, 4.49, 0.43, 0.74) | 0.14 | 4 | M6(1), M9(1), M10(4) | M4(10) (0.137–0.317) |

Vis_C3 | Q_MIV = M4(P, E, 1.83, 0.29, 1.13, 370.10, 6.88, 3.57, 0.82) | 0.05 | 2 | M9(7), M10(1) | M4(6), M9(4) (0.05–0.133) |

Target | Q_MIV = M4(P, E, α_Qq_FR, K_Qq_FR, Ce, Smax_UR, β_Qq_UR, β_E_UR, SiniFr_UR)Q_MIV = M4(P, E, 2.00, 0.20, 1.00, 200.00, 3.00, 0.10, 0.50) |

Fitness metric . | Equation of the best model configuration (in terms of training fitness) . | Best fitness (out of 10 runs) . | True positives (out of 10 runs) . | False positives (out of 10 runs) . | Best models out of 10 runs of 50 generations (fitness range) . |
---|---|---|---|---|---|

Borsanyi | Q_MXII = M12(P, E, 3.24, 9.98, 1.05, 0.23, 0.95, 0.58, 220.10, 10.68, 0.16, 0.32, 1.16) | 0.06 | 5 | M9(2), M10(3) | M12(10) (0.056–0.094) |

CED|KG10 | Q_MXII = M12(P, E, 5.65, 0.58, 1.16, 0.07, 0.32, 0.92, 290.10, 7.67, 0.92, 2.23, 1.52) | 0.15 | 3 | M5(1), M9(5), M10(1) | M9(1), M10(3), M12(6) (0.146–0.183) |

Dawson | Q_MXII = M12(P, E, 6.10, 2.46, 0.99, 0.25, 0.92, 0.74, 370.10, 14.88, 0.2, 0.75, 2.06) | 0.09 | 5 | M9(1), M10(4) | M12(10) (0.085–0.1) |

KG10 | Q_MXII = M12(P, E, 7.03, 0.44, 1.05, 0.33, 0.99, 0.83, 440.10, 20, 0.5, 8.26, 2) | 7 × 10^{−3} | 5 | M9(1), M10(4) | M12(10) (7.3 × 10^{−4}–0.01) |

KG20 | Q_MXII = M12(P, E, 2.89, 0.14, 1.32, 0.43, 0.69, 0.7, 160.10, 17.10, 0.52, 4, 1.08) | 0.01 | 3 | M9(5), M10(2) | M12(5), M9(4), M10(1) (0.014–0.023) |

Mai0 | Q_MXII= M12(P, E, 7.8, 0.33, 0.98, 0.24, 0.98, 0.78, 330.10, 9.61, 0.1, 0.52, 3.2) | 0.03 | 5 | M9(4), M10(1) | M12(10) (0.026–0.039) |

MD0 | Q_MXII = M12(P, E, 7.23, 0.2, 1.04, 0.74, 0.3, 0.8, 390.10, 12.42, 0.26, 3.24, 1.44) | 0.03 | 8 | M9(2) | M12(10) (0.027–0.03) |

Madsen | Q_MXII = M12(P, E, 3.29, 0.17, 1.04, 0.48, 0.48, 0.54, 200.10, 6.49, 0.02, 0.21, 1.09) | 0.12 | 4 | M9(1), M10(4) | M12(10) (0.11–0.258) |

NS0 | Q_MXII = M12(P, E, 4.66, 0.26, 1.08, 0.27, 0.94, 0.68, 280.10, 9.3, 0.16, 1.63, 2.4) | 6 × 10^{−3} | 7 | M9(3) | M12(10) (6 × 10^{−3}–7 × 10^{−3}) |

Price | Q_MXII= M12(P, E, 5.3, 8.64, 1.13, 0.22, 0.9, 0.7, 280.10, 7.11, 0.23, 3.56, 1.13) | 0.06 | 4 | M9(2), M10(4) | M12(10) (0.057–0.069) |

RD0 | Q_MXII = M12(P, E, 4.28, 0.12, 0.92, 0.33, 0.9, 0.63, 160.10, 10.10, 0.06, 0.17, 1.14) | 10^{−5} | 4 | M4(1), M6(1), M9(3), M10(1) | M12(10) (10^{−5}–1.6 × 10^{−5}) |

Vis_C1 | Q_MXII = M12(P, E, 6.13, 0.28, 1.06, 0.33, 0.9, 0.72, 310.10, 11.48, 0.21, 2.26, 1.96) | 0.1 | 5 | M4(1), M9(2), M10(2) | M12(10) (0.096–0.106) |

Vis_C2 | Q_MXII = M12(P, E, 3.43, 0.16, 1.12, 0.46, 0.46, 0.62, 180.10, 7.05, 0.05, 0.34, 1.68) | 0.39 | 6 | M9(3), M10(1) | M12(10) (0.39–0.535) |

Vis_C3 | Q_MXII = M12(P, E, 3.83, 0.17, 1.03, 0.27, 0.94, 0.62, 240.10, 8.99, 0.1, 0.32, 1.94) | 0.06 | 7 | M9(2), M10(1) | M12(10) (0.06–0.085) |

Target | Q_MXII = M12(P, E, β_Qq_UR, K_Qq_FR, Ce, K_Qq_SR, D_S, SiniFr_UR, Smax_UR, Smax_IR, m_QE_IR, β_E_UR, Tlag)Q_MXII = M12(P, E, 3, 0.2, 1, 1, 0.3, 0.5, 200, 10, 0.1, 0.1, 2) |

Fitness metric . | Equation of the best model configuration (in terms of training fitness) . | Best fitness (out of 10 runs) . | True positives (out of 10 runs) . | False positives (out of 10 runs) . | Best models out of 10 runs of 50 generations (fitness range) . |
---|---|---|---|---|---|

Borsanyi | Q_MXII = M12(P, E, 3.24, 9.98, 1.05, 0.23, 0.95, 0.58, 220.10, 10.68, 0.16, 0.32, 1.16) | 0.06 | 5 | M9(2), M10(3) | M12(10) (0.056–0.094) |

CED|KG10 | Q_MXII = M12(P, E, 5.65, 0.58, 1.16, 0.07, 0.32, 0.92, 290.10, 7.67, 0.92, 2.23, 1.52) | 0.15 | 3 | M5(1), M9(5), M10(1) | M9(1), M10(3), M12(6) (0.146–0.183) |

Dawson | Q_MXII = M12(P, E, 6.10, 2.46, 0.99, 0.25, 0.92, 0.74, 370.10, 14.88, 0.2, 0.75, 2.06) | 0.09 | 5 | M9(1), M10(4) | M12(10) (0.085–0.1) |

KG10 | Q_MXII = M12(P, E, 7.03, 0.44, 1.05, 0.33, 0.99, 0.83, 440.10, 20, 0.5, 8.26, 2) | 7 × 10^{−3} | 5 | M9(1), M10(4) | M12(10) (7.3 × 10^{−4}–0.01) |

KG20 | Q_MXII = M12(P, E, 2.89, 0.14, 1.32, 0.43, 0.69, 0.7, 160.10, 17.10, 0.52, 4, 1.08) | 0.01 | 3 | M9(5), M10(2) | M12(5), M9(4), M10(1) (0.014–0.023) |

Mai0 | Q_MXII= M12(P, E, 7.8, 0.33, 0.98, 0.24, 0.98, 0.78, 330.10, 9.61, 0.1, 0.52, 3.2) | 0.03 | 5 | M9(4), M10(1) | M12(10) (0.026–0.039) |

MD0 | Q_MXII = M12(P, E, 7.23, 0.2, 1.04, 0.74, 0.3, 0.8, 390.10, 12.42, 0.26, 3.24, 1.44) | 0.03 | 8 | M9(2) | M12(10) (0.027–0.03) |

Madsen | Q_MXII = M12(P, E, 3.29, 0.17, 1.04, 0.48, 0.48, 0.54, 200.10, 6.49, 0.02, 0.21, 1.09) | 0.12 | 4 | M9(1), M10(4) | M12(10) (0.11–0.258) |

NS0 | Q_MXII = M12(P, E, 4.66, 0.26, 1.08, 0.27, 0.94, 0.68, 280.10, 9.3, 0.16, 1.63, 2.4) | 6 × 10^{−3} | 7 | M9(3) | M12(10) (6 × 10^{−3}–7 × 10^{−3}) |

Price | Q_MXII= M12(P, E, 5.3, 8.64, 1.13, 0.22, 0.9, 0.7, 280.10, 7.11, 0.23, 3.56, 1.13) | 0.06 | 4 | M9(2), M10(4) | M12(10) (0.057–0.069) |

RD0 | Q_MXII = M12(P, E, 4.28, 0.12, 0.92, 0.33, 0.9, 0.63, 160.10, 10.10, 0.06, 0.17, 1.14) | 10^{−5} | 4 | M4(1), M6(1), M9(3), M10(1) | M12(10) (10^{−5}–1.6 × 10^{−5}) |

Vis_C1 | Q_MXII = M12(P, E, 6.13, 0.28, 1.06, 0.33, 0.9, 0.72, 310.10, 11.48, 0.21, 2.26, 1.96) | 0.1 | 5 | M4(1), M9(2), M10(2) | M12(10) (0.096–0.106) |

Vis_C2 | Q_MXII = M12(P, E, 3.43, 0.16, 1.12, 0.46, 0.46, 0.62, 180.10, 7.05, 0.05, 0.34, 1.68) | 0.39 | 6 | M9(3), M10(1) | M12(10) (0.39–0.535) |

Vis_C3 | Q_MXII = M12(P, E, 3.83, 0.17, 1.03, 0.27, 0.94, 0.62, 240.10, 8.99, 0.1, 0.32, 1.94) | 0.06 | 7 | M9(2), M10(1) | M12(10) (0.06–0.085) |

Target | Q_MXII = M12(P, E, β_Qq_UR, K_Qq_FR, Ce, K_Qq_SR, D_S, SiniFr_UR, Smax_UR, Smax_IR, m_QE_IR, β_E_UR, Tlag)Q_MXII = M12(P, E, 3, 0.2, 1, 1, 0.3, 0.5, 200, 10, 0.1, 0.1, 2) |

M3 and M4 submodels consist of two serially connected reservoirs (UR and FR) in which precipitation enters UR, whereas its excess storage is routed through FR (Figure 1). The outflow from UR is a power function of storage in M4 rather than a threshold function in M3 and CED|KG10 fails to capture this difference.

Other resultant model configurations show differences in terms of parameter values. Further investigation is required to identify the objective functions and other settings that enable Evolutionary Superflex to efficiently search model parameter space and find the optimum.

In terms of identifying the underlying model structure, the objective functions that are able to find the right structures (termed true positives) in at least five out of ten independent runs are noted. Comparing Tables 2 and 3, the combined objective function Borsanyi has performed consistently well in evolving the correct model structure as the one with minimum fitness in 50% of the total runs considering both synthetic case studies of varying complexities.

The most commonly occurring false positives (wrong model structures established as the ones with minimum fitness in respective runs) in the case of simulations using *Q_MIV* and *Q_MXII* are M9, M3 and M9, M10, respectively. The objective functions resulting in less than five true positives and more than two different false positives in the case of *Q_MIV* and *Q_MXII* simulations are Dawson, Mai0, NS0, Vis_C2 and CED|KG10, RD0, respectively. The top ten model configurations (in terms of training fitness) resulting from 50 generations of ten independent runs (500 generations) using each of the 14 objectives are analysed. It is found that the objective functions which identified the right model structure all ten times are Dawson, Mai0, Madsen, Price, RD0, Vis_C1, Vis_C2 (7 out of 14) and Borsanyi, Dawson, KG10, Mai0, MD0, Madsen, NS0, Price, RD0, Vis_C1, Vis_C2, Vis_C3 (12 out of 14) in the case of *Q_MIV* and *Q_MXII* simulations, respectively. The objectives other than those mentioned above fail to adequately distinguish between hypotheses (submodels) of the model structure space in the respective case studies. It is evident that greater efficiency in searching model structure space is observed in the case of M12 as compared to M4. This might be due to the fact that M12 is the most unique submodel of GP function set consisting of four reservoirs with both serial and parallel connections, whereas M4 is reflected as not significantly different from M3 (differs from M4 only in terms of UR outflow function) and M9 (differs from M4 in terms of UR outflow function and excess UR storage being routed through two parallel reservoirs FR and SR).

The variability of parameter values for the best model configurations simulating *Q_MIV* and *Q_MXII* are analysed using boxplots shown in Figures 2 and 3. For ease of comparison, all associated parameter values are normalized between 0 and 1. Dots indicate the spread of parameter values and the lines connect the parameters of multiple feasible descriptions (model configurations with corresponding best training fitnesses and negative SIS values) of the target. From Figure 2, it can be observed that the least and the most uncertain parameters of evolved M4 model configurations are Smax_UR (maximum capacity of UR) and *β*_E_UR (smoothing factor for function governing evaporation loss from UR), respectively. In resultant M4 configurations whose structure consists of serially connected UR and FR, greater variation is observed in the parameters of flux functions of UR, namely, SiniFr_UR (Intial UR storage = SiniFr_UR*Smax_UR), *β*_E_UR and *β*_Qq_UR (outflow function parameter).

Figure 3 shows that the least and the most uncertain parameters of the resultant M12 configurations whose structure consists of serial network of IR and UR before parallelly connected FR and SR are K_Qq_FR and D_S, respectively. K_Qq_FR denotes hydraulic conductivity of outflow from FR and D_S is the splitter that diverts excess storage of UR to FR and SR, which implicitly governs the outflows of FR and SR. Greater variation is visually evident in parameters of IR, UR and SR of M12 configurations.

In GP runs simulating Q_MIV and Q_XII, the lowest variation is observed in the values of evaporation multiplication factor (Ce) and FR parameters (*α*_Qq_FR, K_Qq_FR).

Evolutionary Superflex approach shows greater uncertainty in filtering out combinations of parameters from model parameter space given that inherent model structure is accurately found at most instances. Beven (2006) suggests that this uncertainty should be viewed as a problem of decidability between different probable descriptions of the working of hydrological systems. Sources of uncertainty include model structure (process representations), numerical solution approximations and parameter values. Many different parameter sets within a chosen model structure may be acceptable in reproducing the observed behaviour of that system and this concept is called ‘Equifinality’.

Literature suggests that uncertainty estimation techniques allowing for equifinality can be broadly classified as formal (e.g., Bayesian) limited by requirements on prior knowledge and informal (e.g., generalized likelihood uncertainty estimation, GLUE) limited by subjectivity but simple (Vrugt *et al.* 2009). Here, GLUE, a useful paradigm that avoids overconditioning and estimates prediction quantiles is chosen for uncertainty analysis. All Evolutionary Superflex configurations generated using 14 objective functions that have rightly induced the inherent model structures M4 and M12 for *Q_MIV* and *Q_MXII*, respectively, are listed and the corresponding parameter sets are chosen for uncertainty analysis using GLUE. From the chosen parameter sets belonging to behavioural model space, elimination of those that do not perform adequately is carried out based on SIS criterion (likelihood measure) and the threshold is set as 0. More weight is assigned to parameter sets that result in negative SIS values and the prediction boundaries are constructed as shown in Figures 4 and 5. The percentage of observed values falling outside the prediction limits are observed as 1.49% and 3.58% for *Q_MIV* and *Q_MXII*, respectively. The width of the prediction boundary for *Q_MIV* (0.67) is smaller than for *Q_MXII* (0.99), which can be attributed to higher complexity of inherent model (M12) of *Q_MXII*. It can be seen in Figures 4 and 5 that the observed values closely follow the 50% quantile (median) of the simulated values. It has also been verified that uncertainty bands derived using parameter sets given by Evolutionary Superflex are narrower than using parameter sets randomly drawn from uniform distributions (with user specified parameter ranges) for the same likelihood measure (SIS).

In order to select the most appropriate, the best model configurations (Tables 2 and 3) of both the synthetic case studies are evaluated with respect to biases in FDC signatures. The FDCs (Figures 6 and 7) reflect the differences in the behaviour among the best configurations of *Q_MIV* and *Q_MXII* with negative SIS values.

Figure 6 illustrates that M4 model configuration evolved based on Vis_C1 with the least SIS value (−3.1) matches most of the *Q_MIV* characteristics showing bias in FHV and negligible bias (less than ) with respect to other FDC signatures (FMV, FMS and FLV) (see subplot of Figure 6 entitled ‘Vis_C1’). KG20 (SIS = −2.69) and Madsen (SIS = −2.16) also resulted in good overall approximation of the target with slightly greater deviations in FMS and FMS, FLV respectively (see subplots of Figure 6 entitled ‘KG20’ and ‘Madsen’), when compared to the best objective Vis_C1. Although the wrong model structure M3 (differs from M4 in terms of UR outflow function) identified as the best by CED|KG10 has a negative SIS value, it results in the poorest performance with respect to low flows (see subplot of Figure 6 entitled ‘CED_new’).

Figure 7 shows that M12 model configurations evolved based on Madsen (SIS = −3.05), Vis_C3 (SIS = −2.83) and Vis_C2 (−2.49) perform well in comparison to others. The FDC signatures of Madsen, Vis_C2 and Vis_C3 show that high and medium flows are better approximated by M12 model configuration evolved based on Vis_C3 in comparison to Madsen and Vis_C2. Intermediate flows are well captured by M12 model configuration evolved based on Madsen. Considering low flows, the lowest value of FLV bias is observed in the case of Vis_C2.

Comparing Figures 6 and 7, it is evident that the optimal model reported by Evolutionary Superflex approach in the case of simulations using *Q_MIV* is superior (close match to the target both in terms of structure and parameters) to those using *Q_MXII*. This can be attributed to higher complexity of M12 submodel both in terms of structure and number of parameters to be optimized. It can be concluded that combined objective functions, say, Vis_C1, KG20 and Madsen for *Q_MIV* simulations and Madsen, Vis_C3 and Vis_C2 for *Q_MXII* simulations resulted in Evolutionary Superflex performing better compared to others. Madsen fitness metric is observed as the common best objective irrespective of the complexity of model search space.

Tables 4 and 5 show the performance evaluation of the best model configurations of Evolutionary Superflex simulations using *Q_MIV* (Table 2) and *Q_MXII* (Table 3), respectively, based on hydrological efficiency criteria KGE1, KGE2, NSE and VE, correlation (r) and numerical accuracy metrics MAE, MD, RMSE and PI. No considerable variation with respect to SUSE, KGE1, KGE2, NSE, MD and r is observed (except in the case of RD0 for *Q_MXII* simulations) making them unfit to decide between different system representations. Very high values of Pearson correlation coefficient (r) (see Tables 4 and 5) associated with all the best configurations indicate the absence of phase errors and good agreement of observed and simulated means and variances.

Objective functions (optimum) . | SUSE (0) . | KGE1 (1) . | KGE2 (1) . | MAE (0) . | MD (1) . | NSE (1) . | PI (1) . | r (1) . | RMSE (0) . | VE (1) . |
---|---|---|---|---|---|---|---|---|---|---|

Borsanyi | 0.0028 | 0.996 | 0.994 | 0.0026 | 0.979 | 0.998 | 0.997 | 0.999 | 0.088 | 0.951 |

CED|KG10 | 0.0024 | 0.962 | 0.964 | 0.0025 | 0.891 | 0.94 | 0.914 | 0.971 | 0.442 | 0.741 |

Dawson | 0.005 | 0.983 | 0.989 | 0.0056 | 0.967 | 0.996 | 0.994 | 0.998 | 0.116 | 0.922 |

KG10 | 0.013 | 0.992 | 0.992 | 0.0015 | 0.939 | 0.985 | 0.978 | 0.992 | 0.222 | 0.852 |

KG20 | 9.7 × 10^{−4} | 0.995 | 0.995 | 0.0016 | 0.969 | 0.991 | 0.988 | 0.996 | 0.167 | 0.927 |

Mai0 | 0.025 | 0.922 | 0.909 | 0.064 | 0.958 | 0.991 | 0.987 | 0.996 | 0.17 | 0.903 |

MD0 | 0.0064 | 0.957 | 0.971 | 0.025 | 0.97 | 0.996 | 0.995 | 0.999 | 0.111 | 0.929 |

Madsen | 0.0064 | 0.983 | 0.982 | 0.013 | 0.984 | 0.999 | 0.998 | 0.999 | 0.066 | 0.963 |

NS0 | 0.011 | 0.973 | 0.961 | 0.023 | 0.962 | 0.996 | 0.994 | 0.998 | 0.114 | 0.910 |

Price | 0.0052 | 0.983 | 0.987 | 0.003 | 0.975 | 0.998 | 0.997 | 0.999 | 0.082 | 0.942 |

RD0 | 0.027 | 0.913 | 0.927 | 0.012 | 0.956 | 0.982 | 0.974 | 0.995 | 0.243 | 0.893 |

Vis_C1 | 0.0036 | 0.99 | 0.988 | 0.0017 | 0.989 | 0.999 | 0.999 | 0.999 | 0.055 | 0.974 |

Vis_C2 | 0.011 | 0.958 | 0.951 | 0.034 | 0.981 | 0.998 | 0.997 | 0.999 | 0.09 | 0.954 |

Vis_C3 | 0.0031 | 0.978 | 0.986 | 0.012 | 0.978 | 0.998 | 0.997 | 0.999 | 0.084 | 0.949 |

Objective functions (optimum) . | SUSE (0) . | KGE1 (1) . | KGE2 (1) . | MAE (0) . | MD (1) . | NSE (1) . | PI (1) . | r (1) . | RMSE (0) . | VE (1) . |
---|---|---|---|---|---|---|---|---|---|---|

Borsanyi | 0.0028 | 0.996 | 0.994 | 0.0026 | 0.979 | 0.998 | 0.997 | 0.999 | 0.088 | 0.951 |

CED|KG10 | 0.0024 | 0.962 | 0.964 | 0.0025 | 0.891 | 0.94 | 0.914 | 0.971 | 0.442 | 0.741 |

Dawson | 0.005 | 0.983 | 0.989 | 0.0056 | 0.967 | 0.996 | 0.994 | 0.998 | 0.116 | 0.922 |

KG10 | 0.013 | 0.992 | 0.992 | 0.0015 | 0.939 | 0.985 | 0.978 | 0.992 | 0.222 | 0.852 |

KG20 | 9.7 × 10^{−4} | 0.995 | 0.995 | 0.0016 | 0.969 | 0.991 | 0.988 | 0.996 | 0.167 | 0.927 |

Mai0 | 0.025 | 0.922 | 0.909 | 0.064 | 0.958 | 0.991 | 0.987 | 0.996 | 0.17 | 0.903 |

MD0 | 0.0064 | 0.957 | 0.971 | 0.025 | 0.97 | 0.996 | 0.995 | 0.999 | 0.111 | 0.929 |

Madsen | 0.0064 | 0.983 | 0.982 | 0.013 | 0.984 | 0.999 | 0.998 | 0.999 | 0.066 | 0.963 |

NS0 | 0.011 | 0.973 | 0.961 | 0.023 | 0.962 | 0.996 | 0.994 | 0.998 | 0.114 | 0.910 |

Price | 0.0052 | 0.983 | 0.987 | 0.003 | 0.975 | 0.998 | 0.997 | 0.999 | 0.082 | 0.942 |

RD0 | 0.027 | 0.913 | 0.927 | 0.012 | 0.956 | 0.982 | 0.974 | 0.995 | 0.243 | 0.893 |

Vis_C1 | 0.0036 | 0.99 | 0.988 | 0.0017 | 0.989 | 0.999 | 0.999 | 0.999 | 0.055 | 0.974 |

Vis_C2 | 0.011 | 0.958 | 0.951 | 0.034 | 0.981 | 0.998 | 0.997 | 0.999 | 0.09 | 0.954 |

Vis_C3 | 0.0031 | 0.978 | 0.986 | 0.012 | 0.978 | 0.998 | 0.997 | 0.999 | 0.084 | 0.949 |

Objective functions (optimum) . | SUSE (0) . | KGE1 (1) . | KGE2 (1) . | MAE (0) . | MD (1) . | NSE (1) . | PI (1) . | r (1) . | RMSE (0) . | VE (1) . |
---|---|---|---|---|---|---|---|---|---|---|

Borsanyi | 0.002 | 0.971 | 0.982 | 0.006 | 0.973 | 0.993 | 0.951 | 0.997 | 0.071 | 0.932 |

CED|KG10 | 0.03 | 0.953 | 0.936 | 0.015 | 0.905 | 0.951 | 0.627 | 0.975 | 0.196 | 0.759 |

Dawson | 0.015 | 0.985 | 0.988 | 0.004 | 0.971 | 0.997 | 0.974 | 0.998 | 0.052 | 0.926 |

KG10 | 0.013 | 0.992 | 0.992 | 1.3 × 10^{−4} | 0.94 | 0.985 | 0.887 | 0.993 | 0.108 | 0.848 |

KG20 | 0.022 | 0.975 | 0.978 | 0.0018 | 0.922 | 0.965 | 0.738 | 0.983 | 0.164 | 0.8 |

Mai0 | 0.022 | 0.904 | 0.941 | 0.018 | 0.954 | 0.982 | 0.867 | 0.995 | 0.117 | 0.878 |

MD0 | 0.023 | 0.981 | 0.974 | 0.0073 | 0.973 | 0.997 | 0.977 | 0.999 | 0.049 | 0.930 |

Madsen | 0.029 | 0.966 | 0.963 | 0.0012 | 0.966 | 0.990 | 0.921 | 0.995 | 0.09 | 0.911 |

NS0 | 0.0029 | 0.992 | 0.988 | 0.0014 | 0.966 | 0.994 | 0.956 | 0.997 | 0.067 | 0.912 |

Price | 0.016 | 0.991 | 0.989 | 0.0017 | 0.958 | 0.985 | 0.881 | 0.993 | 0.108 | 0.894 |

RD0 | 0.029 | 0.583 | 0.7 | 0.114 | 0.879 | 0.847 | −0.16 | 0.98 | 0.345 | 0.657 |

Vis_C1 | 0.039 | 0.917 | 0.916 | 1.4 × 10^{−4} | 0.952 | 0.982 | 0.863 | 0.994 | 0.119 | 0.875 |

Vis_C2 | 0.019 | 0.965 | 0.952 | 0.013 | 0.96 | 0.99 | 0.923 | 0.995 | 0.089 | 0.896 |

Vis_C3 | 0.0073 | 0.987 | 0.979 | 0.004 | 0.973 | 0.996 | 0.968 | 0.998 | 0.057 | 0.930 |

Objective functions (optimum) . | SUSE (0) . | KGE1 (1) . | KGE2 (1) . | MAE (0) . | MD (1) . | NSE (1) . | PI (1) . | r (1) . | RMSE (0) . | VE (1) . |
---|---|---|---|---|---|---|---|---|---|---|

Borsanyi | 0.002 | 0.971 | 0.982 | 0.006 | 0.973 | 0.993 | 0.951 | 0.997 | 0.071 | 0.932 |

CED|KG10 | 0.03 | 0.953 | 0.936 | 0.015 | 0.905 | 0.951 | 0.627 | 0.975 | 0.196 | 0.759 |

Dawson | 0.015 | 0.985 | 0.988 | 0.004 | 0.971 | 0.997 | 0.974 | 0.998 | 0.052 | 0.926 |

KG10 | 0.013 | 0.992 | 0.992 | 1.3 × 10^{−4} | 0.94 | 0.985 | 0.887 | 0.993 | 0.108 | 0.848 |

KG20 | 0.022 | 0.975 | 0.978 | 0.0018 | 0.922 | 0.965 | 0.738 | 0.983 | 0.164 | 0.8 |

Mai0 | 0.022 | 0.904 | 0.941 | 0.018 | 0.954 | 0.982 | 0.867 | 0.995 | 0.117 | 0.878 |

MD0 | 0.023 | 0.981 | 0.974 | 0.0073 | 0.973 | 0.997 | 0.977 | 0.999 | 0.049 | 0.930 |

Madsen | 0.029 | 0.966 | 0.963 | 0.0012 | 0.966 | 0.990 | 0.921 | 0.995 | 0.09 | 0.911 |

NS0 | 0.0029 | 0.992 | 0.988 | 0.0014 | 0.966 | 0.994 | 0.956 | 0.997 | 0.067 | 0.912 |

Price | 0.016 | 0.991 | 0.989 | 0.0017 | 0.958 | 0.985 | 0.881 | 0.993 | 0.108 | 0.894 |

RD0 | 0.029 | 0.583 | 0.7 | 0.114 | 0.879 | 0.847 | −0.16 | 0.98 | 0.345 | 0.657 |

Vis_C1 | 0.039 | 0.917 | 0.916 | 1.4 × 10^{−4} | 0.952 | 0.982 | 0.863 | 0.994 | 0.119 | 0.875 |

Vis_C2 | 0.019 | 0.965 | 0.952 | 0.013 | 0.96 | 0.99 | 0.923 | 0.995 | 0.089 | 0.896 |

Vis_C3 | 0.0073 | 0.987 | 0.979 | 0.004 | 0.973 | 0.996 | 0.968 | 0.998 | 0.057 | 0.930 |

In Table 4, the low RMSE values of 0.055 and 0.066 correspond to the best M4 model configurations induced based on Vis_C1 and Madsen, respectively, which is in agreement with the decision on the most suitable configurations made using SIS criterion. On the other hand, the highest RMSE and the lowest VE (see Table 4) observed in the case of model configuration induced based on CED|KG10 for *Q_MIV* is not to be considered, although it has a negative SIS value. KG20 is associated with better MAE (1.6 × 10^{−3}) than Madsen (0.013) while RMSE of Madsen (0.066) is superior to KG20 (0.167). Both SIS criterion and performance evaluation based on a wide range of metrics indicate that M4 configurations evolved based on KG20 (QGP_KG20), Madsen (QGP_Madsen) and Vis_C1 (QGP_Vis_C1) can be considered the most suitable for Weierbach catchment synthetic dataset used in this study (see Figure 8).

Table 5 shows that the best M12 model configurations evolved based on CED|KG10 (SIS = 1.25) and RD0 (SIS = 6.18) with low PI values of 0.63 and −0.16, high RMSE values of 0.2 and 0.34, low VE values of 0.76 and 0.66, respectively, can be ignored. The best configurations evolved using the objective functions Borsanyi, Dawson, MD0, Madsen, NS0, Vis_C2 and Vis_C3 have better fitness in comparison to the rest (Table 5). Among the above mentioned, all configurations have negative SIS values except for those evolved using Dawson (SIS = 1.03) and MD0 (SIS = 1.45). As the complexity of the system increases, the model selection decisions based on FDC signature indices and other performance evaluation metrics do not match each other completely. In such cases, the choice based on SIS is considered superior as it equally weights different flow aspects. Thus, M12 model configurations evolved using Madsen (QGP_Madsen), Vis_C2 (QGP_Vis_C2) and Vis_C3 (QGP_Vis_C3) are considered the most suitable for Huewelerbach catchment synthetic dataset used in this study (see Figure 9).

## SUMMARY AND FUTURE WORK

This review provides an assessment of abilities of different objective functions that enable Evolutionary Superflex modelling framework to identify the most suitable model configurations (structure + parameters set) that reproduce several target flow regime signatures simultaneously. As expected, model configurations evolved based on one criterion objective function do not ensure good agreement with respect to others. NS0 has performed better than the other one criterion objective functions used in this study. Combined objective functions have delivered good performance across a range of flow characteristics. The proposed approach performs well with respect to refinement of model structure space using objective functions defined in this study. Greater variation is observed in filtering out feasible parameter combinations from model parameter space. The uncertainty bands of simulated model parameters are quantified using GLUE approach and it is found that observed values closely follow the median of simulated values. The uncertainty in the prediction of parameter values increases as the complexity of the structure increases. Among the five types of reservoirs (IR, RR, UR, FR and SR) that constitute the structures of submodels employed in this study, the parameters of FR show the least variation. The best configurations of Evolutionary Superflex approach evolved using each of the 14 objective functions can be considered as multiple descriptions of the system under study. Deciding on the most suitable configurations among them is done using SIS criterion and other selected performance evaluation metrics. Each performance evaluation metric measures one aspect of the difference between observed and simulated responses. Some of the metrics, say, SUSE, KGE, NSE, MD and r assign similar values to different hydrographs and are not suitable for selection of the most suitable configurations. Difference is observed between selection of suitable configurations based on SIS criterion and other performance evaluation metrics, as complexity of the underlying system increases. The model configuration selected based on SIS criterion overrules in case of disparity as SIS equally weights (different portions of the hydrograph). The combined objective functions, say, Madsen, KG20, Vis_C1 and Madsen, Vis_C2, Vis_C3 are found to be the best for induction of inherent models of Weierbach and Huewelerbach catchments synthetic datasets, respectively. Madsen metric that equally weights agreement of overall volume, overall shape of system response, high and low flows is found as the common best objective for both case studies, irrespective of the complexity of search space. It remains a challenge to pinpoint an objective function that brings out the best performance of Evolutionary Superflex approach in effectively searching the model space to evolve optimal conceptual model constructs and parameters for given datasets. This study aims to run the evolutionary modelling framework using different, strategically aggregated objective functions and to select the most suitable model based on the collective analysis of results obtained.

This research will be extended to capture hydrological processes of larger and complex catchments in order to reinforce the current findings. An attempt can be made to constrain equifinal parameters to ranges associated with observed catchment behaviour using identifiability analysis which can result in narrow uncertainty bands. Also, the efficiency of the proposed modelling approach will be tested in a multiobjective optimization context.

## ACKNOWLEDGEMENTS

The authors thank Dr Laurent Pfister, Luxembourg Institute of Science and Technology (LIST) for granting the permission to use Luxembourg catchments dataset for this research study. Additionally, we are grateful to Dr Fabrizio Fenicia for his assistance with Superflex framework.

## REFERENCES

*NCAR Tech. Note NCAR/TN-5141STR*

*.*