Review and comparison of performance indices for automatic model induction

One of the more perplexing challenges for the hydrologic research community is the need for development of coupled systems involving integration of hydrologic, atmospheric and socioeconomic relationships. Given the demand for integrated modelling and availability of enormous data with varying degrees of (un)certainty, there exists growing popularity of data-driven, unified theory catchment scale hydrological modelling frameworks. Recent research focuses on representation of distinct hydrological processes using mathematical model components that vary in a controlled manner, thereby deriving relationships between alternative conceptual model constructs and catchments’ behaviour. With increasing computational power, an evolutionary approach to autoconfiguration of conceptual hydrological models is gaining importance. Its successful implementation depends on the choice of evolutionary algorithm, inventory of model components, numerical implementation, rules of operation and fitness functions. In this study, genetic programming is used as an example of evolutionary algorithm that employs modelling decisions inspired by the Superflex framework to automatically induce optimal model configurations for the given catchment dataset. The main objective of this paper is to identify the effects of entropy, hydrological and statistical measures as optimization objectives on the performance of the proposed approach based on two synthetic case studies of varying complexity. doi: 10.2166/hydro.2017.078 s://iwaponline.com/jh/article-pdf/21/1/13/517640/jh0210013.pdf Jayashree Chadalawada (corresponding author) Department of Civil and Environmental Engineering, National University of Singapore, Block E1-08-24, No. 1 Engineering Drive 2, Singapore 117578 E-mail: jayashree@u.nus.edu Vladan Babovic Department of Civil and Environmental Engineering, National University of Singapore, Block E1A05-09, No. 1 Engineering Drive 2, Singapore 117576


OBJECTIVE FUNCTIONS
The model identification and its performance evaluation  In this review, a single objective optimization scheme is adopted to clearly assess each of the 14 objective functions highlighted in Table 1.The objective functions defined consist of one criterion or combined criteria representing different aspects of streamflow.This includes three one criterion (NS0, MD0 and RD0) and the rest as combined criteria objective functions.
The main characteristics of the metrics used in this study can be summarized as follows: 1. Nash-Sutcliffe efficiency (NSE) and mean absolute relative error (MARE) are sensitive to peak and low flows, respectively.
2. Modified NSE and logarithmic NSE emphasize low flows.
3. Mean absolute error (MAE) measures the agreement between the average simulated and observed catchment runoff volume and root mean squared error (RMSE) measures the overall agreement of hydrograph shape.
Range: À∞ to 1 (best) Best estimate for future is given by the latest value Detects additive and proportional differences in observed and simulated means and variances 6. Kling-Gupta efficiency (KGE) combines correlation, bias and variability into one objective metric and is proven to have overcome the problems associated with NSE.Two Superflex submodels M4 and M12 are used in the generation of synthetic streamflow series for Weierbach (Q_MIV) and Huewelerbach (Q_MXII) catchments, respectively.Among the Superflex submodels considered in this study, M12 is the most complex while M4 has a comparatively simpler structure (see Figure 1).The idea behind using synthetic data is to assess the ability of the objective function to guide GP towards the optimal known solution.In the case of synthetic data, the existence of the optimal solution can be guaranteed.The entire length of data (five years) is used for every simulation.
First, one month is reserved for model spin up.All the reservoirs of the 12 selected Superflex submodels are initially assumed to be empty except for the UR reservoir that is initialized to a percentage of its maximum capacity.
Euler implicit stepping scheme with fixed daily time steps is used in the flux computations.In this setting, GP minimizes the objective functions, evolves optimal model configurations using components (Superflex submodels and basic algebraic functions) of the function set, terminal set (P, E, constants) and other settings listed in the Appendix (Table A1).For each of the 14 objective functions listed in

RESULTS
The equations of resultant model configurations induced by Evolutionary Superflex simulations using two synthetic case studies are presented in Tables 2 and 3, respectively.It is evident that the respective underlying model structures are induced correctly for all objective functions except for CED|KG10 under Q_MIV.
M3 and M4 submodels consist of two serially connected reservoirs (UR and FR) in which precipitation enters UR, whereas its excess storage is routed through FR (Figure 1).
The outflow from UR is a power function of storage in M4 rather than a threshold function in M3 and CED|KG10 fails to capture this difference.
Other resultant model configurations show differences in terms of parameter values.Further investigation is required to identify the objective functions and other settings that enable Evolutionary Superflex to efficiently search model parameter space and find the optimum.
In terms of identifying the underlying model structure, the objective functions that are able to find the right structures (termed true positives) in at least five out of ten independent runs are noted.Comparing Tables 2 and 3       that uncertainty bands derived using parameter sets given by Evolutionary Superflex are narrower than using parameter sets randomly drawn from uniform distributions (with user specified parameter ranges) for the same likelihood measure (SIS).
In order to select the most appropriate, the best model configurations (Tables 2 and 3) of both the synthetic case studies are evaluated with respect to biases in FDC signatures.The FDCs (Figures 6 and 7) reflect the differences in the behaviour among the best configurations of Q_MIV and Q_MXII with negative SIS values.Comparing Figures 6 and 7, it is evident that the optimal model reported by Evolutionary Superflex approach in the case of simulations using Q_MIV is superior (close match to the target both in terms of structure and parameters) to those using Q_MXII.This can be attributed to higher complexity of M12 submodel both in terms of structure and number of parameters to be optimized.It can be concluded Tables 4 and 5 show the performance evaluation of the best model configurations of Evolutionary Superflex simulations using Q_MIV (Table 2) and Q_MXII (  4 and 5) associated with all the best configurations indicate the absence of phase errors and good agreement of observed and simulated means and variances.
In        better fitness in comparison to the rest (Table 5).Among figuration using minimum data and FUSE as sample model inventory.The success of an ideal data-driven multi-component modelling framework is not only to perform automatic selection from certain hypotheses, but also to search an entire model space extensively to evolve the most suitable combinations of conceptual model components and generate novel modelling philosophies to fill knowledge gaps.Evolutionary computation (EC) techniques (Goldberg ; Babovic et al. , ; Babovic & Abbott ; Babovic & Keijzer ) mimic the mechanisms of natural evolution to provide global or near global solutions to real world optimization problems.Literature shows the successful application of EC techniques in modelling hydrological processes of complex and chaotic nature in application areas, namely, rainfallrunoff prediction (Khu et al. ; Babovic & Keijzer ), settling velocity estimation (Babovic et al. ), water quality (Jeong et al. ), sediment transport (Kizhisseri et al. ), model error correction (Zechman & Ranjithan ), reservoir operation (Rani & Moreira ), water distribution systems (Savic & Walters ; Bi et al. ), etc.Earlier studies involve the use of EC techniques for evolving empirical, grey box models for short-term prediction rather than harnessing the idea of automated knowledge evolution and improving our current understanding of the system.Recent studies have attempted to incorporate hydrological concepts into the framework of an evolutionary data-driven algorithm, say, genetic programming (GP) (Koza ) to automatically estimate the input (rainfall) history for the prediction of runoff (Havlícěk et al. ) and to automatically evolve Sugawara tank (Sugawara ) model configurations (Chadalawada et al. ) for the catchments of interest.In this study, GP is used in conjunction with Superflex titled Evolutionary Superflex framework (http://scholarbank.nus.edu.sg/handle/10635/138678) for automatic evolution of optimal model configuration for the given catchment.As mentioned earlier, Superflex (Fenicia et al. ; Kavetski & Fenicia ) is a multi-component hydrological modelling framework that consists of building blocks that are customized into spatially lumped flow network configurations.The building blocks are reservoirs, flux and lag functions, connection elements.Twelve submodels (M1 to should be based on ability to reproduce hydrological behaviour and capture properties of catchments rather than merely matching observed and simulated responses.There are a variety of performance indices that measure the ability of a hydrological model to reproduce observed streamflow.It is important for the simulated time series to preserve critical aspects of observed streamflow hydrograph (Vis et al. ) so that the governing model structure can also be used as a catchment classifier for predictions in ungauged basins.One of the recent studies (Shafii & Tolson ) highlights the point that the results are more sensitive to objective function formulation as opposed to the choice of optimization algorithm.Therefore, in this paper, a range of entropy, statistical and hydrological signatures listed in

4.
Volumetric efficiency (VE) measures the accuracy in prediction of overall runoff volume over the entire simulation period.5. Correlation coefficient (r) measures the success in replicating overall timing and magnitude of discharge but disregards differences in absolute values.

Figure 1 |
Figure1| Twelve Superflex submodels considered in this study(Fenicia et al. 2014).P, E and Q represent precipitation, potential evaporation and streamflow, respectively.S I , S U , S R , S F and S S represent storages of IR, UR zone, RR zone, fast reacting and slow reacting reservoirs, respectively.
7. Persistence index (PI) aims at comparing the performance of a model against a simple model using the observed value of the previous day as the prediction for the current day as opposed to a constant mean.This is based on the assumption that the process is a Wiener process, i.e., variance increases linearly with time.8. Index of agreement is based on a two-part squared distance measurement in which the absolute difference between model and simulated value and observed mean is added to the absolute difference between observed record and observed mean.Modified index agreement (MD) involves replacing the squared (power ¼ 2) differences that are sensitive to extreme values by different values of power.On the other hand, relative index of agreement (RD) quantifies the differences between observed and predicted values as relative deviations which reduce the influence of absolute differences during high flows and enhance the influence of absolute low flow differences.9. Signature measures of variability of runoff generation mechanisms (runoff coefficients and recession curves) are believed to capture significant catchment information that is not conveyed regularly using least squares or conventional variance-based measures of fit.Flow duration curve (FDC) can be considered a hydrologically informative signature of catchment behaviour.Entropy-based descriptors are more sensitive to subtle differences in streamflow (Pechlivanidis et al. ).Conditioned entropy difference (CED) is a diagnostic FDC signature representing static information of flow frequency distribution.CED is temporally insensitive and is said to result in better performance when combined with measures that characterize temporal dynamics such as KGE.The main objective of this work is to evaluate the ability of different objective functions in bringing out the best performance of the Evolutionary Superflex approach thereby resulting in simulated response that adequately preserves observed flow characteristics.The outline of the remainder of this paper is as follows.The next section provides a description of the synthetic dataset and Evolutionary Superflex simulations.The results and analysis of the simulations follow and finally the key conclusions are summarized.DATA AND METHODOLOGY Data used in this study are collected at headwater catchments Attert Basin, Luxembourg (Fenicia et al. ), over a period of five years from 1 September 2004 to 31 August 2009, covering a wide range of runoff events.Data consist of daily time steps of precipitation (P) and potential evaporation (E) in mm/h.Two synthetic streamflow time series are generated using the observed forcings (P and E) of two catchments, namely, Huewelerbach and Weierbach.
2) where s indicates model configuration induced by Evolutionary Superflex approach, a indicates type of FDC signature index, x its value, x sa j j indicates absolute values of every signature index, x a and σ a represent mean and standard deviation of x sa j j for all s.Z sa represents Z score or standardized values.SIS s represents the sum of standardized indices of each model configuration.Sum of Standardized Indices (SIS) is a combined measure of signature indices derived from FDC (Yilmaz et al. ) that distinguishes the merits of the evolved model alternatives.FDC signature indices considered in the formulation of SIS include high flows corresponding to extreme rainfall events (FHV (FDC high-segment volume): probability of flow exceedance <2%), intermediate flows between high and medium runoff (FMV (FDC intermediatesegment volume): probability of flow exceedance between 2% and 20%), medium flows covering the mid slope of FDC corresponding to vertical moisture distribution (FMS (FDC mid-segment slope): flow exceedance probability between 20% and 70%) and low flows corresponding to base flows (FLV (FDC low-segment volume) flow exceedance probability >70%).These indices identify the portion of hydrograph where bias is the highest.SIS compares the deviation of simulated FDCs from observed FDC weighting all parts of FDC equally.Negative SIS values indicate above average performance and the model configuration with the lowest SIS value is the one that performs the best for the given catchment.
, the combined objective function Borsanyi has performed consistently well in evolving the correct model structure as the one with minimum fitness in 50% of the total runs considering both synthetic case studies of varying complexities.The most commonly occurring false positives (wrong model structures established as the ones with minimum fitness in respective runs) in the case of simulations using Q_MIV and Q_MXII are M9, M3 and M9, M10, respectively.The objective functions resulting in less than five true positives and more than two different false positives in the case of Q_MIV and Q_MXII simulations are Dawson, Mai0, NS0, Vis_C2 and CED|KG10, RD0, respectively.The top of comparison, all associated parameter values are normalized between 0 and 1. Dots indicate the spread of parameter values and the lines connect the parameters of multiple feasible descriptions (model configurations with corresponding best training fitnesses and negative SIS values) of the target.From Figure 2, it can be observed that the least and the most uncertain parameters of evolved M4 model configurations are Smax_UR (maximum capacity of UR) and β_E_UR (smoothing factor for function governing evaporation loss from UR), respectively.In resultant M4 configurations whose structure consists of serially connected UR and FR, greater variation is observed in the parameters of flux functions of UR, namely, SiniFr_UR (Intial UR storage ¼ SiniFr_UR*S-max_UR), β_E_UR and β_Qq_UR (outflow function parameter).

Figure 3
Figure3shows that the least and the most uncertain parameters of the resultant M12 configurations whose structure consists of serial network of IR and UR before parallelly

Figure 2 |
Figure 2 | Parameter uncertainties in the best Q_MIV model configurations induced by Evolutionary Superflex simulations using 14 objective functions.

Figure 3 |
Figure 3 | Parameter uncertainties in the best Q_MXII model configurations induced by Evolutionary Superflex simulations using 14 objective functions.

Figure 6
Figure 6 illustrates that M4 model configuration evolved based on Vis_C1 with the least SIS value (À3.1) matches most of the Q_MIV characteristics showing 1% bias in FHV and negligible bias (less than ±0:2%) with respect to other FDC signatures (FMV, FMS and FLV) (see subplot of Figure 6 entitled 'Vis_C1').KG20 (SIS ¼ À2.69) and Madsen (SIS ¼ À2.16) also resulted in good overall approximation of the target with slightly greater deviations in FMS and FMS, FLV respectively (see subplots of Figure 6 entitled 'KG20' and 'Madsen'), when compared to the best objective

Figure 7
Figure 7 shows that M12 model configurations evolved based on Madsen (SIS ¼ À3.05), Vis_C3 (SIS ¼ À2.83) and Vis_C2 (À2.49) perform well in comparison to others.The FDC signatures of Madsen, Vis_C2 and Vis_C3 show that high and medium flows are better approximated by M12 model configuration evolved based on Vis_C3 in comparison to Madsen and Vis_C2.Intermediate flows are well captured by M12 model configuration evolved based on Madsen.Considering low flows, the lowest value of FLV bias is observed in the case of Vis_C2.
that combined objective functions, say, Vis_C1, KG20 and Madsen for Q_MIV simulations and Madsen, Vis_C3 and Vis_C2 for Q_MXII simulations resulted in Evolutionary Superflex performing better compared to others.Madsen fitness metric is observed as the common best objective irrespective of the complexity of model search space.

Figure 5 |
Figure 5 | Uncertainty ranges of predictions of Evolutionary Superflex derived using GLUE for a representative portion of Q_MXII.

Figure 6 |
Figure 6 | FDC and % bias of FDC signature indices of the best Q_MIV model configurations values evolved by Evolutionary Superflex simulations with negative SIS values.

Figure 7 |
Figure 7 | FDC and % bias of FDC signature indices of the best Q_MXII model configurations values evolved by Evolutionary Superflex simulations with negative SIS values.
the above mentioned, all configurations have negative SIS values except for those evolved using Dawson (SIS ¼ 1.03) and MD0 (SIS ¼ 1.45).As the complexity of the system increases, the model selection decisions based on FDC signature indices and other performance evaluation metrics do not match each other completely.In such cases, the choice based on SIS is considered superior as it equally weights different flow aspects.Thus, M12 model configurations evolved using Madsen (QGP_Madsen), Vis_C2 (QGP_Vis_C2) and Vis_C3 (QGP_Vis_C3) are considered the most suitable for Huewelerbach catchment synthetic dataset used in this study (see Figure 9).

Figure 8 |
Figure 8 | Observed and simulated hydrographs of the most suitable Evolutionary Superflex models for a portion of synthetic Weierbach catchment data.

Figure 9 |
Figure9| Observed and simulated hydrographs of the most suitable Evolutionary Superflex models for a portion of synthetic Huewelerbach catchment data.

Table 1 |
Objective functions of evolutionary superflex framework Mai et al.

Table 1
| continued : N and N all denote entire length of data, N h , N l are the number of time steps corresponding to high and low flow values, respectively, Qst and Qot denote simulated and observed streamflows, respectively.A i represents transformation constants such that F i þ A i have the same distance to the ideal point (origin), F i,min denotes minimum values of F i estimated from initial GP population.p(x i ) denotes probability of outcome x i such that the probabilities sum to 1, H unscaled , H scaled denote unscaled and scaled Shannon entropies and SUSE represents maximum of unscaled and scaled Shannon entropy difference.m is the number of FDC segments and sg is the probability of exceedance for each segment of FDC.σ0 and σs represent standard deviation of observed and simulated flows, respectively, μ 0 and μ s represent mean of observed and simulated flows, respectively.α denotes relative variability in simulated and observed flows, β (bias) denotes ratio between mean observed and mean simulated flows and γ represents ratio of coefficient of variation of simulated and observed flows. Notes

Table 1
istics, standardized signature index sum criterion (Ley et al.) is calculated:

Table 2 |
Results of evolutionary superflex simulations based on synthetic data generated using superflex submodel M4 Vis_C1, Vis_C2, Vis_C3 (12 out of 14) in the case of Q_MIV and Q_MXII simulations, respectively.The objectives other than those mentioned above fail to adequately distinguish between hypotheses (submodels) of the model structure space in the respective case studies.It is evident that greater The variability of parameter values for the best model configurations simulating Q_MIV and Q_MXII are analysed using boxplots shown in Figures 2 and 3.For ease

Table 3 |
Results of evolutionary superflex simulations based on synthetic data generated using superflex submodel M12

Table 4
using SIS criterion.On the other hand, the highest RMSE and the lowest VE (see Table4) observed in the case of model configuration induced based on CED|KG10 for Q_MIV is not to be considered, although it has a negative SIS value.KG20 is associated with better MAE (1.6 × 10 À3 )

Table 5
(SIS ¼ 6.18) with low PI values of 0.63 and À0.16, high RMSE values of 0.2 and 0.34, low VE values of 0.76 and 0.66, respectively, can be ignored.The best configurations evolved using the objective functions Borsanyi, Dawson, MD0, Madsen, NS0, Vis_C2 and Vis_C3 have

Table 4 |
Performance of the best configurations evolved by evolutionary superflex for Q_MIV using 14 objective functions

Table 5 |
Performance of the best configurations evolved by evolutionary superflex for Q_MXII using 14 objective functions