In the calibration of flood forecasting models, different objective functions and their combinations could lead to different simulation results and affect the flood forecast accuracy. In this paper, the Xinanjiang model was chosen as the flood forecasting model and shuffled complex evolution (SCE-UA) algorithm was used to calibrate the model. The performance of different objective functions and their combinations by using the aggregated distance measure in calibrating flood forecasting models was assessed and compared. And the impact of different thresholds of the peak flow in the objective functions was discussed and assessed. Finally, a projection pursuit method was proposed to composite the four evaluation indexes to assess the performance of the flood forecasting model. The results showed that no single objective function could represent all the characteristics of the shape of the hydrograph simultaneously and significant trade-offs existed among different objective functions. The results of different thresholds of peak flow indicated that larger thresholds of peak flow result in good performance of peak flow at the expense of bad simulation in other aspects of hydrograph. The evaluation results of the projection pursuit method verified that it can be a potential choice to synthesize the performance of the multiple evaluation indexes.

## INTRODUCTION

Hydrological models have been widely used in flood forecasting, water resource management, impact studies of climate change and land-use change, and regional and global water balance calculations, etc. (Refsgaard *et al.* 1988; Xu *et al.* 1996; Kizza *et al.* 2011, 2013; Li *et al.* 2013; Gosling 2014; Li *et al.* 2014; McIntyre *et al.* 2014; Emam *et al.* 2015). Among these applications, flood forecasting is perhaps the most original and widely performed. Generally, parameters of these models are not directly obtained from the measurable catchment characteristics, but calibrated by using various algorithms and procedures so that the simulated hydrologic process matches as close to the hydrological behaviour of the catchment as possible (Duan *et al.* 1992; Cheng *et al.* 2002; Goswami & O'Connor 2007; Jeon *et al.* 2014). The common goal of each optimization algorithm is to seek the optimal value of model parameters according to numerical measures of the goodness-of-fit such as minimizing or maximizing an objective function. In general, a single overall objective function is often used to measure the performance of the model calibration (Xu 2001; Vrugt *et al.* 2003; Jeon *et al.* 2014), such as the Nash-Sutcliffe efficiency coefficient (NSE), coefficient of determination (R^{2}) and the relative volume error (RE) of the hydrograph. If RE is chosen as an objective function, the simulations may have a good performance on the total volume, but a bad performance on the shape of the hydrograph and peak flow (Moussa & Chahinian 2009). When using the NSE, R^{2} (Legates & McCabe 1999) or similar indexes as an objective function, the simulations are prior to matching the high flows and these measures are oversensitive to extreme values. It is difficult to make the simulation match most of the characteristics of the observed hydrograph simultaneously when calibrated by a single objective function (Vrugt *et al.* 2003). Different objective functions are adopted for different purposes. However, when all the characteristics of a hydrograph demand to be reproduced in real applications, no single objective function is efficient enough to calibrate the selected model.

Many studies have applied multi-objective functions to calibrate the parameters of hydrological models, which have been verified to perform better than single ones (Madsen 2000, 2003; Moussa & Chahinian 2009; Prakash *et al.* 2015; Hailegeorgis & Alfredsen 2016). It would be advisable to take various objective functions into consideration by calibrating the model in a multi-objective framework (Yapo *et al.* 1998; Liu & Sun 2010; Li *et al.* 2010; Moussu *et al.* 2011). Dickinson (1973) uses a minimum variance criterion to yield a composite forecast by estimating the weights of different forecasts. Yapo *et al.* (1998) present an efficient and effective algorithm for solving the multi-objective global optimization problem. Dong *et al.* (2013) combine the results of diverse models by using the Bayesian Model Averaging method and an integrated result is obtained. The most widely used method is to transform a multi-objective problem into a single objective problem by using aggregated distance measure (Yapo *et al.* 1998; Madsen 2000; Moussa & Chahinian 2009). Madsen (2003) aggregates two objective functions in a distributed hydrological modelling system to propose a proper balance between groundwater flow and surface runoff. Moussa & Chahinian (2009) composite two or more objective functions into a balanced objective function.

In most continuous models, the objective functions are used to present different parts of the hydrograph which generally contains volume error, root mean square error (RMSE) of the overall hydrograph, high flow events, middle flow events and low flow events. Further, peak flow events, middle flow events and low flow events are generally separated from the whole hydrograph by using a pre-defined threshold value (Madsen 2000; Van *et al.* 2009; Liu & Sun 2010). Madsen (2000) and Liu & Sun (2010) define the peak flow and low flow events as periods with flow above or below a threshold, and a compromise solution is obtained between peak flow and low flow. Van *et al.* (2009) combine the middle flow objective function which is calculated using flows between the 30th and 70th percentile of the flow duration curve to other objective functions. While in real-time flood forecasting research, the event-based flood models are universally adopted to investigate multi-objective calibration issues. Peak flow reproduction is essential for flood forecasting exercises (Chahinian *et al.* 2005; Moussa & Chahinian 2009). It is obvious that the selection of threshold value of peak flow events or low flow events in these cases is subjective, whose reasonability needs to be discussed and compared with other issues in flood forecasting; however, the effect of this subjectivity is not well addressed in the literature. Thus, in this study, the reasonability of the threshold of the peak flow events is analysed and discussed by using a series of equally spaced thresholds. It is important to note that trade-offs exist between different objective functions so that a Pareto front or Pareto surface corresponding to a number of parameter sets could be obtained when solving multi-objective optimization problems (Gupta *et al.* 1998; Messac & Mattson 2002; Madsen 2003).

The performance of a simulated hydrograph is evaluated by a series of evaluation indexes such as RE, NSE, the ratio of annual simulated runoff volume to annual observed runoff volume (Vsim/Vobs) (Yu & Yang 2000), and RMSE, etc. Some of them may be highly correlated, mutually opposing or even contradictory to each other. Therefore, integrating multiple evaluation indexes into a comprehensive value is necessary to identify the simulation performance effectively. Many multivariate statistical techniques have been applied for the analysis of high dimensional data (Wang *et al.* 2006; Wang & Ni 2008). Some of them are intractable in data processing, while the projection pursuit algorithm has superior ability on highly dimensional data processing so that it has been widely used in many fields such as forecasting, water resources assessment, environmental protection and so on (Rajeevan *et al.* 2007; Zhang & Dong, 2009). In this study, the projection pursuit method is used to solve the multi-index evaluation problem.

The main goal of this paper is to assess the effects of different objective functions and their combinations in calibrating flood forecasting models, and a new evaluation method is proposed to assess the performance of a flood forecasting model when evaluation indexes are more than two. The main goal is achieved through the following steps: (1) the Xinanjiang (XAJ) model is chosen as the flood forecasting model and the shuffled complex evolution (SCE-UA) algorithm (Duan *et al.* 1993) is used to calibrate the model; (2) the performances of different single objective functions and multiple objective functions are assessed in model calibration; (3) the impact of different threshold values of the peak flow on flood forecasting is evaluated; and (4) the practicability of the projection pursuit method is verified through the evaluation of the performance of the flood forecasting model.

## STUDY AREA AND DATA

### Study area

^{2}and a river length of 126 km is selected to perform the study, which is a main upstream tributary of Minjiang River located in southeastern China. The physiography of the basin consists of highly dissected topography with steep slopes and high stream densities (Figure 1). The surface of this area is covered by well drained yellow-red soil derived from a variety of parent materials such as sandstone, shale and granite. The forest coverage of this region is above 80% of the total area, making it one of the regions with the highest percentage of forest cover in China (Hu

*et al.*2014). This also means that human activities have little impact on the runoff of the basin. Its climate is dominated by the southeast Pacific Ocean and the southwest Indian Ocean subtropical monsoons and the regional landforms (Tang

*et al.*2013), which bring abundant long-term and extensive rainfall. The mean annual precipitation is 1,700 mm, of which 70–80% occurs in the rainy season from April to September. The average runoff depth is 1,100 mm and the runoff coefficient is 0.64. The catchment can be classified as humid, with abundant soil moisture and high vegetation cover, and the flow generation mechanism is saturation excess runoff.

In this basin, flood is generally caused by two kinds of rain type, the typhoon rain and convection rain of short duration and high precipitation intensity. The former is caused by typhoon weather system which occurs mostly from July to September. The latter results from the interaction between the mid-latitude weather system and the low-latitude weather system, which often brings the frontal rainstorm and generally occurs from April to June. The geography and climate condition of this area makes it a central area of high rain and high flood risk, which often greatly threatens the safety of life and property in the basin. In June 1998, there was a very serious flood event with a return period up to 200 years in the basin, which led to huge financial losses to the local residents (Zhang & Hall 2004). As a result, it is very important and necessary to establish a flood forecasting model for public safety and water management in this study area.

### Data and their characteristics

## MODEL AND OBJECTIVE FUNCTIONS

### Xinanjiang model

The Xinanjiang model is a conceptual rainfall-runoff model proposed by Zhao in the 1970s (Zhao *et al.* 1980). It has been extensively and successfully used for flood simulation and operational forecasting in the humid and semi-humid region in China with good performance (Hu *et al.* 2005; Li *et al.* 2009, Yao *et al.* 2009; Yan *et al.* 2016). The main feature of the model is the proposal of the concept of runoff formation on repletion of storage and storage capacity curve. It implies that runoff is not produced until the soil water content of the aeration zone reaches its field capacity (Zhao 1992). Zhao *et al.* (1995) used the storage capacity curve to solve the problem of the unevenly distributed soil moisture deficit.

The Xinanjiang model contained 15 parameters in total, which are listed in Table 1. Each represents the properties of the catchment. Although some insensitive parameters can be obtained from the observed information, the sensitive parameters must be calibrated (Thiemann *et al.* 2001). The range of each parameter is listed in Table 1, which is used for the calibration of the model's parameters.

Parameter | Description | Range |
---|---|---|

WM | Areal mean tension water storage capacity: WM = WUM + WLM + WDM | 100–150 |

X | WUM = X*WM, WUM is the upper layer tension water storage capacity | 0.1–0.6 |

Y | WLM = Y*(WM-WUM), WLM is the lower layer tension water storage capacity | 0.1–0.6 |

KE | Ratio of potential evapotranspiration to pan evaporation | 0.6–1.2 |

B | Tension water distribution index | 0.1–1.2 |

SM | Areal mean free water storage capacity | 20–70 |

EX | Free water distribution index | 0.1–1.2 |

KI | Out flow coefficient of free water storage to interflow | 0.8–0.99 |

KG | Out flow coefficient of free water storage to groundwater flow | 0.95–1 |

IMP | Impermeable coefficient | 0.01–0.1 |

C | Deep layer evapotranspiration coefficient | 0.15–0.2 |

CI | Interflow recession coefficient | 0.01–0.1 |

CG | Groundwater recession coefficient | 0.03–0.15 |

N | Parameter of Nash unit hydrograph | 1.0–10 |

NK | Parameter of Nash unit hydrograph | 5–15 |

Parameter | Description | Range |
---|---|---|

WM | Areal mean tension water storage capacity: WM = WUM + WLM + WDM | 100–150 |

X | WUM = X*WM, WUM is the upper layer tension water storage capacity | 0.1–0.6 |

Y | WLM = Y*(WM-WUM), WLM is the lower layer tension water storage capacity | 0.1–0.6 |

KE | Ratio of potential evapotranspiration to pan evaporation | 0.6–1.2 |

B | Tension water distribution index | 0.1–1.2 |

SM | Areal mean free water storage capacity | 20–70 |

EX | Free water distribution index | 0.1–1.2 |

KI | Out flow coefficient of free water storage to interflow | 0.8–0.99 |

KG | Out flow coefficient of free water storage to groundwater flow | 0.95–1 |

IMP | Impermeable coefficient | 0.01–0.1 |

C | Deep layer evapotranspiration coefficient | 0.15–0.2 |

CI | Interflow recession coefficient | 0.01–0.1 |

CG | Groundwater recession coefficient | 0.03–0.15 |

N | Parameter of Nash unit hydrograph | 1.0–10 |

NK | Parameter of Nash unit hydrograph | 5–15 |

### SCE-UA algorithm

The Shuffled Complex Evolution algorithm is a global optimization algorithm called SCE-UA for short (Duan *et al.* 1993). The SCE-UA method is considered as the most effective and efficient search algorithm for applying parameter calibration for various conceptual rainfall-runoff models (Gan & Biftu 1996; Madsen 2000; Wang *et al.* 2010). The SCE-UA method is based on the notion of sharing information and on the concepts extracted from principles of natural biological evolution (Duan *et al.* 1994). It takes advantage of the Controlled Random Search (CRS) algorithms (i.e., global sampling, complex evolution) and combines them with powerful concepts of competitive evolution and complex shuffling (Nelder & Mead 1965; Duan *et al.* 1992). Both concepts mentioned above help to ensure that the information of the sample is fully used and does not become degenerate, so that the SCE-UA method has a high probability of succeeding in finding the global optimum (Duan *et al.* 1993; Jeon *et al.* 2014).

The SCE-UA method contains various algorithmic parameters in which the number of complexes p is the most important one. Some studies show that the proper choice of p depends on the dimensionality of the calibration problem (Duan *et al.* 1994). The larger the value of p, the higher the probability of converging into the global optimum but at the expense of a larger number of model simulations, and vice versa (Madsen 2000). Take all the factors into consideration, p is set equal to the number of calibration parameters, and for the rest of the parameters, the default values are used (Duan *et al.* 1994).

### Objective functions

In general terms, the objective of model calibration is to select model parameter values such that the model simulates the hydrological behaviour of the catchment as close to the observations as possible. In the course of flood forecasting, the runoff volume, flood hydrograph and flood peak – called the three elements of flood – are considered to be the most important factors in evaluating the performance of flood forecasting. The objectives that measure the three elements of the hydrological response are listed as follows: (i) a good agreement between the simulated and observed water volume such as a good water balance; (ii) a good overall agreement of the flood hydrograph; and (iii) a good agreement of the flood peak (Moussa & Chahinian 2009).

In the procedure of calibration, the quality of the final model parameter values could be affected by many factors, such as the quality of the input data, the simplifications and errors inherent in the model structure, the power of the optimization algorithm, the estimation criteria, the objective functions and so on (Madsen 2003; Feyen *et al.* 2007). It is difficult to take all the factors into account. In this paper our aim is to analyse the influence of the objective functions on the model parameters and simulations.

The following numerical performance statistics measure the different calibration objectives stated above (Madsen 2000; Moussa & Chahinian 2009):

*Q*is the observed discharge at time

_{obs,i}*i*in each flood event,

*Q*the simulated discharge,

_{sim,i}*n*the number of time steps in the flood event

_{j}*j*,

*M*the total number of flood events,

_{p}*θ*the set of model parameters to be calibrated,

*Q*the observed peak flow of discharge in the flood event

_{obsmax,j}*j*, and

*Q*the simulated peak flow of discharge in the flood event

_{simmax,j}*j*.

_{P}(θ) is based on a small sampling of data as compared with the other two objective functions, the reliability and stability of the calibration result may not be guaranteed. As a result, the average RMSE of peak flow events which include more sample data has been adopted as an objective function to simulate the hydrograph, which is defined as the peak flow events above a given threshold level (Madsen 2000; Van

*et al.*2009; Liu & Sun 2010). However, there is no universal method to decide a reasonable threshold for selecting the peak flow events. In this study, selection of a reasonable threshold has been tried by rolling thresholds from 0 to 100% of the peak flow value with equal step of 2%. A new equation of F

_{f}(θ), named root mean square error of peak flow events, is defined as follows: where

*Q*and

_{obs,i,p}*Q*are the observed and simulated discharges at time

_{sim,i,p}*i*in each peak flow event, respectively;

*f*is the threshold;

*np*is the number of peak flow events (hydrograph in which discharge is greater than the threshold in flood event

_{f,j}*j*); and other notations are as previously defined.

The objective functions listed above are positive functions, and the parameters corresponding to the minimum value of each function (i.e., as close as to 0) could be regarded as the optimum value. Each of the single-objective calibration procedures is undertaken separately and contains a coupled manual and automatic calibration procedure. The main goal of the manual calibration procedure is to obtain the range of each parameter. The automatic calibration procedure aims at finding the optimum in the range. It is worth noting that trade-offs and equilibrium constraints exist between the different objective functions. So it is necessary to consider the calibration in a multi-objective framework.

*F*(

_{i}*θ*) (

*i*= 1, 2, …,

*n*) are the different objective functions. The optimization problem is constrained because

*θ*is restricted to the feasible parameter space Θ. The parameter space is usually defined as a hypercube by specifying lower and upper limits on each parameter which are chosen according to physical and mathematical constraints in the model or from modelling experiences (Kuczera 1997; Madsen 2000).

In general, the solution of Equation (5) will not be a single unique set of parameters but will consist of the so-called Pareto set of solutions according to various trade-offs between the different objectives (Gupta *et al.* 1998). A member of the Pareto set will be better than any other sets with respect to some of the objectives instead of all of other objectives because of the trade-off between the different objectives.

*A*(

_{i}*i*= 1, 2, …,

*n*) are transformation constants corresponding to different objectives so that different relative priorities can be adopted to certain objectives. Different values of

*A*using in the aggregated distance measure can investigate the entire Pareto front. However, it is computationally too expensive to calculate the entire Pareto front. So we can calculate some of the Pareto optimal solutions that people are of interest. In this case, an aggregated objective function was proposed to put equal weights on the different objectives. The value of

_{i}*A*can be calculated by Equation (7) (Madsen 2000): Equation (7) makes sure that each of the objective functions is transformed to having the same distance from the origin so that equal weights are put on the different objectives.

_{i}### Evaluation criteria

Four evaluation indexes are adopted for evaluating the goodness-of-fit of the 30 simulated flood hydrographs, which have been widely applied in flood model calibration and flood forecasting (Madsen 2000; Chahinian *et al.* 2005; Moussa & Chahinian 2009; Wang *et al.* 2015). In order to measure the overall performance of flood events, the absolute values of the evaluation indexes of the 30 flood events are averaged as follows:

*T*is the time of occurrence of the simulated peak flow in the flood event

_{sim,j}*j*,

*T*the time of occurrence of the observed peak flow in the flood event

_{obs,j}*j*, and other notations are as previously defined.

Owing to the contradictions existing among the evaluation results of the four indexes showed above, it is difficult to visually evaluate the performance among different objective-function combinations according to the evaluation results of the four indexes. In this study, a type of statistical technique called the projection pursuit method (Li & Chen 1985) is used to composite the four evaluation indexes so that an aggregated indicator is obtained to assess the performance of simulations calibrated by each objective function. Projection pursuit aims at locating high dimensional space projections into low dimensional space ones which reveal the major characteristics about the structure of the data sets, and finally the optimal projection direction vector and projection eigenvalues are obtained (Swinson 2005). The bigger the projection eigenvalue, the better the comprehensive performance so that the sample data can be classified based on the comprehensive performance (Huang & Lu 2014). To assess the impact of different threshold of the peak flow on the performance of flood forecasting, the projection pursuit is applied to composite the four evaluation indexes.

## RESULTS AND DISCUSSION

Scheme | Objective function | Description |
---|---|---|

1 | F1(θ) | Volume error of the flood events |

2 | F_{2}(θ) | Root mean square error of the flood events |

3 | F_{p}(θ) | Peak flow error of the flood events |

4 | F_{1}(θ)F_{2}(θ) | Combination of volume error and root mean square error of the flood events |

5 | F_{1}(θ)F_{p}(θ) | Combination of volume error and peak flow error of the flood events |

6 | F_{2}(θ)F_{p}(θ) | Combination of root mean square error and peak flow error of the flood events |

7 | F_{1}(θ)F_{2}(θ)F_{p}(θ) | Combination of volume error, root mean square error and peak flow error of the flood events |

Scheme | Objective function | Description |
---|---|---|

1 | F1(θ) | Volume error of the flood events |

2 | F_{2}(θ) | Root mean square error of the flood events |

3 | F_{p}(θ) | Peak flow error of the flood events |

4 | F_{1}(θ)F_{2}(θ) | Combination of volume error and root mean square error of the flood events |

5 | F_{1}(θ)F_{p}(θ) | Combination of volume error and peak flow error of the flood events |

6 | F_{2}(θ)F_{p}(θ) | Combination of root mean square error and peak flow error of the flood events |

7 | F_{1}(θ)F_{2}(θ)F_{p}(θ) | Combination of volume error, root mean square error and peak flow error of the flood events |

Scheme | RE (%) | NSE | Q_{re} (%) | ΔT (h) |
---|---|---|---|---|

1 | 4.67 | 0.8386 | 19.32 | 2.03 |

2 | 6.98 | 0.8892 | 10.87 | 1.63 |

3 | 24.79 | 0.4217 | 6.54 | 5.8 |

4 | 5.8 | 0.8793 | 12.76 | 1.4 |

5 | 7.06 | 0.4368 | 7.11 | 7.07 |

6 | 7.93 | 0.8894 | 6.8 | 1.4 |

7 | 7.29 | 0.8923 | 6.75 | 1.4 |

Scheme | RE (%) | NSE | Q_{re} (%) | ΔT (h) |
---|---|---|---|---|

1 | 4.67 | 0.8386 | 19.32 | 2.03 |

2 | 6.98 | 0.8892 | 10.87 | 1.63 |

3 | 24.79 | 0.4217 | 6.54 | 5.8 |

4 | 5.8 | 0.8793 | 12.76 | 1.4 |

5 | 7.06 | 0.4368 | 7.11 | 7.07 |

6 | 7.93 | 0.8894 | 6.8 | 1.4 |

7 | 7.29 | 0.8923 | 6.75 | 1.4 |

### Single objective calibration results

It can be seen from Tables 2 and 3 that there are great differences among the calibrated evaluation indexes for the three objective functions F_{1}(θ), F_{2}(θ) and F_{P}(θ). It is obvious that the objective function F_{1}(θ) has a best value of relative error of water volume (RE) with 4.67%, while worst value of relative error of peak flow (Q_{re}) with 19.32%. The objective function F_{P}(θ) has an inverse performance compared with the F_{1}(θ). The objective function F_{2}(θ) has the highest accuracy of the Nash-Sutcliffe coefficient (NSE) and time lag of peak flow (△T) among the three single objective functions.

The same phenomenon can be observed in the variation of the evaluation values of the 30 flood events presented in Figure 3. The objective function F_{1}(θ) has the best average value and the least variation of RE, and the objective function F_{p}(θ) performs the water volume worst. It is obvious that the objective function F_{1}(θ) is to measure the agreement between the simulated and observed water volume, while the objective function F_{p}(θ) puts weight on the simulation of the flood peak and naturally the overall water volume and flood hydrograph do not perform well. Accordingly, the objective function F_{2}(θ) simulates the NSE best and F_{P}(θ) performs the Q_{re} best while the values of ΔT are a little high.

*et al.*(2005), Li

*et al.*(2009) and Xu

*et al.*(2013), who applied XAJ model in the humid and semi-humid region in China with good performance. Specifically, the global shape of the hydrograph performs better by using the objective function F

_{2}(θ) than other two single objective functions. The amplitude of the peak flow is simulated better by using the objective function F

_{p}(θ) than other two objective functions, however, the time of occurrence of the peak flow does not perform well by using the objective function F

_{p}(θ).

It can be concluded from Table 3, Figures 3 and 4 that the characteristics of the observed hydrograph are difficult to be matched simultaneously when it is calibrated by single objective functions. Naturally, it is necessary to consider multi-objective calibration so as to have a better simulation on the hydrological behaviour of the catchment.

### Multi-objective calibration results

_{1}(θ)F

_{2}(θ) is smaller than that by F

_{2}(θ) and larger than that by F

_{1}(θ), while the value of NSE calibrated by F

_{1}(θ)F

_{2}(θ) performs better than that by F

_{1}(θ) and worse than that by F

_{2}(θ). The simulations by the multi-objective function F

_{1}(θ)F

_{2}(θ) incorporate the characteristics of the single functions F

_{1}(θ) and F

_{2}(θ) and get balanced results. The similar result could be obtained as for other multi-objective functions, except the combination of F

_{1}(θ) and F

_{p}(θ) (Scheme 5). Analysing the performance of each multi-objective functions, it can be found that compared to Scheme 5 (F

_{1}(θ)F

_{p}(θ)) and Scheme 6 (F

_{2}(θ)F

_{p}(θ)), Scheme 4 (F

_{1}(θ)F

_{2}(θ)) has a better performance on the RE and NSE, while it performs worse on the

*Q*. The overall best result is obtained by Scheme 7 (F

_{re}_{1}(θ)F

_{2}(θ)F

_{p}(θ)).

The results shown above were calibrated by multi-objective functions using aggregated distance measure in which different objective functions were set equal weights. In order to estimate the Pareto front and analyse the trade-offs between different objective functions, a number of tests were carried out.

With respect to the optimization calibrated by F_{1}(θ)F_{2}(θ) (Figure 6(a) and 6(b)), the trade-off between the two objectives (F_{1}(θ) and F_{2}(θ)) is less significant. This is due to the fact that the range of RMSE is larger than the corresponding relative range of volume error. By moving from C to A only a small reduction of the volume error is obtained at the expense of a large increase in the RMSE (Figure 6(b)). The maximum volume error of 39.21 (corresponding to RE = 6.98%) decreases to 24.74 (corresponding to RE = 4.67%), and at the same time the RMSE is increased from 114.27 (corresponding to NSE = 0.8892) to 162.54 (corresponding to NSE = 0.8386). This result is in line with the findings of Moussa & Chahinian (2009), who conducted a comparative study of different multi-objective calibration criteria using a conceptual rainfall-runoff model on flood events.

The estimated Pareto front for the calibration results of F_{1}(θ)F_{p}(θ) presents significant trade-offs (Figure 6(c) and 6(d)). A good calibration of F_{1}(θ) (corresponding to F_{1}(θ) = 24.74) provides a bad calibration of F_{p}(θ) (corresponding to F_{p}(θ) = 341.33), and vice versa (F_{1}(θ) = 139.96 for F_{p}(θ) = 97.83). The aggregated distance measure is seen to provide better balance between the two objectives in the optimization (point B) (corresponding to F_{1}(θ) = 44.96 and F_{2}(θ) = 110.77). Also, a significant trade-off can be observed for the calibration of F_{2}(θ)F_{p}(θ) in Figure 6(e) and 6(f): F_{2}(θ) = 114.27 when F_{p}(θ) = 184.20 and F_{2}(θ) = 289.30 when F_{p}(θ) = 97.83. It seems that the single objective optimization provides the tails of the Pareto front and the compromise solution acts as a break point on the Pareto front. When moving along the front in either direction, one of the objective functions would have a small decrease while the other increases dramatically.

_{1}(θ)F

_{2}(θ)F

_{P}(θ)) during the calibration process are shown in Figure 7 in a tridimensional space. It can be found that most of the values center on the place close to the surface composed by smaller volume error and RMSE. The trade-offs between the three objective functions are significant from the result of Pareto front.

*et al.*2012). The parameter values were normalized with respect to the upper and lower limits given in Table 1 so that the feasible range of all parameters is between 0 and 1. The compromised solution using the aggregated distance measure is shown in full bold line on Figure 8. The calibrated parameters of the compromised solution are within the interval delimited by the calibrated parameters of the Pareto front. For the calibration of F

_{1}(θ)F

_{2}(θ) (Figure 8(a)), a narrow variability is observed in the parameter values when moving along the Pareto front which is in accordance with the fact that the trade-off between the two functions is less significant than other function combinations (Figure 8(b)–8(d)).

With respect to the range of the other combinations of objective functions (Figure 8(b)–8(d)), the intervals are larger because significant trade-offs existed between each objective function. For instance, the normalized range of SM in Figure 8(a) is from 0.48 to 1, the low boundary values become smaller in the other cases (Figure 8(b)–8(d)). This result is not surprising as a smaller SM contributes to more surface runoff so as to have a good performance on the peak flow. That is to say, the parameter SM is sensitive to the objective function F_{P}(θ). With regard to the performance of groundwater recession coefficient CG, a narrow span is observed when moving along the Pareto front in the four multi-objective flood-forecasting schemes. Note that CG is closely related to the recession of groundwater and has less effect on high flow. When the trade-offs change between the three objective functions which pay attention to the water balance and high flow, the low flow exhibits good stability so that the variation of CG value is small.

### Comparison of the calibration performance on the flood events and on the continuous flow series

In order to analyse the differences between the simulation performance of flood events and continuous flow series, three sets of calibration results, i.e., the average values of performance measures of 30 individual flood events, performance measures calculated by putting 30 flood events together, and performance measures of continuous flow series are presented in Table 4.

Average of 30 individual flood events | Total flood events | Continuous flow series | ||||
---|---|---|---|---|---|---|

Scheme | RE (%) | NSE | RE (%) | NSE | RE (%) | NSE |

1 | 4.67 | 0.8386 | −1.05 | 0.9258 | 5.98 | 0.8676 |

2 | 6.98 | 0.8892 | −0.12 | 0.9675 | −18.73 | 0.8809 |

3 | 24.79 | 0.4217 | 16.55 | 0.8363 | 0.92 | 0.8134 |

4 | 5.8 | 0.8793 | −1.98 | 0.9635 | 0.49 | 0.8927 |

5 | 7.06 | 0.4368 | 0.98 | 0.7902 | −14.59 | 0.7746 |

6 | 7.93 | 0.8894 | 0.31 | 0.9590 | −12.66 | 0.8861 |

7 | 7.29 | 0.8923 | 0.59 | 0.9578 | −14.73 | 0.8803 |

Average of 30 individual flood events | Total flood events | Continuous flow series | ||||
---|---|---|---|---|---|---|

Scheme | RE (%) | NSE | RE (%) | NSE | RE (%) | NSE |

1 | 4.67 | 0.8386 | −1.05 | 0.9258 | 5.98 | 0.8676 |

2 | 6.98 | 0.8892 | −0.12 | 0.9675 | −18.73 | 0.8809 |

3 | 24.79 | 0.4217 | 16.55 | 0.8363 | 0.92 | 0.8134 |

4 | 5.8 | 0.8793 | −1.98 | 0.9635 | 0.49 | 0.8927 |

5 | 7.06 | 0.4368 | 0.98 | 0.7902 | −14.59 | 0.7746 |

6 | 7.93 | 0.8894 | 0.31 | 0.9590 | −12.66 | 0.8861 |

7 | 7.29 | 0.8923 | 0.59 | 0.9578 | −14.73 | 0.8803 |

From Table 4 it can be found that the best results are obtained based on the calibration of 30 flood events together, followed by the continuous flow series and the worst results are obtained by averaging the performance measures of 30 individual flood events.

### The impact of different thresholds of the peak flow events on the calibration

_{f}(θ), whose threshold changes from 0 to 100% of the peak flow in each flood events. It can be seen that

*Q*(Figure 9(c)) decreases quickly along with increasing threshold of peak flow, indicating that high threshold results in better performance in the magnitude of the peak flow than small ones. When the threshold is less than 90% of the peak flow, no significant change of ΔT is found, however, a dramatic increase of ΔT is observed with further increase of threshold. From Figure 9(a) and 9(b), it can be observed that the RE and NSE perform badly as the threshold value increases.

_{re}To summarise, a high value of threshold of the peak flow makes good performance on peak flow, however, it performs worse on the volume and the global shape of the flood hydrograph.

_{1}(θ), F

_{2}(θ) and F

_{f}(θ) with different thresholds of the peak flow are combined in different ways to calibrate the model parameters. Figure 10 shows the change of four evaluation indexes on different objective functions under different thresholds of peak flow respectively.

As for the index RE, it can be seen from Figure 10 that the performances of the calibration by the three multi-objective functions become worse and worse with the increasing threshold. A potential reason for this may be that the objective functions combined with F_{f}(θ) of different thresholds did not take the volume error into consideration. In terms of NSE (Figure 10(b)), when the threshold is smaller than 60% of the peak flow, the NSE values of the multi-objective function F_{2}(θ)F_{f}(θ) remain stable and the values are larger than those of other two multi-objective functions. When the threshold is greater than 60% of the peak flow, the values of NSE of the multi-objective functions F_{2}(θ)F_{f}(θ) and F_{1}(θ)F_{2}(θ)F_{f}(θ) decrease slowly, while the value of the multi-objective function F_{1}(θ)F_{f}(θ) decreases dramatically. With respect to *Q _{re}* (Figure 10(c)), it is obvious that the performance of the calibration becomes better and better for the three multi-objective functions with the increase of threshold. Comparing the values between the three multi-objective functions under different thresholds of peak flow, the value of multi-objective function F

_{2}(θ)F

_{f}(θ) is smaller than that of the other two multi-objective functions when the threshold is less than 80% of the peak flow, while there is no significant difference between them when the threshold increases over 80% of the peak flow. However, there is no obvious change to ΔT when using the multi-objective functions with varying threshold of peak flow.

From the results shown above, it can be inferred that too large a threshold will contribute to worse performance of simulations when calibrated by a combination of two objective functions; for a combination of three objective functions, its performance is less influenced by threshold than the other two cases. In general, the threshold values of peak flow between 40 and 70% are good for the calibration of multi-objective function and a better goodness-of-fit is shown in the simulated hydrographs.

### The results of projection pursuit method

_{1}(θ)F

_{f}(θ) is high when the threshold approaches zero in which region the multi-objective function F

_{1}(θ)F

_{f}(θ) is equivalent to the multi-objective function F

_{1}(θ)F

_{2}(θ), while no obvious differences are found between the projection values for the multi-objective functions F

_{1}(θ)F

_{2}(θ) and F

_{1}(θ)F

_{2}(θ)F

_{f}(θ) when the threshold value is low. However, the eigenvalue for all of three multi-objective functions increases when the threshold reaches between 40 and 70% of the peak flow. Unfortunately, the value for all objective functions declines to different degrees when the threshold goes over 70% of the peak flow during which the eigenvalues of the multi-objective functions F

_{1}(θ)F

_{f}(θ) and F

_{1}(θ)F

_{2}(θ)F

_{f}(θ) vary towards minimal and maximum values, respectively.

*Q*and ΔT. That is, the simulation corresponding to high projection eigenvalue fits reasonably well on the observed hydrograph. The results are in accordance with the hypothesis that the bigger the projection eigenvalue the better the comprehensive performance, which in some degree demonstrates the reasonability of the projective pursuit method. Therefore, the projection pursuit method is a reasonable selection in solving multiple evaluation indexes problems which could be applied in performance evaluation of flood forecasting models.

_{re}## CONCLUSIONS

The XAJ model and SCE-UA algorithm were applied on the Chongyang River catchment in southeastern China with the purpose of assessing the effects of different objective functions and their combinations in calibrating flood forecasting models. The model was calibrated on 30 flood events with both single objective and multi-objective schemes. Different objective functions and their combinations were chosen to calibrate the model and their performances were assessed and compared. The following conclusions are drawn from this study.

The performance of simulation is dependent on the objective function used in the flood forecasting model. No unique single objective function could predict all the characteristics of the shape of the hydrograph simultaneously. Significant trade-offs exist between different objective functions so that a set of Pareto optimal solutions is adopted to minimize calibration errors. A compromise solution is obtained when equal weights are assigned to different objective functions by using the aggregated distance measure. The results illustrate that the trade-off between flood events volume error and flood events RMSE is less significant. While significant trade-offs between different objective functions are represented in other combinations of objective functions and a wider range of parameters are reflected in these cases. Comparing to the result of total flood events, the performance of continuous flow series and the average of 30 individual flood events are worse.

High value of threshold makes better performance on peak flow, however, it performs worse on the volume and the global shape of the hydrograph. The impact of different thresholds of the peak flow in the multi-objective functions varies from different evaluation index. A larger threshold of peak flow contributes to good performance of peak flow at the expense of bad simulation in other aspects. The extent of the effect to the simulation varies with different objective functions in which the multi-objective function consisting of three objective functions has the minimal impact. The results also indicate that the threshold values between 40 and 70% of the peak flow have better performances for all multi-objective functions.

This study compares the effect of different objective functions in calibrating rainfall-runoff models which is an important early step in flood forecasting. Long-term flow simulation and flood forecasting are two important and different applications of hydrological models, which focus on different aspects of hydrograph, and therefore need different objective functions and evaluation criteria. This study contributes to an improved knowledge and method for accurate forecasting of river flood. It should be noted that although the study is performed using one model and on one catchment, since the choice of objective functions is governed by the nature of the problem (i.e., the specific aspect of a hydrograph), rather than by the model and the catchment, the findings provide useful reference for other studies. Nevertheless, further studies involving more models and study regions are needed to generalize the findings of our study to other conditions.

## ACKNOWLEDGEMENTS

The study was supported by the National Natural Science Fund of China (51190094; 51339004; 51279138).