## Abstract

Approximate Bayesian computation (ABC) relaxes the need to derive explicit likelihood functions required by formal Bayesian analysis. However, the high computational cost of evaluating models limits the application of Bayesian inference in hydrological modeling. In this paper, multivariate adaptive regression splines (MARS) are used to expedite the ABC calibration process. The MARS model is trained using 6,561 runoff simulations generated by the soil and water assessment tool (SWAT) model and subsequently replaces the SWAT model to calculate the objective functions in ABC and multi-objective evolutionary algorithm (MOEA). In experiments, MARS can successfully reproduce the runoff time series simulations of the SWAT model at a low time cost, with a runoff variance determination coefficient of 0.90 as compared to the Monte Carlo method. MARS-assisted ABC can quickly and accurately estimate the parameter distributions of the SWAT model. The comparison of ABC with non-Bayesian MOEAs helps in the selection of an appropriate calibration approach.

## HIGHLIGHTS

Approximate Bayesian computation (ABC) was used to calibrate the complex hydrological model.

Multivariate adaptive regression splines (MARS) were used to expedite the ABC process.

The ability of MARS-assisted ABC for calibration of the SWAT model was demonstrated.

The calibration results of ABC and multi-objective evolutionary algorithms were compared.

MARS-assisted ABC has the potential to extend the use of ABC in hydrological modeling.

## INTRODUCTION

Hydrological models are frequently used to simulate the water cycle and quantify the effects of climate change and human activity on hydrological processes. Along with the use of large amounts of high-precision input data and the consideration of additional mechanistic processes, hydrological models have become increasingly complex. Overparameterization is a common issue in hydrological models, and the uncertainty in these parameters affects the reliability of model predictions. To address this issue, Bayesian inference is widely used to quantify the uncertainty in model parameters. Traditional Bayesian inference methods rely heavily on the likelihood function. However, due to the inherent complexity and non-linearity of complex hydrological models, obtaining their likelihood functions is difficult, thus limiting the application of Bayesian inference methods to parameter uncertainty quantification. To circumvent relying on the likelihood function, a new method for quantifying parameter uncertainty is needed. One such approach is approximate Bayesian computation (ABC).

ABC-related ideas were first proposed in the 1980s (Rubin 1984). Later in 1984, the first procedure for doing statistical inference using simulation in situations where the likelihood is intractable is introduced (Diggle & Gratton 1984). The earlier ABC was aimed to approximate likelihood before being used for a posteriori inference (Tavaré *et al.* 1997). The core of traditional ABC is rejection sampling, in which only parameter combinations that satisfy a predefined acceptance criterion are accepted. The posterior distribution of participation is made up of a large number of these accepted parameter combinations. This is in contrast to formal Bayesian analysis, which must decide whether to accept a set of parameters based on a likelihood function. Furthermore, ABC is useful for diagnostic model calibration because it defines multiple summary statistics that reflect different aspects of the modeled system. Based on these benefits, ABC is now widely used in a range of applications, from population genetics (Beaumont *et al.* 2002), groundwater modeling (Cui *et al.* 2018), energy modeling (Zhu *et al.* 2020) to hydro-ecological modeling (Piccioni *et al.* 2022). However, two critical factors, in particular, have hindered the application of ABC in calibration of the complex hydrological models.

The first issue is the significant computing burden imposed by the numerous model evaluations required by rejection sampling. Because the prior distribution of the model parameters is often unknown, Bayesian inference utilizing methods such as grid search or random search frequently demands a large number of model runs. Despite the rising efficiency of parametric sampling techniques and the increasing capacity of computer hardware, this computational burden is sometimes prohibitive, if not prohibited. To address this specific issue, model emulation offers a potential promise. The principle of model emulation is to replace computationally demanding models with computationally efficient emulation models, i.e., to increase the computational speed of the emulation model while maintaining and inheriting significant accuracy of the original model.

The emulator, also known as the surrogate model, is usually a black-box or statistical model that has been trained using a set of inputs and outputs from the original model. For new inputs, such as parameter values and driving variables of the inputs, which were not included in the initial training set, a well-trained simulator can make predictions that are reasonably accurate and precise (Cui *et al.* 2018). Numerous emulator techniques have been explored in different disciplines, such as polynomials (Hussain *et al.* 2002; Regis & Shoemaker 2004; Fen *et al.* 2009), Gaussian processes(Wang *et al.* 2014; Gong & Duan 2017; Yang *et al.* 2018), artificial neural networks (ANN) (Behzadian *et al.* 2009; Kourakos & Mantoglou 2009), polynomial chaos expansion (Fan *et al.* 2015; Wang *et al.* 2015; Ghaith & Li 2020), and multivariate adaptive regression splines (MARS) (Friedman 1991; Chen *et al.* 2018). Non-parametric emulators or models, in contrast to their parametric counterparts known for their interpretability, computational efficiency, and robust theoretical foundation (Tutsoy & Polat 2022; Tutsoy 2023), provide enhanced flexibility, eliminate the requirement for predefined data distribution assumptions, and exhibit improved predictive accuracy. As a result, they are gaining increasing popularity for various tasks, including model calibration, sensitivity analysis, and uncertainty analysis. Among these simulators, the MARS model excels over other black-box models like ANNs by providing clear insights into how input variables affect outputs. Unlike ANNs, MARS reveals the nature of variable relationships through piecewise linear functions, making it more interpretable. Additionally, MARS can incorporate lagged time series data as predictors, offering a powerful approach for modeling nonlinear time series relationships that traditional linear models cannot capture. This enhances the MARS model's predictive accuracy and interpretability for complex time series analysis (Lewis & Stevens 1991). Therefore, the MARS has been widely used in the prediction of time series, ranging from streamflow (Adamowski *et al.* 2012; Deo *et al.* 2017; Abdulelah Al-Sudani *et al.* 2019; Mirabbasi *et al.* 2019; Sahraei *et al.* 2021), drought (Deo *et al.* 2017), rainfall (Mirabbasi *et al.* 2019), water pollution (Kisi & Parmar 2016), to energy (Sahraei *et al.* 2021). Despite efforts to apply the MARS to hydrology, it has primarily been used in the simulation and prediction of raw time series data, and unlike traditional emulator techniques, few trials have been conducted to assess its performance as a surrogate model.

Another limiting factor is the lack of reliable comparison of parameter uncertainty quantification. Since the dimensionality of the model parameters is much greater than that of the observed metrics, the phenomenon of equifinality is common. Equifinality means various sets of parameters yield similar simulations in the calibration procedure, which is common in complex nonlinear models such as hydrological simulation program-fortran (HSPF) and SWAT. To control the equifinality to arrive at meaningful parameter sets, the multi-objective evolutionary algorithm (MOEA) calibration strategy is believed to improve parameter identifiability and reduce the equifinality through the inclusion of various metrics as part of the objective function. In this sense, MOEAs and ABC look similar, as the latter does not restrict the inclusion of different types of numerical observations to be considered in summary statistics, and even qualitative data can be treated as acceptance criteria. It is therefore natural and interesting to compare the ability of ABC to quantify parameter uncertainty against MOEAs. However, to the best of the authors' knowledge, no research has been conducted that compares the parameter distributions of ABC inference to those obtained by various MOEA techniques, which would help select an appropriate calibration approach.

This study has three research objectives: to determine whether MARS is capable of mimicking the forecast capacity of a complex hydrological model, and if so, to determine whether MARS can expedite ABC to calibrate the hydrological model and to compare ABC's capability to quantify parameter uncertainty against MOEAs. The SWAT, one of the most extensively used complex hydrological models, has been chosen as the target model. A variety of SWAT simulations are used to train MARS, and its approximation performances are compared against that of Monte Carlo (MC). To calibrate the SWAT and establish a parameter distribution, 10 MOEA approaches are employed. This parameter distribution is then compared to those derived by ABC to assess ABC's capability to quantify parameter uncertainty. The contribution of this study lies in the introduction of MARS as a surrogate model to simulate the predictive behavior of complex hydrological models, and with the computational efficiency of MARS, the speed of the ABC calibration process for complex hydrological models is significantly enhanced. The promising application of this methodology not only reveals the tremendous potential of MARS as an alternative to complex hydrological models but also provides an effective pathway to accelerate model calibration. The next section describes the methods used in the study, followed by a description of the study area, the surrogate MARS model, the MARS-assisted ABC, and the MOEAs. The results section summarizes the overall findings and implications of the study. Finally, we provide some conclusions in Section 6.

## METHODOLOGY

### Methodological framework

The first step begins with the construction of the original Soil and Water Assessment Tool (SWAT) model, including gathering hydrological, meteorological, Digital Elevation Model (DEM), river network land use, and soil type data to create a hydrological simulation of the region of interest. Then, samples of adjustable input parameters are created using a second-order Gaussian orthogonal sampling technique. Initially, the SWAT model employs these parameters to calculate runoff time series outputs, which are essential for constructing the training datasets for the MARS model. In parallel, it applies Latin hypercube sampling, a Monte Carlo stochastic method, to generate a diverse set of adjustable input parameters. These inputs are then processed to produce runoff time series outputs that serve to validate the MARS model's performance. Upon obtaining the training and validation datasets, the MARS model is meticulously trained to closely emulate the relationship between the input parameters and the SWAT-generated runoff time series. The validation dataset's inputs are then fed into the trained MARS to predict the runoff time series, allowing for a direct comparison of outcomes derived from both the SWAT and MARS models using the same parameters. This dual application facilitates the evaluation of each model's performance. Throughout the training and validation phases, the consistency of the runoff's mean and variance at each time step serves as the benchmark for the MARS model's acceptability. Meeting these benchmarks qualifies the MARS model as a viable surrogate for the SWAT model, enabling further research with confidence in its predictive capability.

The second step concerns the implementation of the ABC. For the parameter auto-calibration, the well-trained MARS instead of the SWAT model is combined with the ABC procedure to optimize the input parameters, with the Nash–Sutcliffe efficiency (NSE) coefficient of the runoff as the calibration objective. Then, the parameter posterior distribution derived from the ABC is measured in the original SWAT model, to assess the accuracy of the auto-calibration.

The last step concerns the implementation of the MOEAs. The discrepancies between the parameter distributions from ABC inference and those derived by MOEA techniques are compared to evaluate ABC's calibration capability. To reduce subjectivity in the evaluation and maintain uniformity in the comparison, the probability distributions of the parameters are obtained using numerous multi-objective optimization methods, with the same NSE of runoff as calibration objectives, Likewise, the MARS model is used in place of the original SWAT model to calculate the multi-objective functions, aiming to accelerate the computation of the objective function.

### Multivariate adaptive regression splines

First introduced by Friedman (1991), MARS is a non-parametric piecewise multivariate regression technique, which has been used in a wide range of research areas. Non-parametric strategies have gained popularity in recent years because they outperform parametric models in many ways. Non-parametric models are distinguished by their remarkable characteristics. They offer high flexibility, making them highly effective in adapting to various data structures and relationships, especially when dealing with nonlinear and complex data. Unlike parametric models, non-parametric models do not require stringent assumptions about the data distribution prior to modeling. This lack of presumption about data distribution contributes significantly to their ability to deliver highly accurate predictions in many scenarios. The key benefits of employing non-parametric approaches are the ability to deliver high predicted accuracy, the absence of predetermined assumptions about data distribution, and the flexibility to deal with several explanatory variables at the same time (Chen *et al.* 2018; Zheng *et al.* 2020; Naser *et al.* 2022). The MARS model's key advantage is that it can capture non-linearities in data, allowing it to be used to represent complicated interactions between variables (Lewis & Stevens 1991).

*et al.*(2020).where

*x*is the independent variable, is the predicted response variable, the coefficients are constants that are estimated from the least-squares approach, are the products of BFs or the spline functions previously included in the model, and

*M*is the total number of BFs.

Fitting the MARS model above consists of two major phases. The first stage is model creation, which is divided into two phases: forward-stepwise regression selection and backward-stepwise elimination. Model selection is the second stage. The starting model in the forward selection phase has just one constant . Equation (1) is used to generate various BFs, which are then added to the model in several regions of the independent variables. Following that, the model searches for and analyzes significant variable-node combinations. Interactions are also implemented at this step to improve model fit. The procedure of searching is repeated until the optimal variable-node combination is found (Pramanik *et al.* 2022).

*i*, and

*n*is the number of datasets.

*n*is the number of lane-changing events,

*y*is the lead/lag gaps for observation

_{i}*i*,

*C*(

*M*) is the complexity penalty function, and

*d*is the defined cost for each BF optimization.where MSE is.

*M*is the number of BFs, and

*d*is the penalty of each basis function. A small

*d*generates large models with more BFs, whereas a large

*d*generates small smooth models with fewer BFs. The best MARS model is one that is trained to produce the lowest GCV value.

### Approximate Bayesian computation

First introduced in the statistical literature about four decades ago (Diggle & Gratton 1984), ABC is especially useful for cases where the likelihood is intractable, too expensive to be evaluated, or impossible to formulate explicitly (Cui *et al.* 2018). The first application of ABC in hydrology can be found in uncertainty assessment, which establishes a theoretical connection between ABC and generalized likelihood uncertainty estimation (GLUE)-based approaches (Nott *et al.* 2012). Notably, the ABC approach has been implemented in the DREAM suite for Bayesian inference (Sadegh & Vrugt 2014).

*F*parameters can be determined using Bayes' theorem, where

*x*is the vector of variables, and

*θ*is the vector of parameters, and

*D*is a vector of observed values of

*x*. The posterior probability can be denoted as:where

*p*(

*θ*|

*D*), also known as the posterior probability, is the conditional probability of the model parameters given the observations,

*p*(

*D*|

*θ*), also known as the likelihood function, is the conditional probability of the observations given the parameter values,

*p*(

*θ*) is the prior distribution of

*θ*, and

*p*(

*D*) is the marginal probability of the data. Typically, one can think of the marginal probability

*p*(

*D*) as a normalizing constant. The desired posterior probability can thus be described by the prior distribution and the likelihood function as:where the likelihood function

*p*(

*D*|

*θ*) summarizes the overall distance between the model simulation and corresponding observations. Standard Bayesian inference relies on the evaluation of such likelihood. However, such a function is often not available for simulation-based models. The mathematical definition of this function has been subjected to considerable debate in the hydrologic and statistical literature (Smith

*et al.*2010; Evin

*et al.*2013). The likelihood function is mathematically difficult in most hydrological applications, and determining the posterior distribution via conventional computational techniques (such as Markov Chain Monte Carlo algorithms) is impracticable (Piccioni

*et al.*2022). The idea at the core of ABC is to bypass the explicit evaluation of the likelihood function, directly obtaining an approximation of the posterior probability distribution. To do so, prior probability distributions are first defined for the model parameters. The model

*F*is then used to generate a large set of simulations by randomly sampling the parameter values according to their priors (van der Vaart

*et al.*2015). A collection of relevant summary statistics that summarize the data from the model runs are computed from these simulations and saved with the corresponding parameter values in a dataset known as a ‘reference table’. The posterior distributions can eventually be determined using this reference table, a rejection method, or machine learning techniques.

The proposition underlying ABC is that a parameter set should lie in the posterior distribution when the distance between the observed and simulated data is less than a small positive value (Vrugt & Sadegh 2013). The distance is defined as , where is the observation vector and represents the simulated data corresponding to the observations. Unfortunately, complex models have very few opportunities to satisfy the requirements, which results in a very low acceptance rate during the model evaluation process. The inference process typically uses the distance between summary statistics to catch key data features like means and standard deviations to prevent this problem. If a summary statistic (*S*(*y*)) contains the same amount of information about the model parameters as the whole dataset, i.e., if *p*(*θ*|*D*) = *p*(*θ*|*S*(*y*)), it is referred to as being a sufficient statistic. If not it will lead to weaker posterior constraints.

### Multi-objective evolutionary algorithms

The model calibration is a typical optimization problem. Mathematically, the problem of optimization can be simply stated as finding the argument values that return the minimal (or maximal) value of a given function, which is referred to as an objective function or objective. It has been argued that model calibration based solely on one objective function does not guarantee the credibility of a hydrological model since the water balance components can be misrepresented despite the performance statistics being accurate (Fernandez-Palomino *et al.* 2021). Recognizing this deficiency, comprehensive MOEAs, which are a class of optimization algorithms inspired by the processes of natural evolution, have been introduced to improve the calibration to better represent hydrological processes and system dynamics.

### Open-source libraries

This study completely depends on open-source libraries written in the Python programming language. Numpy was mainly used for numerical calculations, especially the internal integration of a large number of matrix calculation modules. Pandas are mainly used for data processing and analysis, including data reading and writing, numerical calculations, and data visualization. The MARS surrogate modeling framework used in this study was py-earth. The ABC calibration framework used was ABCPMC (approximate Bayesian computation for population Monte Carlo code) (Akeret *et al.* 2015) The MOEA framework used was Platypus. The figures were depicted using Matplotlib (Hunter 2007).

## CASE STUDY

### Study area and SWAT model

To simplify the research process and to take into account the availability of data, this study relies on a previous application case, in which the SWAT model was employed to simulate runoff in the Meichuan Basin (Jiangxi Province, China). The most important eight parameters that were screened out by previous sensitivity analysis, namely CANMX, CN2, CH_N2, CH_K2, ALPHA_BNK, SOL_AWC, SOL_K, and SOL_BD, are specified as adjustable parameters for the original dynamic model (inputs), and the corresponding simulated runoff time series is used as the objective function (outputs). These adjustable parameter descriptions and their upper and lower bounds can be found in the previous research (Ma *et al.* 2022). It is worth noting that, despite MARS' capacity to identify sensitive parameters, sensitivity analysis is not conducted in this study for the sake of simplicity.

### Implementation of MARS-assisted ABC for calibration of SWAT model

#### Design of experiment

The MARS surrogate model is intended to replicate the functional behavior of a numerical model, i.e., how a model output varies as model parameters are varied. As a result, defining a finite set of model outputs, also called design of experiment (DOE), for which a surrogate will be constructed, is a requirement for model emulation. The DOE's major goal is to effectively sample the parameter space with as many distinct combinations of model parameters as possible with limited time and computing resources. In practice, a well-distributed DOE program to sample the parameters is vital. The uniform distribution was chosen since there was very little information available from the literature or prior studies. As a result, it is commonly believed that each of the eight parameters is uniformly distributed, and a joint distribution is derived by joining the eight uniform distributions. To ensure that sampling points are distributed uniformly and independently in a multidimensional space, a quasi-Monte Carlo method, namely a second-order Gaussian quadrature method tailed to the eight-vector joint distribution, is employed to produce a total of 6,561 sample points. Following that, the original dynamic model is run with these eight-vector parameters, and runoff time series are extracted from the raw model. Finally, an input–output pair is obtained.

#### Surrogate model construction

With the input–output pairs generated above, a MARS model is constructed to replace the original SWAT model. MARS is a flexible regression method that searches for interactions and nonlinear connections automatically. The MARS comprises linear models on a higher dimensional basis space (specifically, a multivariate truncated power spline basis). Each component in the MARS model is a product of so-called ‘hinge functions’. A hinge function equals its argument when it is greater than zero and zero otherwise. During fitting, the MARS automatically determines which variables and BFs to use. The final result is a set of nonlinear BFs in the original feature space, which may include interactions and is likely to generalize well. Notably, in a typical model performance evaluation approach, the entire dataset is usually split into two parts: the training set and the validation set. This study instead used all of the 6,561 second-order Gaussian quadrature scheme-guided input–output pairs to train the MARS model, to fully reflect the interactions and nonlinear relationships between inputs and outputs. An additional 10,000 sets of parameter values, randomly drawn using the Latin Hypercube Sampling method, were employed to conduct a robust evaluation of the trained MARS model's performance. (iii) ABC inference

The resulting MARS will replace the SWAT model in the ABC analysis to measure the summary statistic. In the current study, MARS was only trained from the raw output-runoff time series. As previously stated, while ABC allows a variety of acceptance threshold criteria, using raw output as a criterion directly might result in a very low acceptance rate in the model evaluation process. To avoid this difficulty, the Nash–Sutcliffe efficiency (NSE) metric was used as a summary statistic to measure the global trend of MARS and observation matching. To constrain the ABC procedure, runoff NSEs from three distinct stations were utilized as summary statistics. In the ABC procedure, random parameter combinations were generated from the parameter prior distributions. As described earlier, these parameters were assumed to be uniformly distributed. Then, they were evaluated with the MARS. Only those parameter combinations that met the acceptance criteria were accepted into the posterior parameter distributions. In the present study, the inferential analysis was conducted utilizing ‘abcpmc’, a sophisticated Python implementation of approximate Bayesian computing population Monte Carlo (ABC PMC). This implementation is grounded in advanced sequential Monte Carlo (SMC) methodologies, integrating particle filtering techniques to enhance the robustness of the inferential process. Note that the parameter sets have been normalized to the [0,1] range. The number of particles utilized in the abcpmc setting was 500, and the simplest method of iteratively adapting the threshold was employed, with iterations of 50 and all of the starting values are 1 (for NSE), which are two sufficiently high values that ensure that the acceptance ratio remains appropriate. Using a fixed threshold based on the Nash–Sutcliffe Efficiency (NSE) establishes a foundation for a strategic reduction of the threshold over time. This strategy employs an ‘alpha’ value, generally set at the 75th percentile, to progressively lower the threshold, thereby refining the inferential model's convergence toward the posterior distribution. The final step in the procedure is to generate the posterior ensembles of inferences. The MARS model was used to build inference ensembles in this study, using 500 samples drawn from the posterior parameter distribution of the 50th ABC iteration.

### Constructing parametric posterior distributions through combining the outcomes of multiple MOEAs

To reduce the subjectivity of using a single MOEA, multiple MOEAs were used to derive a distribution of parameters by combining the results of various MOEAs. This study uses the Platypus library to implement 10 multiple MOEAs. A suite of 10 MOEAs nondominated sorting genetic algorithm ((NSGA)-II, NSGA-III, CMAES, GDE3, indicator-based evolutionary algorithm (IBEA), MOEA/D, optimized multi-objective particle swarm optimization (OMOPSO), speed-constrained multi-objective particle swarm optimization (SMPSO), SPEA2, and Epsilon-MOEA) were used to calibrate eight parameters (CANMX, CN2, CH_N2, CH_K2, ALPHA_BNK, SOL_AWC, SOL_K, and SOL_BD). This calibration involved three objective functions, each computed by a unique MARS model. Each algorithm used the same variables (8 input parameters), objective functions (3 NSEs of runoff), a default population size of 100, and a total of 10,000 function evaluations. The outcomes of these 10 multi-objective optimizations were further amalgamated to create parameter posterior distributions, which were then compared with those obtained via the ABC method for a comprehensive analysis. For detailed implementation of the optimization algorithm, such as the upper and lower limits of eight parameters, and default parameter settings for 10 optimization algorithms, please refer to the Software and data availability section.

## RESULTS

### MARS surrogate model performance evaluation

### MARS-assisted ABC-derived posterior parameter distribution

*ε*values and distances of ABC are also shown in Figure 3(b) and 3(c), respectively. Both graphs demonstrate that the

*ε*and distance values converge at the 10th iteration.

### MARS-assisted multiple MOEA-derived posterior parameter distribution

### Reduction of the computation time

A single SWAT execution took an average of 112 s on a standalone machine with an Intel 16-core i7-10700 processor and 32 GB RAM. Thus, 25,000 (50 iterations × 500 samples) standalone model evaluations required by traditional ABC took 46,667 min. In contrast, the training of the MARS model required 6,561 evaluations of the original SWAT model and took 12,247 min. Furthermore, because of the fast processing performance of the MARS, 25,000 surrogate ABC took approximately 489 s on a standalone machine. Overall, the proposed MARS-assisted ABC reduced the number of model evaluations by 18,439 and the computation time by approximately 74%.

### Case study calibration results

## DISCUSSION

MARS can serve as a potentially viable substitute model for complex hydrological models. Our MARS surrogate model performance evaluation showed that MARS is capable of mimicking the forecast capacity of a complex hydrological model like SWAT. The case study showed that the time series of the mean monthly flow for the MC and MARS were almost identical during both the training and validation periods, and the average fit coefficient of determination was larger than 0.90 during the validation period. Previously, MARS was mostly used for raw time series forecasting (Garcia Nieto *et al.* 2011; Raj & Gharineiat 2021; Naser *et al.* 2022). However, in this study, it is evaluated as a surrogate model instead for assessing its calibration performance utilizing optimization techniques such as ABC and MOEAs. The calibration results showed that the MARS was able to produce uncertainty analysis findings by creating a SWAT surrogate, with NSE values of greater than 0.65 present in 31% of the 500 ABC results and 50% of the 805 MOEA results. However, the statistics of 31 and 50% show that only a part of the parameters in the suggested set of parameters is capable of meeting the criteria for calibration (NSE > 0.65). This further demonstrates that a surrogate model such as MARS is not a total replacement for the original model but rather a reduction of it. Nevertheless, the benefit of a surrogate model is to assist, i.e., to keep the considerable accuracy of the original models while speeding up the calculation significantly.

The MARS-assisted ABC approach can accelerate the calibration procedure. MARS has been extensively used to identify critical parameters of the complex model (Bose *et al.* 2021). By achieving parameter dimensionality reduction in this way, the computing effort required for parameter calibration will be reduced accordingly. In addition, to analyze the relationship between a given predicted value and a set of predictive variables flexibly and effectively, MARS can significantly reduce computation time when compared to the original model. This can be confirmed in our case study, where the MARS-assisted ABC reduced the number of model evaluations by 18,439 and the computation time by approximately 74%, at the cost of 6,561 evaluations of the original SWAT model. Because of the fast processing performance of the MARS, 25,000 surrogate ABC took only 489 s on a standalone machine, supporting our expectation that MARS can expedite ABC to calibrate the hydrological model. In this sense, MARS-assisted ABC has the potential to extend the use of ABC in hydrological modeling.

The MARS-assisted ABC approach can be an effective supplement to conventional MOEA calibration strategies. As few trials have been conducted to assess MARS's performance as surrogate models, no research has been conducted that compares the parameter distributions of ABC inference to those obtained by traditional MOEA techniques, which would be beneficial in selecting an appropriate calibration approach. Our work is the first of its kind to compare MARS-assisted ABC-derived posterior parameter distribution with that of MOEAs derived. Unsurprisingly, the comparison of posterior distributions showed no significant relationship between ABC and MOEAs. This implies that the transition of surrogate models such as MARS, ABC, and MOEAs can complement each other in the calibration and optimization of complex models. Because the phenomenon of equifinality is common, the inclusion of more objective functions is believed to improve parameter identifiability. In this sense, calibration of complex hydrological models could benefit from MARS-assisted ABC, because the MARS can reduce the computation time significantly, and most importantly, the ABC does not restrict the inclusion of different types of numerical observations, even qualitative data can be treated as acceptance criteria.

However, there are two possible drawbacks associated with MARS-assisted ABC. This study employed multiple single-output emulators, namely three MARS models, rather than a single multi-output emulator, to surrogate the SWAT model. The previous research, however, claimed that the multiple single-output emulator strategy is inappropriate due to its failure to account for temporal correlations, and implied that multi-output emulators should eventually lead to the successful emulation of time series outputs (Conti & O'Hagan 2010). A similar report was also observed in other studies (Razavi *et al.* 2012). Although there is no clear evidence in our study that multiple single-output emulators lose temporal correlation, we believe that accounting for correlations among outputs that are highly connected in the response surface surrogate, should theoretically lead to enhanced surrogate accuracy. In the future, we will examine the single multi-output emulator, such as a long short-term memory approach (Liang *et al.* 2020), gated recurrent unit (Chen *et al.* 2021), or iTransformer (Liu *et al.* 2023) to evaluate the improvement in its surrogate accuracy. The other possible drawback of the MARS-assisted ABC arises from the standard form of the ABC, where a rejection algorithm is used to derive the posterior parameter distribution (Burr & Skurikhin 2013). This includes defining a distance and a tolerance threshold between acceptance and rejection. Such a threshold, however, is arbitrary and should be adjusted for each specific application (Simola *et al.* 2021). Moreover, a limited number of summary statistics must be required to employ the conventional ABC method, and it is often difficult to determine which statistics are the most pertinent to the available data (Piccioni *et al.* 2022). Therefore, other alternative ABCs, such as the ABC random forest that replaces the rejection algorithm with the random forest machine learning technique (Raynal *et al.* 2019), should be encouraged to be examined in the future.

## CONCLUSIONS

The high computational cost of evaluating complex models limits the application of Bayesian inference in hydrological modeling. In this paper, MARS is used to speed up the ABC calibration process of a complex hydrological model. MARS surrogate model performance evaluation showed that MARS is capable of mimicking the forecast capacity of a complex hydrological model like SWAT; therefore, MARS can be used as a promising surrogate model for complex hydrological models. Moreover, the case study calibration results showed that MARS-assisted ABC can expedite the calibration process because of the fast processing performance of MARS. Furthermore, the results of comparing ABC's capability to quantify parameter uncertainty with that of MOEAs showed no significant relationship between ABC and MOEAs. Therefore, combining the ABC with non-Bayesian MOEAs helps in the selection of an appropriate calibration approach. The MARS-assisted ABC is particularly suitable for complex model calibration, but it is also a generic approach for solving various optimization applications since calibration is just a typical optimization problem. Therefore, it is expected to be an ideal method for complex model optimization.

## SOURCE CODE

Name of the software: MARS-ABC.

Developers: Jinfeng Ma.

Operating systems: Windows 10, CentOS 7.0, or other Linux Operating Systems.

Programming languages: Python.

Source code size: 36.1 KB.

Dependent software: Python 3.7.

Availability: The source codes are available free of charge and can be downloaded from: https://github.com/JinfengM/MARS-ABC.

First available: December 2022.

## ACKNOWLEDGEMENTS

This work was supported by the National Natural Science Foundation of China (72349001, 41925005).

## DATA AVAILABILITY STATEMENT

All relevant data are available from https://github.com/JinfengM/MARS-ABC.

## CONFLICT OF INTEREST

The authors declare there is no conflict.