## Abstract

The replacement of models by emulators is becoming a frequent approach in environmental science due to the reduction of computational time, and different approaches exist in the water resources modelling literature. In this work, an emulator to mimic a hydrological model at catchment scale is proposed, taking into account the effect of land use on the hydrological processes involved in water balance. The proposed approach is novel for its combination of techniques. The dimension of the temporal model output is reduced via principal component analysis, and this reduced output is approximated using Gaussian process emulators built on a conditioned Latin hypercube design to reflect constrained land use inputs. Uncertainty from both the model approximation and the dimension reduction is propagated back to the space of the original output. The emulator has been applied to simulate river flow in a rural river basin located in south west England, the Frome at East Stoke Total, but the methodology is general. Results showed that the use of the emulator for water resources assessment at catchment scale is an effective approach, providing accurate estimates of the model output as a function of land use inputs, for a reduction of the computational burden.

## INTRODUCTION

Environmental modelling, particularly for the simulation of geological and climatic processes, usually requires the use of computationally expensive and long runtime codes. This is an issue for analyses that require a large number of model simulations, such as a sensitivity analysis or model calibration, and such analyses can become prohibitive or even impossible using the original model. Recently, the development of new methodologies has been explored, in order to approximate complex environmental models with simpler and more computationally efficient models (e.g. Castelletti *et al.* 2009; Fu *et al.* 2010; Lima *et al.* 2018). One of the most frequently used approaches is emulation or metamodelling. According to the definition given by Castelletti *et al.* (2012), an emulator can be considered as a lower-order approximation of a complex, process-based model, designed to be used as a surrogate in order to overcome the large computational burden. In general, the purpose of an emulator is to replicate the functions of one system with a different system that attempts to reproduce its behaviour (Abrahart & See 2007; Stanfill *et al.* 2015). A variety of different emulators have been developed and tested in a wide range of applications, such as river basin and groundwater management (Solomatine & Ostfeld 2008; Mirfenderesgi & Mousavi 2015; Roy *et al.* 2016; Christelis *et al.* 2017), water quality modelling (Castelletti *et al.* 2010), computational fluid dynamics (Moonen & Allegrini 2015), fire emission modelling (Katurji *et al.* 2015), and ecological modelling (Leeds *et al.* 2013).

Hydrological modelling is a field which benefits greatly from the use of emulators, not only for facilitating common tasks such as sensitivity analysis or calibration, but also for other applications, such as the investigation of effects of climate or land use change on water resources. Indeed, this kind of analysis requires a great number of model runs in order to assess the uncertainty related to predictions in potential future scenarios. Metamodelling approaches have been widely applied in various hydrological modelling problems, as reported by Razavi *et al.* (2012) who reviewed and analyzed a number of studies on emulators in the water resources modelling field. One of the first applications of emulation in hydrology can be attributed to Romanowicz *et al.* (1996), who developed a distributed flow routing model emulator to explore the accuracy in flood prediction. In order to predict water level in the Severn River basin (UK), Young *et al.* (2009) presented and applied a nonlinear Dynamic Model Emulator (DME) to reproduce the output of the HEC-RAS model (Brunner 2001). Fraser *et al.* (2013) proposed the application of a metamodelling approach to improve the performance of a physically-based model in the assessment of the implications of land use changes on peak flows at the catchment scale. Romanowicz & Osuch (2015) developed an emulator based on the MIKE11 model (DHI 1997), obtaining reliable flow predictions for a basin of the river Vistula in Poland. More recently, Machac *et al.* (2016) and Carbajal *et al.* (2017) investigated the application of emulation in urban drainage modelling. Machac *et al.* (2016) proposed an emulator to simplify the application of the Storm Water Management Model (SWMM) to the urban drainage system of the city of Adliswil (Switzerland). Carbajal *et al.* (2017) compared the performance of two different kinds of emulators, mechanistic and data-driven, to mimic the results obtained by the application of an urban drainage model of two small catchments in Switzerland, finding that data-driven emulation outperforms mechanistic emulation where the mechanistic prior is linear.

In this work, a statistical emulator has been developed to mimic and replace a hydrological model based on a soil moisture balance (Rushton *et al.* 2006). This model is quite simplistic in nature and has mainly been chosen to demonstrate the emulator methodology. The proposed approach is unique as it combines a range of techniques to develop an efficient emulator, namely: the use of principal component analysis (Jolliffe 2002) to reduce the high dimensional nature of the model output; the use of models from the field of mixture experiments (Cornell 2011) which are appropriate for the constrained land use model inputs; a novel design strategy called conditioned Latin hypercube sampling (CLHS) which combines traditional mixture design with Latin hypercubes; and the use of Gaussian process emulation (Kennedy & O'Hagan 2001; Rasmussen & Williams 2006) for accurate prediction of the model output as well as uncertainty quantification. The uncertainty from both the emulation of the model and the dimension reduction is also explicitly propagated back to the space of the original outputs. The proposed methodology has been applied to simulate river flow in a rural river basin in south west England, the Frome at East Stoke Total. This work focuses on a hydrological application, but the statistical emulator presented here is a general technique which could be used for a range of environmental models. The approach presented here is similar to the data-driven emulator of Carbajal *et al.* (2017), with the addition of the novel design strategy for the constrained inputs, explicit calculation of the uncertainty propagation, and a different test model.

The paper is organized as follows. In the next section, the area of study and datasets are presented and described. The proposed methodology is explained in the following section. Results are presented and discussed are then discussed and in the next section, applied methodologies and obtained results are briefly summarized.

## STUDY AREA AND DATA

The study area is the Frome at East Stoke Total river basin, located in south west England. The catchment area is approximately 415 km^{2}. In this area, land cover includes pasture (90%), arable (8.7%), woodlands (1%), and urban areas (0.3%). The prevalent crop in the arable areas is winter wheat. Figure 1 shows the digital elevation model of the basin, the outlet location and the spatial distribution of land cover.

The geology is characterized by Chalk, Upper Greensand and Gault, Lias and Oolites in headwaters area, while sand, gravels and clays are frequent in the lower part of the catchment. The predominant soil type in the river basin is Rendzina soil, covering 43.2% of the area, whereas the remaining soils are Eutric Cambisols (27.4%), Orthic Podzols (20.8%) and Eutric Gleysols (8.6%).

According to the Köppen classification, the climate of south west England is classified as oceanic, characterized by cool winters and warm summers. The average annual temperature is approximately 10 °C. Summer is the warmest season with an average annual temperature of 15–16 °C. Precipitations are distributed during the whole year. Average annual rainfall ranges between 1,000 and 1,100 mm/year.

The average annual flow at the outlet of the Frome river basin is approximately 6.7 m^{3}/s. In the catchment, the natural flow regime is not affected by upstream water abstractions, nevertheless, a considerable amount of water is abstracted from the chalk area by bore holes and wells.

Daily series of precipitation, temperature, solar radiation, relative humidity and wind speed data have been obtained from a dataset based on the global data of National Centers for Environmental Prediction (NCEP) Climate Forecast System Reanalysis (CFSR), available at the website http://globalweather.tamu.edu/. In the website, the selected weather gauges (all located in the same site with coordinates 50°44′13.19″N and 2°29′59.98″W) are identified with the following IDs: p507-25 (rainfall in mm), t507-25 (maximum and minimum temperatures in °C), r507-25 (relative humidity as a fraction), s507-25 (solar radiation in MJ/m^{2}) and w507-25 (wind speed in m/s). Daily river flow data recorded at the East Stoke Total station (9.2 mAOD) have been provided by the National River Flow Archive of UK. In this dataset, the river flow gauge is identified with the ID 44001 and data can be downloaded at the link https://nrfa.ceh.ac.uk/data/station/meanflow/44001. Land cover information in the area of study have been obtained from Feranec *et al.* (2016) (downloadable at https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012?tab=download), while soil information has been extracted from the FAO/UNESCO Soil Map of the World (downloadable at www.fao.org/geonetwork/srv/en/metadata.show%3Fid=14116).

## METHODOLOGY

### Runoff modelling

The applied model has been developed by Rushton *et al.* (2006). Based on soil moisture conditions, that are a function of catchment characteristics, such as the crop types in arable areas and the related growth stages, the model is able to simulate runoff and potential groundwater recharge.

The processes included in the soil moisture balance can be summarized in the following steps (Figure 2): precipitation reaches the ground surface and, based on rainfall intensity and soil moisture conditions, a fraction of it can generate runoff, whereas another part infiltrates the soil zone. The amount of infiltrated water becomes available to supply potential evapotranspiration or to be stored as groundwater. Every time the potential evapotranspiration demand has been met, any further infiltration of water will increase the amount of water stored in the soil. This process will continue until the stored water reaches the field capacity, that is the point at which the soil becomes free draining and groundwater recharge occurs. During dry periods, the soil dries because of water uptake by the crop. The deficit of soil moisture represents the amount of water needed to bring the soil back to the field capacity. Whenever enough water is available, actual evapotranspiration will occur at the potential rate.

*θ*, that is, the soil moisture content below which the plant roots are not able to extract water, and the field capacity

_{WP}*θ*. Rainfall is the main input for soil moisture balance, while interception and runoff reduce the actual infiltration to the soil zone. Runoff can be assessed by means of coefficients that define the fraction of rainfall which generates runoff. These coefficients are a function of daily rainfall and soil moisture deficit and are defined in order to reflect the small runoff in summer (higher soil moisture deficit) and the larger runoff in winter (lower soil moisture). The reference crop potential evapotranspiration

_{FC}*ET*

_{0}is estimated using the FAO version of the Penman–Monteith equation (Allen

*et al.*1998). The amount of water required by each crop depends on different factors, such as the date of sowing and harvest, the growth of crop, etc. The potential evapotranspiration of the crop

*ET*is a function of

_{C}*ET*

_{0}and is calculated as follows:where

*K*is the stress coefficient and

_{S}*K*is the crop coefficient (Allen

_{C}*et al.*1998).

*K*is calculated as a function of root depth, the moisture holding properties of the soil and the soil moisture deficit, whereas

_{S}*K*depends on the growth stage of the crop, assuming that the highest values are ascribed to the development and the mid stage, when the crop matures. Evaporation from bare soil occurs from crop harvest up to the complete coverage of ground by the crop.

_{C}*RAW*), is introduced in the model in order to represent the condition in which the crop can obtain all the required water from the soil:where

*p*is a depletion factor ranging from 0.2 and 0.7. Every time the soil moisture deficit exceeds

*RAW*, the crop is under stress conditions, then transpiration will occur at a reduced rate, while when

*RAW*is higher than the soil moisture deficit, the crop will transpire at the potential rate. Groundwater recharge will occur on days when the soil moisture deficit becomes zero and the soil becomes free draining. Groundwater recharge is equal to the amount of water exceeding that required by the soil to reach the field capacity (see Appendix, available with the online version of this paper). All stages of the soil moisture balance calculations and equations are fully described by Rushton

*et al.*(2006) and Ireson & Butler (2013).

The model application requires the calibration of six parameters, specifically, *θ _{FC}*,

*θ*,

_{WP}*p*,

*Zr*, the bypass fraction

*BF*and the factor for near surface storage

*FRACSTOR*. Model outputs are very sensitive to changes of the

*BF*parameter (Ireson & Butler 2013), representing the proportion of rainfall which becomes bypass flow, that is the amount of water passing directly to the aquifer through the fractures. Bypass flow is responsible for groundwater recharge during summer months, when soil moisture deficits prevent drainage from the soil.

*FRACSTOR*is an empirical factor used to define the proportion of the increase in moisture content that becomes near surface soil storage.

The model previously described has been calibrated and validated. In order to identify the set of model parameters that give a good fit between observed and simulated discharges, a Monte Carlo procedure has been applied, consisting of a random generation of parameter values in fixed ranges of variation. Therefore, each model simulation for calibration has been characterized by a different combination of randomly generated parameters. The achievement of the best set of model parameters required a great number of simulations (10,000), since several combinations of parameters need to be tested.

The fact that the Rushton *et al.* (2006) model is fast enough to enable a Monte Carlo calibration approach contradicts the need for emulation somewhat. However, the focus of this paper is to outline the methodology of the emulator and its implementation. The aforementioned model is used for demonstration purposes and has been pre-calibrated in order to avoid a biased poor performance of the emulator. Indeed, statistical emulators are commonly employed in the calibration stage itself, but this requires a slightly different approach to the focus here. A number of calibration approaches have been proposed in the emulation literature, including Bayesian calibration (Kennedy & O'Hagan 2001; Wilkinson 2011) and history matching (Craig *et al.* 1996; Williamson *et al.* 2015).

*t*, and is the average of the observed series.

*NS*can range from to 1. An efficiency equal to 1 is reached when a perfect match between observed and simulated values occurs.

### Model set up and experimental design

With the calibration complete, the aim is to study the relationship between the remaining four land use input variables and the river flow output. The land use variables are defined as shares in the basin and will be denoted as (woodland), (arable), (pasture) and (urban), or using the vector . The model output is the daily river flow at *T* time points, denoted as the vector . The Rushton *et al.* (2006) model will be denoted functionally as . To reduce the computational burden, the model will be replaced by a fast-to-run statistical emulator, denoted . The methodology for building the statistical emulator is presented below under ‘Statistical emulation’. An analysis of the difference in computational time between the model and emulator is given below under ‘Comparing the computational time of model and emulator’. Daily river flow output of the Rushton *et al.* (2006) model for five settings of the four land use inputs are shown in Figure 3.

The statistical emulator must be constructed using *n* runs of the model in an experimental design, denoted . The experimental design literature for statistical emulation is vast, but in general it is good practice to run the model at a range of input settings which are space-filling in the region of interest, and Latin hypercubes (McKay *et al.* 1979) are a popular approach. The number of model runs, *n*, depends on the computational budget, but Loeppky *et al.* (2009) suggest the rule , where *d* denotes the input dimension ( here). The land use inputs for the Rushton *et al.* (2006) model are non-standard in that they are shares constrained to sum to one, i.e. . This is the focus of the field of mixture experiments (Cornell 2011) and several mixture designs have been suggested. However, the majority of mixture designs, e.g. simplex centroid designs and simplex lattice designs, are similar to factorial or gridded designs, which are known to be substandard for building (Gaussian process) statistical emulators (Urban & Fricker 2010). As far as the authors are aware, equivalents of Latin hypercube designs for mixture experiments do not exist in the literature. Here, a novel strategy for mixture design for emulators is proposed which uses the CLHS algorithm (Minasny & McBratney 2006). The algorithm is implemented in the *clhs* package (Roudier 2011) in the *R* programming language (R Core Team 2017). With the design strategy decided it remains to choose the number of training runs (design size). In this work, a CLHS design of size is chosen, following the rule suggested by Loeppky *et al.* (2009), from candidate mixture points. The experimental design is shown in Figure 4. The rule of thumb is satisfactory in many scenarios, and provides accurate emulator predictions here (see below under ‘Statistical emulation’). However, in cases where there are a large number of inputs or the model is too expensive, it may not be possible to achieve within the computational limitations. A solution in this case would be to reduce the input dimension, for example using active subspaces (Constantine 2015), or to implement smaller experiments over multiple iterations or waves (Williamson *et al.* 2015).

In summary, the experimental design is a CLHS mixture design of size . The output of the Rushton *et al.* (2006) model at the experimental design is the matrix , obtained by evaluating for . Since the river flow model output is strictly positive, the logarithm of the model output, , is used for subsequent statistical analysis.

### Dimension reduction

The majority of statistical emulation methods are appropriate for scalar model outputs (Kennedy & O'Hagan 2001). To adhere to this and use standard emulation techniques, the dimension of the temporal river flow model output is reduced. In this work, principal component analysis is employed via the singular value decomposition (SVD) (Jolliffe 2002). Several other decompositions are possible, for example nonnegative matrix factorization (Carbajal *et al.* 2017). The dimension reduction procedure used here is as follows:

- 1.
Subtract the column means from to obtain .

- 2.
Using SVD (or equivalently an eigendecomposition) obtain a column matrix of eigenvectors, , and a diagonal matrix of eigenvalues, . The eigenvalues satisfy .

- 3.
Select the first

*K*eigenvectors and eigenvalues to explain a proportion*P*(e.g. 0.9, 0.95, 0.99) of variability in the data, by solving for : . Denote the first*K*eigenvectors and eigenvalues using the matrices and respectively. Denote the remaining eigenvectors and eigenvalues, , using the matrices and respectively. - 4.
Project the data by computing . The columns of are the principal components, denoted .

- 5.
Transform back to the original dimension of the data by computing . Since a logarithmic transformation was used, an exponential transformation will also be needed here.

By using principal component analysis on it was found that eigenvectors could explain almost all of the variability in the data (). Usual practice is to select the number of eigenvectors to explain a high proportion of variability in the data (e.g. 0.9, 0.95, 0.99). Since almost all the variability could be explained by only two eigenvectors in our example it made sense to go beyond this rule. This is very uncommon and is probably due to the model output being similar across model runs, as we can see in Figure 3. As the land use input is varied the flow output is similar, save for a shift effect. Despite explaining a high proportion of variability, the principal component approach does not guarantee accurate reconstruction of the original data, especially if there is non-linearity present. However, it is shown that the reconstruction is accurate below under ‘Results and discussion’.

### Statistical emulation

Recall that the Rushton *et al.* (2006) model, , is to be replaced with an emulator, . Using principal components the model output has been approximated by the two principal components, and . To construct a full emulator for the Rushton *et al.* (2006) model, functional relationships between the land use inputs and the principal components, denoted and , need to be replaced with independent emulators, namely and .

*k*th principal component:where and denote the mean and covariance functions respectively, and , and are hyperparameters to be estimated. The mean function is used to describe any known global behaviour of the model output as a function of its inputs, and commonly a regression model is used. Recall that for the Rushton

*et al.*(2006) model the land use inputs are shares constrained to sum to one. It is therefore appropriate to use a regression model from the field of mixture experiments, the most popular class of which are Scheffé models (Scheffé 1958; Cornell 2011). For simplicity, in this work the following linear Scheffé model is used:where the hyperparameters are regression coefficients to be estimated. This is to align with the fact that linear regression models are the standard choice for the mean function in Gaussian emulation (O'Hagan 2006). The covariance function is used to describe the smoothness properties of the model output as a function of its inputs, and there are several choices from the literature. The majority of these define the covariance in the output at two input settings, and , in terms of the distance between the inputs, . In this work, the following squared exponential covariance function is used, which assumes a stationary and separable Gaussian process (Rasmussen & Williams 2006):where is a matrix with diagonal elements , . The hyperparameters are known as correlation lengths, and determine the smoothness of the output in each input separately. Specifying one correlation length for each input dimension is a technique known as automatic relevance determination (ARD) (Rasmussen & Williams 2006). Satisfying , the larger the correlation length, the smoother the output is with respect to that input. The hyperparameter is a variance term which determines the extent that the Gaussian process can deviate away from the mean function. The squared exponential covariance function results in a very smooth (infinitely differentiable) Gaussian process which will not be appropriate in all cases, but is found to be suitable for the principal component output featured here. However, the proposed approach will work with any other choice of covariance function, for example, the Matérn class of covariance functions. It is fundamental to correctly choose the covariance function. It is shown below under ‘Statistical emulation’ that the squared exponential covariance function with ARD gives the best value of the marginal log-likelihood when compared to Matérn covariance functions with ARD. To test whether the choice of the squared exponential covariance function with ARD is optimal, the value of the marginal log-likelihood is calculated and compared with values obtained by using Matérn covariance functions with ARD. Specifically, the squared exponential covariance function is compared with three Matérn covariance functions with ARD, where the regularity parameter is set to 1, 3 and 5. These are referred to as Matérn 1, Matérn 3 and Matérn 5 respectively. A larger value of the regularity parameter increases the smoothness of the resulting Gaussian process. More detail on the choice and role of the mean and covariance function and on the Matérn class of covariance functions can be found in Rasmussen & Williams (2006).

*k*th principal component output, conditional on estimates of the hyperparameters , and , remains a Gaussian process and is as follows:where and denote the posterior mean and covariance functions respectively. Full details of this calculation can be found in Haylock & O'Hagan (1996). This posterior distribution is known as the Gaussian process emulator, and provides a fast approximation to the model at any input setting. The posterior mean function interpolates the model output at the experimental design points in . The posterior uncertainty is captured by the variance and quantifies the uncertainty introduced by replacing the original model with an emulator. This uncertainty is zero at the experimental design points and grows as we move further away from them.

### Reconstruction and propagation of uncertainty

*et al.*(2006) model and effectively propagate the uncertainty back to the space of the model output, the river flow time series. To do this, the procedure described in Wilkinson (2011) is followed. Assuming that the two principal components are independent, the predictive distribution of the combined statistical emulators at a new input setting is the following multivariate Gaussian distribution:where is a mean vector and is a covariance matrix with diagonal elements , . This distribution can be propagated to the space of the (logarithm of the) Rushton

*et al.*(2006) model output as follows (Wilkinson 2011):where the definitions of , , and were given above under ‘Dimension reduction’. The term is a propagation of the statistical emulator mean back to the space (of the logarithm) of the river flow output. The two variance terms and quantify the uncertainty from replacing the model with an emulator, and reducing the dimension through

*K*principal components, respectively. Like the mean term they themselves are propagated back to the space of the original model output. An exponential transformation of Equation (10) completes the reconstruction and propagation.

### Validating the statistical emulator

It is important to validate a statistical emulator to see if it provides an accurate approximation to the model. In this work, leave-one-out cross validation is used to this end (Sammut & Webb 2017). The emulators are validated on the reduced dimensional (principal component) space and the reconstructed space of the original model output. A procedure for leave-one-out cross validation for the emulator in Equation (7) is as follows (for ):

Remove the

*i*th point from the experimental design and the*i*th point from the principal component output, denote the result as and respectively.Construct a new Gaussian process emulator conditional on ; denote this . Instead of re-estimating the hyperparameters, simply fix them at the values estimated for the complete output . Denote the posterior mean and covariance functions of as and respectively.

Predict the principal component output at using the emulator. The predictive distribution is Gaussian with mean and variance .

A credible interval for the prediction can be constructed as , where denotes the quantile of the Student's

*t*-distribution with degrees of freedom, and*q*is the number of hyperparameters ( here).

Plotting leave-one-out cross validation emulator predictions against the corresponding principal component output provides a useful diagnostic validation tool. It is important to also include credible intervals to assess whether the uncertainty of the emulator is well calibrated. In short, a well-calibrated emulator should have predictions close to the line with small uncertainty, or if not, the uncertainty should increase appropriately.

*t*respectively.

## RESULTS AND DISCUSSION

### Runoff modelling

To calibrate the model, daily river flow data recorded over the 2003–2010 period has been used. Table 1 shows the set of parameters that provided the best model performance.

Parameter . | Description . | Calibrated value . | Measurement unit . |
---|---|---|---|

θ _{FC} | Field capacity | 0.201 | m^{3}/m^{3} |

θ _{WP} | Wilting point | 0.096 | m^{3}/m^{3} |

p | Depletion factor for RAW calculation | 0.495 | – |

Z _{r} | Root zone depth | 0.539 | m |

BF | Bypass fraction | 0.220 | – |

FRACSTOR | Factor for near surface storage | 0.081 | – |

Parameter . | Description . | Calibrated value . | Measurement unit . |
---|---|---|---|

θ _{FC} | Field capacity | 0.201 | m^{3}/m^{3} |

θ _{WP} | Wilting point | 0.096 | m^{3}/m^{3} |

p | Depletion factor for RAW calculation | 0.495 | – |

Z _{r} | Root zone depth | 0.539 | m |

BF | Bypass fraction | 0.220 | – |

FRACSTOR | Factor for near surface storage | 0.081 | – |

At the daily scale, the calibrated model provided an *NS* coefficient equal to 0.71. The model performance improved at the monthly scale, reaching an *NS* value equal to 0.81. Figure 5 shows the comparison between the observed river flow and the river flow simulated by the calibrated model in the 2003–2010 period at the daily (Figure 5(a)) and monthly scale (Figure 5(b)).

According to the previously described results, the model is able to reproduce the time pattern of the daily and average monthly river flow and the model performance can be considered acceptable (Ritter & Muñoz-Carpena 2013). Annual average observed river flow is equal to 6.2 m^{3}/s, while the simulated value is 6.3 m^{3}/s.

The set of best parameters were used to validate the model. Model validation has been performed using data measured from 2011 to 2013. Validation *NS* coefficients were 0.64 and 0.70 for daily and monthly river flow respectively. These values of the efficiency index point out an acceptable model performance.

### Statistical emulation

In this work, Gaussian process emulators are built for each principal component output using the DiceKriging package (Roustant *et al.* 2012) in *R*. The package implements maximum likelihood to estimate the hyperparameters , and for the *k*th principal component output. Hyperparameter estimates for the emulators of the two principal components, , , are given in Table 2.

Hyperparameter . | . | . |
---|---|---|

15.989 | 1.992 | |

15.237 | –3.832 | |

14.673 | 1.967 | |

–40.707 | –0.079 | |

16.722 | 0.708 | |

1.938 | 1.938 | |

1.892 | 0.617 | |

1.691 | 1.691 | |

0.224 | 0.338 |

Hyperparameter . | . | . |
---|---|---|

15.989 | 1.992 | |

15.237 | –3.832 | |

14.673 | 1.967 | |

–40.707 | –0.079 | |

16.722 | 0.708 | |

1.938 | 1.938 | |

1.892 | 0.617 | |

1.691 | 1.691 | |

0.224 | 0.338 |

The marginal log-likelihood of emulators , , with the four covariance function choices are given in Table 3. It is clear that the choice of squared exponential covariance function provides the maximum value of the log-likelihood out of the four options, meaning that the principal component output is very smooth. This is also reflected in the marginal log-likelihood values for the Matérn covariance functions, which increase when the regularity parameter is larger.

Covariance function . | . | . |
---|---|---|

Squared exponential | 13.434 | 71.017 |

Matérn 1 | –58.845 | 17.475 |

Matérn 3 | –20.119 | 46.549 |

Matérn 5 | –1.486 | 61.232 |

Covariance function . | . | . |
---|---|---|

Squared exponential | 13.434 | 71.017 |

Matérn 1 | –58.845 | 17.475 |

Matérn 3 | –20.119 | 46.549 |

Matérn 5 | –1.486 | 61.232 |

All covariance functions use automatic relevance determination. The Matérn covariance functions have regularity parameters 1, 3, and 5 respectively.

Leave-one-out cross validation diagnostic plots for the two emulators , , are shown in Figure 6 with 95% credible intervals. It is evident that both emulators accurately predict the principal component output at the withheld design points. In particular, for both principal components the emulator's predictions lie on or very close to the line. In terms of coverage, for the first principal component two of the 40 withheld points lie outside of the emulator's credible intervals, giving a coverage of 95%. For the second principal component, four of the 40 withheld points lie outside of the emulator's credible intervals so the coverage is 90%. In summary, both emulators approximate the principal component output accurately and are well calibrated.

The emulators validate well on the space of the two principal component outputs. However, to be of any use, emulator predictions must be reconstructed and propagated back to the original space of the Rushton *et al.* (2006) model output: the daily river flow. A procedure for doing this was given above under ‘Reconstruction and propagation of uncertainty’. It is useful to check whether the reconstructions accurately predict the daily river flow at unseen land use inputs, and RMSE and MAE metrics are used to this end. Calculating the RMSE for each of the 40 leave-one-out cross validation points gives values ranging from to (with a mean of ). Similarly, the MAE for each of the 40 leave-one-out cross validation points gives values ranging from to (with a mean of ). The RMSE and MAE are measured in the same units as the model output (m^{3}/s). Given that the range of the Rushton *et al.* (2006) model output for the Frome basin is approximately between 0 and 25 m^{3}/s, these small values of the RMSE and MAE indicate a very good match between the simulated and emulated daily river flows. The difference in simulated and emulated daily river flow for the smallest and largest RMSE values is shown in Figure 7. The difference is plotted because it was hard to see a difference between the simulated and emulated time series on an absolute scale. For the case of the smallest RMSE value, the emulator predicts the model output almost exactly. For the largest RMSE value, the emulator mainly overestimates the daily river flow in a seasonal pattern, but the error is small and the uncertainty increases accordingly. It was found that larger values of the RMSE tended to correspond with flow output with larger magnitude, indicating that these outputs are harder to predict. However, this relationship was small and is not investigated further here.

### Comparing the computational time of model and emulator

With the emulator validation complete, it is useful to investigate the reduction in computational burden it provides. The time it took for the model and emulator to run the 40 experiments in the CLHS design was recorded for 100 repetitions, with the average time for completion compared. In this work, experiments were run on a computer with an Intel Core i7 processor running at 3.4 GHz using 32 GB of RAM, on a Windows 7 operating system. The average completion time for the model and emulator was 8.768 and 2.182 seconds respectively. This equates to approximately a four times speed up in computational time when using the emulator. This reduction is quite small but is not surprising considering that the Rushton *et al.* (2006) model is a fairly simple hydrological model, mainly used in this paper to demonstrate our emulation approach. Nevertheless, the difference in computational time would become substantial when performing tasks which require large numbers of simulations, or when using the emulator for more complex models.

## CONCLUSIONS

Due to a rapid increase in the sophistication of environmental modelling, approaches to reduce the computational time of the models have become popular. In this context, statistical emulators are used to provide fast approximations of computationally intensive models to facilitate subsequent analysis. With enough runs of the original model, emulators often offer predictions of comparable accuracy for a fraction of the computational time and, importantly, quantify the uncertainty associated with replacing the model with an emulator.

In this study, a novel emulator methodology has been proposed to provide a fast approximation to a hydrological model available in the literature. The model itself was reasonably simple and fast to run, but provided a good test example to demonstrate the emulator methodology. While emulators have been used for hydrological models before, the approach given here was novel in a number of ways, for example in its utilization and combination of a range of techniques from various fields. Firstly, principal component analysis was used to reduce the dimension of the time series output of the model. Secondly, due to the constrained nature of the land use inputs, the field of mixture experiments was drawn upon: classical mixture models were used alongside a novel mixture design strategy which used a CLHS algorithm to generate a space-filling design tailored for building statistical emulators. Finally, Gaussian process emulators were employed to describe the relationship between the land use inputs and the principal component outputs. The emulator's predictions were then reconstructed back to the space of the daily river flow output of the model. Furthermore, uncertainty due to the use of an emulator and dimension reduction techniques were efficiently propagated.

The statistical emulator presented here was applied to a version of the hydrological model calibrated for the Frome basin using measurements from the East Stoke Total station. Under ‘Results and discussion’ above it was shown first that the calibrated model provided a good description of the observed river flow at a daily and monthly scale, via the Nash and Sutcliffe coefficient. Second, the proposed statistical emulator gave very accurate predictions of the calibrated model output as a function of four land use inputs, as tested by a number of leave-one-out cross validation exercises and diagnostics. Possible future directions of this work would be to test the proposed methodology on hydrological models calibrated to different river basins, or for more complex implementations of the model (for example with a larger number of land use inputs). It is anticipated that the approach outlined here would perform well in these scenarios.

## DATA AVAILABILITY

Data and code for the hydrological model and emulator methodology are openly available at https://github.com/neowen7/water-resource-emulation.

## REFERENCES

*European Landscape Dynamics: CORINE Land Cover Data*. CRC Press Taylor & Francis Group, Boca Raton, FL, USA; London, UK; New York, NY, USA.