Prediction of mean wave overtopping at simple sloped Prediction of mean wave overtopping at simple sloped breakwaters using kernel-based methods breakwaters using kernel-based methods

The accurate prediction of the mean wave overtopping rate at breakwaters is vital for a safe design. Hence, providing a robust tool as a pre-liminary estimator can be useful for practitioners. Recently, soft computing tools such as arti ﬁ cial neural networks (ANN) have been developed as alternatives to traditional overtopping formulae. The goal of this paper is to assess the capabilities of two kernel-based methods, namely Gaussian process regression (GPR) and support vector regression for the prediction of mean wave overtopping rate at sloped breakwaters. An extensive dataset taken from the EurOtop database, including rubble mound structures with permeable core, straight slopes, without berm, and crown wall, was employed to develop the models. Different combinations of the important dimensionless parameters representing structural features and wave conditions were tested based on the sensitivity analysis for developing the models. The obtained results were compared with those of the ANN model and the existing empirical formulae. The modi ﬁ ed Taylor diagram was used to compare the models graphically. The results showed the superiority of kernel-based models, especially the GPR model over the ANN model and empirical formulae. In addition, the optimal input combination was introduced based on accuracy and the number of input parameters criteria. Finally, the physical consistencies of developed models were investigated, the results of which demonstrated the reliability of kernel-based models in terms of delivering physics of overtopping phenomenon.


INTRODUCTION
Breakwaters are designed to protect harbours and infrastructures against wave attacks.Recently, due to the potential impact of climate change and sea-level rise, the safety and performance of breakwaters have become more important for coastal engineers.Excessive overtopping can also greatly threaten the stability of a breakwater or cause damage to nearby equipment or properties.Conventionally, the mean wave overtopping rate (q) as one of the important hydraulic responses needs to be limited.
During recent decades, several methods have been applied to predict wave overtopping phenomena at coastal structures including numerical, empirical, and soft computing methods.Numerical models (e.g.Losada et al. 2008;Neves et al. 2008;Ingram et al. 2009;Zhang et al. 2020) have been used for situations in which empirical test data are limited, or reliable results may not be obtained (van der Meer et al. 2017).Nevertheless, the application of numerical models is time-consuming and computationally expensive, especially when high accuracy is required.
The existing empirical formulae to predict mean overtopping rate (e.g.Owen 1980;van der Meer & Janssen 1995;EurOtop 2007;EurOtop 2018;Shaeri & Etemad-Shahidi 2021) have mostly been derived by regression analysis of small-scale experiments.The mentioned formulae correlate the dimensionless mean overtopping rate to dimensionless wave and structural parameters through physical arguments.However, poor predictions of mean overtopping rate at armoured structures using empirical formulae have been reported in the literature (e.g.Koosheh et al. 2020).Figure 1 displays the performances of Jafari & Etemad-Shahidi (2011) (hereafter JE), and EurOtop (2018) (mean approach: Equation (6.5)) (hereafter ET), formulae for simple sloped breakwaters.The dimensionless measured and predicted mean overtopping rates defined as q Ã ¼ q=(g Á H 3 m0,t ) 1=2 are shown in this figure .Here, q (m 3 /s/m) is the dimensional mean overtopping rate per unit width, g (m/s 2 ) represents the gravitational acceleration, and H m0,t (m) refers to the significant wave height at the toe of the structure.In this figure, the data of rubble mound structures with permeable core and simple slope without crown wall, including both head-on and oblique waves, have been selected from the EurOtop (2018) database.More details of the dataset used are given in the section of the used dataset.As seen, some predictions lie out of 10 times over/under estimation lines (dashed).The ET formula remarkably underestimates overtopping rates, which could be misleading for the design procedure.
In recent decades, several applications of soft computing techniques (e.g.artificial neural network (ANN)) for water engineering problems can be found (e.g.Ayoubloo et al. 2010;Kazeminezhad et al. 2010;Cini & Deo 2013;Ghaemi et al. 2013;Moghaddas et al. 2021).These techniques provide a quick and cost-effective solution that can be useful for complicated problems.Due to the complex nature of the overtopping process and the existing limitations of empirical formulae, some soft computing approaches, as alternative tools, have been implemented to predict the mean overtopping rate for a broad range of coastal structures.Among them all, initially developed within the CLASH (De Rouck & Geeraerts 2005) project and presented by EurOtop (2007), the ANN model is the most well-known soft computing tool applicable for a wide range of structures.In the training of this model, dimensional input parameters have been used which may not be appropriate for all cases with different scales.Recently, Zanuttigh et al. (2016) developed an improved neural network for a broad range of coastal structures, released in EurOtop (2018) and EurOtop (2018)-ANN, using dimensionless input parameters based on an extended database (Zanuttigh et al. 2014) mainly derived from the CLASH database.However, in terms of the accuracy of wave overtopping prediction, the recent version does not show a significantly better performance in comparison to CLASH-ANN (Formentin et al. 2017;Pillai et al. 2017).Besides the mentioned ANN applicable for the wide range of coastal structures, some other studies focused on specific types of structures using soft computing approaches.For example, Molines & Medina (2016) applied ANN to derive an explicit wave overtopping formula for breakwaters with crown wall.However, they achieved the same prediction accuracy compared to CLASH-ANN.The group method of data handling (GMDH) algorithm was used by Lee & Suh (2019) to develop wave overtopping formulae for inclined seawalls.It was shown that GMDH has a better performance compared to the empirical formulae, while its accuracy is similar to that of the EurOtop-ANN model.
This study aims to provide an overview of kernel-based methods, as soft computing tools, to investigate their capabilities for the prediction of mean wave overtopping rate at simple sloped breakwaters.Gaussian process regression (GPR) (Rasmussen & Williams 2006) and support vector regression (SVR) (Vapnik 1995) as kernel-based methods are flexible, as they can handle nonlinear problems.These methods have been recently used in different fields of engineering problems representing promising performance compared to the other soft computing methods (e.g.Ghazanfari-Hashemi et al. 2011;Grbićet al. 2013;Sun et al. 2014;Roushangar & Koosheh 2015;Roushangar et al. 2016;Najafzadeh & Oliveto 2020).To the best of the authors' knowledge, SVR and GPR methods have not been implemented for the prediction of the mean wave overtopping rate so far.To develop the models, an extensive dataset selected from the EurOtop (2018) database was used.Also, to evaluate the performances of the kernel-based methods, the results of the analysis were compared with those of ANN, as a benchmark tool for overtopping problems, as well as recently proposed empirical formulae.Moreover, the key variables of overtopping, representing structural and wave conditions features at rubble mound breakwaters, were determined based on sensitivity analysis.To evaluate the reliability of used kernel-based models, a physical consistency test between the key input parameter and mean overtopping rate was investigated.This evaluation was based on a parametric analysis between the most effective input parameter and output one to recognize the existing trend and compare it with the identified physical pattern of overtopping.

Support vector regression
Initially proposed by Vapnik (1995), the support vector machine (SVM) is commonly implemented for classification purposes in statistical learning problems.In contrast to the conventional neural networks in which empirical risk is minimized, the SVM approach minimizes an upper bound on the expected risk.This equips SVM with a greater ability to generalize, which is the goal of statistical learning (Gunn 1998).In the SVM formulation, the original training data are transformed into a higher dimensional space using nonlinear mapping functions to make data easily separable.Here, the purpose is to find an optimal hyperplane that separates the samples of two classes by considering the widest margin between them within the new space.SVR is an adaption of SVM which can be used as a predictive tool for regression problems.SVR tries to find as flat an optimal function as possible that has the most deviation from the training data while balancing model complexity and prediction error.Since SVR uses a symmetrical loss function with equal penalties for the high and low misestimation, a tube with the radius of 1 is formed around the estimation function.In this manner, the points outside of the tube are proportionally penalized to their distance regarding the function.The significant advantage of SVR is that its computational complexity is independent of the dimensionality of input spaces (Awad & Khanna 2015).
The general SVR formulation can be written as follows: where w represents the weight factor, w(x) is known as a nonlinear function in the feature of input x, and b is called the bias.By minimizing regularized risk function, these factors can be obtained as follows: The constant C is the cost factor for performing the trade-off between the weight factor and approximation error.The term w k k 2 represents the norm of the inner product of the w vector and its transposed form (w T :w).L 1 (t i , y i ) is the loss function in which y i is the predicted value and t i is the observed value in period i.To ensure the convergence of the optimization process within the finite number of steps, the loss function needs to be symmetric and convex.The simplest loss function is provided in Equation ( 3) where, for the data out of the tube, the loss will increase linearly: The slack variables (j, j Ã ) are defined to specify the upper and lower training errors subject to an error tolerance ε.Hence, Equation ( 2) can be re-written in the below form: Subject to: The dual Lagrangian form can then be obtained by applying Lagrangian multipliers (a i and a Ã i ) and Karush-Kuhn-Tucher condition: subject to: As the inner product of two vectors, x i and x j in the feature space w(x i ) and w(x j ), kernel function K(x i , x j ) transforms data into the new space.Several kernel functions, which have their own variable parameters to adjust the flexibility of the regression function, have been implemented in the literature.Obviously, the selection of kernel functions depends on the nature of the data and the problem.However, radial basis function (RBF) was selected for the present study which has been publicly accepted as a good kernel especially for cases without prior knowledge of the data characteristics (Roushangar & Koosheh 2015): where s stands for the kernel parameter.By calculating a i and a Ã i , the regression function is obtained as follows: The implementation of SVR entails the allocation of an optimization where several parameters such as 1 and kernel variables need to be adjusted.Hence, the SVR developed in the MATLAB software was used, and has been equipped with an automated optimization tool.

Gaussian process regression
As a generalization of Gaussian probability (GP) distribution, GPR is a nonparametric and probabilistic approach that can be applied for a variety of nonlinear problems (Rasmussen & Williams 2006).Based on the assumption that the learning sample follows the prior probabilities of GP, the corresponding posterior probability is calculated.GPR uses a kernel to define the covariance of a prior distribution over the target functions.Here, the covariance function plays an important role, as it encodes the prior assumptions about the underlying process that generated the data (Hu & Wang 2015).Assuming X Â Y represents the input and output domains from which n pairs (x i , y i ) are drawn independently and identically distributed.Let f represent an unknown function which maps X !Y. Hence, the regression functional form can be described as follows: where 1 is the Gaussian noise with variance s 2 n .The function f can be expressed as a Gaussian process (GP): GP is a distribution over functions defined by a mean and a covariance function.Conventionally, for the basic GPR, M(x) ¼ 0 is assumed to avoid expensive posterior computations (Aye & Heyns 2017).K (x, x 0 ) is the covariance (kernel) function by which the dependence between the function values at input points x and x 0 can be modelled.The expected smoothness and likely patterns in the used data should be considered for the selection of an appropriate kernel (Schulz et al. 2018).After testing different kernels to find a suitable one leading to accurate results, 'ARD (automatic relevance determination) Matern 5/2' kernel was selected for the GPR modelling, the expression of which is as follows: where where s f represents the single standard deviation, s m is the length scale for each predictor m (m ¼ 1, 2, …, d ).
By knowing the observation data D ¼ {X, Y}, the predictions for the new input X * should be obtained by drawing f * from the posterior distribution P( f/D).The distributions of f * and Y, which follow a normal distribution, can be written as follows: where I is an identity matrix and s 2 e stands for the noise level of observations.By imposing restrictions on the joint prior distribution, the posterior distribution over f * can be derived. where After determining mean and covariance functions, the corresponding hyper-parameters (u) are still unknown in the GPR formulation and need to be obtained from the training dataset.To estimate the parameters of the GPR model, maximum likelihood estimation is commonly used (Melo 2012).Based on Bayes' rule, the marginal likelihood can be written as follows: Maximizing the log-marginal likelihood gives: Eventually, the optimal hyper-parameters (u 0 ) can be calculated using the conjugate gradient algorithm (Rasmussen & Williams 2006): It should be mentioned that the automatic optimization of GPR was developed in the MATLAB software for this study.

Artificial neural network
ANN is a well-known artificial intelligence (AI) model which is based on the framework of the biological human nervous system.It can be used for finding a relationship between the inputs and output called regression.The regression is performed through configuring a flexible architecture of ANN, which consists of input, hidden, and output layers.These layers are connected by neurons where the number of neurons in the input and output layers corresponds to the number of used parameters as inputs and output, while the number of neurons in the hidden layer can be varied to find the best architecture for the problem.In the present study, the criteria used by Pourzangbar et al. (2017) for selecting the optimum number of neurons were applied.A three-layer feed-forward (FF) network with the Levenberg-Marquardt back-propagation (BP) training algorithm was utilized for the modelling process.In the feed-forward back-propagation (FFBP) neural network, the term FF illustrates how the neural network process works when the neurons are connected forward, while the BP term points out how the weights of different layers are adjusted in the training procedure using the output estimated by model (Zanganeh et al. 2016).The mathematical expression of this network is as follows: where y o is the output of neuron o, w io represents the weight vector, p i is the input vector for neuron i (i ¼ 1,…, n), b o represents the bias for neuron o, and f is the network transfer function.The tangent sigmoid function is selected, which can be defined as follows (Haykin 2009):

Used dataset
An extensive collected dataset of CLASH (De Rouck & Geeraerts 2005), updated and reorganized later within EurOtop (2018), was employed in the present study.The details of used dimensionless input parameters are given in Table 1.
It should be mentioned that these input parameters were selected based on the EurOtop (2018)-ANN (hereafter ET-ANN) inputs without the berm indicators.In Table 1, H m0,t and L mÀ1,0,t represent significant wave height and wavelength at the toe of the structure, respectively.Here, L mÀ1,0,t is 1.56T 2 mÀ1,0,t where T mÀ1,0,t is the spectral wave period at the toe of the structure.h is the water depth at the toe of structure, h t is the toe submergence depth, and B t is the toe width.The parameters R c , A c , and G c are the crest freeboard, crest height, and crest width, respectively.In addition, D stands for the average size of the structural elements in the run-up/down area.The symbols relevant to the overtopping are shown in Figure 2.
To refine the permeable simple sloped breakwaters, small-scale records (H mo,t 0:5) with the roughness factors g f , 0:6 (except g f ¼ 0.55) and mild slope (1.33 cota 2) were chosen.The factors RF and CF, varying from 1 to 4, represent the reliability and complexity level of given data.These factors somehow describe the measurements accuracy of each test or how well the geometry of a structure could be described by the geometrical parameters.Records with the lowest reliability (RF ¼ 4) and the highest level of complexity (CF ¼ 4) were ignored to consider only good quality data in the analysis (see also Etemad-Shahidi & Jafari 2014; Shaeri & Etemad-Shahidi 2021).Low overtopping rates (q 1 Â 10 À6 m 3 /s/m) were also removed, as they may be affected by measurement errors (e.g.Verhaeghe et al. 2008;Etemad-Shahidi et al. 2016).The records with the emerged crest (R c .0) and simple slopes (cota u ¼ cota d ), without berm (B ¼ 0), and the crown wall (R c A c ) were selected.In this way, a total number of 1,220 small-scale records remained for further analysis: 70% of these selected for the training, and the rest were used for the testing.It should be mentioned that the selected data for permeable simple sloped rubble mound structures include rock permeable straight slopes denoted by the label 'A', armour units straight slopes represented by the label 'C', and oblique wave attack with the label 'G' (see Zanuttigh et al. (2016) for details).Table 2 provides the details of used data.
As the perfect reproduction of wave and structure interaction is not possible in the small-scale physical models in a laboratory, the existence of scale effects is unavoidable.This is because the simultaneous fulfilment of scaling laws or similarity principle (i.e.Froude and Reynolds) is unachievable in the physical modelling.Therefore, a significant difference between field and model measurements of overtopping rate on rubble mound structures, especially for low rates, has been reported in EurOtop (2018).This difference is more considerable for the longer and flatter slopes where the zero overtopping is predicted in the laboratory for an overtopped prototype situation (Koosheh et al. 2021).Thus, using field measurements along with small-scale data to develop the models can lead to model confusion due to scale effects (e.g.Jafari & Etemad-Shahidi 2011; see Supplementary Appendix A for details).For this reason, the large-scale and field measurements were excluded, and only small-scale laboratory tests were selected to develop the models.To generalize the developed models for the large-scale and field measurements, the scale effect correction proposed by EurOtop (2018) for rubble mound structures can be applied.This correction is derived from the prototype and laboratory observations and suggests an increasing ratio (f q ! 1) for upscaled wave overtopping only when the discharge (q up ) is smaller than about 1 Â 10 À3 (m 3 /s/m).This adjustment factor also depends on the slope of the structure (see EurOtop (2018) for details).

Modelling overview
When sea waves run-up above the coastal structures and water overflows, wave overtopping occurs.The mean overtopping rate is defined as the average discharge per metre width of the structure and commonly expressed in m 3 /s/m.This parameter as the response of structure against incident wave depends on its geometrical features such as crest freeboard and seaward slope but also on local wave conditions such as wave height, wave period, and water depth.The dimensionless mean overtopping rate (q ) is usually correlated to the dimensionless form of the shown parameters to generalize the results.The mentioned dimensionless overtopping rate is employed based on the assumption of critical flow conditions on the crest of the structure (Altomare et al. 2020).The selection of input parameters is an important step for modelling.The appropriate combinations of the most effective parameters can enhance the performance of models.Table 3 shows the used input combinations to feed GPR, SVR, and ANN models.These input combinations are taken from ET-ANN (combinations a and b) and existing empirical formulae such as JE and ET (combinations c and d).Regarding the specific studied structure, the berm indicators were excluded for configuring the input combination a.The key point for the configuration of the used dimensionless input parameters in the combination a, as the most comprehensive one, is using the significant wave height (H m0,t ) to scale the structure heights (A c , h t , and R c ) as well as using wave length (L mÀ1, 0 ,t ) to scale structure widths (B t and G c ).The wave dissipation caused by breaking wave on the toe of structure (h t ) and possible wave overtopping (R c and A c ) can be considered using H m0,t as a height-scaling parameter.In addition, two key procedures such as breaking by steepness and shoaling described by S mÀ1, 0 ,t and h=L mÀ1, 0 ,t , respectively, were considered in this input combination.To achieve the modelling with as few as possible input parameters, the most influential parameters reported in the literature (Pillai et al. 2017) and extracted from sensitivity analysis were considered to configure input combination b.
The results of sensitivity analysis to identify the key governing parameters are shown in Table 4.According to this table, the prediction errors were estimated by eliminating each parameter one by one.If the elimination of a parameter does affect the results marginally, that parameter could be neglected for further modelling; otherwise, the parameter needs to be included.For example, by elimination of some input parameters such as A c /H m0,t , D/H m0,t , m, B t /L mÀ1 , 0 , t , h t /H m0,t , no significant change was observed in the accuracy metrics.On the other hand, by the elimination of parameters such as wave steepness (S mÀ1,0,t ) or oblique wave factor (g b ), centred pattern-root-mean-square error (c-RMSE) increases by 23 and 50%, respectively.This implies that the mentioned parameters should be used in the modelling as key parameters.Among all input parameters, the relative crest freeboard (R c /H m0,t ) is the most effective one as modelling excluding this parameter can lead to large prediction errors (Model 13: c-RMSE ¼ 0.54 and BIAS ¼ À0.06).
Input combinations c and d were selected to fairly compare soft computing models with empirical formulae (JE and ET, respectively) using the same input parameters.Here, the mathematical expressions of used formulae are given.For rubble mound structures, EurOtop (2018) proposed a simple exponential formula as follows:  where g b and g f are the reduction factors of wave obliquity and structure's surface roughness, respectively.For head-on waves, g b is assumed equal to one, and g b can be calculated as follows: Likewise, for smooth structures, the roughness factor (g f ) is equal to one and for other types can be determined based on used materials and the structure's permeability (EurOtop 2018).The roughness factor also needs to be modified when Ir mÀ1,0 .5:0 which increases linearly up to 1 at Ir mÀ1,0 ¼ 10 as below: where Ir mÀ1,0 demonstrates the Iribarren number based on T mÀ1,0,t defined as tan a= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi S mÀ1,0,t p where S mÀ1,0,t is the wave steepness defined by H m0,t /L mÀ1,0,t .It should be mentioned that the Iribarren number represents the wave breaking condition (Ir mÀ1,0 , 1:8) and non-breaking condition (Ir mÀ1,0 .1:8) (EurOtop 2018).Based on Equation ( 23), the roughness factor for the data with Ir mÀ1,0,t .5 (around 6% of all data) was modified for all input combinations applied in this study.
Jafari & Etemad-Shahidi ( 2011) suggested multi-conditional formulae for rubble mound structures using the CLASH database as follows: where H m0 is the significant wave height at the toe of structure and R Ã is defined as follows: As seen, all used input parameters are dimensionless to unify the whole dataset regardless of different model scales of tests.Moreover, employing dimensionless parameters improves the analysis, fitting, and interpretation of results as well as the generalization capacity of the developed model.It should be mentioned that the dimensionless wave overtopping rate in the form of q Ã ¼ q=(g Á H 3 m0,t ) 1=2 was selected as the output of the models.

Performance measures
The capabilities of the models were evaluated using the discrepancy ratio (DR) and accuracy metrics of modified Taylor's diagram (Elvidge et al. 2014), i.e. standard deviation (s), Pearson's correlation coefficient (R), c-RMSE, and BIAS.Taylor's diagram, initially proposed by Taylor (2001), is a mathematical diagram which graphically illustrates how realistic the models are and simplifies the comparison process.This is obtained by finding a geometric relation between standard deviation, c-RMSE, and Pearson's correlation coefficient.c-RMSE defined as mean-removed RMSE and represented by E can be calculated as follows: Indeed, c-RMSE can be equated with the standard deviation of the model error based on the mathematical operations applied to the above equation as follows: where q Ã m and q Ã p are dimensionless measured and predicted overtopping rates, n is the number of records, and X and Y are the average values of log q Ã m and logq Ã p , respectively.c-RMSE is always non-negative where a value of zero represents the perfect fit of prediction to the measured data.In addition, given that c-RMSE is a scale-dependent parameter and the target parameter in the present study is dimensionless (q Ã ), the c-RMSE value will be dimensionless.
The used metrics in Taylor's diagram are related by the following equation: where s p and s m represent the standard deviation of predicted and measured values, respectively, and R is the Pearson correlation coefficient.These parameters are expressed as follows: Standard deviation represents the amount of dispersion of a set of values in comparison to the average expected value.A low value of standard deviation points out the closeness of values to the mean of the set, while a high standard deviation illustrates that there are widespread values around the average value.R is a measure of linear correlation between two sets of data.The value of the correlation coefficient shows the strength of the relationship between the measured values and predicted ones.
The law of cosines (Pickover 2009) defined by c 2 ¼ a 2 þ b 2 À 2ab cos w (where a, b, and c represent triangle sides and w is the angle between the sides a and b) is a key to forming the geometrical connection between four quantities (s p , s m , R, and E) which underlie the Taylor diagram (Figure 3).Elvidge et al. (2014) proposed a modified Taylor's diagram in which BIAS as a complementary accuracy metric was added in contours.The BIAS can be calculated as follows: BIAS is a systematic error that is achieved from an estimation process not giving accurate results on average.The positive and negative BIAS values show the overestimation or underestimation of the modelling where for the best fit, the BIAS value will be equal to zero.Also, the mathematical formulation of another used metric, namely DR, can be expressed as follows: where DR i ¼ 1 means that there is an exact match between the measured and predicted values.

RESULTS AND DISCUSSION
The scatterplots of measured values against the predicted ones for all input combinations and the test data using all applied models (GPR, SVR, and ANN) and empirical formulae (JE and ET) are given in Figure 4. Here, the solid line shows the perfect agreement, and the dashed lines represent 10 and 0.1 times over-/underestimations, respectively.As can be seen, for all input combinations, predicted values by soft computing methods are less scattered than those by empirical formulae.The plot given for input combination b is slightly more scattered than that for input combination a.In the case of input combinations of a and b, all predicted values by the GPR model lie between over-/underestimation lines which show its higher accuracy.Figures 4(c) and 4(d) compare the prediction performance of soft computing models against empirical formulae using the same input parameters.As can be seen, the ET formula underestimates the low overtopping rates significantly, while slight underestimations and some overestimations were observed for the JE formula.
The accuracy metrics of GPR, SVR, ANN models, and empirical formulae for both all and test datasets are presented in Table 5.Here, lowercase letters a, b, c, and d correspond to the input combinations given in Table 3.The capital letters G, S, and N also denote GPR, SVR, and ANN methods, respectively.
According to Table 5, it could be inferred that models using input combinations of a and b (taken from EurOtop (2018)-ANN) provide better results in comparison to other models (using input of empirical formulae).As an example, both G(a) and G(b) models with c-RMSE ¼ 0.24 and 0.25 are more accurate compared with G(c) and G(d) models with c-RMSE ¼ 0.31 and 0.62, respectively.Models with input combination b, as the reduced form of input combination a, show an acceptable accuracy.For G models, almost similar accuracy metrics were obtained for input combinations a and b with a slight difference in the c-RMSE.Since the eliminated parameters in the input combination b are mostly the less influential ones in the overtopping process (at least for study case), the results could be expected.The input combination b consists of all conventionally known effective parameters namely R c =H m0,t , S mÀ1,0,t , tan a, cos b, and g f .Besides the mentioned parameters, the parameter (h=L mÀ1,0,t ) has an effective contribution in the modelling which is supported by the findings of Cheon & Suh (2016) and Pillai et al. (2017).In addition, since breakwaters have a permeable crest, the position of the box collecting overtopping on it can be a significant factor for the measurement ( Jafari & Etemad-Shahidi 2011).This issue can be considered by using the relative crest width (G c =L mÀ1,0,t ) by the increase of which the overtopping is expected to decrease as water percolates into the permeable surface (Pillai et al. 2017).The comparison of models with input combination b demonstrates the highest accuracy for model G (c-RMSE ¼ 0.25).However, the good performance of model S with BIAS ¼ 0 should not be overlooked.The improvements of the model G(b) compared to N(b), JE, and ET formulae can be accounted for 22, 60, and 79% in terms of c-RMSE value.
In general, soft computing models show better performance than empirical formulae.This superiority can be seen even in the cases where the same input parameters are used.For example, the models G(c), S(c), and N(c) have the c-RMSE values of 0.31, 0.37, and 0.4, while JE gives c-RMSE ¼ 0.63.Comparing the accuracy metrics of the most accurate soft computing   method for the input combination c, G(c) with the JE formula representing the improvements of about 51 and 85% in terms of c-RMSE and BIAS, respectively.Developed models using input combination d (taken from the ET formula) result in a better accuracy metrics compared to ET.For example, c-RMSE has been reduced from 1.20 (ET) to 0.62 (G(d)).However, the comparison of Figure 4(d) and other panels demonstrates the unsuitability of this input combination, as it lacks some key parameters compared to other input combinations.It should also be mentioned that for input combinations c and d, GPR models outperform SVR and ANN models.Besides the better performance of JE than ET formula, soft computing models fed by input combination of c, taken from the JE formula, perform better compared with the input combination in which ET parameters have been used (input combination d).This can be explained by overlooking some parameters in the input combination d such as S mÀ1,0,t , tan a, and G c =H mÀ1,0,t .Overall, considering both accuracy and the number of input parameters, input combination b can be introduced as the optimal one.In addition, comparing the results of the analysis of all applied models for the test dataset represents the good capability of kernel-based models compared to ANN and empirical formulae.It can also be seen that the GPR model performs slightly better than the SVR model.
Figure 5 shows the used modified Taylor diagram which graphically summarizes the results of the analysis.The advantage of using modified Taylor's diagram is plotting all meaningful accuracy metrics in one diagram, which can be more helpful for the comparison of the models.In this diagram, (1) the azimuthal angle shows the correlation coefficient, (2) the radial distance represents the standard deviation of models, (3) the blue-coloured dashed line shows the standard deviation of measured values, (4) the red-coloured circular dashed lines with the centre of measured standard deviation inside the diagram display the c-RMSE, (5) BIAS is presented by contours.Predicted patterns, which are in good agreement with the observations, will lie nearest the point marked 'Measured' on the horizontal axis.This point is the representation of the highest correlation (R ¼ 1) and lowset c-RMSE (¼0) and a similar dispersion pattern of predicted values compared to the measured ones (s p ¼ s m ¼ 0:72).
As can be seen, each diagram (a, b, c, and d) refers to the used input combinations in this study.According to diagram (a), it is evident that the GPR and SVR models agree best with observations by the lowest c-RMSE and highest correlation coefficient (R), respectively.However, the spatial variability of the ANN model is lower compared to the others, as it is close to the blue-dashed line.Also, the ANN and SVR models in yellow show few overestimations, while the GPR model in green indicates the underestimation.Attending to diagram (b), the relative merit of kernel-based models, especially GPR, can be inferred from the location of the models.As seen, the GPR model is in the nearest spot to the measured point on the horizontal axis in the case of either radial distance (close standard deviation to the measured values) or azimuthal angle (lowest c-RMSE and highest correlation coefficient).However, according to the BIAS contour, SVR and ANN models show slight overestimations, while GPR is underestimated.Given that the differences of BIAS values for the models regardless of their over-/underestimations are negligible, the GPR model can be considered as the most accurate one for the input combination b.In diagrams (c) and (d), the applied soft computing models are compared with the empirical formulae.Based on diagram (c), the location of the JE formula, standing further than those of the applied soft computing models with respect to the optimal point on the horizontal axis, confirms its unreliable estimation.Also, the superiority of used soft computing models can be obviously seen in diagram (d) where the point representing the ET formula is in the furthest location from the measured point on the horizontal axis.
Overall, Figure 5 demonstrates the higher capability of kernel-based models in comparison to ANN and empirical formulae in the prediction of wave overtopping at simple sloped breakwaters.G(a) and G(b), as the most accurate ones, are the nearest to the measured point on the horizontal axis.
The distribution of log(DR) for the predicted values using the input combination b was further analysed (Figure 6).The narrower distribution of predictions using all soft computing methods especially GPR (À1 , log DR , 1) in comparison with JE (À2 , log DR , 2) and ET (less than À2 , log DR , 2) formulae can be observed.Also, as seen all models (except the JE formula) have negative skewness indicating underestimation, while the JE formula with positive skewness illustrates overestimation.
A good model is a model with errors that are independent of the input parameters (Sahay & Dutta 2009) and has no systematic error.Hence, the variation of DR as a function of the relative crest freeboard (R c =H m0,t ) is shown in Figure 7 for the input combination b as optimal one using different models (JE, ET, ANN, SVR, and GPR).As shown, it can be concluded that the relative crest freeboard (R c =H m0,t ) has been used appropriately in all models except the ET formula where a systematic error is observed.DR values of applied models (GPR, SVR, and ANN) are less sensitive to the change of relative crest freeboard as well as more symmetric than those of the ET formula.In addition, comparing the dispersion of the data points around DR ¼ 1 for all models indicates the good capability of soft computing methods especially the GPR for the prediction of overtopping rate.
To investigate the physical consistency of developed GPR and SVR models, a parametric analysis showing the relationship between the most important input parameter (R c /H m0, t ), mentioned in the literature and extracted from sensitivity analysis, and overtopping rate was conducted.Figure 8 represents the predicted values of dimensionless mean overtopping rate against relative crest freeboard (R c /H m0, t ) for GPR and SVR using the input combination b as the optimal one.As seen, a decreasing

Figure 3 |
Figure 3 | Geometrical relationship between metrics plotted on Taylor's diagram based on the cosines law.

Figure 4 |
Figure 4 | Measured versus predicted overtopping rates using JE and ET formulae as well as ANN (N), SVR (S), and GPR (G) models for the input combinations a (a), b (b), c (c), and d (d); test data.

Figure 5 |
Figure 5 | Modified Taylor's diagrams for the input combinations a (a), b (b), c (c), and d (d); test data.Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/hydro.2021.046.

Table 1 |
The range and definitions of dimensionless used input parameters Figure 2 | Schematic diagram of simple sloped breakwater.

Table 2 |
Details of used data extracted from the EurOtop (2018) database

Table 3 |
Used input combinations for developing the GPR, SVR, and ANN models

Table 5 |
Accuracy metrics of different developed models (GPR, SVR, and ANN) and empirical formulae for test (all) data