## Abstract

The intricate calculation of bed sediment load (BSL), which is influenced by hydraulic, hydrological, and sedimentary factors, is vital for informed decision-making in water resource management. Machine learning models, which are gaining popularity due to their accessibility and ability to reveal complex relationships, play a significant role in tackling these challenges. The efficacy of gene expression programming (GEP) models, support vector machines (SVMs), multi-layer perceptron (MLP), and multivariate adaptive regression splines (MARS) has been assessed through measured data of number 540 obtained from six rivers, namely Oak Creek, Nahal Yatir, Sagehen Creek, Elbow River, Jacoby River, and Goodwin Creek from 1954 to 1992. The assessment of model performance has been conducted utilizing root mean square error (RMSE), *R*^{2}, Nash–Sutcliffe coefficient (NSE), and developed discrepancy ratio (DDR) as indices. Following data normalization within the range of 0–1, the data models underwent training and testing processes with a partition ratio of 80% for training and 20% for testing. Four dimensionless parameters, denoted as Fr = *U*/√*gy*, *U*/*U**, *S*_{e}, and ω = *τ**U*/*γ*_{s}√*gyD*_{s}^{3}, were employed as inputs in the models. The outcomes indicate that they exhibit superior performance compared to other methods, as evidenced by the following metrics in predicting BSL during the test stage: RMSE = 1.4088, NSE = 0.73054, *R*^{2} = 0.8729, and maximum *Q*_{DDR(max)} = 1.9564.

## HIGHLIGHTS

The present work proposes BSL prediction using field data from six rivers.

Four machine learning models including SVM, GEP, MLP and MARS are employed to forecast BSL.

A variety of inputs is implemented for each MLMs to find optimum output.

Statistical indices are used to opt for superior MLMs for BSL prediction.

## INTRODUCTION

When conditions surpass the movement threshold, sediment particles are transported through erodible channels in both suspended and bedload forms. The natural process of sediment production within rivers leads to alterations in river morphology. Some objectionable impacts of sediment aggradation and degradation include a decrease in the useful storage capacity of a dam, damage to waterworks and installed equipment near rivers, perturbation in hydraulic systems’ performance and reduced efficiency, endangerment of in-river hydraulic structures, and undesirable environmental effects. These impacts make it crucial to accurately predict sediment load in river engineering. Directly measuring bed sediment load (BSL) by establishing gauging stations at all desired locations for a long period of time is impractical and economically unviable. While BSL accounts for 5–25% of the total sediment load, discrepancies in fundamental factors such as climate features, geological structure, and topography significantly reduce the accuracy of this estimation. Researchers have developed empirical, semi-empirical, and analytical equations as the primary classification of mathematical estimators to predict BSL (bed shear stress) based on river and sediment features. Mathematical estimators incorporate one or more prominent factors, such as shear tension, energy slope, water discharge, sediment characteristics, flow velocity, and so on. The main category of BSL equations has been presented as follows (Graf 1971; Gomez & Church 1989):

Du Boys’ (1879) type: This was developed based on shear tension, such as Meyer-Peter & Mueller (1948).

Schoklitsch's (1950) type: This was developed on flow discharge basic such as Schulich series equations.

Einstein's (1950) type: This was developed on statistical aspects.

Bagnold's (1966) type: This was developed on flow power aspects such as Yalin's (1963) equation.

The complexity and uncertainty of the parameters involved in sediment transport result in imprecise outputs from theoretical equations, leading to reported discrepancies of up to 100% in some research studies. In recent decades, there has been a substantial increase in interest regarding the application of multivariate linear models (MLMs). The MLMs have the ability to simulate intricate phenomena without necessitating prior knowledge of aggradation and degradation processes. These models can uncover concealed, imperceptible, and nonlinear connections between input and output variables in order to accurately predict the desired outcome. Adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN), genetic algorithms, wavelet kernel extreme learning machine (WKELM), granular computing, particle swarm optimization (PSO), support vector machine (SVM), random forest (RF), chi-squared automatic interaction detection (CHAID), support vector regression (SVR), decision tree (DT), gene expression programming (GEP), classification and regression tree (CART), and extreme learning machine (ELM) are examples of MLMs. Table 1 presents a brief list of MLM applications.

The aforementioned mathematical estimators reveal a substantial discrepancy between the estimated and target values. This can be attributed to the intricate nature of sediment transport phenomena. The principal objective of this investigation is to assess the capabilities and potential of four established data-driven models, namely, GEP, SVM, multi-layer perceptron (MLP)-ANN, and multivariate adaptive regression splines (MARS), in predicting the intricate BSL. A noteworthy advantage and innovation of this research lie in the utilization of field data acquired from six rivers, addressing the limitations associated with laboratory data and providing a direct representation of sediment load under diverse hydrodynamic conditions. Furthermore, the uniqueness of this forthcoming paper is underscored by the novelty that the dataset employed herein has not been utilized in prior research endeavors.

## MATERIALS AND METHODS

### Framework of the prediction process

#### Study area and data used

The present paper covers six river datasets, i.e. Oak Creek (Millhous 1973), Nahal Yatir (Reid & Laronne 1995), Sagehen Creek (Andrews 1994), Elbow River (Hollingshead 1971), Jacoby River (Lisle 1989), and Goodwin Creek (Kuhnle 1992). A brief hydrodynamic and sediment characteristics of each river have been illustrated in Table 2.

Property . | Index . | River's name . | |||||
---|---|---|---|---|---|---|---|

Oak Creek . | Nahal Yatir . | Sagehen Creek . | Elbow River . | Jacoby River . | Goodwin Creek . | ||

Number of sample | – | 66 | 74 | 55 | 19 | 100 | 358 |

Bed material median size (mm) | D_{50} | 54 | 6 | 58 | 76 | 21 | 12 |

Channel width (m) | B | 3.6 | 3.5 | 4.85 | 43.5 | 17.2 | 13 |

Flow depth (m) | Max | 0.590 | 0.600 | 0.620 | 0.840 | 0.595 | 1.213 |

Min | 0.120 | 0.110 | 0.350 | 0.630 | 0.133 | 0.366 | |

Mean | 0.277 | 0.254 | 0.503 | 0.747 | 0.358 | 0.864 | |

STDEV | 0.128 | 0.142 | 0.097 | 0.065 | 0.104 | 0.213 | |

Flow discharge (m^{3}/s) | Max | 3.398 | 5.000 | 3.120 | 95.160 | 16.082 | 21.580 |

Min | 0.153 | 0.300 | 0.990 | 39.497 | 0.603 | 1.370 | |

Mean | 0.915 | 1.716 | 2.130 | 67.766 | 5.871 | 10.783 | |

STDEV | 0.831 | 1.349 | 0.781 | 17.321 | 3.417 | 5.369 | |

Hydraulic radius (m) | Max | 0.445 | 0.470 | 0.418 | 0.811 | 0.557 | 1.263 |

Min | 0.110 | 0.100 | 0.275 | 0.610 | 0.131 | 0.505 | |

Mean | 0.234 | 0.221 | 0.359 | 0.723 | 0.343 | 0.920 | |

STDEV | 0.093 | 0.109 | 0.051 | 0.062 | 0.096 | 0.201 | |

Energy slope (–) | Max | 0.010800 | 0.010100 | 0.012000 | 0.00745 | 0.006317 | 0.003400 |

Min | 0.008300 | 0.007000 | 0.009500 | 0.00745 | 0.006286 | 0.001300 | |

Mean | 0.009585 | 0.008900 | 0.010304 | 0.00745 | 0.006300 | 0.002374 | |

STDEV | 0.000522 | 0.000757 | 0.000748 | 0.00000 | 0.000007 | 0.000443 | |

Shear stress (N/m^{2}) | Max | 43.219 | 36.900 | 43.480 | 63.850 | 34.400 | 38.320 |

Min | 8.934 | 8.700 | 29.670 | 47.880 | 8.090 | 8.590 | |

Mean | 22.369 | 19.872 | 35.970 | 56.809 | 21.201 | 22.003 | |

STDEV | 9.691 | 8.872 | 3.192 | 4.932 | 5.927 | 8.197 | |

Sediment discharge (kg/m·s) | Max | 24.333 | 7.050 | 34.900 | 0.924 | 0.402 | 2.985 |

Min | 0.000 | 0.200 | 0.499 | 0.039 | 0.000 | 0.000 | |

Mean | 1.696 | 2.205 | 9.899 | 0.422 | 0.029 | 0.208 | |

STDEV | 4.567 | 1.841 | 7.646 | 0.267 | 0.061 | 0.490 |

Property . | Index . | River's name . | |||||
---|---|---|---|---|---|---|---|

Oak Creek . | Nahal Yatir . | Sagehen Creek . | Elbow River . | Jacoby River . | Goodwin Creek . | ||

Number of sample | – | 66 | 74 | 55 | 19 | 100 | 358 |

Bed material median size (mm) | D_{50} | 54 | 6 | 58 | 76 | 21 | 12 |

Channel width (m) | B | 3.6 | 3.5 | 4.85 | 43.5 | 17.2 | 13 |

Flow depth (m) | Max | 0.590 | 0.600 | 0.620 | 0.840 | 0.595 | 1.213 |

Min | 0.120 | 0.110 | 0.350 | 0.630 | 0.133 | 0.366 | |

Mean | 0.277 | 0.254 | 0.503 | 0.747 | 0.358 | 0.864 | |

STDEV | 0.128 | 0.142 | 0.097 | 0.065 | 0.104 | 0.213 | |

Flow discharge (m^{3}/s) | Max | 3.398 | 5.000 | 3.120 | 95.160 | 16.082 | 21.580 |

Min | 0.153 | 0.300 | 0.990 | 39.497 | 0.603 | 1.370 | |

Mean | 0.915 | 1.716 | 2.130 | 67.766 | 5.871 | 10.783 | |

STDEV | 0.831 | 1.349 | 0.781 | 17.321 | 3.417 | 5.369 | |

Hydraulic radius (m) | Max | 0.445 | 0.470 | 0.418 | 0.811 | 0.557 | 1.263 |

Min | 0.110 | 0.100 | 0.275 | 0.610 | 0.131 | 0.505 | |

Mean | 0.234 | 0.221 | 0.359 | 0.723 | 0.343 | 0.920 | |

STDEV | 0.093 | 0.109 | 0.051 | 0.062 | 0.096 | 0.201 | |

Energy slope (–) | Max | 0.010800 | 0.010100 | 0.012000 | 0.00745 | 0.006317 | 0.003400 |

Min | 0.008300 | 0.007000 | 0.009500 | 0.00745 | 0.006286 | 0.001300 | |

Mean | 0.009585 | 0.008900 | 0.010304 | 0.00745 | 0.006300 | 0.002374 | |

STDEV | 0.000522 | 0.000757 | 0.000748 | 0.00000 | 0.000007 | 0.000443 | |

Shear stress (N/m^{2}) | Max | 43.219 | 36.900 | 43.480 | 63.850 | 34.400 | 38.320 |

Min | 8.934 | 8.700 | 29.670 | 47.880 | 8.090 | 8.590 | |

Mean | 22.369 | 19.872 | 35.970 | 56.809 | 21.201 | 22.003 | |

STDEV | 9.691 | 8.872 | 3.192 | 4.932 | 5.927 | 8.197 | |

Sediment discharge (kg/m·s) | Max | 24.333 | 7.050 | 34.900 | 0.924 | 0.402 | 2.985 |

Min | 0.000 | 0.200 | 0.499 | 0.039 | 0.000 | 0.000 | |

Mean | 1.696 | 2.205 | 9.899 | 0.422 | 0.029 | 0.208 | |

STDEV | 4.567 | 1.841 | 7.646 | 0.267 | 0.061 | 0.490 |

^{2}. The riverbed primarily consists of gravel material. There is an armor layer present on the bed, and the top layer of particles is uniformly distributed. Below the armor layer, smaller materials are found that are well-graded in size. The Nahal Yatir River, an ephemeral river in Israel, encompasses a drainage area of 19 km

^{2}. The channel at the Yatir site appears to be linear or straight. The banks exhibit nearly vertical slopes with coarse-grained bars and elongated flat areas. Sagehen Creek is a minor tributary that originates from the eastern side of the Sierra Nevada in California and flows into the Truckee River. It covers an area of 27.2 km

^{2}. The Elbow River, which is situated in Alberta, Canada, is a perennial river that may be small in size but carries substantial flow discharge due to snowmelt. It originates from a catchment area spanning 1,238 km

^{2}. The watershed of the river is located in the rain shadow of the Rocky Mountains, making it one of the driest regions in southern Canada. The Jacoby River, which is located in northern California, is a perennial river with a sinuous channel pattern. The area of its catchment is 36.3 km

^{2}. Goodwin Creek, which is located in north central Mississippi, is a gravel-bed stream with a relatively steep slope. It drains a basin area of 17.9 km

^{2}and exhibits weakly bimodal sediment characteristics. Figure 2 shows the catchment of the six rivers utilized in the present research.

#### Preprocessing of data

By employing the SPSS software, the randomness of the collected data was assessed using the median values presented in Table 3. The obtained *P*-values indicate that the measured parameters exhibit a random nature with a significance level of 95%. In terms of MLMs, the dataset is divided into two phases: the training and the testing. In all models, 80% (540 instances) of the data were allocated for the training stage, while 20% (132 instances) were reserved for the testing stage. It should be mentioned that prior to implementing MLMs, all measured data underwent normalization between a range of 0–1.

. | Depth . | Flow discharge . | Velocity . | Shear velocity . | Hydraulic radius . | Energy slope . | Shear tension . | Sediment discharge . |
---|---|---|---|---|---|---|---|---|

Test value | 0.60 | 6.11250 | 0.94849 | 0.14745 | 0.63050 | 0.00310 | 21.75150 | 0.02670 |

Cases < test value | 335 | 336 | 335 | 336 | 336 | 323 | 336 | 336 |

Cases ≥ test value | 337 | 336 | 337 | 336 | 336 | 349 | 336 | 336 |

Total cases | 672 | 672 | 672 | 672 | 672 | 672 | 672 | 672 |

Number of runs | 26 | 26 | 49 | 51 | 20 | 30 | 51 | 61 |

Z | −24.012 | −24.012 | −22.236 | −22.082 | −24.475 | −23.700 | −22.082 | −21.310 |

Asymp. Sig. (two-tailed) | 0.454 | 0.423 | 0.401 | 0.467 | 0.411 | 0.487 | 0.438 | 0.409 |

. | Depth . | Flow discharge . | Velocity . | Shear velocity . | Hydraulic radius . | Energy slope . | Shear tension . | Sediment discharge . |
---|---|---|---|---|---|---|---|---|

Test value | 0.60 | 6.11250 | 0.94849 | 0.14745 | 0.63050 | 0.00310 | 21.75150 | 0.02670 |

Cases < test value | 335 | 336 | 335 | 336 | 336 | 323 | 336 | 336 |

Cases ≥ test value | 337 | 336 | 337 | 336 | 336 | 349 | 336 | 336 |

Total cases | 672 | 672 | 672 | 672 | 672 | 672 | 672 | 672 |

Number of runs | 26 | 26 | 49 | 51 | 20 | 30 | 51 | 61 |

Z | −24.012 | −24.012 | −22.236 | −22.082 | −24.475 | −23.700 | −22.082 | −21.310 |

Asymp. Sig. (two-tailed) | 0.454 | 0.423 | 0.401 | 0.467 | 0.411 | 0.487 | 0.438 | 0.409 |

Kernel name . | Definition . |
---|---|

Linear | |

Polynomial | |

Radial basis function or RBF | |

Exponential radial basis function or ERBF |

Kernel name . | Definition . |
---|---|

Linear | |

Polynomial | |

Radial basis function or RBF | |

Exponential radial basis function or ERBF |

#### Overview of SVM

*et al.*2003). The main goal of the SVM is to find a function

*f*(

*x*) that satisfies the following fitting regression equation:where

**W**is the coefficient vector,

*b*is constant, and Φ(

*x*) is the kernel function. The values of

**W**,

*b*and Φ(

*x*) are determined through a convex optimization approach and the minimization of structural principle. Four kernel functions utilized in the SVM are represented in Table 4. A trial-and-error process is used to opt for an optimal function to achieve maximum fitness between simulated and target datasets (Majedi-Asl

*et al.*2020).

#### Overview of GEP

*et al.*2020).

The algorithm begins with a random generation of the initial population from a certain of individuals as the first stage of the GEP. Then, the chromosomes are expressed as tree expression after ensuring compliance with the target values based on a fitting function. Then, individuals are opted based on their performance to rebuild with modifications and improvements, leaving behind a new generation with new characteristics. The generation has been considered as going through a cycle of the same developing process, which includes gene expression, adaptation to a new environment, selection based on fitness, and reproduction with improvement. The process repeats for a certain number of generations up to reaching an acceptable and adequate solution. During the reproduction process, the genome is copied and transferred to the next generation (Majedi-Asl *et al.* 2020).

#### Overview of MLP-ANN

*W*and

_{ij}*W*affect the distributed values of all nodes in the hidden layer between input and hidden nodes. They play an interconnecting link role between neurons in successive layers (Aichouri

_{jk}*et al.*2015).

#### Overview of MARS

*et al.*2018). The following is the general formulation of MARS:in which

*y*is the response variable, is the constant term,

*c*is the coefficient vector of the non-constant BFs, is the truncated power BF, and is the index of the independent input variable of the

_{i}*i*th term and the

*j*th product, and

*K*is the order of interaction limit. The definition of spline

_{i}*b*is as follows:where

_{ji}*t*is the loop of the spline (Mallick

_{ji}*et al.*2020). Forward and backward phases are used in this algorithm. During the first stepwise stage, namely the forward phase, all possible BFs are generated. Through the second phase, namely the backward phase, overfitting BFs are determined using the generalized cross-validation (GCV) criterion and eliminated to improve forecasting (Yilmaz

*et al.*2018). The GCV expression is as follows:where

*N*is the number of data and is the penalty functions described as follows:where

*B*is the number of basis terms and is the penalty term in the respective functions (Mallick

*et al.*2020).

#### Performance assessment

*R*

^{2}), and Nash–Sutcliffe coefficient (NSE) were utilized to evaluate the conformity between simulated and target datasets with the following expressions:here

*X*and

*Y*are measured and predicted values, respectively, and

*N*is the total number of the dataset. Prescribed assessment criteria illustrate the mean error values of the implemented models. To compensate this deficiency, the developed discrepancy ratio (DDR) has been developed and proposed by Noori

*et al.*(2010). The values of DDR are calculated by the following equation:

Gaussian function of DDR values in a standard normal distribution gives a better visualization to provide a better judgment. *X*_{DDR}–*Z*_{DDR} is the final scatterplot of this procedure where *X*_{DDR} is the normalized value of the DDR using Gaussian function and *Z*_{DDR} is the standardized value of variables. High values of *X*_{DDR} and more tendencies in error distribution to the centerline denote a more précised model (Noori *et al.* 2010).

## RESULTS AND DISCUSSION

Utilizing measured sediment and hydraulic parameters, four dimensionless parameters were derived as follows: , , *S*_{e} and , where *U* represents the mean flow velocity in (m/s), *g* denotes the gravitational acceleration in (m/s^{2}), *y* signifies the flow depth in (m), *U** represents the shear velocity in (m/s), *τ* represents the shear tension in (N/m^{2}), *γ*_{s} represents the sediment specific weight in (N/m^{3}), and *D*_{s} represents the mean sediment particle size in (m). The models employed for MLMs are enumerated in Table 5. In total, 15 distinct models were scrutinized for their efficacy in predicting BSL. The synopsis of performance evaluation results derived from simulations employing four machine learning models is delineated in Table 6. The table provides an overview of the performance evaluation metrics for the models during both the training and testing phases.

Model's name . | Involved parameters . | |||
---|---|---|---|---|

. | . | S_{e}
. | . | |

M1 | ✓ | ✓ | ✓ | ✓ |

M2 | ✓ | ✓ | ✓ | – |

M3 | ✓ | ✓ | – | – |

M4 | ✓ | – | – | – |

M5 | ✓ | – | ✓ | ✓ |

M6 | ✓ | ✓ | – | ✓ |

M7 | ✓ | – | – | ✓ |

M8 | ✓ | – | ✓ | – |

M9 | – | ✓ | ✓ | ✓ |

M10 | – | – | ✓ | ✓ |

M11 | – | ✓ | – | ✓ |

M12 | – | ✓ | ✓ | – |

M13 | – | – | – | ✓ |

M14 | – | – | ✓ | – |

M15 | – | ✓ | – | – |

Model's name . | Involved parameters . | |||
---|---|---|---|---|

. | . | S_{e}
. | . | |

M1 | ✓ | ✓ | ✓ | ✓ |

M2 | ✓ | ✓ | ✓ | – |

M3 | ✓ | ✓ | – | – |

M4 | ✓ | – | – | – |

M5 | ✓ | – | ✓ | ✓ |

M6 | ✓ | ✓ | – | ✓ |

M7 | ✓ | – | – | ✓ |

M8 | ✓ | – | ✓ | – |

M9 | – | ✓ | ✓ | ✓ |

M10 | – | – | ✓ | ✓ |

M11 | – | ✓ | – | ✓ |

M12 | – | ✓ | ✓ | – |

M13 | – | – | – | ✓ |

M14 | – | – | ✓ | – |

M15 | – | ✓ | – | – |

MLM's name . | Excellent model . | Statistical indices . | |||||||
---|---|---|---|---|---|---|---|---|---|

Training phase . | Testing phase . | ||||||||

RMSE . | NSE . | R^{2}
. | Q_{(DDR(max))}
. | RMSE . | NSE . | R^{2}
. | Q_{(DDR(max))}
. | ||

SVM | M1 | 11.2132 | 0.6683 | 0.7686 | 0.7354 | 1.5620 | 0.66872 | 0.7645 | 1.1812 |

GEP | M5 | 8.4411 | 0.8121 | 0.8641 | 1.4466 | 1.4088 | 0.73054 | 0.8729 | 1.9564 |

MLP | M1 | 10.7438 | 0.6955 | 0.7401 | 0.5360 | 1.6771 | 0.61809 | 0.7593 | 1.6707 |

MARS | M1 | 15.1822 | 0.3920 | 0.5386 | 0.6134 | 1.9796 | 0.46793 | 0.7508 | 0.7343 |

MLM's name . | Excellent model . | Statistical indices . | |||||||
---|---|---|---|---|---|---|---|---|---|

Training phase . | Testing phase . | ||||||||

RMSE . | NSE . | R^{2}
. | Q_{(DDR(max))}
. | RMSE . | NSE . | R^{2}
. | Q_{(DDR(max))}
. | ||

SVM | M1 | 11.2132 | 0.6683 | 0.7686 | 0.7354 | 1.5620 | 0.66872 | 0.7645 | 1.1812 |

GEP | M5 | 8.4411 | 0.8121 | 0.8641 | 1.4466 | 1.4088 | 0.73054 | 0.8729 | 1.9564 |

MLP | M1 | 10.7438 | 0.6955 | 0.7401 | 0.5360 | 1.6771 | 0.61809 | 0.7593 | 1.6707 |

MARS | M1 | 15.1822 | 0.3920 | 0.5386 | 0.6134 | 1.9796 | 0.46793 | 0.7508 | 0.7343 |

*γ*and

*C*, are determined to be 38 and 150, respectively. The performance evaluation indices (RMSE, NSE,

*R*

^{2}, and

*Q*

_{DDR(max)}) exhibit values of (11.2132, 0.6683, 0.7686, and 0.7354) during the training period and (1.562, 0.66872, 0.7654, and 1.1812) in the testing phase. A comparative analysis of these indicator values suggests that the model training process has been executed accurately. A scatter plot of the SVM function is presented in Figure 6. These points fall close to the ideal line. Some minor deviations are obvious for the training phase.

The second row of Table 6 illustrates the optimal performance evaluation indicators for the GEP, which are derived from the regulatory parameter values of the most effective GEP model as detailed in Table 7. The M5 combinations exhibit superior simulation performance for the BSL within the GEP model. The statistical indices (RMSE, NSE, *R*^{2}, and *Q*_{(DDRmax)}) achieved during the training phase are 8.4411, 0.8121, 0.8641, and 1.4466, and during the testing phase, these values are 1.4088, 0.73054, 0.8729, and 1.9564. These metrics affirm the accuracy and goodness of fit of the M5 model in simulating the BSL during both the training and testing phases. The scatter plot of the GEP model's performance is depicted in Figure 7, wherein the points cluster closely to the ideal line, indicating a favorable fit. Nonetheless, some minor deviations are observable during both the training and testing stages.

GEP parameters . | Description . |
---|---|

Population size | 110 |

Number of genes | 3 |

Gene head length | 7 |

Gene tail length | 15 |

Mutation rate | 0.068 |

Inversion rate | 0.1 |

Gene transposition rate | 0.1 |

One-point recombination rate | 0.3 |

Two-point recombination rate | 0.3 |

Gene recombination rate | 0.1 |

Fitness function | Root mean square error |

Linking function | + |

Mathematical operations | +, −, /, *, Exp, Inv, Power, Sqrt, Ln |

GEP parameters . | Description . |
---|---|

Population size | 110 |

Number of genes | 3 |

Gene head length | 7 |

Gene tail length | 15 |

Mutation rate | 0.068 |

Inversion rate | 0.1 |

Gene transposition rate | 0.1 |

One-point recombination rate | 0.3 |

Two-point recombination rate | 0.3 |

Gene recombination rate | 0.1 |

Fitness function | Root mean square error |

Linking function | + |

Mathematical operations | +, −, /, *, Exp, Inv, Power, Sqrt, Ln |

*R*

^{2}, and

*Q*

_{DDR(max)}are reported as 10.438, 0.6955, 0.7401, and 0.5361, respectively. In the testing phase, these values are observed to be 1.6771, 0.61809, 0.7593, and 1.6707, respectively. These values have been derived based on the optimum architecture or topology of the MLP network of three hidden layers with details, as presented in Table 8. A scatter plot illustrating the performance of MLP is presented in Figure 8. The plot suggests that the MLP demonstrated superior performance during the training phase in comparison to the testing phase, as evidenced by a more pronounced deviation from the ideal line.

Neurons number | Number of neurons for the first hidden layer | 3 |

Number of neurons for the second hidden layer | 1 | |

Number of neurons for the third hidden layer | 1 | |

Activation function | Activation function for the first hidden layer | Hyperbolic tangent |

Activation function for the second hidden layer | Hyperbolic tangent | |

Activation function for the third hidden layer | Linear | |

Activation function for the output layer | Linear | |

Learning rate | 0.5 | |

Momentum coefficient | 0.65 | |

Number of epochs | 125,000 |

Neurons number | Number of neurons for the first hidden layer | 3 |

Number of neurons for the second hidden layer | 1 | |

Number of neurons for the third hidden layer | 1 | |

Activation function | Activation function for the first hidden layer | Hyperbolic tangent |

Activation function for the second hidden layer | Hyperbolic tangent | |

Activation function for the third hidden layer | Linear | |

Activation function for the output layer | Linear | |

Learning rate | 0.5 | |

Momentum coefficient | 0.65 | |

Number of epochs | 125,000 |

*R*

^{2}, and

*Q*

_{(DDRmax)}, during the training and testing phases, are reported as 15.1822, 0.3920, 0.5386, and 0.6134 and 1.9796, 0.46793, 0.7508, and 0.7343, respectively. These metrics quantify the performance of the M1 model in terms of accuracy, goodness of fit, and discrepancy ratio assessment during both the training and testing phases. The utilization of draft number 21 for the MARS model employed the BF algorithm, resulting in the optimization of the model with BF number 5 and a GCV value of 0.968. The graphical representation in Figure 9 depicts the distribution of simulated and measured data for the M1 combination in the MARS. It is discernible that a notable deviation from the ideal line is apparent in both the training and testing phases. This observation suggests a significant disparity between the simulated and measured values, indicating potential limitations or areas for improvement in the performance of this particular model.

*Q*

_{(DDRmax)}values of 1.4466 and 1.9564 during the training and testing stages, respectively. Conversely, the MARS demonstrates the least favorable performance across all models during both phases, with the SVM and the MLP trailing the GEP in the performance hierarchy.

## CONCLUSION

The accurate prediction of complex BSL holds significant importance in the realm of water resource management. This study endeavors to predict bed load with precision through the application of four machine learning methods, namely the SVM, the GEP, the MLP, and the MARS, utilizing a dataset derived from field measurements in six rivers. Various input scenarios were formulated based on four dimensionless parameters: Fr, , *S*_{e} and . The following conclusions can be drawn from the findings presented in this study:

It was observed that all the dimensional parameters are generally effective on BSL.

Among the machine learning methods, the GEP model offered the best accuracy in predicting BSL and it was followed by the SVM and ANN, respectively.

Compared to the SVM, ANN, and MARS models, the GEP improved prediction accuracy by 9.8, 16, and 28.8% with respect to RMSE in the test stage, respectively.

The outcomes of the study revealed the complexity of BSL prediction, which has several influencing parameters. In this study, only four dimensional parameters from six rivers were employed as inputs, and more data involving more input parameters are needed so as to improve and generalize the implemented MLMs. In a broader context, it can be asserted that the outcomes of this research align with findings from analogous studies, collectively affirming the capacity of machine learning models in effectively predicting sediment load.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

**142**, 349–367

*Ph.D. Dissertation*