## Abstract

The aim of this study is to model a relationship between the amount of the suspended sediment load by considering the physiographic characteristics of the Lake Urmia watershed. For this purpose, the information from different stations was used to develop the sediment estimation models. Ten physiographic characteristics were used as input parameters in the simulation process. The M5 model tree was used to select the most important features. The results showed that the four factors of annual discharge, average annual rainfall, form factor and the average elevation of the watershed were the most important parameters, and the multilinear regression models were created based on these factors. Furthermore, it was concluded that the annual discharge was the most influential parameter. Then, the stations were divided into two homogeneous classes based on the selected features. To improve the efficiency of the M5 model, the non-stationary rainfall and runoff signals were decomposed into sub-signals by the wavelet transform (WT). By this technique, the available trends of the main raw signals were eliminated. Finally, the models were developed by multilinear regressions. The model using all four factors had the best performance (DC = 0.93, RMSE = 0.03, ME = 0.05 and RE = 0.15).

## HIGHLIGHTS

This study links the physiographic characteristics of the watershed to M5 sediment estimation.

M5 model tree selects the most important features of the watershed.

Wavelet transform decomposes the raw main signals into several sub-signals and improve the model performance.

## INTRODUCTION

Soil erosion is a process in which soil particles are separated from their substrate and transported to another place by a transfer factor (Verheijen *et al.* 2009). Materials transported by erosive factors such as water, wind and ice, which settle in layers on the surface of the earth's crust, are referred to as sediment (Toy *et al.* 2002; Yang *et al.* 2022). The total sediments that are suspended, sliding or rolling by the streamflow, is called the sediment load. Basically, the sediment load of the entire watershed is transferred as three general forms: wash load, suspended load and bed load (Turowski *et al.* 2010; Yang *et al.* 2022). Suspended sediment load can have negative and undesirable effects on catchment areas and watersheds such as erosion increment, water quality reduction, infrastructure damage, flooding, water storage capacity reduction, etc. (Bhattacharya & Dutta 2013).

Physiographic characteristics of a watershed are referred to the set of features whose values are relatively constant for each watershed over time and indicate the appearance and morphology of the watershed. Benefiting the physiographic characteristics and the climatic conditions of the studied area can provide a relatively accurate picture of the quantitative and qualitative performances of the hydrological system of the watershed (Azizi & Nejatian 2022). The most important physiographic characteristics of a watershed are included as area, perimeter, length, main waterway, slope, form, elevation, topography and time of concentration (Ziegler *et al.* 2014; Eslami *et al.* 2022).

Estimating the suspended sediment load amount is considered very significant in a watershed study. However, direct measurement of the sediment is a very time-consuming and expensive action. So, it is easier to find the relationship between the physiographic and environmental characteristics of the watershed and obtain the suspended sediment load amount. Nowadays, the black-box models, which obtain comparable results without any cost or direct measurements, have become a popular tool (Nourani *et al.* 2019a). Multiple regression methods, cluster analysis and factor analysis are among the common methods in modeling the relationship between sediment rate and watershed characteristics. The multilinear regression model is a common method whose purpose is to express the dependent variable in the form of a mathematical function of the independent variable(s) (Nourani *et al.* 2019c).

In multiple regression, as the number of variables increases, the model becomes increasingly complex and the potential for errors to arise also increases. Therefore, implementing a method for selecting the most significant features would be highly beneficial in improving the efficiency and accuracy of the model. (Ares *et al.* 2016). Feature selection is one of the common methods, which tries to reduce the number of model input variables to a small number of important variables. In this method, a large number of variables can be reduced into a few factors and in this way, a summary of the main data can be prepared (Grabczewski & Jankowski 2005).

Among the various algorithms of a decision tree, the M5 model tree is a subset that can detect useful information from a dataset and select the important features and parameters. Through the M5 algorithm of a decision tree, the features with high scores, which were located in the upper nodes of tree, were selected in all the grid points (Quinlan 1992; Nourani *et al.* 2019d). The M5 model tree assigns a multivariate linear regression model instead of fitting a constant value to the leaf node, so it is analogous to piecewise linear functions (Khosravi *et al.* 2022). The benefits of the M5 model tree can be listed as (Quinlan 1992; Nourani *et al.* 2019c; Sayed *et al.* 2023):

acceptable efficiency in dealing with large multi-dimensional problems and missing data;

requires no trial and error;

being more understandable and much simpler in the training phase than non-linear methods.

In recent years, experimental research works have been carried out in the field of studying and modeling the relationship between the watershed characteristics and the amount of suspended sediment load in the catchments. Kumar & Das (2000) utilized the multivariate regression model to estimate the daily sediment of the Ramganga River in India. It was observed that only four parameters, including the rainfall intensity at the event occurrence time and the 2 days before, the discharge 2 days before and the erosion of the previous day, were significant among all 17 variables that were introduced to the regression model step by step. Sarangi & Bhattacharya (2005) used a series of regression relationships to estimate the sediment load of watersheds in Quebec, Canada. The results showed that benefitting the physical parameters of the watershed increased the accuracy of the model so that the coefficient of determination of the model increased dramatically. Zhu *et al.* (2007) modeled suspended sediment using artificial neural networks and multivariate regression methods in China. The rainfall, temperature, rainfall intensity and discharge characteristics were used to estimate the amount of sediment in their study. The results indicate that the artificial neural network method has relatively better efficiency than the multiple regression method. Ares *et al.* (2016) experimented and analyzed the sediment concentration control factors for the Pampas region of Argentina. In this study, several rainfall events were simulated by multiple regression method and the obtained results showed that the developed linear model is able to explain 85% of sediment concentration changes. Lamb & Toniolo (2016) quantified the suspended load of three rivers in the northern region of Alaska. The study area was monitored for 3 years and suspended load sampling was done at different depths of the river, and between the amount of suspended load and the parameters of the basin, modeling was done by regression method. The results showed that in all three rivers, rainfall parameters and the shape of the basin had a great effect on the amount of suspended load in the basin.

Owing to the multi-resolution nature of original raw suspended sediment load signals, the efficiency of models to simulate the highly non-stationary, autoregressive and seasonal suspended sediment load signals declined meaningfully. Under these conditions, benefitting an appropriate data preprocessing method, like wavelet transform (WT), maybe a suitable solution to prevail these issues. The significant temporal information and hidden frequencies of the main raw suspended sediment load signals may be extracted by WT. Hence, numerous studies have examined the capability of WT in decomposing seasonal raw suspended sediment load signals time series into sub-time series at numerous temporal scales (levels) to extract inherent properties (Shiri & Kisi 2010; Belayneh *et al.* 2014; Nourani *et al.* 2019a, 2019b).

According to the mentioned studies, the importance of estimating the amount of suspended sediment load using the physiographic characteristics of the watershed is undeniable (Yang *et al.* 2022). However, according to our knowledge in order to determine the significant and influencing parameters on the amount of sediment modeling in the studied area, a comprehensive study has not been implemented in this regard in Lake Urmia yet. In this study, it is tried to link the hydrological, environmental and physiographic characteristics of the watershed to the decomposed selected time series to model the suspended sediment load.

## METHODOLOGY

### Case study

^{2}, equivalent to 21.3% of the total area of Iran. 9,000 km are involved by flat and plain areas, 35,200 km

^{2}are included in the mountainous areas, and 7,800 km

^{2}are made up of Lake Urmia and marginal marshes. In terms of the territory of this watershed, the Urmia Lake consists of the central, western and southwestern parts of the East Azerbaijan province (a relatively large part of the province is approximately 19,000 km

^{2}), about half of the West Azerbaijan province (the southern half of the province is approximately 21,500 km

^{2}), a part of the northern part of Kurdistan province (about 5,000 km

^{2}) and a very limited part of Zanjan province. The main source of the water supply is precipitation caused by humid air currents that enter the region from the west and the Mediterranean. The rivers of the watershed originate from high mountains that are covered with snow most days of the year and have permanent springs and flowing water flows permanently and seasonally.

The climate of the Lake Urmia watershed is often influenced by its altitude. This catchment has a semi-arid continental climate and the Mediterranean rainfall regime is the dominant climatic regime of this basin. Its average annual rainfall is 398 mm. The rainiest season is winter and early spring, so that about 75% of the total rainfall occurs in the months of December–May. The regime of rivers is caused by precipitation and snow melting. The watershed temperature varies between −20 °C and 0 in winter and up to 40 °C in summer. The Urmia Lake watershed is divided into eight sub-basins. Zarineh Rood-Simineh Rood sub-basin is the largest one. Other sub-basins of are Aji Chai, Nazlochai, Mahabad Chai, Zulachai, Shabestar, Sufi Chai and Tasouj.

The important rivers of this watershed are the Talcheh River, Zarineh River, Simine River, Barandoz Chai, Rozeh Chai and Sufi Chai. The river bed is generally steep and consists of coarse-grained materials that are transported downstream by the flood stream. It should be mentioned that the Lake Urmia watershed has small but fertile plains such as the Sarab plain, Selmas plain, Sufian plain, Tabriz plain, Naqdeh plain, Miandoab plain, etc. Lake Urmia is the center of accumulation and discharge of surface water in this watershed, which was investigated at the local and national levels due to its importance.

It should be noticed that the Lake Urmia watershed is one of the most important catchments of Iran in terms of water, energy and agricultural products. Due to the tension increments between Iran and the United States of America (USA) regarding various challenges, including Iran's nuclear issues, some countries (especially the USA) have imposed severe economic sanctions against Iran (Koruzhde 2022; Koruzhde & Popova 2022). This matter has caused a large section of the Iranian people, especially the poor stratum of the society such as the villagers and farmers, to attempt to increase the amount of their cultivation and change their cultivation pattern toward more profitable and better-selling products to improve their livelihoods, which are highly dependent on the water. This factor has resulted in the indiscriminate exploitation of groundwater and surface water to increase personal profits and improve the livelihoods. The mentioned subject is one of the most important influencing factors that resulted in intensifying the drying process of Lake Urmia. This watershed collects runoff from vast areas of different provinces and after providing water to the plains joins Lake Urmia. The lack of sufficient number of sediment measurement stations and the limitation of the number of statistical years, as well as the low number of flood sampling during river flooding shows the importance of the present study in modeling suspended sediment estimation in the Lake Urmia watershed.

*C*(Equation (1))) and form factor (

*FF*(Equation (2))) to imply the physiographic characteristics of the watershed in the estimation of the suspended sediment load (Thakkar & Dhiman 2007).

Sub-bn . | A^{1} (km^{2})
. | P^{2}
. | AMP^{3}
. | AMR^{4} (m^{3}/s)
. | MSSL^{5} (Ton/day)
. | ME^{6}
. | MS^{7}
. | T_{c}^{8}
. | C^{9}
. | FF^{10}
. |
---|---|---|---|---|---|---|---|---|---|---|

Zarineh Rood-Simineh Rood | 11,840 | 435 | 370.60 | 54.77 | 701.69 | 1,500 | 3 | 63.2 | 1.14 | 1.81 |

Aji Chai | 9,200 | 383 | 355 | 40.52 | 396 | 1,320 | 2 | 58.1 | 1.12 | 1.52 |

Nazlochai | 2,030 | 180 | 340 | 6.52 | 300 | 3,000 | 2 | 14.6 | 1.13 | 0.66 |

Mahabad Chai | 811 | 113 | 318 | 2.50 | 129.26 | 1,779 | 2.5 | 25.5 | 1.11 | 0.89 |

Zulachai | 960 | 123 | 335 | 2.43 | 312 | 1,400 | 2.1 | 29.4 | 1.10 | 0.91 |

Shabestar | 1,293 | 143 | 285.60 | 2.50 | 276 | 1,600 | 2.3 | 13.1 | 1.05 | 1.43 |

Sofi Chai | 1,800 | 170 | 340.02 | 3.68 | 298 | 2,450 | 3.5 | 10.7 | 1.12 | 0.48 |

Tasuj | 30 | 21 | 260 | 2.21 | 240 | 2,200 | 2.9 | 2.5 | 1.07 | 0.59 |

Sub-bn . | A^{1} (km^{2})
. | P^{2}
. | AMP^{3}
. | AMR^{4} (m^{3}/s)
. | MSSL^{5} (Ton/day)
. | ME^{6}
. | MS^{7}
. | T_{c}^{8}
. | C^{9}
. | FF^{10}
. |
---|---|---|---|---|---|---|---|---|---|---|

Zarineh Rood-Simineh Rood | 11,840 | 435 | 370.60 | 54.77 | 701.69 | 1,500 | 3 | 63.2 | 1.14 | 1.81 |

Aji Chai | 9,200 | 383 | 355 | 40.52 | 396 | 1,320 | 2 | 58.1 | 1.12 | 1.52 |

Nazlochai | 2,030 | 180 | 340 | 6.52 | 300 | 3,000 | 2 | 14.6 | 1.13 | 0.66 |

Mahabad Chai | 811 | 113 | 318 | 2.50 | 129.26 | 1,779 | 2.5 | 25.5 | 1.11 | 0.89 |

Zulachai | 960 | 123 | 335 | 2.43 | 312 | 1,400 | 2.1 | 29.4 | 1.10 | 0.91 |

Shabestar | 1,293 | 143 | 285.60 | 2.50 | 276 | 1,600 | 2.3 | 13.1 | 1.05 | 1.43 |

Sofi Chai | 1,800 | 170 | 340.02 | 3.68 | 298 | 2,450 | 3.5 | 10.7 | 1.12 | 0.48 |

Tasuj | 30 | 21 | 260 | 2.21 | 240 | 2,200 | 2.9 | 2.5 | 1.07 | 0.59 |

*Note:* In order to avoid disordering the table, the parameters were briefly mentioned in the table: (1) A: area; (2) P: perimeter; (3) AMP: annual mean precipitation; (4) AMR: annual mean runoff; (5) MSSL: mean suspended sediment load; (6) ME: mean elevation; (7) MS: mean slope; (8) T_{c}: time of concentration; (9) C: compactness; (10) FF: form factor.

In the above equations, *C* and *FF* are the compactness and form factor of the watershed (dimensionless), *A* represents the area of the watershed (km^{2}), *P* shows the perimeter of the catchment (km), *L* is the length of the watershed (km). The average slope of waterways and catchments was extracted using a DEM map in an ArcMap environment (ArcMap is the main component of Esri's ArcGIS suite of geospatial processing programs, and is used for geospatial data) and then the longitudinal profile was drawn in Excel environment and the weighted slope of the main waterway was calculated. Among the climatic parameters, the average annual precipitation (rainfall) and the average rainfall of the rainy and flood months of the year, including December, January, February, March, April and May, were considered. First, the average monthly rainfall data and the elevation of each station were collected and based on the kriging method as the most appropriate geostatistical method, the spatial distribution of rainfall curves were extracted (Table 1). Then, the average annual and monthly rainfall of 20 years was extracted for each of the sub-basins.

### Proposed methodology

The proposed methodology consists of four steps. At first, the physiographic characteristics such as area, perimeter and slope are collected (Step 1). In the second stage, the most important variables are selected by the feature selection property of the M5 model tree algorithm (Step 2). In the third stage, the WT decomposes the main signals into several sub-signals. Each of the obtained sub-signals depicts a specific feature. There are several functions that can decompose the main signals regarding to the relation specifies a wavelet function. Based on previous studies, it can be claimed that the db4 mother wavelet is more suitable than other wavelet functions to simulate the annual discharge and SSL (Nourani *et al.* 2019c). In the fourth stage, the selected variables are classified into homogeneous classes to optimize the structure of the model (Step 4). At last, the M5 model tree tries to fit a linear regression between independent and dependent variables (Step 5).

### Multilinear regression model construction

In this study, the physiographic characteristics information of eight sub-basins was used to estimate the suspended sediment load. The most important variables which affected the amount of suspended sediment load were identified by the M5 model tree. Unlike the other black-box algorithms, the M5 model tree can diagnose and select the most important variables among a set of variables. Then, the Lake Urmia watershed was divided into homogeneous areas by model tree classification. Finally, based on the surveyed studies, the suspended sediment load was modeled using the multilinear regression for each homogeneous area. The multilinear regression model has been widely used due to its simplicity in the implementation and interpretation of the hydrological processes, especially in estimating the suspended sediment load amount, while benefiting the physiographic characteristics of the watershed (Gellis 2013; Ziegler *et al.* 2014; Nourani *et al.* 2019c). Also, the validation was carried out by the statistics of the two remaining sub-basins, and the efficiency of the model was evaluated.

In order to improve the accuracy of the multilinear regression model, the WT was employed to eliminate the available trend in the main raw time series (rainfall and runoff). Then, the M5 model tree classified the dataset samples and finally, a suitable regression model was presented for each class. WEKA software was used to check the relations and present a tree model. WEKA software provides the implementation of different learning algorithms, and with this software, you can easily apply different algorithms to the dataset. In line with the explanation of the tools used in the current study, a brief explanation of the WT as a preprocessing tool and the M5 model tree of the decision tree has been discussed.

### Wavelet transform

WT is one of the most efficient and effective mathematical transforms in signal processing. Mathematical transformations are used to obtain additional information from the signal, which cannot be obtained from the raw main signal itself. Similar to the Fourier analysis, which is one of the most famous mathematical transformations, wavelet analysis deals with the expansion of functions, but this expansion is done in terms of wavelets. WT is an assumed specific function with zero mean and unlike the trigonometric polynomials, it is checked locally in space and it is provided a closer relation between some functions and their coefficients and more numerical stability in calculations. Any application based on the fast Fourier transform can be formulated using wavelets and obtain more local spatial (or temporal) information (Nourani *et al.* 2019a, 2019b; Lakshmi *et al.* 2022; Anupong *et al.* 2023).

*ψ(x)*is a WT function if and only if its Fourier transform

*ψ(ω)*satisfies the following condition (Nourani

*et al.*2019a, 2019b; Lakshmi

*et al.*2022):

*ψ(x)*. The above relationship can be considered equivalent to the Equation (4) (Nourani

*et al.*2019a, 2019b; Lakshmi

*et al.*2022; Anupong

*et al.*2023):

*ψ(x)*is considered as the mother WT function, which is used by the two mathematical operations of transfer and scaling to change the size and location during the analyzed signal, and finally, the WT coefficients at any point of the signal

*(b)*and each value of the scale

*(a)*can be calculated as (Nourani

*et al.*2019a, 2019b; Lakshmi

*et al.*2022; Anupong

*et al.*2023):

### M5 model tree

*et al.*2019b).

*et al.*2019d):where

*T*is a set of input samples in each node.

*T*represents a subset of samples that have the

_{i}*ith*potential test result.

*Sd*indicates the standard deviation.

*i*and

*N*show the data number (Figure 3).

### Efficiency criteria

*DC*), root mean square error (

*RMSE*), mean error (

*ME*) and relative error (

*RE*) as (Xianzhao & Jiazhu 2008; Nourani

*et al.*2019c):

## RESULTS AND DISCUSSION

At first, 10 independent variables, including physiographic, climatic and hydrological characteristics, which are considered effective in suspended sediment load production, were identified and extracted.

After selecting the effective variables by the M5 model tree in order to reduce the calculation value and avoid the error growth, it can be seen that the four factors of annual runoff (discharge), average annual rainfall, FF and the average elevation of the watershed are able to explain the variances. Table 2 demonstrates the cumulative variance percentage and the specific values of the influential factors.

Component . | Total . | Variance (%)
. | Cumulative (%)
. |
---|---|---|---|

Annual discharge | 8.11 | 52.77 | 52.77 |

Annual mean rainfall | 2.59 | 16.49 | 69.26 |

Form factor | 1.80 | 12.01 | 81.27 |

Mean elevation | 1.33 | 9.01 | 90.28 |

Component . | Total . | Variance (%)
. | Cumulative (%)
. |
---|---|---|---|

Annual discharge | 8.11 | 52.77 | 52.77 |

Annual mean rainfall | 2.59 | 16.49 | 69.26 |

Form factor | 1.80 | 12.01 | 81.27 |

Mean elevation | 1.33 | 9.01 | 90.28 |

The average annual discharge has the highest weight on the first factor and explains more than half of the change in the main data. The second factor is the average annual rainfall, which has the highest weight. The third factor is the FF and finally, the fourth factor is the average elevation of the watershed. Totally, these four factors contain 90.28% of the variance or change in the original data and are selected for classification.

As mentioned earlier, based on the results of the previous studies, it can be claimed that the db4 wavelet function benefits the adequate essential features to decompose the runoff time series. Also, there are several jumps in the runoff time series due to the sudden start and cessation of rainfall over the catchment. Consequently, because of the formation of db4 wavelet that is similar to the runoff time series, it could capture the signal characteristic, especially peak points, efficiently and led to comparatively good results. Because of the proportional relation between the amount of runoff and SSL, these signals were supposed to have the same seasonality level and both time series were decomposed by the same wavelet function. In some previous studies, also db4 mother wavelet showed reliable outcomes to decompose runoff and SSL time series (Nourani *et al.* 2019c).

In general, relatively larger sub-basins were placed in homogeneous class 1. Homogeneous class 2 included smaller sub-basins. The multilinear regression models of the suspended sediment load estimation of the homogeneous classes 1 and 2 are presented in Table 3 for the annual scale by using four factors of the annual discharge, average elevation, FF and average rainfall of the watershed.

Eq. No. . | Independent variable . | Eq. . | R^{2}
. |
---|---|---|---|

Homogeneous Class 1 | |||

(11) | Q^{1}, P^{2}, F^{3} and H^{4} | 0.93 | |

(12) | Q, P and F | 0.90 | |

(13) | Q, P and H | 0.89 | |

(14) | P, F and H | 0.88 | |

Homogeneous Class 2 | |||

(15) | Q and P | 0.61 | |

(16) | Q | 0.58 |

Eq. No. . | Independent variable . | Eq. . | R^{2}
. |
---|---|---|---|

Homogeneous Class 1 | |||

(11) | Q^{1}, P^{2}, F^{3} and H^{4} | 0.93 | |

(12) | Q, P and F | 0.90 | |

(13) | Q, P and H | 0.89 | |

(14) | P, F and H | 0.88 | |

Homogeneous Class 2 | |||

(15) | Q and P | 0.61 | |

(16) | Q | 0.58 |

*Note:* (1) *Q*: annual discharge; (2) *P*: annual mean rainfall; (3) *F*: form factor; (4) *H*: mean elevation.

All the homogeneous class 1 models have a higher *R*^{2} than the homogeneous class 2 models. Due to the more involved parameters in class 1, it is expected *R*^{2} would be increased, although the amount of error may also increase to some extent. As it is shown in Table 3, the annual discharge variable is more significant in both classes and in class 1, where all the effective factors are involved, *R*^{2} is higher and closer to 1.

The WT was taken to the input signals and then, the M5 model tree was employed to the sub-signals obtained through the WT decomposition. The performance of the wavelet-M5 model tree is presented in Table 4 for calibration and validation steps. As it can be observed in Tables 4 and 5, the utilization of WT can improve the efficiency of the multilinear regression model significantly due to its ability in overcoming the non-stationary precipitation and streamflow signals.

Sub-basin . | DC . | RMSE . | ME . | RE . | ||||
---|---|---|---|---|---|---|---|---|

Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | |

Zarineh Rood | 0.93 | 0.78 | 0.03 | 0.01 | 0.05 | 0.07 | 0.15 | 0.18 |

Aji Chai | 0.91 | 0.79 | 0.02 | 0.01 | −0.12 | −0.13 | 0.12 | 0.14 |

Nazlochai | 0.90 | 0.76 | 0.04 | 0.03 | 0.02 | −0.05 | 0.26 | 0.19 |

Mahabad Chai | 0.88 | 0.71 | 0.05 | 0.04 | 0.07 | 0.11 | 0.32 | 0.29 |

Zulachai | 0.87 | 0.70 | 0.02 | 0.03 | 0.10 | 0.09 | 0.11 | 0.19 |

Shabestar | 0.88 | 0.69 | 0.06 | 0.07 | −0.11 | −0.15 | 0.23 | 0.31 |

Sofi Chai | 0.85 | 0.68 | 0.04 | 0.05 | 0.04 | 0.05 | 0.16 | 0.24 |

Tasuj | 0.84 | 0.61 | 0.06 | 0.07 | −0.19 | 0.17 | 0.29 | 0.30 |

Sub-basin . | DC . | RMSE . | ME . | RE . | ||||
---|---|---|---|---|---|---|---|---|

Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | |

Zarineh Rood | 0.93 | 0.78 | 0.03 | 0.01 | 0.05 | 0.07 | 0.15 | 0.18 |

Aji Chai | 0.91 | 0.79 | 0.02 | 0.01 | −0.12 | −0.13 | 0.12 | 0.14 |

Nazlochai | 0.90 | 0.76 | 0.04 | 0.03 | 0.02 | −0.05 | 0.26 | 0.19 |

Mahabad Chai | 0.88 | 0.71 | 0.05 | 0.04 | 0.07 | 0.11 | 0.32 | 0.29 |

Zulachai | 0.87 | 0.70 | 0.02 | 0.03 | 0.10 | 0.09 | 0.11 | 0.19 |

Shabestar | 0.88 | 0.69 | 0.06 | 0.07 | −0.11 | −0.15 | 0.23 | 0.31 |

Sofi Chai | 0.85 | 0.68 | 0.04 | 0.05 | 0.04 | 0.05 | 0.16 | 0.24 |

Tasuj | 0.84 | 0.61 | 0.06 | 0.07 | −0.19 | 0.17 | 0.29 | 0.30 |

*Note:* RMSE is normalized.

Sub-basin . | DC . | RMSE . | ME . | RE . | ||||
---|---|---|---|---|---|---|---|---|

Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | |

Zarineh Rood | 0.72 | 0.63 | 0.05 | 0.03 | 0.07 | 0.17 | 0.25 | 0.28 |

Aji Chai | 0.60 | 0.49 | 0.04 | 0.04 | −0.15 | −0.16 | 0.32 | 0.24 |

Nazlochai | 0.59 | 0.42 | 0.06 | 0.06 | 0.09 | −0.09 | 0.16 | 0.19 |

Mahabad Chai | 0.67 | 0.49 | 0.07 | 0.05 | 0.17 | 0.11 | 0.22 | 0.19 |

Zulachai | 0.76 | 0.61 | 0.04 | 0.06 | 0.10 | 0.13 | 0.31 | 0.29 |

Shabestar | 0.67 | 0.57 | 0.08 | 0.09 | −0.12 | −0.16 | 0.33 | 0.21 |

Sofi Chai | 0.64 | 0.55 | 0.06 | 0.07 | 0.14 | 0.06 | 0.46 | 0.34 |

Tasuj | 0.73 | 0.60 | 0.09 | 0.09 | −0.12 | 0.13 | 0.19 | 0.35 |

Sub-basin . | DC . | RMSE . | ME . | RE . | ||||
---|---|---|---|---|---|---|---|---|

Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | |

Zarineh Rood | 0.72 | 0.63 | 0.05 | 0.03 | 0.07 | 0.17 | 0.25 | 0.28 |

Aji Chai | 0.60 | 0.49 | 0.04 | 0.04 | −0.15 | −0.16 | 0.32 | 0.24 |

Nazlochai | 0.59 | 0.42 | 0.06 | 0.06 | 0.09 | −0.09 | 0.16 | 0.19 |

Mahabad Chai | 0.67 | 0.49 | 0.07 | 0.05 | 0.17 | 0.11 | 0.22 | 0.19 |

Zulachai | 0.76 | 0.61 | 0.04 | 0.06 | 0.10 | 0.13 | 0.31 | 0.29 |

Shabestar | 0.67 | 0.57 | 0.08 | 0.09 | −0.12 | −0.16 | 0.33 | 0.21 |

Sofi Chai | 0.64 | 0.55 | 0.06 | 0.07 | 0.14 | 0.06 | 0.46 | 0.34 |

Tasuj | 0.73 | 0.60 | 0.09 | 0.09 | −0.12 | 0.13 | 0.19 | 0.35 |

*Note:* RMSE is normalized.

The proximity of the DC in the verification and training phases is another point of view. Wavelet-M5 is not dependent on the number of data and is suitable for the processes in that a lot of historical data are not available (Table 4).

Figure 4 shows that the wavelet-M5 model tree overcomes the non-stationary features of the suspended sediment load signals, because of benefits of the WT as a preprocessing tool. Also, the WT can handle the signal features, especially the peak values, and acquire comparatively high efficiency according to its structure.

## CONCLUSION

Regional analysis of the rivers’ suspended sediment load and its relation to the characteristics of the watersheds is considered significant in estimating the amount of erosion and sedimentation, especially in arid and semi-arid regions. It is possible to estimate the correct amount of the suspended sediment load by exploring and modeling the relationship between the physiographic and environmental characteristics of the watershed. The purpose of the current study was to model the relationship between the environmental characteristics of the Lake Urmia watershed and the amount of the suspended sediment load using a multilinear regression model. The obtained results indicated that the four factors of the annual discharge, average elevation, FF and average rainfall of the watershed were the most important factors in estimating the amount of the suspended sediment load based on the feature selection of the M5 model tree (see Table 2). The results also showed that the multilinear regression model obtained from all four factors has the highest *R*^{2} (see Tables 3 and 4). Furthermore, benefiting the WT as a preprocessing tool resulted in acceptable criteria efficiency (see Table 4). It can be concluded that the combined use of feature selection and multilinear regression model has a suitable and acceptable performance in estimating the suspended sediment load. It is recommended to considered more characteristics of the watershed in the future studies. Also, it is suggested to compare the performance of the model with other black-box and physical-based models.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.