## Abstract

Measured records of suspended sediment concentrations are vital information to support water management activities. However, these measured time series are often incomplete and, as such, are not suitable for some analyses. This paper sets out the options for modelling suspended sediment concentrations to determine them in periods when measurements were not performed. The Danube River profile in Bratislava was selected as the case study. Regression using least absolute selection and shrinkage operator, support vector regression and deep learning neural network are compared in the paper to solve this task using various data sources. The results obtained show a significant increase in the precision of modelling suspended sediment concentrations over the standard method, which is a rating curve. Several variables were used to establish the suspended sediment concentration, because the same data as in this study may not be available everywhere. In particular, the use of climatic (precipitation and temperature) and hydrological inputs (flows) is assessed in order to promote the more general benefit of work. In the article, the authors propose an original method of modification of input climate data, which significantly increases the accuracy of modelling. The authors demonstrate that when using proposed methodology, the use of climate data, which are usually better available than hydrological data, resulted in a comparable degree of precision to standard modelling based on river flow data.

## HIGHLIGHTS

The aim is imputation of missing data.

Original method based solely on meteorological data is proposed.

Gridded meteorological databases, which have not yet been tested for this purpose, are used.

Machine learning is enhanced by so-called feature engineering.

Using solely meteorological data in modelling suspended sediment concentration has comparable accuracy as when river flows are used.

### Graphical Abstract

## INTRODUCTION

The term ‘suspended sediments’ means various mineral and organic particles that are carried by flowing water. The origin of these particles is in the erosive activities of water, wind or other forces in a watershed or by erosion directly in a riverbed. Particles carried by water are categorized either as bedload sediments, which are coarse-grained particles that are rolled and moved along the bottom of a river channel, or suspended sediments which are fine-grained particles that are transferred as suspended particles in flowing water (Roushangar & Shahnazi 2020). The quantification of suspended sediments supports an evaluation of the scope of erosive processes in watersheds and a prediction of their impact on waterworks (Xu *et al.* 2019) and reservoirs or on shipping in a watercourse. The amount of suspended sediments in rivers must, therefore, be quantified, and information on how the transport of sediments develops over time is needed. Suspended sediments form the majority of solid substances carried by flowing water (Walling & Fang 2003) and usually constitute about 85–95% of their overall amount (Babiński 2005). So, this paper deals with a quantitative analysis of suspended sediments.

The basis for quantifying the amount of suspended sediments is the utilization of their measurements. However, a thorough sampling of suspended sediments is a demanding activity for various reasons (Muhammad *et al.* 2019). Measurements were not always performed daily or even hourly, especially in the past, due to the lack of a fully developed measurement technology. Such data have irregular time steps between measurements and consequently are not suitable for some analyses (Walling & Fang 2003; Melesse *et al.* 2011). In the article, the authors try to propose a possible solution to such a situation.

Various types of models can be used for modelling suspended sediments with the aim of interpolating incomplete data series. The ‘rating curve’ method is used in engineering practice. However, its precision is limited due to the low amount of input information (Özger & Kabataş 2015). More sophisticated models with distributed parameters have been used in various works for the simulation of erosion and the subsequent movement of sediments in a stream network as well as their accumulation and consequent impact on stream morphology. The scope of the data in the form of Geographic Information Systems layers and the complexity of the driving processes and the phenomenon of sediment transport itself lead to the introduction of many simplifying assumptions. These, in some cases, bring significant uncertainty into the results (Her & Heatwole 2016).

An alternative to the above-mentioned modelling is provided by machine learning, which is the primary tool applied in this paper. This methodology does not require detailed (and often unavailable) information on the mechanism of accompanying physical processes or the physical characteristics of the system. The application of machine learning requires only a time series of the inputs that cause the formation of suspended sediments (i.e., the driving forces).

Kisi & Shiri (2012) used artificial neural networks (ANNs) for the estimation of daily suspended sediment concentrations in the Eel River in California. They achieved very satisfactory results with a coefficient of determination (*R*^{2}) from 0.82 to 0.95. By predicting the daily flow of suspended sediments in the Mississippi, Missouri and Rio Grande rivers in the USA, Melesse *et al.* (2011) found that their results using ANN (0.65 ≤ *R*^{2} ≤ 0.96) in most cases are better than those obtained from other models, e.g., multiple linear/non-linear regressions or the autoregressive moving average. In the Longchuanjiang River in the upper estuary of the Yangtze River in China, the monthly sediment flow was satisfactorily modelled using the ANN, and in the validation phase, the coefficient of determination was reported to be between 0.66 and 0.89 (Zhu *et al.* 2007). Other papers can also be cited; to get an overview of this theme, review articles are available (e.g., Afan *et al.* 2016).

In this paper, the authors have sought options for modelling suspended sediments with the aim of interpolating incomplete data series that use machine learning models. To the knowledge of the authors, this specific type of modelling has not yet been presented in the professional literature on suspended sediment concentration. Above-mentioned papers propose some innovations in machine learning methods, which may provide more precise results (e.g., Asheghi *et al.* 2020; Kisi & Yaseen 2019; Tabatabaei *et al.* 2019). However, the experience of the authors of this paper and generally in the machine learning community shows that for the improvement of modelling results, each step in the data mining process must be carefully considered, among which one of the most important is the preparation of the input data (Afan *et al.* 2017; Unnikrishnan & Jothiprakash 2018). To further analyse the benefits of this aspect of modelling, the authors, unlike previous works, focused more on the preparation and transformation of data, i.e., on so-called feature engineering (Kuhn & Johnson 2019), which can significantly improve the accuracy of modelling, than on innovation in machine learning algorithms.

Moreover, authors evaluate the effect of the use of various input data on the precision of modelling. Variations of the input data are aimed at a generalization of the knowledge acquired, as not all kinds of the data are available everywhere. Authors show in the paper that mentioned preparation of the data is especially important when only meteorological time series are available. The authors propose an efficient method for the utilization and transformation of climate data that can result in the better performance of the machine learning models for suspended sediment concentrations. According to the author's best knowledge, it was not demonstrated yet that using climate data solely as inputs for modelling suspended sediments could lead to comparable results as when flow data are used. This has practical consequences as meteorological data are more readily available than hydrological data which is only available for a limited amount of river profiles (Dumedah & Coulibaly 2014). In addition, precipitation and temperature data can be obtained from climate models for future periods and then used to assess the river's sediment transport in the future (Shivam *et al.* 2019).

In the following section, the case study and acquisition of the data are described. Due to the intention of the study, a special ‘Methodology’ section is devoted to the preparation of the data for modelling. The methods applied in this study are briefly explained in the section ‘Results and Discussion’. In the section ‘Conclusion’, the settings of the experimental computations are described, and the results are evaluated and discussed. Finally, the last section summarizes the main findings of this study.

## CASE STUDY AND DATA USED

The upper central part of the Danube River (Europe), i.e., the area around the confluence of the Morava and Danube Rivers, was selected for testing modelling of suspended sediment concentrations (Figure 1). The measurements of the suspended sediments have been performed in Bratislava (from the Lafranconi Bridge) for several years (Holubova & Szolgay 1999). These measurements were performed irregularly, which limits their use for some analyses. The case study aims for their interpolation.

Several variables were used to determine the suspended sediment concentrations: other measurements of suspended sediment concentrations from nearby measurement points on the Danube and Morava Rivers, streamflows from several gauging stations on these rivers, and the precipitation and air temperature in the Danube watershed. Within this paper, a comparison of the precision of the calculation of the suspended sediments using various subsets of these data is given. The aim was to acquire knowledge that can be generalized, because some of the data used in this study may not be available elsewhere.

The daily values of suspended sediment concentrations were mostly acquired by measurements performed by Slovak institutions: The Water Research Institute (WRI; Holubova & Szolgay 1999) and the Slovak Hydrometeorological Institute (SHI; Borodajkevycova 2015). Some data were acquired from an Austrian website ‘eHYD’ of the Austrian Federal Ministry of Agriculture, Regions and Tourism at https://ehyd.gv.at/. Table 1 includes data on suspended sediment concentrations for the period 2008–2015. This range is given by measurements start in the Bratislava profile (2008) and the last date of measurements available from the above-mentioned Austrian website. For this period, about 17% of the data is available for the Bratislava station; the data from the other stations are complete. The study aims to calculate the unmeasured 83% of the data at the Bratislava station.

. | Bratislava . | Hainburg . |
---|---|---|

Number of data | 496 | 2,922 |

Percentage of data available | 17 | 100 |

Mean | 69.3 | 52.0 |

Median | 28.0 | 24.6 |

Minimum | 1.00 | 5.86 |

Maximum | 1,842.7 | 1,817.3 |

Data range | 1,841.7 | 1,811.5 |

Standard deviation | 156.1 | 102.2 |

Skewness | 7.6 | 8.4 |

Kurtosis | 72.0 | 100.8 |

Lower quartile 0.25% | 13.0 | 13.6 |

Upper quartile 0.75% | 69.5 | 51.4 |

. | Bratislava . | Hainburg . |
---|---|---|

Number of data | 496 | 2,922 |

Percentage of data available | 17 | 100 |

Mean | 69.3 | 52.0 |

Median | 28.0 | 24.6 |

Minimum | 1.00 | 5.86 |

Maximum | 1,842.7 | 1,817.3 |

Data range | 1,841.7 | 1,811.5 |

Standard deviation | 156.1 | 102.2 |

Skewness | 7.6 | 8.4 |

Kurtosis | 72.0 | 100.8 |

Lower quartile 0.25% | 13.0 | 13.6 |

Upper quartile 0.75% | 69.5 | 51.4 |

In Table 2, the flows from the different stations are summarized for the investigated time period of 2008–2015. The daily flow series were provided by the Slovak Hydrometeorological Institute or from the website ‘eHYD’.

. | Fischamend . | Hainburg . | Bratislava . | Medvedov . | Moravsky Jan . | Zahorska Ves . |
---|---|---|---|---|---|---|

Number of data | 2,922 | 2,922 | 2,922 | 2,922 | 2,922 | 2,922 |

Mean | 1,801.5 | 1,851.9 | 1,989.9 | 1,914.4 | 99.0 | 102.4 |

Median | 1,591.0 | 1,635.0 | 1,763.8 | 1,707.8 | 70.7 | 72.4 |

Minimum | 745.0 | 779.0 | 808.6 | 743.5 | 10.2 | 13.4 |

Maximum | 10,461.0 | 10,812.0 | 10,500.0 | 10,018.0 | 847.0 | 901.8 |

Data range | 9,716.0 | 10,033.0 | 9,691.4 | 9,274.5 | 836.8 | 888.4 |

Standard deviation | 840.0 | 861.1 | 925.7 | 879.5 | 88.4 | 88.8 |

Skewness | 2.95 | 2.86 | 2.56 | 2.43 | 2.94 | 2.97 |

Kurtosis | 17.40 | 16.19 | 12.29 | 11.40 | 13.04 | 14.28 |

Lower quartile 0.25% | 1,231.0 | 1,271.0 | 1,358.0 | 1,312.2 | 43.5 | 47.0 |

Upper quartile 0.75% | 2,127.0 | 2,182.0 | 2,359.0 | 2,260.2 | 122.6 | 127.7 |

. | Fischamend . | Hainburg . | Bratislava . | Medvedov . | Moravsky Jan . | Zahorska Ves . |
---|---|---|---|---|---|---|

Number of data | 2,922 | 2,922 | 2,922 | 2,922 | 2,922 | 2,922 |

Mean | 1,801.5 | 1,851.9 | 1,989.9 | 1,914.4 | 99.0 | 102.4 |

Median | 1,591.0 | 1,635.0 | 1,763.8 | 1,707.8 | 70.7 | 72.4 |

Minimum | 745.0 | 779.0 | 808.6 | 743.5 | 10.2 | 13.4 |

Maximum | 10,461.0 | 10,812.0 | 10,500.0 | 10,018.0 | 847.0 | 901.8 |

Data range | 9,716.0 | 10,033.0 | 9,691.4 | 9,274.5 | 836.8 | 888.4 |

Standard deviation | 840.0 | 861.1 | 925.7 | 879.5 | 88.4 | 88.8 |

Skewness | 2.95 | 2.86 | 2.56 | 2.43 | 2.94 | 2.97 |

Kurtosis | 17.40 | 16.19 | 12.29 | 11.40 | 13.04 | 14.28 |

Lower quartile 0.25% | 1,231.0 | 1,271.0 | 1,358.0 | 1,312.2 | 43.5 | 47.0 |

Upper quartile 0.75% | 2,127.0 | 2,182.0 | 2,359.0 | 2,260.2 | 122.6 | 127.7 |

It can be assumed that the concentrations at Bratislava have a good correlation with the flows at the preceding stations and the previous time steps (Figure 2). A correlation matrix in this figure shows the correlation coefficients between the two variables, which are named on the diagonal of the matrix. The correlation coefficient is always between those variables which are from him in the vertical and horizontal directions on the diagonal. This figure serves as a quick overview of the data; therefore, mini histograms and mini scatter plots are also on it, and the assignment of the variables is similar to the correlation coefficients. Based on Figure 2, the highest correlation of suspended sediments in Bratislava with the flow variables is with the flow in the upstream Fischamend station, Hainburg, and with the streamflow at the Bratislava station. Besides, Figure 2 also shows a relatively high degree of correlation of the concentration in Bratislava with the next day streamflows in Medvedov.

Further data for the modelling of the suspended sediment concentrations used in this paper include the time series of precipitations and temperatures. As the part of the Danube watershed that ends in the Bratislava profile is relatively large, the data were obtained from the freely accessible European Climate Assessment & Dataset (ECA&D; Klein Tank *et al.* 2002). Similar databases are also known for other parts of the world and have recently been the subject of attention in the hydrological literature (e.g., Chowdhury 2019). The ECA&D climate data are provided in a time range from 1950 to the present on a daily and monthly basis. The data cover the area of 25°N–75°N and 40°E– 75°E. It is a regular point grid with a spatial resolution of 0.25° × 0.25° and 0.5° × 0.5°, with a time series of the climatic variables that are available for each point. The data are available in the NetCDF-4 format. Figure 3 shows the points of the ECA&D grid with the climate time series in the part of the Danube watershed ending in Bratislava. There are a total of 258 points.

## METHODOLOGY

### Data preparation

Data preparation is an essential part of the machine learning process, which significantly affects its success and thus is considered as a part of the methodology. The following subchapters briefly describe what was performed with each type of data.

#### Flows and suspended sediment concentrations

The distribution of the streamflows and suspended sediment concentrations data show a distinct skewness to the right. Such data can be a problem for some modelling methods and were therefore normalized with the use of a logarithm and ordered quantile method. The effect of normalization of the suspended sediment concentrations at Bratislava is shown as an example in Figure 4.

Based on the correlation analysis, the following flows and concentrations were included in the input data (*Q* refers to the flow; *C* the concentration; then, the abbreviation of the station name follows and a digit, which indicates the time shift): *Q*.devin-1, *Q*.devin-2, *Q*.hainburg-1, *Q*.hainburg-2, *Q*.fischamend-1, *Q*.fischamend-2, *Q*.morJan-1, *Q*.zahVes-1, *Q*.medv-1, *Q*.medv + 1, *C*.hainburg-1.

#### Meteorological data

Not all of the ECA&D points in Figure 3 with meteorological data were distinct variables. The reason for the herein proposed reduction is to limit the correlation of the input data. A two-step procedure for the aggregation of the ECA&D points with the data was used: (1) the whole watershed above Bratislava was split into four parts, the outlet points of which are approximately evenly located on the river Danube, and (2) in each of these parts, the ECA&D data points were additionally grouped by a cluster analysis. The division of the whole watershed into four parts is based on similar standards as the subsequent clustering: with an effort to minimize the geographical and hydrological similarity within the sub-catchments and to maximize these differences between sub-catchments (Nadoushani *et al.* 2018). Another principle was to obtain approximately similarly large parts in terms of area. In the interest of possible subsequent calculations, the authors also sought to end the sub-catchments always in places on the Danube, where there are quality and long-term observations of flows and sediments available. Based on these principles, the character of Regensburg sub-catchment is distinguished as the upper Danube; the Linz sub-catchment collects water and sediments from the large right-hand alpine tributaries of the Danube (Isar and Inn); the Vienna sub-catchment has significant right-hand tributaries Traun and Enns from the eastern and lower Alps and in the minority also includes flat territory; and the sub-catchment ending in Bratislava has more lowlands than upper sub-catchments has, and includes important left tributary Morava, which has a significantly different character than previous Danube tributaries. A data model with four catchment parts was in that way created (Figure 5), each with a specific number of clusters: Bratislava (six clusters), Vienna (two clusters), Linz (two clusters), and Regensburg with three clusters, i.e., a total of 13 clusters. This is also the number of variables, because each cluster is represented by its centre, i.e., by the average precipitation or temperature in the cluster. The precipitation in the cluster on the previous or following days (time-lagged data) was subsequently derived from these variables. An example of a split of a partial watershed into clusters is shown in Figures 6 and 7. In Figure 7, differences in average precipitation and temperatures can be seen. The cluster analysis was performed using the *k*-means algorithm, with the number of clusters determined by a silhouette plot (Rousseeuw 1987); an example of this plot is given in Appendix Figure A3.

Mentioned time-lagged data on the precipitation from previous days provide information for up to 4 days before the prediction of the suspended sediments. However, using even older data can help to describe the climate history of the watershed and thus be an indicator of the saturation of the basin with water. Rain in a water-saturated catchment causes a higher outflow than in a dry catchment; therefore, the amount of sediments will also increase (and *vice versa*). Aggregated precipitation variables were introduced for this purpose, i.e., the sums of the precipitation from the previous 8, 14 and 20 days. In this way, an additional 3 × 13 = 39 constructed variables (three cumulative variables multiplied by the number of clusters) were added to the input data to describe the precipitation.

Similarly, as with the precipitation, the temperature time series were obtained from the ECA&D dataset and are available for the same points in the watershed (Figure 3). The same strategy as for the precipitation was chosen regarding the number of variables describing the temperature, i.e., splitting the whole watershed into four parts and subsequently into clusters. The temperature in the regression calculations is thus represented by 13 average time series in the clusters.

*et al.*(2005), was used, which is based on the following energy balance:where

*λ*is the latent heat of fusion, ΔSWE is the change in the snowpack's water equivalent,

*S*is the net incident solar radiation,

*L*

_{a}is the atmospheric long-wave radiation,

*L*

_{t}is the terrestrial long-wave radiation,

*H*is the sensible heat exchange,

*E*is the energy flux associated with the latent heats of vaporization and condensation at the surface,

*G*is ground heat conduction to the bottom of the snowpack,

*P*is heat added by rainfall and SWE(

*C*Δ

*T*

_{s}) is the change of snowpack heat storage. In Walter's paper, methods to estimate these parameters using only the day of the year, daily maximum and minimum temperature, and geographic latitude are proposed.

In this paper, authors proved that although it is a simple method, it provides good results in the context of hydrological modelling. The snowmelt values were derived from the precipitation, the minimal and maximal temperatures, the constants based on the geographic location and a digital elevation model using a snowmelt model from R package *EcoHydRology* (Fuka *et al.* 2020), which is based on the approach from Walter's paper cited above. The amount of melting snow obtained was added to the liquid precipitation. This variable was calculated from the precipitation and temperature at the ECA&D points. Again, data from all the points were not used as separate variables, but the whole watershed area was (1) divided into sub-catchments and subsequently (2) into clusters as in the case of the precipitation and temperature. The resulting clusters were slightly different from the clusters based on precipitation. The time series of the data from the previous days and aggregated variables from the previous 8, 14 and 20 days were also derived from the base data on snow melting as in the case of the precipitation.

#### Interaction of variables

The precision of the modelling was improved by the embedding of the variables' interaction into the input data. This means the conjoint effect of two or more variables, e.g., rain at various locations of the watershed simultaneously or subsequently at the same place. The interaction was indicated by new variables that are products of the interacting variables. If we potentially consider all the possible products of two variables, many new combined variables were created. So, consequently those variables were selected only if the interaction was significant enough, which was found directly by algorithms of machine learning.

#### Splitting the data into training and testing subsets

A total of 265 explanatory variables were available for the calculations. This number does not include variables reflecting the interaction. In the case of the precipitation data, this would be another 5,671 variables. This high number shows why the authors sought to reduce the number of variables using the cluster analysis described above.

To keep the distinct calculations mutually comparable, data from the same days were used for training all the models. The testing was run on data which were not used for the creation of the models, and also data from the same days were used for all the testing procedures.

A total of 496 days with complete data (i.e., with the measured suspended sediment concentrations in Bratislava) were therefore randomly divided into training and testing days. The training dataset included 345 days and the testing dataset 149 days. Figure 8 compares these datasets, and the statistical features of the testing and training vectors of the concentrations are presented in Table 3. These two sets of data should be relatively similar, which can be confirmed based on this figure and the table.

. | All data . | Training . | Testing . |
---|---|---|---|

Minimal value | 1.0 | 1.0 | 1.0 |

Lower quartile 0.25% | 13.0 | 13.0 | 14.0 |

Median | 28.0 | 28.0 | 31.0 |

Upper quartile 0.25% | 69.5 | 67.0 | 69.0 |

Maximal value | 1,842.7 | 1,763.7 | 1,842.7 |

Interquartile range | 56.3 | 54.0 | 55.0 |

Number of data | 496 | 345 | 149 |

. | All data . | Training . | Testing . |
---|---|---|---|

Minimal value | 1.0 | 1.0 | 1.0 |

Lower quartile 0.25% | 13.0 | 13.0 | 14.0 |

Median | 28.0 | 28.0 | 31.0 |

Upper quartile 0.25% | 69.5 | 67.0 | 69.0 |

Maximal value | 1,842.7 | 1,763.7 | 1,842.7 |

Interquartile range | 56.3 | 54.0 | 55.0 |

Number of data | 496 | 345 | 149 |

### Methods used in suspended sediment modelling

Several methods were compared in the presented work to solve the interpolation of the suspended sediments. A standard method for the calculation of suspended sediment concentrations, which was also used in this paper, is the rating curve. The rating curve's computations serve herein for an evaluation of the improvement when using new, proposed methods. The suspended sediment concentration *C* by the rating curve method is only calculated from the streamflow *Q* and evaluated by function *C* = *aQ ^{b}*. The coefficients

*a*,

*b*are determined by optimization of the non-linear function.

Some variants of multiple linear regression (MLR) were applied too. Independent variables of MLR should not correlate too much, which can occur here, considering the nature and the amount of data used. To remove such multicollinearity, the least absolute selection and shrinkage operator (LASSO) was used for the regularization of MLR. It was introduced in 1996 by Tibshirani (1996). In a LASSO regularization, some coefficients are zeroed. Using this regularization, a simpler model is created. The regularization usually provides a better generalization, i.e., better results with data which were not used in the creation of the model (which is the aim of proper regression analysis).

The authors used two machine learning models as well – support vector machine (SVM) and deep learning neural network (DLNN).

A SVM is a supervised machine learning method that can be used to calculate regression tasks. Then term ‘support vector regression (SVR)’ is used. Its characteristic feature is the kernel trick, i.e., a mapping of the original training data of a non-linear problem into a higher-dimensional space where it is transformed into a linear problem. The objective function of SVR minimizes the L2-norm of the coefficient vector – not the squared error as in the case of linear regression. The error term is handled in the constraints, where the absolute error less than or equal to a specified margin, called the maximum error (*ɛ*), should be set. Moreover, SVR uses the concept of slack variables: for any value that falls outside of *ɛ*, its deviation from the margin is denoted as *ξ*, which is minimized. These deviations are added to the objective function of SVR. The mathematical formulation of this method has been described many times in various papers, including the hydrological domain, as SVR is quite a popular method because of its good generalization ability (Samadianfard *et al.* 2019).

The last model used in this work utilizes the h2o machine learning framework (Cook 2016). The h2o's deep learning used herein is based on a multi-layer feedforward ANN that is trained with a stochastic gradient descent using backpropagation. A feedforward ANN model, also known as a multi-layer perceptron (MLP), is the most common type of deep neural network (DNN). The network can contain many hidden layers consisting of neurons with *tanh*, *rectifier* and *maxout* activation functions. Advanced features such as the adaptive learning rate, rate annealing, momentum training, dropout and L1 or L2 regularization enable high predictive accuracy. Several other types of DNNs, such as the convolutional neural network and recurrent neural network, are popular as well. MLPs were chosen for this work because they works well on the tabular data which are used in this work.

Each model has its specifics and parameters that need to be set up. The model's tuning process requires quite detailed knowledge of the algorithms and their parameters. Authors wanted to avoid mathematical details of algorithms used, as they are explained in informatics literature and only a brief description of the procedure applied to tune the models follows as a practical user of algorithms must deal with it.

If there are *n* tunable hyperparameters of the model, an *n*-dimensional matrix is designed with a vector of the expected values of one parameter on each dimension. The matrix as such defines all their possible combinations. The best combination of parameters can be found by trying all these combinations (this is called ‘grid search’) by *k*-fold cross-validation. In the *k*-fold cross-validation, training data are randomly split into *k-*folds (groups of data). Calculations with every single combination of parameters are performed *k* times. In every run, a different fold is kept for testing, and the model is trained by the data from the remaining *k* − 1 folds. By using some criteria for the evaluation of the precision, one can evaluate: (1) the accuracy of the calculations in the testing folds, which is the anticipated precision of the model trained with a particular combination of the parameters and also (2) a summary of the precision of the model in the *k* − 1 training groups. The best result obtained from the individual testing folds indicates the most suitable combination of parameters. A too significant difference between the training and testing results is a signal of overtraining or a lack of ability to generalize. This means that the model does not have satisfactory results when computing with new data.

The evaluation of the models is based on the success of the model with the testing data, i.e., on the data which were not used in the calibration of the model. Several statistical indicators were used. Table 4 shows their abbreviations and names. The mathematical formulation of these indicators is available in the standard statistical literature (e.g., Montgomery & Runger 2018). The input values for the calculation of the indicators are the series of calculated and measured values of the suspended sediment concentration. Each indicator evaluates a different aspect of the conformity of the measured and calculated data.

Abbreviation . | Name . | Comments . |
---|---|---|

ME | Mean error | Basic statistical indicator |

RMSE | Root mean square error | Gives the standard deviation of the model prediction error |

PBIAS | Percentual bias | Percentual average deviation (quantifying the average undervaluation/overvaluation of the model) |

NSE | Nash–Sutcliffe efficiency | Standard in hydrology, primary indicator in this work |

cp | Coefficient of persistence | Compares the model's performance against a simple model using the observed value of the previous day as the prediction for the current day |

R^{2} | Coefficient of determination | Proportion of the variance in the dependent variable that is predictable from the independent variables. The coefficient of determination ranges from 0 to 1; an R^{2} value of 1 indicates that the regression predictions perfectly fit the data. |

Abbreviation . | Name . | Comments . |
---|---|---|

ME | Mean error | Basic statistical indicator |

RMSE | Root mean square error | Gives the standard deviation of the model prediction error |

PBIAS | Percentual bias | Percentual average deviation (quantifying the average undervaluation/overvaluation of the model) |

NSE | Nash–Sutcliffe efficiency | Standard in hydrology, primary indicator in this work |

cp | Coefficient of persistence | Compares the model's performance against a simple model using the observed value of the previous day as the prediction for the current day |

R^{2} | Coefficient of determination | Proportion of the variance in the dependent variable that is predictable from the independent variables. The coefficient of determination ranges from 0 to 1; an R^{2} value of 1 indicates that the regression predictions perfectly fit the data. |

## RESULTS AND DISCUSSION

Although several models were used, the primary aim of this paper is an evaluation of the effect of the selection of the input data on the computation of the suspended sediments. Using data selection, the authors performed a kind of ‘what if’ analysis, specifically an analysis of what happens if, for example, hydrological data are not available at a given location. The subchapters below are therefore organized according to the inputs used.

### Computation based on the flow at the location of the determination of suspended sediments (rating curve)

The rating curve is a standard method for the modelling of suspended sediment concentrations. Its evaluation herein is used for comparison. The rating curve is graphically assessed on the plot on the left side of Figure 9. Further statistical indicators evaluating this model (on the testing data) are given in Table 5. The success rate of the other models can be assessed by comparing them with these values.

Model . | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. |
---|---|---|---|---|---|---|

Rating curve (Bratislava flow) | −16.2 | 140.8 | −19.3 | 0.545 | 0.611 | 0.668 |

Rating curve (Fischamend flow) | −17.1 | 106.4 | −20.4 | 0.740 | 0.778 | 0.814 |

Model . | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. |
---|---|---|---|---|---|---|

Rating curve (Bratislava flow) | −16.2 | 140.8 | −19.3 | 0.545 | 0.611 | 0.668 |

Rating curve (Fischamend flow) | −17.1 | 106.4 | −20.4 | 0.740 | 0.778 | 0.814 |

The correlation matrix in Figure 2 shows that the suspended sediment concentration does not have the highest correlation with the streamflow in Bratislava (correlation = 0.708), but with the flow at the Austrian Fischamend station (correlation = 0.786), which is 38.5 km upstream. After the calibration of the rating curve for this data, a more precise model was acquired (Table 5). A similar situation may also occur elsewhere, and it is therefore recommended to check the use of flows from close upstream stations for computation of the suspended sediment concentration.

### Calculation of concentrations using more flows

In this part, several streamflow variables are considered as inputs for the determination of the suspended sediment concentration. The input data on the flows were taken from the Hainburg and Fischamend measurement stations, which are above the investigated river profile and Bratislava's time series of flows were used. Further data were taken at the Medvedov gauging station below the researched river profile and the Moravsky Jan and Zahorska Ves gauging stations on the Morava, a Danube tributary. Flows from the given day, the previous day and, in the case of the downstream stations (Medvedov, Komarno), also from the next day were used in the computation.

#### Linear methods

The LASSO model was quite sensitive so that its dependent variable should have an approximately normal distribution; therefore, it was normalized (using logarithm). When computed without any normalization, the result was better than when the rating curve with the flow in Bratislava was used but worse than the rating curve which uses the flow at the Fischamend measurement station. The reason is that the rating curve method is a non-linear model, but the LASSO model is, in fact, a sort of MLR, i.e., a linear model. Table 6 shows computations with the interaction variables included (which did not significantly influence the results) and with the logarithm of the dependent variable, which the results improved significantly, which is shown by a higher Nash–Sutcliffe Coefficient (NSE) value. The functions from the R package *glmnet* were used for computation of the LASSO model (Friedman *et al.* 2010).

Model . | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. |
---|---|---|---|---|---|---|

LASSO* | −10.7 | 116.1 | −12.8 | 0.691 | 0.735 | 0.797 |

LASSO with interactions** | −14.9 | 110.8 | −17.8 | 0.719 | 0.759 | 0.804 |

LASSO with log of response variable | −52 | 108.4 | −62.3 | 0.731 | 0.770 | 0.841 |

LASSO inter. and log of response*** | −17.8 | 89.7 | −21.2 | 0.815 | 0.842 | 0.840 |

Model . | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. |
---|---|---|---|---|---|---|

LASSO* | −10.7 | 116.1 | −12.8 | 0.691 | 0.735 | 0.797 |

LASSO with interactions** | −14.9 | 110.8 | −17.8 | 0.719 | 0.759 | 0.804 |

LASSO with log of response variable | −52 | 108.4 | −62.3 | 0.731 | 0.770 | 0.841 |

LASSO inter. and log of response*** | −17.8 | 89.7 | −21.2 | 0.815 | 0.842 | 0.840 |

**λ* = 1.83, ***λ* = 3.15, ****λ* = 0.051.

The *lambda* regularization parameter was optimized, and its value is shown in the table footnote. An example of tuning this parameter for LASSO regression with a logarithm of the dependent variable and the interaction of the explanatory variables is given in Appendix A, Figure A1.

#### Non-linear machine learning models

In SVR, the selection of the kernel and its parameters, and parameter *C* (cost of constraint violations) were optimized. The SVR alternatives were compared with the radial basis kernel and the linear kernel. Also, a normalization of the computed variable and the inclusion of new variables reflecting the interactions (products of all of the possible pairs of variables) were performed. Strong interactions were identified, mainly in the Linz and Regensburg sub-catchments (Figure 5), between subsequent precipitation events over time. The results of the computations are presented in Table 7, and, as shown by the NSE (a primary indicator of precision), the above-stated manipulation with the data helped to increase the degree of accuracy. The optimal parameters of the models were identified using cross-validation on the training data and are stated in the note of the table. The functions from the R package *kernlab* were used for computation of the SVR (Karatzoglou *et al.* 2004). The tuning progress of the most successful model is given in Appendix A, Figure A2.

Model . | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. | VE . |
---|---|---|---|---|---|---|---|

SVR linear kernel**** | −20.1 | 130.3 | −24.0 | 0.611 | 0.667 | 0.787 | 0.516 |

SVR radial basis kernel*** | −17.9 | 114.2 | −21.3 | 0.701 | 0.744 | 0.805 | 0.625 |

SVR with log of response** | −15.3 | 108.1 | −18.3 | 0.732 | 0.771 | 0.738 | 0.645 |

SVR with log of response and interactions* | −15.5 | 90.2 | −18.6 | 0.813 | 0.840 | 0.835 | 0.656 |

Model . | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. | VE . |
---|---|---|---|---|---|---|---|

SVR linear kernel**** | −20.1 | 130.3 | −24.0 | 0.611 | 0.667 | 0.787 | 0.516 |

SVR radial basis kernel*** | −17.9 | 114.2 | −21.3 | 0.701 | 0.744 | 0.805 | 0.625 |

SVR with log of response** | −15.3 | 108.1 | −18.3 | 0.732 | 0.771 | 0.738 | 0.645 |

SVR with log of response and interactions* | −15.5 | 90.2 | −18.6 | 0.813 | 0.840 | 0.835 | 0.656 |

*****C* = 8, ****C* = 185, *σ* = 0.0033, ***C* = 0.1, linear kernel, **C* = 0.1, linear kernel.

For the computations using the *Deep Learning Neural Network* (DLNN), the h2o platform was used (LeDell *et al.* 2020). The efficiency of deep learning is typically proved on tasks using a massive amount of data, such as in computer vision, speech recognition and the management of complex processes. The dataset used in this paper is in comparison with such data smaller. However, the premise with regard to the DLNN was that a larger number of layers and neurons is not the only improvement of DLNN compared with classic multi-layer perceptrons from the 1990s. It is not always necessary to apply a DLNN with many layers and neurons; however, the benefits of the developments in this branch of AI can still be employed. Two alternatives were proposed when using DLNN in this work. A relatively small net was introduced in the first one (one hidden layer with 5–15 neurons), while in the second, a more complicated neural net was used, i.e., with several layers and a more significant number of neurons. Unlike the first alternative, where a model is ‘regularized’ only by its simplicity, but advanced regularization methods, such as L1 and L2 regularization or dropout (Srivastava *et al.* 2014), are not used, these innovations were applied in the second alternative. The results of the computations are shown in Table 8. The proposed solution to one technical detail is described in Appendix B (the problem with the fact that because DLNN uses randomly generated numbers, different results can be obtained for the subsequent computations with the same settings during parallel computations).

Model . | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. | VE . |
---|---|---|---|---|---|---|---|

SVR with log of response and interactions* | −15.5 | 90.2 | −18.6 | 0.813 | 0.840 | 0.835 | 0.656 |

Small DLNN | −6.9 | 90.3 | −8.3 | 0.813 | 0.840 | 0.814 | 0.552 |

DLNN – bigger, using regularization | 9.8 | 71.4 | 13.3 | 0.834 | 0.863 | 0.897 | 0.607 |

Model . | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. | VE . |
---|---|---|---|---|---|---|---|

SVR with log of response and interactions* | −15.5 | 90.2 | −18.6 | 0.813 | 0.840 | 0.835 | 0.656 |

Small DLNN | −6.9 | 90.3 | −8.3 | 0.813 | 0.840 | 0.814 | 0.552 |

DLNN – bigger, using regularization | 9.8 | 71.4 | 13.3 | 0.834 | 0.863 | 0.897 | 0.607 |

**C* = 0.1, linear kernel.

The DLNN model is more difficult to interpret than, for example, linear regression. Nevertheless, there is intensive research in this direction – in Figure 10, one of the ways to interpret the relative importance of input flows in the calculation of suspended sediment concentrations is displayed. It can be seen from the figure that the most important flow rates in determining the concentration of suspended sediment are the flow rates in the measuring stations such as Fischamend, Medvedov, Hainburg and Bratislava. By this example, we wanted to point out in the article that the interpretation of machine learning models is being intensively studied and at present, there are already certain possibilities for this purpose.

### Concentrations in the Hainburg measuring station as inputs

For the interpolation of the concentrations in the Danube profile at Bratislava, the advantage is that this variable is also measured in the nearby upstream Hainburg gauging station (15.2 km). As could be expected, these two series of measurements are highly correlated. The regression models, which use data from the Hainburg, are evaluated in Table 9 and provide results with quite high precision. Only two models are provided in this table, as more complicated models are not relevant in this case because of the small number of variables. Concentrations from the previous day were also used. The models based only on these two variables achieved quite a high degree of precision, and a small number of input variables is also an advantage of these models. However, the availability of such a nearby site with data on the concentration is generally an exception. The probable reason that the degree of accuracy is not even higher is that the Morava, the right tributary of the Danube, inflows into the Danube between Hainburg and Bratislava (Figure 1).

. | ME . | MAE . | RMSE . | PBIAS % . | NSE . | r
. | R^{2}
. |
---|---|---|---|---|---|---|---|

Linear regression | −3.273 | 23.38 | 84.56 | −3.9 | 0.836 | 0.936 | 0.877 |

SVR* | −1.128 | 21.85 | 78.13 | −1.3 | 0.860 | 0.937 | 0.878 |

. | ME . | MAE . | RMSE . | PBIAS % . | NSE . | r
. | R^{2}
. |
---|---|---|---|---|---|---|---|

Linear regression | −3.273 | 23.38 | 84.56 | −3.9 | 0.836 | 0.936 | 0.877 |

SVR* | −1.128 | 21.85 | 78.13 | −1.3 | 0.860 | 0.937 | 0.878 |

**C* = 385, *σ* = 4 × 10^{−4}, radial basis kernel.

### Computations using climate data

Flows are usually used as standard input data for the prediction of suspended sediments. In this paper, using meteorological data (derived from precipitation and temperatures) as inputs was also investigated. However, they are not factors as direct as the flow of water containing and carrying suspended sediments. Therefore, a lower degree of precision of the computation is expected when using solely meteorological inputs. The authors wanted to demonstrate in the paper that, despite such expectations, a relatively high degree of precision can be achieved using solely meteorological data, when feature engineering is applied. Such a result may be useful in other places where flow measurements are not available.

The historic meteorological data used consist of 74 series of snow melting and liquid precipitation daily totals. The creation of this combined variable based on temperature and precipitation data is described in the methodology part. In the warm period of the year, when there is no snow, the value of this variable is equal to precipitation. The data also include 45 cumulative values of this variable for previous periods. The computation also takes into account interactions of the variables, as it is evident that concurrent rain (or melting snow) in various locations of a watershed, or with consecutive timing at one site, can have a significant synergistic effect. The interaction was not entered for the h2o neural net algorithms, as it has its own mechanisms for capturing it (see Cook 2016).

Several linear statistical and machine learning models were used for the computations, i.e., linear regression with LASSO, DLNN and SVR. The results in this case are described more succinctly, as the procedures of the computations are similar to the previous ones. The evaluation of the results of the individual algorithms is presented in Table 10. Table 11 evaluates for comparison the calculations that use temperatures and precipitation separately. A better degree of precision can be seen when in the calculations a constructed variable including both the snowmelt and precipitation was used in comparison with the calculations in which this construction was not used (see, e.g., better NSE in all computations in Table 10 than in Table 11).

. | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. |
---|---|---|---|---|---|---|

LASSO with interactions | −5.9 | 103.7 | −7.1 | 0.753 | 0.789 | 0.858 |

DLNN h2o small model | −7.2 | 127.4 | −8.6 | 0.628 | 0.682 | 0.662 |

DLNN – larger, regularized | 1.4 | 100.9 | 1.7 | 0.766 | 0.800 | 0.800 |

SVR with principal components | −22.6 | 89.6 | −27.0 | 0.816 | 0.843 | 0.867 |

SVR with interactions | −16.7 | 96.9 | −19.9 | 0.785 | 0.816 | 0.831 |

. | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. |
---|---|---|---|---|---|---|

LASSO with interactions | −5.9 | 103.7 | −7.1 | 0.753 | 0.789 | 0.858 |

DLNN h2o small model | −7.2 | 127.4 | −8.6 | 0.628 | 0.682 | 0.662 |

DLNN – larger, regularized | 1.4 | 100.9 | 1.7 | 0.766 | 0.800 | 0.800 |

SVR with principal components | −22.6 | 89.6 | −27.0 | 0.816 | 0.843 | 0.867 |

SVR with interactions | −16.7 | 96.9 | −19.9 | 0.785 | 0.816 | 0.831 |

. | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. |
---|---|---|---|---|---|---|

LASSO with interactions | −9.1 | 110.5 | −10.9 | 0.720 | 0.761 | 0.831 |

DLNN h2o small model | −7.1 | 135.4 | −12.6 | 0.608 | 0.612 | 0.620 |

DLNN h2o larger, regularized | −2.6 | 103.5 | −3.1 | 0.754 | 0.790 | 0.770 |

SVR with principal components | −22.6 | 92.8 | −27.0 | 0.803 | 0.831 | 0.883 |

SVR with interactions | −16.7 | 96.9 | −19.9 | 0.785 | 0.816 | 0.831 |

. | ME . | RMSE . | PBIAS % . | NSE . | cp . | R^{2}
. |
---|---|---|---|---|---|---|

LASSO with interactions | −9.1 | 110.5 | −10.9 | 0.720 | 0.761 | 0.831 |

DLNN h2o small model | −7.1 | 135.4 | −12.6 | 0.608 | 0.612 | 0.620 |

DLNN h2o larger, regularized | −2.6 | 103.5 | −3.1 | 0.754 | 0.790 | 0.770 |

SVR with principal components | −22.6 | 92.8 | −27.0 | 0.803 | 0.831 | 0.883 |

SVR with interactions | −16.7 | 96.9 | −19.9 | 0.785 | 0.816 | 0.831 |

The LASSO regularization of the linear regression was used due to the correlation of the climate variables in the geographically close locations of the watershed. Two alternatives of the h2o Deep Learning Neural Net were used (as previously described when using flows as inputs), i.e., a smaller net with one hidden layer and a more complex net, with several layers and a larger number of neurons and with regularization. The smaller neural network had one hidden layer with 21 neurons, and the activation function of the neuron was the rectifier. The number of neurons and the type of activation function were found by cross-validation. In the larger regularized network, there were four hidden layers with 50, 25, 15 and 10 neurons. The rectifier was again used as the activation function, and the regularization was implemented using input dropout and L1-regularization. Ten-fold cross-validation was used for training both models.

The selection of the kernel, its parameters and *C*-value (cost of constraint violations) were accomplished during the SVR computations. The SVR was computed either with the radial basis kernel or with the linear kernel. Better results were obtained when the computed variable was normalized (by logarithm) and when the interaction of the variables was used. Although initial variable reduction was performed using cluster analysis, the dataset remains relatively large, especially in terms of the number of columns (120 variables). Such a somewhat high number of variables (in terms of the number of rows) is caused by the inclusion of data from previous days and cumulative variables, which are described in the data section. Therefore, a computational experiment was performed with a further reduction of the input values using principal components, which, as can be seen from Table 5, was quite successful. Moreover, the principal components are a linear combination of the original variables and could, as a side effect, capture the interaction of the variables. The fraction of variance explained by the principal components, which is the parameter needed for this preprocessing, was optimized by trial and error and, in the final principal component regression, its value was set to 0.98. This result confirms the usefulness of the variable adjustments (feature engineering) used and the thesis of this article that for accuracy, it is essential not only the choice of the machine learning model, or a suitable combination of models, but also the manner of data preparation. The results in Table 10 and the presented case study confirm the possibility of using exclusively meteorological data in suspended sediment modelling, which is useful in river profiles, where flow measurements are not available.

The best results with different input data are evaluated in Figure 11. Since a quite satisfactory and similar accuracy of individual models can be seen and at the same time sufficient difference between model results is guaranteed (because they used different input data), it is possible in future work to verify the use of ensemble modelling with these models.

## CONCLUSIONS

This paper sets out the options for modelling suspended sediment concentrations with the goal of their interpolation, i.e., their determination in days when measurements were not performed. This work confirmed that the computation of suspended sediment concentrations, in comparison with the standard ‘rating curve’ method, can be fundamentally improved. In the case study presented (Bratislava Danube River profile), the best results were provided by the computations that used the measurement of the suspended sediment concentration from the nearby measurement station in Hainburg, Austria, as inputs. However, the objective of the authors was to provide more general findings, as such closeness of measurement points for suspended sediment measurement as is between Hainburg and Bratislava is rare. For this reason, the computations were also performed with streamflows and also using solely meteorological data.

The standard model, i.e., the rating curve, has precision by the NSE equal to 0.545. The best result when using the suspended sediment concentration time series that was measured at Hainburg was achieved by SVR and had NSE = 0.860 and the best result when using the time series of flows was NSE = 0.879 (with the DLNN). Both of these results are significantly better than the standard method.

The best model with solely meteorological data as inputs had NSE = 0.816 and was achieved by SVR. In SVR, the input data were transformed by principal component analysis. Before the computation itself, the temperatures and precipitation were transformed into an aggregated variable involving melting snow and rain. This transformation reduces the number of variables and makes the data more understandable for machine learning. In this computation, the proposed feature engineering method based on clustering was applied, and the interactions of the variables were included. This result is considered by the authors to be quite promising because meteorological data are usually better available than the other data that can be used for the purpose of suspended sediment modelling.

## ACKNOWLEDGEMENT

This work was supported by the Slovak Research and Development Agency under Contract No. APVV-15-0489 and by the Scientific Grant Agency of the Ministry of Education of the Slovak Republic and the Slovak Academy of Sciences, Grant No. 1/0662/19.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## REFERENCES

_{2}O: Powerful, Scalable Techniques for Deep Learning and AI