This study uses the GRAINet CNN approach on UAV optical aerial imagery to analyze and predict grain size characteristics, specifically mean diameter (dm), along a gravel river point bar in Šumava National Park (Šumava NP), Czechia. By employing a digital line sampling technique and manual annotations as ground truth, GRAINet offers an innovative solution for particle size analysis. Eight UAV overflights were conducted between 2014 and 2022 to monitor changes in grain size dm across the river point bar. The resulting dm prediction maps showed reasonably accurate results, with Mean Absolute Error (MAE) values ranging from 1.9 cm to 4.4 cm in tenfold cross-validations. Mean Squared Error (MSE) and Root Mean Square Error (RMSE) values varied from 7.13 cm to 27.24 cm and 2.49 cm to 4.07 cm, respectively. Most models underestimated grain size, with around 68.5% falling within 1σ and 90.75% falling within 2σ of the predicted GRAINet mean dm. However, deviations from actual grain sizes were observed, particularly for grains smaller than 5 cm. The study highlights the importance of a large manually labeled training dataset for the GRAINet approach, eliminating the need for user-parameter tuning and improving its suitability for large-scale applications.

  • Assessing the effectiveness of a cutting-edge deep learning algorithm in predicting mean diameter (dm) from an individual UAV-based orthophoto.

  • Creating robust maps to predict spatial and temporal variations in mean dm across one entire point bare over time.

  • The method enhances efficient decision-making capabilities by reducing reliance on laborious and resource-intensive field probing techniques.

  • The method demonstrates robustness against light condition impacts.

Detecting changes in fluvial environments is challenging, mainly due to limited spatial and temporal observation of crucial fluvial characteristics, such as grain size data from gravel bars and river channels. Analyzing grain size data provides valuable insights into the interactions between water flow, sediment transport, and the dynamic evolution of rivers. Therefore, it is crucial to map the sediment composition of river-bed surfaces, focusing on gravel- or cobble-bed streams, and conduct comprehensive analyses of river dynamics. This understanding is essential for studying how rivers respond to environmental factors and human activities, especially in the Anthropocene era (Crutzen 2016).

Traditional field-based techniques, such as mechanical sieving, grid-by-number, pebble count method, or line sampling, have historically dominated river surveys (Harvey et al. 2022). However, these techniques have limitations that restrict their application at the network scale. They often struggle to ensure accurate and repeatable monitoring assessments (Purinton & Bookhagen 2019) and may pose risks to human safety. Consequently, fluvial scientists and geomorphologists seek more efficient and reproducible methods for grain size measurements.

In recent years, utilizing unmanned aerial vehicles (UAVs) in photogrammetry has revolutionized environmental monitoring techniques and gained popularity. UAVs offer numerous advantages, including affordability, user-friendliness, consistency, and reliability. These benefits bridge the gap between traditional field observations and remote sensing methods using aircraft or satellite platforms (Rakha & Gorodetsky 2018). Researchers have explored various approaches that leverage UAV imagery and image-processing techniques to further capabilities in granulometric analysis utilizing remotely sensed observations. These efforts encompass a range of methodologies, including earlier techniques like photo sieving or image-based methods, which provide a quick, scalable, non-contact, and cost-effective means of granulometric analysis such as the estimation of particle size distribution (PSD) or specific grain size characteristics (e.g., D50 or the mean particle diameter – dm).

However, these methods introduce errors that arise from the nature of the observations used for analysis. Remotely sensed imagery typically provides direct information about the material's surface alone, thus measuring only visible grains, and being limited by image complexity (e.g., vegetation obstructions, variation in color and texture) (Harvey et al. 2022; Manashti et al. 2023). Image analysis techniques can be categorized as direct or indirect methods. Direct methods employ image segmentation for grain size detection by isolating and measuring the visible axes of individual grains in an image. Examples of a few direct methods include those applied by Graham et al. (2005, 2010); Storz-Peretz & Laronne (2013), Detert & Weitbrecht (2012) (BASEGRAIN software), Purinton & Bookhagen (2019, 2021) (PebbleCounts software), and several commercial software packages.

As listed above, many commercial software packages use direct methods to determine PSD. On the other hand, indirect methods extract textural features from 2D or 3D images to estimate PSD based on textural characteristics and spatial arrangement of pixel intensities. Indirect methods can be further classified into a statistical, local pattern, and transform-based feature texture-based approaches. Statistical methods utilize properties of pixel intensity statistics, such as Haralick features (Haralick et al. 1973). Local pattern features establish relationships between the gray level of each pixel and its neighboring pixels, including techniques such as local binary patterns (Ojala et al. 2002), local confirmation patterns (Guo et al. 2011), and completed local binary patterns (Guo et al. 2010). Transform-based features describe image texture in the frequency domain, employing techniques such as wavelet transforms (e.g., Buscombe 2013), Fourier transforms (e.g., Szeliski & Szeliski 2011; Yaghoobi et al. 2019), and Gabor filters (e.g., Tuceryan & Jain 1993; Yaghoobi et al. 2019). Most of these approaches involve comparing manually sieved grain size data with well-sorted and clean sediment imagery (Ohm & Hryciw 2014) and typically require site-specific calibration to establish relationships between texture and grain size in each location.

The suboptimal environmental conditions can be overcome using indirect methods combined with artificial neural networks (ANNs) to estimate grain sizes. Ghalib et al. (1998) employed Haralick features to predict particle size, while Yaghoobi et al. (2019) utilized Fourier transforms, Gabor filters, and wavelet transforms for PSD prediction. In a comparative study by Manashti et al. (2021) involving nine feature extraction methods, ANNs were used to predict PSD, achieving RMSE values ranging from 4.8 to 6.6%. Some researchers have also integrated direct methods with ANNs by incorporating segmented area statistics for PSD determination (Hamzeloo et al. 2014). However, traditional ANNs often require preprocessing techniques to extract relevant parameters, such as textural parameters, from imagery, which can be time-consuming. Additionally, these methods necessitate large amounts of input data.

More recently, there has been a growing interest in utilizing Convolutional Neural Networks (CNNs) for indirect PSD determination. CNNs offer powerful capabilities for image processing by leveraging convolutions with trained filter banks and non-linear activation functions. These Deep Learning (DL) methods, with their millions of weight parameters, have demonstrated significant success in various image analysis domains (e.g., Buscombe 2020; McFall et al. 2020; Lang et al. 2021). Notably, recent advancements have led to the development of highly effective CNN architectures such as AlexNet (Krizhevsky et al. 2012), GoogLeNet (Szegedy et al. 2015), ResNet (He et al. 2016), and others, showcasing exceptional performance in object recognition tasks. This progress has sparked further exploration of CNNs for PSD estimation, pushing the boundaries of indirect analysis techniques using UAV imagery.

SediNet, a CNN developed by Buscombe (2020), is a notable recent addition to sediment image classification methods. SediNet is specifically designed to classify sediment images based on their size characteristics. The network architecture consists of four convolutional blocks trained using manual segmentation of the images as ground truth for PSD estimation. While SediNet achieved reasonably accurate predictions with Root Mean Square Error (RMSE) values ranging from 16 to 45% for grain size percentiles and millimeter sieve sizes, it is essential to note that the model may be prone to overfitting due to the limited size of the dataset. In a study by McFall et al. (2020) focusing on predicting the PSD of beach sands, four types of analyses were compared: direct analysis, transform-based features, and CNN. The results showed that direct analysis had a mean percentage error of 34.2% on D50, which was reduced to 13.0% with certain modifications. The wavelet feature extraction method had a mean percentage error of 36.4% on D50, while the CNN approach achieved a mean percentage error of 22% on D50. As is evident from the literature, CNNs tend to deliver better estimation results relative to their alternatives.

Introducing a groundbreaking approach in CNN-based sediment analysis, GRAINet, developed by Lang et al. (2021), revolutionizes the prediction of PSD and the mapping of grain size metrics, such as mean diameters (dm), using UAV images. This innovative research tackles the challenge of estimating dm across extensive gravel bars in riverbeds, providing high spatial resolution over a large-scale coverage. GRAINet surpasses many limitations of previous studies by delivering grain size curves and maps encompassing a wide geographical area. By extracting global features that transcend human image interpretation and traditional photosieving methods, GRAINet demonstrates exceptional capability in accurately assessing grain size characteristics. Notably, the recent work by Purinton & Bookhagen (2019, 2021) requires capturing individual grains below a specific threshold, while statistical approaches such as Carbonneau et al. (2004) are restricted by input resolution, limiting their predictive power to D50 values above 3 cm. However, GRAINet successfully overcomes these limitations, marking a significant advancement in the field.

Despite the considerable progress made in computer vision and granulometric analysis, digital granulometric analysis has a good deal of room for improvement. One of the primary challenges arises from the variability in the quality of the input photographic data. Factors such as varying lighting conditions, different imaging heights, and utilization of various sensors during field surveys can lead to inconsistent data quality, leading to variations in the accuracy of the estimates, and thus resulting in irreproducible results. Considering these challenges, our study aims to explore the potential and practicalities of utilizing low-altitude UAV orthoimagery and the end-to-end indirect granulometric analysis approach – GRAINet (Lang et al. 2021). We focus on automating the estimation of the mean particle diameter (dm) on an exposed river point bar located in the Javoří Brook stream within the Šumava National Park (Šumava NP) in Czechia.

To assess the reliability of the GRAINet algorithm in grain size estimation under variable conditions, we selected eight scenarios spanning the years 2014–2022. These scenarios encompass UAV imagery captured under different lighting conditions. Our objectives are:

  • 1.

    to test the method's performance in grain size estimation,

  • 2.

    to evaluate the influence of individual input data on the estimation quality,

  • 3.

    and to assess the practicality and potential of employing the GRAINet method for automated granulometric analysis over large spatial domains, utilizing UAV imagery.

By conducting this comprehensive analysis, we aim to contribute to the understanding of utilizing UAV orthoimagery along with advanced CNN techniques in granulometric studies.

Study site

The study site chosen for this research is an exposed point bar in Javoří Brook, a mid-mountain meandering stream in Šumava NP, Czechia (Figure 1). With a catchment area of 11 km2, the stream exhibits a moderate slope and is characterized by substantial spring snowmelt during April–May, as documented by Vlasák (2003). The selected study site, situated at the basin divide with relatively flat topography, typically experiences lower flood extents in the catchment compared to the lowland segments of the streams. Despite this, the site provides a realistic depiction of fluvial processes and encompasses various bed sediments with varying grain sizes. The point bar, exposed to significant fluvial activity as indicated by the hydrological conditions recorded at the Roklanský Brook gauging station (Figure 2(a)), exhibits dynamic characteristics that evolve significantly over time. In addition, Figure 2(a) visually represents the conducted UAV campaigns, which were primarily scheduled following extraordinary floods. These campaigns aimed to monitor and evaluate changes in grain size characteristics. Flash floods were frequent during the study period, with the annual maximum discharge reaching 13 m3/s, and occasional severe flash floods occurred. One of the most severe flash floods occurred in November 2015, leading to complete stream inundation, erosion of the point bar's flat topography, and channel bank erosion (Langhammer & Vacková 2018). The flood peak discharge during this event reached 51.2 m3/s, corresponding to a 5–10-year flood at the Vydra-Modrava station and a 50-year flood at the downstream station Otava-Písek (CHMI 2008). Although subsequent flash floods before and after 2015 occurred, their intensities remained lower than that of the 2015 event but still exceeded the annual maximum discharge threshold of 13 m3/s. The cumulative count of filtered flash floods surpassing the 13 m3/s threshold is presented in Figure 2(b), indicating that peak flows are primarily observed from winter to spring, driven primarily by snowmelt. However, heavy rainfall events during summer and autumn can also contribute to peak flows, resulting in rapid runoff. During flash floods, sediment and debris accumulate rapidly on the inner point bar, causing the river's channel bed to rise, and reducing the channel's water capacity during high-flow events. This can lead to complete flooding or exacerbate the impact of flash floods by altering flow patterns. Therefore, close monitoring and effective management of these areas are crucial to mitigate the downstream effects of flash floods.
Figure 1

Location of the study area (left) and point bar in Javoří Brook (right), Šumava National Park (Šumava NP), Czechia.

Figure 1

Location of the study area (left) and point bar in Javoří Brook (right), Šumava National Park (Šumava NP), Czechia.

Close modal
Figure 2

(a) Displays the hydrograph of flood events observed at the Roklanský Brook gauging station, located 2 km downstream from the monitoring site (Data: Charles University, 2014). The black dashed line shows the conducted UAV campaigns. In contrast, the red line highlights the threshold line of 13 m3/s, and (b) displays the selected events that exceed the threshold of 13 m3/s summarized from 2014 to 2022 with predominant flood peaks during winter, following two peaks during summer and late summer and one in the autumn. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/hydro.2023.079.

Figure 2

(a) Displays the hydrograph of flood events observed at the Roklanský Brook gauging station, located 2 km downstream from the monitoring site (Data: Charles University, 2014). The black dashed line shows the conducted UAV campaigns. In contrast, the red line highlights the threshold line of 13 m3/s, and (b) displays the selected events that exceed the threshold of 13 m3/s summarized from 2014 to 2022 with predominant flood peaks during winter, following two peaks during summer and late summer and one in the autumn. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/hydro.2023.079.

Close modal

Image acquisition

Between 2014 and 2022, we conducted eight surveys utilizing commonly used UAV platforms, including MikroKopter Okto XL, DJI Inspire 1 Pro, and DJI Matrice 210 RTK (Figure 3(a)–3(c)), specifications of which are summarized in Table 1. The flights were conducted at different times of day and under varying lighting conditions to capture a wide range of image qualities, resulting in a ground sampling distance (GSD) between 0.13 cm/pixel and 0.28 cm/pixel (Table 2). These flights were conducted at low altitudes, approximately 6–8 m above the terrain, and targeted an 80% overlap between individual images. To measure ground control points (GCPs), we distributed them over the target river point bar. We used a Topcon HiPer SR geodetic global navigation satellite system (GNSS) receiver with real-time kinematic (RTK) methods and a virtual reference station. In JPEG, we captured nadir-viewing red, green, and blue (RGB) images of the sediment granulometry surface. We used the DJI GS PRO software to ensure consistency across all imaging campaigns for mission planning, flight, and image acquisition parameters.
Table 1

Specifications of the used UAVs

Mikrokopter OctoXLDJI Inspire 1 ProDJI Matrice 210 RTK
Since 2014 Since 2015 Since 2018 
Panasonic Lumix GX7 Zenmuse X5 camera Zenmuse X4s camera 
MFT sensor MFT sensor 1″ sensor 
16 MPx, 4,608 × 3,456 16 MPx, 4,608 × 3,456 20 MPx 
Panasonic lens 24 mm Olympus M. Zuiko ED lens 24 mm Fixed lens 28 mm 
GLONASS + GPS GLONASS + GPS RTK navigation 
Mikrokopter OctoXLDJI Inspire 1 ProDJI Matrice 210 RTK
Since 2014 Since 2015 Since 2018 
Panasonic Lumix GX7 Zenmuse X5 camera Zenmuse X4s camera 
MFT sensor MFT sensor 1″ sensor 
16 MPx, 4,608 × 3,456 16 MPx, 4,608 × 3,456 20 MPx 
Panasonic lens 24 mm Olympus M. Zuiko ED lens 24 mm Fixed lens 28 mm 
GLONASS + GPS GLONASS + GPS RTK navigation 
Table 2

Overview of the data collection, including the tile names, drone platforms used during the campaign, acquisition dates, tile sizes, and the GSD measured in mm/pixel

Tile namePlatformAcquisition dateTile sizeGSD (mm/pixel)
Ortho 1 MikroKopter XL 20.05.2014 284 × 712 ∼1.75 
Ortho 2 MikroKopter XL 04.12.2015 427 × 1,073 ∼1.40 
Ortho 3 DJI Inspire 1 Pro 14.09.2016 356 × 891 ∼1.40 
Ortho 4 DJI Inspire 1 Pro 10.05.2017 335 × 839 ∼1.49 
Ortho 5 DJI Matrice 210 RTK 31.10.2018 269 × 676 ∼1.85 
Ortho 6 DJI Matrice 210 RTK 05.12.2018 222 × 559 ∼2.80 
Ortho 7 DJI Matrice 210 RTK 12.11.2021 319 × 800 ∼1.56 
Ortho 8 DJI Matrice 210 RTK 22.10.2022 373 × 935 ∼1.33 
Tile namePlatformAcquisition dateTile sizeGSD (mm/pixel)
Ortho 1 MikroKopter XL 20.05.2014 284 × 712 ∼1.75 
Ortho 2 MikroKopter XL 04.12.2015 427 × 1,073 ∼1.40 
Ortho 3 DJI Inspire 1 Pro 14.09.2016 356 × 891 ∼1.40 
Ortho 4 DJI Inspire 1 Pro 10.05.2017 335 × 839 ∼1.49 
Ortho 5 DJI Matrice 210 RTK 31.10.2018 269 × 676 ∼1.85 
Ortho 6 DJI Matrice 210 RTK 05.12.2018 222 × 559 ∼2.80 
Ortho 7 DJI Matrice 210 RTK 12.11.2021 319 × 800 ∼1.56 
Ortho 8 DJI Matrice 210 RTK 22.10.2022 373 × 935 ∼1.33 
Figure 3

Summary of UAV platforms. (a) Mikrokopter Okto XL, (b) DJI Inspire 1 Pro, and (c) DJI Matrice 210 RTK.

Figure 3

Summary of UAV platforms. (a) Mikrokopter Okto XL, (b) DJI Inspire 1 Pro, and (c) DJI Matrice 210 RTK.

Close modal

Photogrammetric processing

Using Agisoft Metashape (v1.6 Pro) software, we created topographic RGB Structure-from-Motion (SfM) models. We eliminated blurry and distorted images from our dataset by estimating their quality within Metashape. Images with a quality score below 0.6 were identified and removed from further analysis. We chose the highest quality settings available in Metashape for initial alignment and subsequent filtering of the tie point clouds. To align the models and standardize camera modeling, we used GCPs, including all standard parameters such as focal length (f), radial distortion (k1, k2, k3), the offset of the principal point (cx, cy), and one decentering parameter p1, excluding p2. RGB dense clouds were converted into orthomosaics based on the Digital Surface Models (DSMs) and were georeferenced to the EPSG:4326 coordinate system (WGS 84). The orthophoto mosaics were exported as GeoTiff files to facilitate the determination of digital grain size measurements, precisely the digital dm of the entire inner river point bar and its temporal changes, excluding vegetation-covered sections.

GRAINet architecture

The open-source, CNN architecture, Python package – GRAINet (Lang et al. 2021) is specifically designed to estimate granulometric grading curves (cumulative volume distribution) and grain sizes characteristics (e.g., dm) directly from georeferenced UAV ortho imagery. This architecture adopts the network proposed by Sharma et al. (2020), based on the ResNet50 presented by He et al. (2016). The general process behind the architecture design is established on the guidelines documented in CS231n Convolutional Neural Networks for Visual Recognition (o. J.). The in-depth overview and description of the final CNN GRAINet architecture are provided by Lang et al. (2021).

To summarize, GRAINet uses a relatively simple network structure (Krizhevsky & Hinton 2009; Krizhevsky 2014) that has proven to be effective in various image analysis tasks. It includes a single 3 × 3 entry convolutional layer followed by six residual blocks (three convolutional blocks and three identity blocks). Each block contains three convolutional layers followed by batch normalization and ReLU layers. The network ends with a 1 × 1 convolutional layer, average pooling, and SoftMax to classify the results. The advantage of the GRAINet model architecture is that it does not demand a pixel-accurate object mask nor a count map for training, as used by others (e.g., Sharma et al. 2020), which are laborious to annotate manually. Instead, GRAINet is trained to regress the PSD end-to-end. Thus, labeling new training data becomes much more efficient because there is no longer a need to obtain pixel-accurate object labels. The GRAINet model learns to assess object size frequencies by looking at large image tiles without access to explicit object counts or locations (Lang et al. 2021). However, the GRAINet architecture also promises faster training in terms of use, higher learning rates, and less careful initialization (Ioffe & Szegedy 2015). Given the reasons mentioned earlier, we applied this open-access tool to generate dm estimation variability maps directly from UAV-generated ortho images, considering only image tiles and eliminating the need for manual sieving and classification of sediment samples. As we encountered running the python package via the Anaconda environment, as advised, we dockerized the installation environment by pre-installing dependencies and then ran the model instances in the docker container. (GitHub link: https://github.com/veethahavya-CU-cz/psd_analysis_grainet.git). By employing containerization, we achieved a more efficient and reliable execution of the model, ensuring a streamlined process throughout.

Model setup and learning

To test the applied methodology, we created a grid with 0.5 × 1.25 m for each orthomosaic to divide the orthomosaics into smaller, manageable image tiles for analysis using QGIS software (Figure 4). To ensure consistency in the grain size distribution within each tile, we utilized the stream flow direction to determine whether the tiles should be horizontally or vertically cut. In our study, all tiles were vertically cut since the stream flow direction followed a north northwest (NNW) to south southeast (NNE) orientation (Figure 4). After choosing tiles for manual annotation (Figure 5(a)) using the digital line sampling method as proposed by Lang et al. (2021), we drew polygons of at least 40–150 grains along the centerline of a tile (Figure 5(b) and 5(c)), and each tile was saved in GeoTIFF format. The manual annotation took, on average, between 20 and 30 min per tile. Afterwards, the minor axis of all annotated grains per tile was measured by fitting a rectangle into each grain shape using the bounding box tool in QGIS (Figure 5(c)). Measuring each annotated grain's minor axis helps obtain the dm for each tile. The minor axis corresponds to the shortest distance between two parallel lines that enclose the grain's extent in the direction perpendicular to the major axis (i.e., the longest distance between two parallel lines that enclose the grain's extent). The minor axis is commonly used as a proxy for grain size in granular materials, and it is often assumed to be the grain's equivalent diameter.
Figure 4

Example image tiles (0.5 × 1.25 m) from each orthomosaic with slightly varying image resolutions and lighting conditions. Each of the eight tiles is taken from the same gravel bar at different dates.

Figure 4

Example image tiles (0.5 × 1.25 m) from each orthomosaic with slightly varying image resolutions and lighting conditions. Each of the eight tiles is taken from the same gravel bar at different dates.

Close modal
Figure 5

Overview example of the used digital ground truth line sampling procedure. (a) Randomly chosen tiles at the original RGB orthoimage using the grid (0.5 × 1.25 m) for tile extraction in the stream flow direction (ortho 1). (b) An example of a section chosen along the river point bar with manually labeled ground truth grains, and (c) a digital line sample with 40–150 grains with automatic extraction of the b-axis (purple rectangles around the grains).

Figure 5

Overview example of the used digital ground truth line sampling procedure. (a) Randomly chosen tiles at the original RGB orthoimage using the grid (0.5 × 1.25 m) for tile extraction in the stream flow direction (ortho 1). (b) An example of a section chosen along the river point bar with manually labeled ground truth grains, and (c) a digital line sample with 40–150 grains with automatic extraction of the b-axis (purple rectangles around the grains).

Close modal

To facilitate the generation of input data for the GRAINet model and enable accurate predictions for individual rasters, we needed to create an npz-file dm-frequency histograms, image tiles, tile names, and mean dm values extracted through manual annotation. These components are crucial for the algorithm to process the data accurately and estimate grain sizes for each raster. Since the algorithm's authors did not provide the npz-file generator, we took the initiative to create our version accessible through the GitHub link earlier.

In our script, we quantized the b-axis measurements of all rectangles extracted from one image tile into 21 bins to create a histogram. This process follows the relative frequency distribution of grain sizes described by Lang et al. (2021). Additionally, the image tiles needed to be precisely cropped to ensure they were all the same size and resolution. We saved them in GeoTiff format in a single folder with a sequential enumeration, such as "test_tile_1," "test_tile_2," and so on. This meticulous preparation of the image tiles simplifies the encoding of the metric scale into the CNN output, making it easier for the model to learn and estimate the dm and grain size distribution. Suitable tools may be used in QGIS or other image-processing software to accomplish this. The dimensions of the image tiles varied, ranging from a minimum of 222 × 559 pixels to a maximum of 427 × 1,073 pixels, depending on the resolution of the orthoimage used to create the 0.5 × 1.25 m grid (Figure 4). We also saved the corresponding shapefiles containing the extracted mean dm values obtained through polygon annotation. These shapefiles were stored in a separate folder and saved in MS Excel format. Additionally, individual masks were manually created for each raster to consider only the regions covered by grains, and not by obstructions such as vegetation. So, the comprehensive final input comprised the npz-file, the orthoimage, and mask for the GRAINet model.

For the GRAINet tool, a standard preprocessing step involves normalizing the RGB bands of the image tiles. This process scales the pixel intensities of the image to a standard range, aiding faster and more stable model convergence during training. By centering the data around a mean value close to 0 and scaling it to have a standard deviation of 1, the pixel value distribution conforms to a standard normal distribution. As Lang et al. (2021) described, this normalization technique can enhance the performance of gradient-based optimization algorithms employed in training CNNs. Further, to successfully apply the GRAINet model, choosing appropriate hyperparameters related to the CNN architecture and optimization procedure is essential. These hyperparameters include the learning rate, batch size, and the number of epochs. The learning rate determines the step size for updating the model parameters during training, with random batches drawn from the training data. The batch size represents the number of training examples presented in a single batch. The number of training epochs specifies the complete passes through the training dataset during gradient descent. For training the GRAINet model, the Adam optimizer is employed. Adam is an adaptive stochastic optimization technique specifically designed for training deep neural networks. It adjusts the magnitude and speed at which each parameter is updated, controlled by the learning rate (Kingma & Ba 2017). The learning rate influences how the gradient of the loss function modifies the network weights. Selecting an appropriate learning rate is crucial. If the learning rate is initially set too large, the training process may oscillate and exhibit instability. Each iteration could overshoot and diverge on both sides of the optimal value (Wang et al. 2022). Conversely, if the learning rate is too low, the convergence speed is slow, and overfitting may occur (Wang et al. 2022). The advantage of the Adam optimizer lies in its adaptive adjustment of different parameters. During the training process, we kept the hyperparameters for the Adam optimizer constant, adhering to the proposed default values. Specifically, we maintained a fixed learning rate of 0.0003 throughout training. The sensitivity analysis of these hyperparameters, however, does not fall into the interest of this study. For information regarding the same, we wish to refer you to the original study.

We conducted experiments with different parameter configurations to optimize the model's performance. For example, we tested larger batch sizes, such as six or eight image tiles, and extended the training epochs to 150. However, we found that these alternative parameter settings reduced the accuracy of the results and increased the computational cost. As a result, we decided to maintain a batch size of four and trained the models for 100 epochs. Even though individual model runs took more hours or even days to train without parallelization across multiple GPUs.

Model performance evaluation

The loss function is a crucial element of a CNN, as it represents an objective against which the model's performance is measured. The current loss is computed during each iteration, estimating the present state of the model's performance. Different error metrics exist to approximate ground truth distributions. Here, we focused on one popular and intuitive metric. Although the GRAINet environment offers more, we used the reasonably performed metric, the Kullback–Leibler divergence (KLD), also called relative entropy. It measures how one probability distribution diverged from a second expected probability and was introduced by Kullback & Leibler (1951). The KLD from P to Q is often denoted as DKL(PQ). It is calculated as a difference between the entropy and cross-entropy, which is described by the formula,
(1)
where P() and Q() are discrete probabilities. KLD is not a distance measure in the mathematical sense, which means that the DKL(PQ) is not equal to DKL(QP). A divergence of 0 indicates that two distributions are identically distributed. To optimize and estimate CNN variants that directly predict scalar values like the direct prediction of the mean dm, GRAINet uses the following indicators to evaluate the model performance:
  • the mean absolute error (MAE), also known as the L1 loss function,

  • the mean squared error (MSE), also known as the L2 loss function, and

  • the root-mean-square error (RMSE).

The mathematical formulations of the error metrics are described in the following equations.
(2)
(3)
(4)
Here, n denotes the sample number, yi is a target variable of sample i, and yipred is the predicted target variable of sample i. MAE, or L1, is the average absolute difference or error between target and estimated values. The MAE value range is between 0 and +∞, with lower MAE values indicating higher prediction model accuracy. One advantage of using MAE is that it has the same unit as the original data, making it simple to calculate and comprehend. Additionally, MAE is often used as a symmetrical loss function L1, as stated by Flores (1986) and Sanders (1997). On the other hand, MSE, or L2, is the average squared differences (errors) between the target and estimated values. The range of MSE is (0, +∞). The smaller the MSE value, the higher the prediction accuracy. The perfect value of MSE is 0, indicating that the prediction model is perfect. The difference between those two loss functions is their sensitivity to errors. The MAE treats all errors equally and returns more interpretable values, while MSE represents significant anomalies and is used where outliers should be detected (Jadon et al. 2022). Alternatively, the RMSE returns the MSE error to the original unit by taking the square root of it while maintaining the property of penalizing higher errors. In the same manner as MSE, the range of RMSE is (0, +∞); the smaller the RMSE value, the higher the accuracy of the prediction model. In contrast with the MSE, the units of the RMSE are the same as the original units, making the RMSE more interpretable than MSE. Furthermore, GRAINet evaluates model bias with the mean error (ME):
(5)
where a positive ME indicates the prediction is greater than the ground truth (Lang et al. 2021). Hence, these four performance indices were used to show the goodness of the applied GRAINet approach in this study. To validate our models, we followed the random tenfold cross-validation used by Lang et al. (2021). Each model was trained 10 times, holding out 10% at randomly selected parts of the data, and used to monitor model performance results based on all ten folds combined to assess the overall performance. It is important to note that the predictive accuracy of the models relies on the quality of the labels used for training. Labeling noise leads to inferior performance of the model and is affected partially by the labeling method itself. It is worth noting that grain annotation in image tiles is somewhat subjective and thus differs across annotators, including in our own case study.

Model performance compared with digital ground truth data

We first tested the GRAINet approach to predict the dm grain size characteristics of an entire gravel point bar between 2014 and 2022 to track its changes. A comprehensive labeling process was undertaken to provide an overview of the annotated ground truth data utilized for model training and evaluation. A total of 65,836 thousand grains, encompassing a diverse spectrum of sizes, were meticulously annotated across 761 tiles (Table 3). As displayed in Table 3, the smallest observed particle in the dataset had a dm of approximately 1.4 cm, while the largest particle measured nearly 10 cm in dm. The mean dm values ranged from 3.5 cm to 5.6 cm, with an overall mean dm of 4.15 cm for all particles. The range of particle sizes within the dataset varied between 3.3 and 8.0 cm in dm. Due to the limited sensitivity of the algorithm in predicting particles smaller than gravel, only samples of gravel to cobble size were identified and labeled, even though sandy patches were present away from the channel and a sand-dominated area down-bar around the bend. We chose challenging samples that were not clean, including natural disturbances such as grass, moss, small wooden branches, and slightly flooded river-bed zones, to train a robust CNN.

Table 3

Overview of the investigated gravel point bar, including the tile names, the number of labeled tiles using digital line sampling, and the corresponding sample sizes

Tile nameNumber of labeled tilesSample size (n)Mean dm (cm)Min dm (cm)Max dm (cm)Range dm (cm)
Ortho 1 70 7,740 4.5 2.1 6.5 4.4 
Ortho 2 112 8,212 4.5 2.3 7.4 7.4 
Ortho 3 125 10,935 4.0 1.4 6.5 5.1 
Ortho 4 90 7,726 3.9 1.9 9.9 8.0 
Ortho 5 102 9,178 3.8 1.8 5.9 4.1 
Ortho 6 100 8,472 3.5 1.5 8.1 6.6 
Ortho 7 100 7,896 5.6 3.9 7.2 3.3 
Ortho 8 62 5,677 3.7 1.8 6.0 4.2 
Tile nameNumber of labeled tilesSample size (n)Mean dm (cm)Min dm (cm)Max dm (cm)Range dm (cm)
Ortho 1 70 7,740 4.5 2.1 6.5 4.4 
Ortho 2 112 8,212 4.5 2.3 7.4 7.4 
Ortho 3 125 10,935 4.0 1.4 6.5 5.1 
Ortho 4 90 7,726 3.9 1.9 9.9 8.0 
Ortho 5 102 9,178 3.8 1.8 5.9 4.1 
Ortho 6 100 8,472 3.5 1.5 8.1 6.6 
Ortho 7 100 7,896 5.6 3.9 7.2 3.3 
Ortho 8 62 5,677 3.7 1.8 6.0 4.2 

Additionally, the ground truth statistics of the characteristic mean diameter (dm) are provided.

The performance of dm estimated by GRAINet was compared to manually annotated tiles. The estimated values for the entire dataset were analyzed. The results show that for 547 tiles, which accounts for approximately 68.5% of the dataset, the predicted values fell within 1σ (one standard deviation). Furthermore, for 726 tiles, representing approximately 90.75% of the dataset, the predicted values fell within 2σ (two standard deviations). Table 4 displays the corresponding percentages of estimated dm values lying within the ranges −1σ and 2σ for each orthoimage.

Table 4

Summary of the percentage of tiles

Tile name1σ (%)2σ (%)
Ortho 1 74.29 97.14 
Ortho 2 90.18 97.32 
Ortho 3 77.6 95.2 
Ortho 4 79.41 95.1 
Ortho 5 60.0 99.0 
Ortho 6 65.22 94.57 
Ortho 7 70.0 94.44 
Ortho 8 54.84 100 
Tile name1σ (%)2σ (%)
Ortho 1 74.29 97.14 
Ortho 2 90.18 97.32 
Ortho 3 77.6 95.2 
Ortho 4 79.41 95.1 
Ortho 5 60.0 99.0 
Ortho 6 65.22 94.57 
Ortho 7 70.0 94.44 
Ortho 8 54.84 100 

Figure 6 illustrates the performance of GRAINet compared to manual annotation. The figure illustrates the level of agreement between manual annotation and GRAINet's dm estimates for different cases. The orthomosaics from 2015 (ortho 2) achieved the best fit, with a difference of 0.07 cm, indicating a high level of similarity between manual annotation and GRAINet's dm predictions. Similarly, the case from 2018 exhibited a relatively small difference of 0.25 cm, indicating a close match between manual annotation and GRAINet's dm predictions. On the other hand, the remaining data points exhibit greater differences, indicating more significant disparities between manual annotation and GRAINet's dm predictions for those orthomosaics. These disparities range within a few centimeters, with the highest difference observed in the case from 2017, reaching 3.6 cm implying a reduced level of agreement.
Figure 6

Comparison of human-annotated dm performance to the predicted GRAINet dm.

Figure 6

Comparison of human-annotated dm performance to the predicted GRAINet dm.

Close modal
Figure 7 illustrates the underestimations and overestimations of GRAINet's dm estimate across the entire dataset. The figure depicts the grain size ranges spanning from 1 to 10 cm. It visually represents the degree of deviation from the actual grain sizes, as represented by the manually annotated tiles. The analysis of the results reveals the following insight. A high degree of underestimation is observed for smaller grains, especially for grains <1 cm in mean diameter (64.81%), while suffering from no overestimation. The degree of underestimation progressively decreases as the grain size increases. Overestimation is first observed for grains between the range of 1 and 2 cm, and the degree of overestimation is also the highest for this range (12.88%). Much like the degree of underestimation, the disparity decreases as the grain size increases, thus demonstrating GRAINet's reliable estimation confidence for grains over 3 cm.
Figure 7

Underestimations and overestimations of GRAINet's estimated dm.

Figure 7

Underestimations and overestimations of GRAINet's estimated dm.

Close modal

GRAINet model prediction accuracies

We trained the GRAINet models using the KLD loss function. The algorithm generated results for each test fold individually. To comprehensively evaluate the model's performance, we derived the average results from all 10-fold cross-validation for each model. This approach allows us to assess the model's performance across multiple folds and provides a reliable estimate by considering various loss functions, such as MAE, MSE, RMSE, and ME. By incorporating different loss functions, we could evaluate the model's accuracy, precision, and overall performance from different perspectives, enhancing the robustness of the assessment. Based on this approach, Figure 8 illustrates the accuracies of eight of the cases from different years. The performance evaluation indicates that the MAE ranges from 1.96 to 4.45 cm. Additionally, the MSE exhibits an increased range from 7.13 to 27.24 cm. Furthermore, the RMSE ranges from 2.49 to 4.72 cm. Most of the GRAINet's dm estimation tends to be underestimated, as illustrated by negative MEs (from −3.38 to 0.11 cm), except for the slightly overestimated model from 2021 (ortho 7; ME = 0.11 cm), as depicted in Figure 8. Based on the evaluation results, the estimation errors of the eight models can be ranked as follows from highest to lower error: ortho 3 > ortho 4 > ortho 5 > ortho 1 > ortho 7 > ortho 6 > ortho 8 > ortho 2. A summary of the dm prediction performance of GRAINet, based on random tenfold cross-validation, can be found in Table A1. Additionally, scatterplots comparing the estimated dm values (cm) from the GRAINet outputs to the digital ground truth data are provided in the Supplements (Figure S1).
Figure 8

Box plots illustrate the accuracy performance of all models, showing the values for mean absolute error (MAE), mean standard error (MSE), RMSE, and ME in cm.

Figure 8

Box plots illustrate the accuracy performance of all models, showing the values for mean absolute error (MAE), mean standard error (MSE), RMSE, and ME in cm.

Close modal

GRAINet's dm prediction maps

To achieve optimal performance, we trained the GRAINet CNN architecture with substantial training data, which is considered more critical than the image quality (Van Horn et al. 2015). Consequently, by utilizing a single example of a selected gravel bar, we generated eight high-resolution prediction maps depicting particle size characteristics, specifically the dm, along the inner bend deposits of the Javoří Brook. These maps, in Figures 9 and 10, highlight the spatial and temporal variations of the mean dm over multiple years (2014–2022). Additionally, alongside the estimated GRAINet dm values, Figures 9 and 10 illustrate the incorporation of input orthomosaics and histograms, illustrating the distribution of the GRAINet estimated dm values. The predicted GRAINet dm maps exhibit favorable outcomes, revealing spatial variabilities within the entire point bar and differences in dm magnitude between the years that seem reasonable. The resulting maps also show a distinct sediment deposition pattern, with coarser material settling at the bar head where sediment bedload energy is higher and finer sediment accumulating towards the downstream end of the point bar where energy is lower. They vividly illustrate the movement of gravel bedload material and resulting changes in sediment distribution, leading to a narrower sediment accumulation over time. These changes have facilitated the establishment of vegetation on both sides of the river point bar.
Figure 9

Right: maps of the GRAINet dm predictions with histograms showing its dm distribution in cm. Left: Orthoimages from 2014 to 2017.

Figure 9

Right: maps of the GRAINet dm predictions with histograms showing its dm distribution in cm. Left: Orthoimages from 2014 to 2017.

Close modal
Figure 10

Right: maps of the GRAINet dm predictions with histograms showing its dm distribution in cm. Left: orthoimages from 2018 to 2022.

Figure 10

Right: maps of the GRAINet dm predictions with histograms showing its dm distribution in cm. Left: orthoimages from 2018 to 2022.

Close modal

However, upon closer examination of the distribution histograms of dm (Figure 9), we gain insights into the possible discrepancies when using this algorithm for primary mapping purposes. For example, the predicted dm map of 2014 (ortho 1) shows a relatively normal distribution of grain sizes, suggesting infrequent or stable flood occurrences before 2014. However, it is essential to note that Mirijovský & Langhammer (2015) documented a significant flood event in June 2013, equivalent to a 2–5-year flood. This event was triggered by intense rainfall in the basin, which was already saturated due to snowmelt. It might result in the downstream flushing of coarser sediment, temporarily reducing the coarser sediment load during that period. Subsequent peak flow events triggered by summer rainstorms were brief and did not reach a discharge level of 13 m3/s.

In contrast, in 2015 (ortho 2; Figure 9), several flash floods impacted the stream, leading to significant terrain changes. These changes are evident in the noticeable differences between the predicted dm maps of 2014 (ortho 1) and 2015 (ortho 2), indicating a slight increase in the coarser sediment load (Figure 9) and following a distinct deposition sediment pattern with coarser material at the river head and finer sediment settling downstream. However, there are instances where the predicted dm maps show discrepancies with the actual sediment distribution, suggesting the need for improved accuracy. In particular, the algorithm faced challenges in accurately identifying particles in the samples, particularly in ortho 3 and ortho 4, resulting in higher sediment heterogeneity over the point bar (Figure 10) and mispredictions. In 2018 (ortho 5), a noticeable increase in vegetation cover was observed, while the predicted dm maps show relatively stable grain size characteristics (Figure 10). The reduced morphological activity of the point bar over time, with narrower sediment accumulation, is even more evident in 2021 (ortho 7; Figure 10). As of 2022, the observable dynamics remain low, indicating that the morphological changes of the bar have stabilized (ortho 8; Figure 10).

Error analysis

Our study's main findings demonstrate that the GRAINet algorithm's predictions for dm showed reasonable results when evaluating specific cases on a single river point bar at the Javoří Brook over multiple periods. However, it is important to note that not all models performed equally well, and there were cases of misleading predictions. Several factors could contribute to these discrepancies.

The sources of error in image-based grain size measurements can be diverse, as discussed in the study by Chen et al. (2022). One fundamental error arises from estimating the three-dimensional (3D) properties of grains based on their two-dimensional (2D) representation in images. For example, accurately determining the vertical axis of a grain solely from an image is challenging. Algorithmic errors can also occur due to limitations in the image-processing algorithm used. Environmental errors may arise from suboptimal conditions during the image procurement, such as vegetation, noise between grains, and inadequate lighting. These factors can lead to misidentifying grain boundaries and result in measurement errors. Image quality errors can be introduced by the size of image tiles and the image resolution. The image resolution affects the ability to detect small grain sizes, and the choice of image tile size can impact measurement accuracy. Additionally, errors can arise from grain characteristics, including the distribution of grain sizes, irregular grain shapes, and image distortions. A wide distribution of grain sizes can lead to larger errors, especially when accurately detecting smaller grains. While the fundamental errors have been extensively discussed in previous literature (Graham et al. 2010), their impact on the final aggregated prediction results of grain size characterization is likely to be minimal (Sime & Ferguson 2003; Graham et al. 2005; Detert & Weitbrecht 2012; Lang et al. 2021).

In this discussion, we focus on the benefits and drawbacks of CNN, manual labeling, and training techniques utilized by GRAINet. Additionally, we will examine how environmental errors might impact the model's ability to make accurate predictions.

Using CNNs with high-resolution UAV imagery for granulometric analysis in fluvial geomorphology is an efficient and promising approach, providing several advantages over conventional and optical methods. CNN-based models, such as GRAINet, can handle large amounts of data effectively (Yan et al. 2016). GRAINet, specifically, can rapidly process vast volumes of UAV orthoimagery with high spatial resolution, reducing the reliance on manual field measurements and enabling analysis of extensive study areas (Lang et al. 2021). The CNN-based GRAINet model capitalizes on the inherent capability of CNNs to capture complex spatial features and patterns, resulting in accurate classification and segmentation of grain sizes within UAV imagery, leading to improved granulometric measurements (Krizhevsky et al. 2012). In contrast, traditional optical granulometry methods often employ simple image-processing techniques, such as edge detection and thresholding, which may struggle to capture the nuanced grain size variations observed in natural environments (Buscombe et al. 2016). CNN-based approaches offer high adaptability and transferability, allowing the trained model to be fine-tuned for different granulometric settings with minimal additional training data, facilitating cost-effective model deployment in various river segments and conditions across river basins (Pan & Yang 2010; Lecun et al. 2015). However, it should be noted that CNNs, including the GRAINet algorithm, necessitate a substantial amount of labeled training data, which can be time-consuming and resource-intensive to collect and digitally annotate. Additionally, manual labeling introduces subjectivity, leading to variability and inconsistencies in interpreting grain boundaries (Lang et al. 2021). It requires significant expertise from the operator and is considered the most robust and reliable method when applied to fluvial environments. However, one potential reason for the variability in model performance is the availability of comprehensive training data. In the evaluation and validation conducted by Lang et al. (2021), they collected a dataset of 1,491 digitized image tiles from 25 gravel bars across six Swiss rivers. The average performance of the model across all 25 gravel bars resulted in a mean absolute error (MAE) of 0.3 cm and an ME of 0.1 cm. However, creating a large, manually labeled training dataset is laborious. Assuming we had enough data input for training, we missed an opportunity to enhance the robustness, reduce bias, and improve the accuracy and reliability of our predictions by not employing the geographical cross-validation technique used by Lang et al. (2021), which might reduce the constant underestimation of grains by the GRAINet algorithm. Like other DL methods, the predictive ability is highly dependent on the quality of training datasets, such as the number and diversity of training images.

The model's predictive ability might also be influenced by the image tiles' size, as Chen et al. (2021) highlighted. Larger image tiles may under-split due to limitations in GPU memory, while smaller tiles may over-split due to limitations imposed by the size of the largest grain that needs to be detected. Finding the optimal image tile size ensures accurate and reliable predictions. In our study, we employed various image tile sizes based on the resolution of the orthoimagery, as detailed in Table 2. Interestingly, we observed that the models using the largest image tiles exhibited the best predictive performance but required the longest processing time. This finding suggests that the choice of image tile size can significantly impact the models' accuracy and computational efficiency. However, it is essential to note that our study has limitations, and further research is warranted to thoroughly investigate the optimal image tile sizes in granulometric analysis. These findings should be considered in future studies to enhance the understanding and improve the methodology in this field.

Influence of environmental errors

Additionally, the tested GRAINet approach does not require parameter tuning by the user (Lang et al. 2021), promising that the algorithm is robust to varying conditions; therefore, we did not tune our input image data. However, the key parameter of all optical methods, including optical granulometry, is the lighting conditions and the uniformity of illumination of the scene, involving significant challenges in image analysis. For instance, insufficient lighting can lead to dark or low-contrast images, where details may be difficult to discern. On the other hand, unfavorable lighting conditions, such as solid shadows or harsh highlights, can result in uneven exposure and adversely affect the overall image quality. These variations in lighting can introduce bias and impact the accuracy of optical granulometry measurements. This is a rather fundamental limitation in natural environments, where research is carried out in different seasons and under different atmospheric conditions. At the same time, when monitoring remote areas where it is impossible to repeat imaging campaigns frequently and in a reproducible manner, it is necessary to consider this limitation and use methods that can at least partially compensate for the influence of different atmospheric conditions. By assuming that these factors may introduce bias and impact the algorithm's performance, we did additional statistical analyses to address this concern, focusing on various image properties such as shadow proportion, hue, saturation, contrast, and overexposure proportion (Figure 11).
Figure 11

Metrics related to image quality assessment are expressed in percentages except for the hue.

Figure 11

Metrics related to image quality assessment are expressed in percentages except for the hue.

Close modal

A higher proportion of shadows (cast shadows, self-shadows, interstitial shadows, or odd shadowing during the afternoon or late afternoon flights) in the input orthoimage could cause false-positive identification of grains and their eventual classification, leading to the erroneous prediction of grain sizes. When considering the hue, a color property in the HSV (hue, saturation, value) provides insights into the dominant colors in the image. There is no inherent meaning of "high" or "low" hue in the context of these statistics, but we can interpret the hue values in the image to determine if specific colors dominate or are lacking in the image (Figure 11). A color-cast, i.e., a significant hue in the image, affects the chromatic fidelity of the image, rendering it monotone and thus causing loss of chromatic detail in the image. Saturation measures the intensity and fidelity of the colors in the image. A high degree of saturation means the colors are vivid and intense, while a low saturation means the colors are more muted. Lower saturation values, too, render the image monotonic, thus leading to a loss of detail. Contrast refers to the difference in brightness between an image's light and dark areas. High-contrast images differ significantly between the light and dark areas, resulting in more vivid and well-defined features. In contrast, low-contrast images have less difference between the light and dark areas, resulting in a more muted appearance and less well-defined features. Orthoimages suffering from a high degree of overexposed highlights might result in a loss of detail in the image, especially in its saturation. They may result in the model failing to classify individual grains or even potentially lead the model to identify and classify a cluster of grains as a single grain, leading to erratically erroneous dm predictions in the former case and a systematic overestimation of dm in the latter case.

Analyzing and considering these lighting variations and comparing the results shows that image properties have a minimal impact on the overall performance. When comparing the best-performing models from 2015 (ortho 2), 2018 (ortho 6), 2021 (ortho 7), and 2022 (ortho 8), they demonstrate opposing image conditions, yet both achieve the highest performance. Based on these observations, we can say that the GRAINet algorithm is fairly robust against such disturbances.

Apart from the above-mentioned sources of uncertainty, other environmental parameters are beyond operators' control and can contribute to potential model bias. These parameters include vegetation, obstacles, human impacts, and technical issues such as motion blur, camera sensor noise, incorrect exposure settings, sharpness, intergranular and unseen grains, and image resolution.

In our study, we deliberately selected a river point bar with some vegetation. Throughout the study, there was a gradual increase in vegetation coverage on the bar. Masking out the vegetation within the orthomosaics, mainly moss and individual tussocks (Figure 12(a) and 12(b)), posed a significant challenge. Additionally, leaves, stems, or plant debris can obstruct smaller particles, limiting particle identification algorithms and leading to false detection of grain sizes (Figure 12(c)). In the case of conventional optical granulometry based on manual individual imaging, the imaged area can be manually cleaned before image capture. In the case of UAV imaging, cleaning the imaging area of vegetation debris is often not even possible due to the extent of the area or its physical inaccessibility. Therefore, extensive post-processing is required when dealing with automatically acquired data. To overcome this limitation, vegetation must be correctly masked, and the number of image tiles could be increased to augment the training dataset for manual annotation. This requires a substantial investment of time and resources and may result in inaccuracies stemming from the subjective annotation of image tiles.
Figure 12

Factors leading to wrong predictions. (a) Wrong masking of vegetation (the white color is the mask, and annotated grains are colored) with appearing moss between grains, (b) tussocks, (c) vegetation debris, (e) small-scale warping and artifacts, and (f) sandy patches wrongly predicted along with a line of annotated grains.

Figure 12

Factors leading to wrong predictions. (a) Wrong masking of vegetation (the white color is the mask, and annotated grains are colored) with appearing moss between grains, (b) tussocks, (c) vegetation debris, (e) small-scale warping and artifacts, and (f) sandy patches wrongly predicted along with a line of annotated grains.

Close modal

Additionally, the human impact should be considered when assessing changes in the granulometry of fluvial accumulations, even in natural areas outside settlement areas or stream modifications (Figure 12(d)). Granulometric analysis of point bars is performed at a detailed scale and with very high resolution at the level of individual stones. Thus, even low-intensity activity can influence grain size composition and distribution changes. An example is the study area, located in a protected area of an NP but accessible to tourists who regularly alter the riverine accumulation environment. For instance, human construction of stone cairns or other artificial structures can displace and eliminate gravels, resulting in dark/mixed gradients and significant heterogeneity in gravel accumulation, ultimately affecting the accuracy of dm predictions.

Regarding the technical component, motion blur plays a significant role in image capture and can notably impact the quality and accuracy of the resulting images. Motion blur occurs when there is movement during image capture, such as caused by wind or the motion of the camera or subject, leading to the blurring or smudging of details, ultimately affecting the overall sharpness and clarity of the image. On the other hand, camera sensor noise refers to unwanted random variations in brightness or color caused by the camera's electronic components, which can degrade image quality and interfere with subsequent image processing or analysis. Incorrect exposure settings occur when the camera's parameters, such as aperture, shutter speed, or ISO sensitivity, are not appropriately adjusted for the prevailing lighting conditions. This can result in overexposed or underexposed images, compromising the visibility of details and potentially leading to information loss, which we already discussed. In addition to these technical issues, adequate image resolution is crucial for accurately identifying and separating individual clasts and detecting their parameters. Lower-resolution images may lack sufficient texture information for precise PSD estimations. Poor-quality orthoimage maps further exacerbate the challenge, making it difficult to outline sediment objects accurately, like partly in our study. Issues like small-scale warping, artifacts, incorrect image stitching, blurred gravel boundaries, and irregular grain shapes can arise from undistorted single nadir images (Figure 12(e)). To address these issues, it is important to conduct UAV flyovers slowly and steadily, monitor scenes repeatedly by increasing the number of images and reconstructions, and choose appropriate image acquisition formats such as RAW instead of JPEG.

By the findings of Lang et al. (2021), GRAINet's effectiveness diminishes when it comes to coarser gravel bars, as observed in their research. This may be corroborated by our study, where we observed underestimation/overestimation in grains within the range of 3–10 cm. The observed decrease in performance in grains <3 cm can be attributed to several factors. Firstly, the dataset has more significant variability, which can challenge accurate predictions (Lang et al. 2021). Additionally, the substantial influence of larger individual grains can affect the algorithm's overall performance (Lang et al. 2021). Furthermore, an imbalanced data distribution may contribute to decreased performance. Another potential reason for the decreased performance is the algorithm's limited ability to estimate dm for lower image resolutions, as highlighted by Lang et al. (2021). This limitation can impact the accuracy of the predictions, mainly when dealing with image resolutions up to 2 cm GSD as in our case. It is recommended to incorporate digital line samples that include finer sediment (Figure 12(f)) in the training and testing datasets to enhance the accuracy and expand the application of GRAINet. This additional data can help improve the algorithm's performance and enable more reliable predictions.

Limitations and future work

We tested the GRAINet algorithm using UAV orthoimagery to predict the grain size characteristics of the entire river point bar in terms of dm size. While the model was trained with adequate training data, the overall performance of GRAINet was satisfactory. However, grain sizes were consistently underestimated, particularly for grains smaller than 3 cm. This underestimation can be attributed to a lack of training data for smaller grains, resulting in the algorithm misidentifying cohesive sands, vegetation, and human impacts on grains. To address this limitation, additional training datasets from diverse environments are recommended. Nonetheless, the preparation of such datasets involves manual labeling and is time-consuming. Even experienced operators may face challenges identifying grains, especially small grains <1 cm, in images with dense moss and vegetation debris. Like other DL methods, the smallest identifiable grain size by the GRAINet algorithm is constrained by image resolution, and the size of the image tiles limits the grain pattern learned by the model. Further research is needed to explore these limitations in more depth. An additional implication could be creating an individual digital data library for training. This library would consist of a diverse collection of river environmental scenarios that have not been previously encountered. The algorithm can learn to generalize and make accurate predictions in unseen river environments by utilizing such data in training. This approach would contribute to improving the algorithm's adaptability and enhancing its performance in real-world applications.

Additionally, the steep learning curve involved in installing the model for the end user may act as a barrier keeping researchers in the field from utilizing the model. Containerization of the model, as in the case of this study (with docker) and building of extensive example applications along with robust documentation could alleviate such a barrier.

This study is aimed at assessing the practical on-field potential of the GRAINet CNN approach in utilizing UAV optical aerial imagery to predict grain size characteristics, specifically mean diameter (dm). GRAINet offers an innovative and efficient solution for particle size analysis on river-bed gravel bars by comparing model-estimated dm with manual annotations obtained through digital line sampling. The research involved eight UAV campaigns between 2014 and 2022, covering an entire gravel river point bar in the Javoří Brook, Šumava NP, Czech Republic, and their subsequent evaluation of granulometric changes on this dynamically shifting point bar utilizing the GRAINet model. The study revealed relatively high accuracies in the resulting dm prediction maps. The MAE (L1) from tenfold cross-validations ranged from 1.9 to 4.4 cm, while the MSE (L2) and RMSE varied between 7.13 and 27.24 cm and 2.49 and 4.07 cm, respectively. In most models, there was a tendency towards underestimation, particularly in predicting smaller grains with dm <3 cm or even <1 cm, which proved challenging for the algorithm due to the high resolution (<2 cm) orthoimages. Nonetheless, the large-scale morphodynamics was captured by the model with agreeable accuracy. Considering its ability to make predictions over large domains, one may confidently conclude that the method and the model demonstrate a great deal of promise in drastically reducing the time and resources to assess and analyze river morphodynamics at large scales as compared to alternate methods that are less automated.

Despite encountering common limitations in optical granulometry methods, such as varying light conditions, including shadows, saturation, and overexposure, the GRAINet algorithm exhibited relative robustness. However, vegetation, cohesive sand, water-lodged image tiles, human impacts, small-scale warping, and image artifacts presented significant challenges. These factors could lead to the misidentification of grains and introduce predictive errors, especially for fine sediment prediction. Informing oneself of these complexities is crucial for addressing and potentially enhancing the accuracy and reliability of the algorithm.

This method, and by extension, the model, may also be used in combination with more robust manual methods discussed in the first section, to both assess river morphodynamics over a large model domain with agreeable accuracy and increased accuracy on a point scale, if necessary.

Creating a large, manually labeled training dataset is time-consuming but essential for the GRAINet CNN approach. Nonetheless, the GRAINet approach eliminates the need for user-parameter tuning, making it particularly beneficial for the transferability of the model to different channel morphologies and granulometric properties.

The research was funded by the Czech Science Foundation project 22-12837S Hydrological and hydrochemical responses of montane peat bogs to climate change, and the Technology Agency of the Czech Republic project SS02030040.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Buscombe
D.
2020
Sedinet: a configurable deep learning model for mixed qualitative and quantitative optical granulometry
.
Earth Surface Processes and Landforms
45
(
3
),
638
651
.
https://doi.org/10.1002/esp.4760
.
Buscombe
D.
,
Grams
P. E.
&
Smith
S. M.
2016
Automated riverbed sediment classification using low-cost sidescan sonar
.
Journal of Hydraulic Engineering
142
(
2
),
06015019
.
Carbonneau
P. E.
,
Lane
S. N.
&
Bergeron
N. E.
2004
Catchment-scale mapping of surface grain size in gravel bed rivers using airborne digital imagery
.
Water Resources Research
40
(
7
).
https://doi.org/10.1029/2003WR002759
.
Chen
X.
,
Hassan
M. A.
&
Fu
X.
2021
CNN for image-based sediment detection applied to a large terrestrial and airborne dataset
.
Earth Surface Dynamics Discussions
2021
,
1
30
.
CHMI 2008
Water quality monitoring diabase
.
Czech Hydrometeorological Institute
,
Prague
.
http://chmi.cz. Accessed 20.6.2008.
Crutzen
P. J.
2016
Geology of mankind
. In:
Paul J. Crutzen: A Pioneer on Atmospheric Chemistry and Climate Change in the Anthropocene
(Crutzen, P. J. & Brauch, H. G., eds.). Springer, Switzerland
, pp.
211
215
.
CS231n n.d. Convolutional Neural Networks for Visual Recognition
.
Available from: https://cs231n.github.io/ (accessed 9 March 2023).
Detert
M.
&
Weitbrecht
V.
2012
Automatic object detection to analyze the geometry of gravel grains – a free stand-alone tool
. In:
River flow 2012: Proceedings of the international conference on fluvial hydraulics
(Munoz, R.M., ed.).
Taylor & Francis Group
,
London
, pp.
595
600
.
Ghalib
A. M.
,
Hryciw
R. D.
&
Shin
S. C.
1998
Image texture analysis and neural networks for characterization of uniform soils
. In:
Computing in Civil Engineering
(Clayton, M.J., ed.)
.
ASCE
,
Reston, VA
, pp.
671
682
.
Graham, D. J., Rice, S. P. & Reid, I. 2005 A transferable method for the automated grain sizing of river gravels. Water Resources Research 41 (7). https://doi.org/10.1029/2004WR003868
.
Graham
D. J.
,
Rollet
A. J.
,
Piégay
H.
&
Rice
S. P.
2010
Maximizing the accuracy of image-based surface sediment sampling techniques
.
Water Resources Research
46
(
2
).
https://doi.org/10.1029/2008WR006940
.
Guo
Z.
,
Zhang
L.
&
Zhang
D.
2010
A completed modeling of local binary pattern operator for texture classification
.
IEEE Transactions on Image Processing
19
(
6
),
1657
1663
.
Guo
Y.
,
Zhao
G.
&
Pietikäinen
M.
2011
Texture classification using a linear configuration model based descriptor
. In:
BMVC
. pp.
1
10
.
https://doi.org/10.1109/TIP.2010.2044957
.
Hamzeloo
E.
,
Massinaei
M.
&
Mehrshad
N.
2014
Estimation of particle size distribution on an industrial conveyor belt using image analysis and neural networks
.
Powder Technology
261
,
185
190
.
https://doi.org/10.1016/j.powtec.2014.04.038
.
Haralick
R. M.
,
Shanmugam
K.
&
Dinstein
I. H.
1973
Textural features for image classification
.
IEEE Transactions on Systems, man, and Cybernetics
(
6
),
610
621
.
https://doi.org/10.1109/TSMC.1973.4309314
.
Harvey
E. L.
,
Hales
T. C.
,
Hobley
D. E.
,
Liu
J.
&
Fan
X.
2022
Measuring the grain-size distributions of mass movement deposits
.
Earth Surface Processes and Landforms
47
(
6
),
1599
1614
.
He, K., Zhang, X., Ren, S. & Sun, J. 2016 Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
.
Ioffe
S.
&
Szegedy
C.
2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
. pp.
448
456
.
Jadon
A.
,
Patil
A.
&
Jadon
S
.
2022
A Comprehensive Survey of Regression Based Loss Functions for Time Series Forecasting. arXiv preprint arXiv:2211.02989
.
Kingma
D. P.
&
Ba
J.
2017
Adam: A Method for Stochastic Optimization (arXiv:1412.6980). arXiv. https://doi.org/10.48550/arXiv.1412.6980
.
Krizhevsky
A.
&
Hinton
G.
2009
Learning multiple layers of features from tiny images. https://api.semanticscholar.org/CorpusID:18268744
Krizhevsky
A.
2014
One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997
.
Krizhevsky
A.
,
Sutskever
I.
&
Hinton
G. E.
2012
Imagenet classification with deep convolutional neural networks
.
Advances in neural information processing systems
25
,
1097
1105
.
Kullback
S.
&
Leibler
R. A.
1951
On information and sufficiency
.
The Annals of Mathematical Statistics
22
(
1
),
79
86
.
Lang
N.
,
Irniger
A.
,
Rozniak
A.
,
Hunziker
R.
,
Wegner
J. D.
&
Schindler
K.
2021
GRAINet: mapping grain size distributions in river beds from UAV images with convolutional neural networks
.
Hydrology and Earth System Sciences
.
https://doi.org/10.5194/HESS-25-2567-2021
.
Langhammer
J.
&
Vacková
T.
2018
Detection and mapping of the geomorphic effects of flooding using UAV photogrammetry
.
Pure and Applied Geophysics
175
(
9
),
3223
3245
.
https://doi.org/10.1007/s00024-018-1874-1
.
Lecun, Y., Bengio, Y. & Hinton, G. 2015 Deep learning. Nature 521 (7553), 436–444
.
Manashti, J., Pirnia, P., Manashty, A., Ujan, S., Toews, M. & Duhaime, F. 2023 PSDNet: Determination of Particle Size Distributions Using Synthetic Soil Images and Convolutional Neural Networks. arXiv preprint arXiv:2303.04269. https://doi.org/10.48550/arXiv.2303.04269
.
McFall
B. C.
,
Young
D. L.
,
Fall
K. A.
,
Krafft
D. R.
,
Whitmeyer
S. J.
,
Melendez
A. E.
&
Buscombe
D.
2020
Technical Feasibility of Creating a Beach Grain Size Database with Citizen Scientists
.
ERDC Coastal and Hydraulics Laboratory
,
Vicksburg, MS
.
Mirijovský
J.
&
Langhammer
J.
2015
Multitemporal monitoring of the morphodynamics of a mid-mountain stream using UAS photogrammetry
.
https://doi.org/10.3390/rs70708586
.
Ohm
H. S.
&
Hryciw
R. D.
2014
Size distribution of coarse-grained soil by sedimaging
.
Journal of Geotechnical and Geoenvironmental Engineering
140
(
4
),
04013053
.
https://doi.org/10.1061/(ASCE)GT.1943-5606.0001075
.
Ojala
T.
,
Pietikainen
M.
&
Maenpaa
T.
2002
Texture analysis-multiresolution gray-scale and rotation invariant texture classification with local binary patterns
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
24
(
7
),
971
987
.
https://doi.org/10.1109/TPAMI.2002.1017623
.
Pan
S. J.
&
Yang
Q.
2010
A survey on transfer learning
.
IEEE Transactions on Knowledge and Data Engineering
22
(
10
),
1345
1359
.
Purinton
B.
&
Bookhagen
B.
2019
Introducing PebbleCounts: a grain-sizing tool for photo surveys of dynamic gravel-bed rivers
.
Earth Surface Dynamics
7
(
3
),
859
877
.
https://doi.org/10.5194/esurf-7-859-2019
.
Purinton
B.
&
Bookhagen
B.
2021
Beyond vertical point accuracy: assessing inter-pixel consistency in 30 m global dems for the arid central Andes
.
Frontiers in Earth Science
9
,
758606
.
https://doi.org/10.3389/feart.2021.758606
.
Rakha
T.
&
Gorodetsky
A.
2018
Review of unmanned aerial system (UAS) applications in the built environment: towards automated building inspection procedures using drones
.
Automation in Construction
93
,
252
264
.
https://doi.org/10.1016/j.autcon.2018.05.002
.
Sanders
N. R.
1997
Measuring forecast accuracy: some practical suggestions
.
Production and inventory management journal
38
(
1
),
43
.
Sharma
K.
,
Gold
M.
,
Zurbruegg
C.
,
Leal-Taixé
L.
&
Wegner
J. D.
2020
HistoNet: Predicting Size Histograms of Object Instances
. pp.
3637
3645
.
Sime
L. C.
&
Ferguson
R. I.
2003
Information on grain sizes in gravel-bed rivers by automated image analysis
.
Journal of Sedimentary Research
73
(
4
),
630
-
636
.
Storz-Peretz
Y.
&
Laronne
J. B.
2013
Morphotextural characterization of dryland braided channels
.
Bulletin
125
(
9–10
),
1599
1617
.
https://doi.org/10.1130/B30773.1
.
Szegedy
C.
,
Liu
W.
,
Jia
Y.
,
Sermanet
P.
,
Reed
S.
,
Anguelov
D.
,
Erhan
D.
,
Vanhoucke
V.
&
Rabinovich
A.
2015
Going deeper with convolutions
. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Boston, MA, 7–12 June 2015. IEEE, New York
, pp.
1
9
.
https://doi.org/10.48550/arXiv.1409.4842
.
Szeliski
R.
&
Szeliski
R.
2011
Image processing
. In:
Computer Vision: Algorithms and Applications
. pp.
87
180
.
Tuceryan
M.
&
Jain
A. K.
1993
Texture analysis
. In:
Handbook of Pattern Recognition and Computer Vision
(Chen, C. H., Pau, L. F. & Wang, P. S. P. eds.)
. World Scientific, Singapore, pp.
235
276
.
Van Horn
G.
,
Branson
S.
,
Farrell
R.
,
Haber
S.
,
Barry
J.
,
Ipeirotis
P.
,
Perona
P.
&
Belongie
S.
2015
Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection
. In:
Proceedings of the IEEE conference on computer vision and pattern recognition
,
Boston, MA, 7-12 June 2015. IEEE, New York
, pp.
595
604
.
Vlasák
T.
2003
Overview and classification of historical floods in the Otava river basin
.
Acta Universitatis Carolinae – Geographica
38
(
2
),
49
64
.
Wang
H.
,
Dalton
L.
,
Fan
M.
,
Guo
R.
,
McClure
J.
,
Crandall
D.
&
Chen
C.
2022
Deep-learning-based workflow for boundary and small target segmentation in digital rock images using UNet ++ and IK-EBM
.
Journal of Petroleum Science and Engineering
215
,
110596
.
https://doi.org/10.1016/j.petrol.2022.110596
.
Yaghoobi
H.
,
Mansouri
H.
,
Farsangi
M. A. E.
&
Nezamabadi-Pour
H.
2019
Determining the fragmented rock size distribution using textural feature extraction of images
.
Powder Technology
342
,
630
641
.
https://doi.org/10.1016/j.powtec.2018.10.006
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data