Abstract
This study uses the GRAINet CNN approach on UAV optical aerial imagery to analyze and predict grain size characteristics, specifically mean diameter (dm), along a gravel river point bar in Šumava National Park (Šumava NP), Czechia. By employing a digital line sampling technique and manual annotations as ground truth, GRAINet offers an innovative solution for particle size analysis. Eight UAV overflights were conducted between 2014 and 2022 to monitor changes in grain size dm across the river point bar. The resulting dm prediction maps showed reasonably accurate results, with Mean Absolute Error (MAE) values ranging from 1.9 cm to 4.4 cm in tenfold cross-validations. Mean Squared Error (MSE) and Root Mean Square Error (RMSE) values varied from 7.13 cm to 27.24 cm and 2.49 cm to 4.07 cm, respectively. Most models underestimated grain size, with around 68.5% falling within 1σ and 90.75% falling within 2σ of the predicted GRAINet mean dm. However, deviations from actual grain sizes were observed, particularly for grains smaller than 5 cm. The study highlights the importance of a large manually labeled training dataset for the GRAINet approach, eliminating the need for user-parameter tuning and improving its suitability for large-scale applications.
HIGHLIGHTS
Assessing the effectiveness of a cutting-edge deep learning algorithm in predicting mean diameter (dm) from an individual UAV-based orthophoto.
Creating robust maps to predict spatial and temporal variations in mean dm across one entire point bare over time.
The method enhances efficient decision-making capabilities by reducing reliance on laborious and resource-intensive field probing techniques.
The method demonstrates robustness against light condition impacts.
INTRODUCTION
Detecting changes in fluvial environments is challenging, mainly due to limited spatial and temporal observation of crucial fluvial characteristics, such as grain size data from gravel bars and river channels. Analyzing grain size data provides valuable insights into the interactions between water flow, sediment transport, and the dynamic evolution of rivers. Therefore, it is crucial to map the sediment composition of river-bed surfaces, focusing on gravel- or cobble-bed streams, and conduct comprehensive analyses of river dynamics. This understanding is essential for studying how rivers respond to environmental factors and human activities, especially in the Anthropocene era (Crutzen 2016).
Traditional field-based techniques, such as mechanical sieving, grid-by-number, pebble count method, or line sampling, have historically dominated river surveys (Harvey et al. 2022). However, these techniques have limitations that restrict their application at the network scale. They often struggle to ensure accurate and repeatable monitoring assessments (Purinton & Bookhagen 2019) and may pose risks to human safety. Consequently, fluvial scientists and geomorphologists seek more efficient and reproducible methods for grain size measurements.
In recent years, utilizing unmanned aerial vehicles (UAVs) in photogrammetry has revolutionized environmental monitoring techniques and gained popularity. UAVs offer numerous advantages, including affordability, user-friendliness, consistency, and reliability. These benefits bridge the gap between traditional field observations and remote sensing methods using aircraft or satellite platforms (Rakha & Gorodetsky 2018). Researchers have explored various approaches that leverage UAV imagery and image-processing techniques to further capabilities in granulometric analysis utilizing remotely sensed observations. These efforts encompass a range of methodologies, including earlier techniques like photo sieving or image-based methods, which provide a quick, scalable, non-contact, and cost-effective means of granulometric analysis such as the estimation of particle size distribution (PSD) or specific grain size characteristics (e.g., D50 or the mean particle diameter – dm).
However, these methods introduce errors that arise from the nature of the observations used for analysis. Remotely sensed imagery typically provides direct information about the material's surface alone, thus measuring only visible grains, and being limited by image complexity (e.g., vegetation obstructions, variation in color and texture) (Harvey et al. 2022; Manashti et al. 2023). Image analysis techniques can be categorized as direct or indirect methods. Direct methods employ image segmentation for grain size detection by isolating and measuring the visible axes of individual grains in an image. Examples of a few direct methods include those applied by Graham et al. (2005, 2010); Storz-Peretz & Laronne (2013), Detert & Weitbrecht (2012) (BASEGRAIN software), Purinton & Bookhagen (2019, 2021) (PebbleCounts software), and several commercial software packages.
As listed above, many commercial software packages use direct methods to determine PSD. On the other hand, indirect methods extract textural features from 2D or 3D images to estimate PSD based on textural characteristics and spatial arrangement of pixel intensities. Indirect methods can be further classified into a statistical, local pattern, and transform-based feature texture-based approaches. Statistical methods utilize properties of pixel intensity statistics, such as Haralick features (Haralick et al. 1973). Local pattern features establish relationships between the gray level of each pixel and its neighboring pixels, including techniques such as local binary patterns (Ojala et al. 2002), local confirmation patterns (Guo et al. 2011), and completed local binary patterns (Guo et al. 2010). Transform-based features describe image texture in the frequency domain, employing techniques such as wavelet transforms (e.g., Buscombe 2013), Fourier transforms (e.g., Szeliski & Szeliski 2011; Yaghoobi et al. 2019), and Gabor filters (e.g., Tuceryan & Jain 1993; Yaghoobi et al. 2019). Most of these approaches involve comparing manually sieved grain size data with well-sorted and clean sediment imagery (Ohm & Hryciw 2014) and typically require site-specific calibration to establish relationships between texture and grain size in each location.
The suboptimal environmental conditions can be overcome using indirect methods combined with artificial neural networks (ANNs) to estimate grain sizes. Ghalib et al. (1998) employed Haralick features to predict particle size, while Yaghoobi et al. (2019) utilized Fourier transforms, Gabor filters, and wavelet transforms for PSD prediction. In a comparative study by Manashti et al. (2021) involving nine feature extraction methods, ANNs were used to predict PSD, achieving RMSE values ranging from 4.8 to 6.6%. Some researchers have also integrated direct methods with ANNs by incorporating segmented area statistics for PSD determination (Hamzeloo et al. 2014). However, traditional ANNs often require preprocessing techniques to extract relevant parameters, such as textural parameters, from imagery, which can be time-consuming. Additionally, these methods necessitate large amounts of input data.
More recently, there has been a growing interest in utilizing Convolutional Neural Networks (CNNs) for indirect PSD determination. CNNs offer powerful capabilities for image processing by leveraging convolutions with trained filter banks and non-linear activation functions. These Deep Learning (DL) methods, with their millions of weight parameters, have demonstrated significant success in various image analysis domains (e.g., Buscombe 2020; McFall et al. 2020; Lang et al. 2021). Notably, recent advancements have led to the development of highly effective CNN architectures such as AlexNet (Krizhevsky et al. 2012), GoogLeNet (Szegedy et al. 2015), ResNet (He et al. 2016), and others, showcasing exceptional performance in object recognition tasks. This progress has sparked further exploration of CNNs for PSD estimation, pushing the boundaries of indirect analysis techniques using UAV imagery.
SediNet, a CNN developed by Buscombe (2020), is a notable recent addition to sediment image classification methods. SediNet is specifically designed to classify sediment images based on their size characteristics. The network architecture consists of four convolutional blocks trained using manual segmentation of the images as ground truth for PSD estimation. While SediNet achieved reasonably accurate predictions with Root Mean Square Error (RMSE) values ranging from 16 to 45% for grain size percentiles and millimeter sieve sizes, it is essential to note that the model may be prone to overfitting due to the limited size of the dataset. In a study by McFall et al. (2020) focusing on predicting the PSD of beach sands, four types of analyses were compared: direct analysis, transform-based features, and CNN. The results showed that direct analysis had a mean percentage error of 34.2% on D50, which was reduced to 13.0% with certain modifications. The wavelet feature extraction method had a mean percentage error of 36.4% on D50, while the CNN approach achieved a mean percentage error of 22% on D50. As is evident from the literature, CNNs tend to deliver better estimation results relative to their alternatives.
Introducing a groundbreaking approach in CNN-based sediment analysis, GRAINet, developed by Lang et al. (2021), revolutionizes the prediction of PSD and the mapping of grain size metrics, such as mean diameters (dm), using UAV images. This innovative research tackles the challenge of estimating dm across extensive gravel bars in riverbeds, providing high spatial resolution over a large-scale coverage. GRAINet surpasses many limitations of previous studies by delivering grain size curves and maps encompassing a wide geographical area. By extracting global features that transcend human image interpretation and traditional photosieving methods, GRAINet demonstrates exceptional capability in accurately assessing grain size characteristics. Notably, the recent work by Purinton & Bookhagen (2019, 2021) requires capturing individual grains below a specific threshold, while statistical approaches such as Carbonneau et al. (2004) are restricted by input resolution, limiting their predictive power to D50 values above 3 cm. However, GRAINet successfully overcomes these limitations, marking a significant advancement in the field.
Despite the considerable progress made in computer vision and granulometric analysis, digital granulometric analysis has a good deal of room for improvement. One of the primary challenges arises from the variability in the quality of the input photographic data. Factors such as varying lighting conditions, different imaging heights, and utilization of various sensors during field surveys can lead to inconsistent data quality, leading to variations in the accuracy of the estimates, and thus resulting in irreproducible results. Considering these challenges, our study aims to explore the potential and practicalities of utilizing low-altitude UAV orthoimagery and the end-to-end indirect granulometric analysis approach – GRAINet (Lang et al. 2021). We focus on automating the estimation of the mean particle diameter (dm) on an exposed river point bar located in the Javoří Brook stream within the Šumava National Park (Šumava NP) in Czechia.
To assess the reliability of the GRAINet algorithm in grain size estimation under variable conditions, we selected eight scenarios spanning the years 2014–2022. These scenarios encompass UAV imagery captured under different lighting conditions. Our objectives are:
- 1.
to test the method's performance in grain size estimation,
- 2.
to evaluate the influence of individual input data on the estimation quality,
- 3.
and to assess the practicality and potential of employing the GRAINet method for automated granulometric analysis over large spatial domains, utilizing UAV imagery.
By conducting this comprehensive analysis, we aim to contribute to the understanding of utilizing UAV orthoimagery along with advanced CNN techniques in granulometric studies.
MATERIALS AND METHODS
Study site
Image acquisition
Mikrokopter OctoXL . | DJI Inspire 1 Pro . | DJI Matrice 210 RTK . |
---|---|---|
Since 2014 | Since 2015 | Since 2018 |
Panasonic Lumix GX7 | Zenmuse X5 camera | Zenmuse X4s camera |
MFT sensor | MFT sensor | 1″ sensor |
16 MPx, 4,608 × 3,456 | 16 MPx, 4,608 × 3,456 | 20 MPx |
Panasonic lens 24 mm | Olympus M. Zuiko ED lens 24 mm | Fixed lens 28 mm |
GLONASS + GPS | GLONASS + GPS | RTK navigation |
Mikrokopter OctoXL . | DJI Inspire 1 Pro . | DJI Matrice 210 RTK . |
---|---|---|
Since 2014 | Since 2015 | Since 2018 |
Panasonic Lumix GX7 | Zenmuse X5 camera | Zenmuse X4s camera |
MFT sensor | MFT sensor | 1″ sensor |
16 MPx, 4,608 × 3,456 | 16 MPx, 4,608 × 3,456 | 20 MPx |
Panasonic lens 24 mm | Olympus M. Zuiko ED lens 24 mm | Fixed lens 28 mm |
GLONASS + GPS | GLONASS + GPS | RTK navigation |
Tile name . | Platform . | Acquisition date . | Tile size . | GSD (mm/pixel) . |
---|---|---|---|---|
Ortho 1 | MikroKopter XL | 20.05.2014 | 284 × 712 | ∼1.75 |
Ortho 2 | MikroKopter XL | 04.12.2015 | 427 × 1,073 | ∼1.40 |
Ortho 3 | DJI Inspire 1 Pro | 14.09.2016 | 356 × 891 | ∼1.40 |
Ortho 4 | DJI Inspire 1 Pro | 10.05.2017 | 335 × 839 | ∼1.49 |
Ortho 5 | DJI Matrice 210 RTK | 31.10.2018 | 269 × 676 | ∼1.85 |
Ortho 6 | DJI Matrice 210 RTK | 05.12.2018 | 222 × 559 | ∼2.80 |
Ortho 7 | DJI Matrice 210 RTK | 12.11.2021 | 319 × 800 | ∼1.56 |
Ortho 8 | DJI Matrice 210 RTK | 22.10.2022 | 373 × 935 | ∼1.33 |
Tile name . | Platform . | Acquisition date . | Tile size . | GSD (mm/pixel) . |
---|---|---|---|---|
Ortho 1 | MikroKopter XL | 20.05.2014 | 284 × 712 | ∼1.75 |
Ortho 2 | MikroKopter XL | 04.12.2015 | 427 × 1,073 | ∼1.40 |
Ortho 3 | DJI Inspire 1 Pro | 14.09.2016 | 356 × 891 | ∼1.40 |
Ortho 4 | DJI Inspire 1 Pro | 10.05.2017 | 335 × 839 | ∼1.49 |
Ortho 5 | DJI Matrice 210 RTK | 31.10.2018 | 269 × 676 | ∼1.85 |
Ortho 6 | DJI Matrice 210 RTK | 05.12.2018 | 222 × 559 | ∼2.80 |
Ortho 7 | DJI Matrice 210 RTK | 12.11.2021 | 319 × 800 | ∼1.56 |
Ortho 8 | DJI Matrice 210 RTK | 22.10.2022 | 373 × 935 | ∼1.33 |
Photogrammetric processing
Using Agisoft Metashape (v1.6 Pro) software, we created topographic RGB Structure-from-Motion (SfM) models. We eliminated blurry and distorted images from our dataset by estimating their quality within Metashape. Images with a quality score below 0.6 were identified and removed from further analysis. We chose the highest quality settings available in Metashape for initial alignment and subsequent filtering of the tie point clouds. To align the models and standardize camera modeling, we used GCPs, including all standard parameters such as focal length (f), radial distortion (k1, k2, k3), the offset of the principal point (cx, cy), and one decentering parameter p1, excluding p2. RGB dense clouds were converted into orthomosaics based on the Digital Surface Models (DSMs) and were georeferenced to the EPSG:4326 coordinate system (WGS 84). The orthophoto mosaics were exported as GeoTiff files to facilitate the determination of digital grain size measurements, precisely the digital dm of the entire inner river point bar and its temporal changes, excluding vegetation-covered sections.
GRAINet architecture
The open-source, CNN architecture, Python package – GRAINet (Lang et al. 2021) is specifically designed to estimate granulometric grading curves (cumulative volume distribution) and grain sizes characteristics (e.g., dm) directly from georeferenced UAV ortho imagery. This architecture adopts the network proposed by Sharma et al. (2020), based on the ResNet50 presented by He et al. (2016). The general process behind the architecture design is established on the guidelines documented in CS231n Convolutional Neural Networks for Visual Recognition (o. J.). The in-depth overview and description of the final CNN GRAINet architecture are provided by Lang et al. (2021).
To summarize, GRAINet uses a relatively simple network structure (Krizhevsky & Hinton 2009; Krizhevsky 2014) that has proven to be effective in various image analysis tasks. It includes a single 3 × 3 entry convolutional layer followed by six residual blocks (three convolutional blocks and three identity blocks). Each block contains three convolutional layers followed by batch normalization and ReLU layers. The network ends with a 1 × 1 convolutional layer, average pooling, and SoftMax to classify the results. The advantage of the GRAINet model architecture is that it does not demand a pixel-accurate object mask nor a count map for training, as used by others (e.g., Sharma et al. 2020), which are laborious to annotate manually. Instead, GRAINet is trained to regress the PSD end-to-end. Thus, labeling new training data becomes much more efficient because there is no longer a need to obtain pixel-accurate object labels. The GRAINet model learns to assess object size frequencies by looking at large image tiles without access to explicit object counts or locations (Lang et al. 2021). However, the GRAINet architecture also promises faster training in terms of use, higher learning rates, and less careful initialization (Ioffe & Szegedy 2015). Given the reasons mentioned earlier, we applied this open-access tool to generate dm estimation variability maps directly from UAV-generated ortho images, considering only image tiles and eliminating the need for manual sieving and classification of sediment samples. As we encountered running the python package via the Anaconda environment, as advised, we dockerized the installation environment by pre-installing dependencies and then ran the model instances in the docker container. (GitHub link: https://github.com/veethahavya-CU-cz/psd_analysis_grainet.git). By employing containerization, we achieved a more efficient and reliable execution of the model, ensuring a streamlined process throughout.
Model setup and learning
To facilitate the generation of input data for the GRAINet model and enable accurate predictions for individual rasters, we needed to create an npz-file dm-frequency histograms, image tiles, tile names, and mean dm values extracted through manual annotation. These components are crucial for the algorithm to process the data accurately and estimate grain sizes for each raster. Since the algorithm's authors did not provide the npz-file generator, we took the initiative to create our version accessible through the GitHub link earlier.
In our script, we quantized the b-axis measurements of all rectangles extracted from one image tile into 21 bins to create a histogram. This process follows the relative frequency distribution of grain sizes described by Lang et al. (2021). Additionally, the image tiles needed to be precisely cropped to ensure they were all the same size and resolution. We saved them in GeoTiff format in a single folder with a sequential enumeration, such as "test_tile_1," "test_tile_2," and so on. This meticulous preparation of the image tiles simplifies the encoding of the metric scale into the CNN output, making it easier for the model to learn and estimate the dm and grain size distribution. Suitable tools may be used in QGIS or other image-processing software to accomplish this. The dimensions of the image tiles varied, ranging from a minimum of 222 × 559 pixels to a maximum of 427 × 1,073 pixels, depending on the resolution of the orthoimage used to create the 0.5 × 1.25 m grid (Figure 4). We also saved the corresponding shapefiles containing the extracted mean dm values obtained through polygon annotation. These shapefiles were stored in a separate folder and saved in MS Excel format. Additionally, individual masks were manually created for each raster to consider only the regions covered by grains, and not by obstructions such as vegetation. So, the comprehensive final input comprised the npz-file, the orthoimage, and mask for the GRAINet model.
For the GRAINet tool, a standard preprocessing step involves normalizing the RGB bands of the image tiles. This process scales the pixel intensities of the image to a standard range, aiding faster and more stable model convergence during training. By centering the data around a mean value close to 0 and scaling it to have a standard deviation of 1, the pixel value distribution conforms to a standard normal distribution. As Lang et al. (2021) described, this normalization technique can enhance the performance of gradient-based optimization algorithms employed in training CNNs. Further, to successfully apply the GRAINet model, choosing appropriate hyperparameters related to the CNN architecture and optimization procedure is essential. These hyperparameters include the learning rate, batch size, and the number of epochs. The learning rate determines the step size for updating the model parameters during training, with random batches drawn from the training data. The batch size represents the number of training examples presented in a single batch. The number of training epochs specifies the complete passes through the training dataset during gradient descent. For training the GRAINet model, the Adam optimizer is employed. Adam is an adaptive stochastic optimization technique specifically designed for training deep neural networks. It adjusts the magnitude and speed at which each parameter is updated, controlled by the learning rate (Kingma & Ba 2017). The learning rate influences how the gradient of the loss function modifies the network weights. Selecting an appropriate learning rate is crucial. If the learning rate is initially set too large, the training process may oscillate and exhibit instability. Each iteration could overshoot and diverge on both sides of the optimal value (Wang et al. 2022). Conversely, if the learning rate is too low, the convergence speed is slow, and overfitting may occur (Wang et al. 2022). The advantage of the Adam optimizer lies in its adaptive adjustment of different parameters. During the training process, we kept the hyperparameters for the Adam optimizer constant, adhering to the proposed default values. Specifically, we maintained a fixed learning rate of 0.0003 throughout training. The sensitivity analysis of these hyperparameters, however, does not fall into the interest of this study. For information regarding the same, we wish to refer you to the original study.
We conducted experiments with different parameter configurations to optimize the model's performance. For example, we tested larger batch sizes, such as six or eight image tiles, and extended the training epochs to 150. However, we found that these alternative parameter settings reduced the accuracy of the results and increased the computational cost. As a result, we decided to maintain a batch size of four and trained the models for 100 epochs. Even though individual model runs took more hours or even days to train without parallelization across multiple GPUs.
Model performance evaluation
the mean absolute error (MAE), also known as the L1 loss function,
the mean squared error (MSE), also known as the L2 loss function, and
the root-mean-square error (RMSE).
RESULTS
Model performance compared with digital ground truth data
We first tested the GRAINet approach to predict the dm grain size characteristics of an entire gravel point bar between 2014 and 2022 to track its changes. A comprehensive labeling process was undertaken to provide an overview of the annotated ground truth data utilized for model training and evaluation. A total of 65,836 thousand grains, encompassing a diverse spectrum of sizes, were meticulously annotated across 761 tiles (Table 3). As displayed in Table 3, the smallest observed particle in the dataset had a dm of approximately 1.4 cm, while the largest particle measured nearly 10 cm in dm. The mean dm values ranged from 3.5 cm to 5.6 cm, with an overall mean dm of 4.15 cm for all particles. The range of particle sizes within the dataset varied between 3.3 and 8.0 cm in dm. Due to the limited sensitivity of the algorithm in predicting particles smaller than gravel, only samples of gravel to cobble size were identified and labeled, even though sandy patches were present away from the channel and a sand-dominated area down-bar around the bend. We chose challenging samples that were not clean, including natural disturbances such as grass, moss, small wooden branches, and slightly flooded river-bed zones, to train a robust CNN.
Tile name . | Number of labeled tiles . | Sample size (n) . | Mean dm (cm) . | Min dm (cm) . | Max dm (cm) . | Range dm (cm) . |
---|---|---|---|---|---|---|
Ortho 1 | 70 | 7,740 | 4.5 | 2.1 | 6.5 | 4.4 |
Ortho 2 | 112 | 8,212 | 4.5 | 2.3 | 7.4 | 7.4 |
Ortho 3 | 125 | 10,935 | 4.0 | 1.4 | 6.5 | 5.1 |
Ortho 4 | 90 | 7,726 | 3.9 | 1.9 | 9.9 | 8.0 |
Ortho 5 | 102 | 9,178 | 3.8 | 1.8 | 5.9 | 4.1 |
Ortho 6 | 100 | 8,472 | 3.5 | 1.5 | 8.1 | 6.6 |
Ortho 7 | 100 | 7,896 | 5.6 | 3.9 | 7.2 | 3.3 |
Ortho 8 | 62 | 5,677 | 3.7 | 1.8 | 6.0 | 4.2 |
Tile name . | Number of labeled tiles . | Sample size (n) . | Mean dm (cm) . | Min dm (cm) . | Max dm (cm) . | Range dm (cm) . |
---|---|---|---|---|---|---|
Ortho 1 | 70 | 7,740 | 4.5 | 2.1 | 6.5 | 4.4 |
Ortho 2 | 112 | 8,212 | 4.5 | 2.3 | 7.4 | 7.4 |
Ortho 3 | 125 | 10,935 | 4.0 | 1.4 | 6.5 | 5.1 |
Ortho 4 | 90 | 7,726 | 3.9 | 1.9 | 9.9 | 8.0 |
Ortho 5 | 102 | 9,178 | 3.8 | 1.8 | 5.9 | 4.1 |
Ortho 6 | 100 | 8,472 | 3.5 | 1.5 | 8.1 | 6.6 |
Ortho 7 | 100 | 7,896 | 5.6 | 3.9 | 7.2 | 3.3 |
Ortho 8 | 62 | 5,677 | 3.7 | 1.8 | 6.0 | 4.2 |
Additionally, the ground truth statistics of the characteristic mean diameter (dm) are provided.
The performance of dm estimated by GRAINet was compared to manually annotated tiles. The estimated values for the entire dataset were analyzed. The results show that for 547 tiles, which accounts for approximately 68.5% of the dataset, the predicted values fell within 1σ (one standard deviation). Furthermore, for 726 tiles, representing approximately 90.75% of the dataset, the predicted values fell within 2σ (two standard deviations). Table 4 displays the corresponding percentages of estimated dm values lying within the ranges −1σ and 2σ for each orthoimage.
Tile name . | 1σ (%) . | 2σ (%) . |
---|---|---|
Ortho 1 | 74.29 | 97.14 |
Ortho 2 | 90.18 | 97.32 |
Ortho 3 | 77.6 | 95.2 |
Ortho 4 | 79.41 | 95.1 |
Ortho 5 | 60.0 | 99.0 |
Ortho 6 | 65.22 | 94.57 |
Ortho 7 | 70.0 | 94.44 |
Ortho 8 | 54.84 | 100 |
Tile name . | 1σ (%) . | 2σ (%) . |
---|---|---|
Ortho 1 | 74.29 | 97.14 |
Ortho 2 | 90.18 | 97.32 |
Ortho 3 | 77.6 | 95.2 |
Ortho 4 | 79.41 | 95.1 |
Ortho 5 | 60.0 | 99.0 |
Ortho 6 | 65.22 | 94.57 |
Ortho 7 | 70.0 | 94.44 |
Ortho 8 | 54.84 | 100 |
GRAINet model prediction accuracies
GRAINet's dm prediction maps
However, upon closer examination of the distribution histograms of dm (Figure 9), we gain insights into the possible discrepancies when using this algorithm for primary mapping purposes. For example, the predicted dm map of 2014 (ortho 1) shows a relatively normal distribution of grain sizes, suggesting infrequent or stable flood occurrences before 2014. However, it is essential to note that Mirijovský & Langhammer (2015) documented a significant flood event in June 2013, equivalent to a 2–5-year flood. This event was triggered by intense rainfall in the basin, which was already saturated due to snowmelt. It might result in the downstream flushing of coarser sediment, temporarily reducing the coarser sediment load during that period. Subsequent peak flow events triggered by summer rainstorms were brief and did not reach a discharge level of 13 m3/s.
In contrast, in 2015 (ortho 2; Figure 9), several flash floods impacted the stream, leading to significant terrain changes. These changes are evident in the noticeable differences between the predicted dm maps of 2014 (ortho 1) and 2015 (ortho 2), indicating a slight increase in the coarser sediment load (Figure 9) and following a distinct deposition sediment pattern with coarser material at the river head and finer sediment settling downstream. However, there are instances where the predicted dm maps show discrepancies with the actual sediment distribution, suggesting the need for improved accuracy. In particular, the algorithm faced challenges in accurately identifying particles in the samples, particularly in ortho 3 and ortho 4, resulting in higher sediment heterogeneity over the point bar (Figure 10) and mispredictions. In 2018 (ortho 5), a noticeable increase in vegetation cover was observed, while the predicted dm maps show relatively stable grain size characteristics (Figure 10). The reduced morphological activity of the point bar over time, with narrower sediment accumulation, is even more evident in 2021 (ortho 7; Figure 10). As of 2022, the observable dynamics remain low, indicating that the morphological changes of the bar have stabilized (ortho 8; Figure 10).
DISCUSSION
Error analysis
Our study's main findings demonstrate that the GRAINet algorithm's predictions for dm showed reasonable results when evaluating specific cases on a single river point bar at the Javoří Brook over multiple periods. However, it is important to note that not all models performed equally well, and there were cases of misleading predictions. Several factors could contribute to these discrepancies.
The sources of error in image-based grain size measurements can be diverse, as discussed in the study by Chen et al. (2022). One fundamental error arises from estimating the three-dimensional (3D) properties of grains based on their two-dimensional (2D) representation in images. For example, accurately determining the vertical axis of a grain solely from an image is challenging. Algorithmic errors can also occur due to limitations in the image-processing algorithm used. Environmental errors may arise from suboptimal conditions during the image procurement, such as vegetation, noise between grains, and inadequate lighting. These factors can lead to misidentifying grain boundaries and result in measurement errors. Image quality errors can be introduced by the size of image tiles and the image resolution. The image resolution affects the ability to detect small grain sizes, and the choice of image tile size can impact measurement accuracy. Additionally, errors can arise from grain characteristics, including the distribution of grain sizes, irregular grain shapes, and image distortions. A wide distribution of grain sizes can lead to larger errors, especially when accurately detecting smaller grains. While the fundamental errors have been extensively discussed in previous literature (Graham et al. 2010), their impact on the final aggregated prediction results of grain size characterization is likely to be minimal (Sime & Ferguson 2003; Graham et al. 2005; Detert & Weitbrecht 2012; Lang et al. 2021).
In this discussion, we focus on the benefits and drawbacks of CNN, manual labeling, and training techniques utilized by GRAINet. Additionally, we will examine how environmental errors might impact the model's ability to make accurate predictions.
Using CNNs with high-resolution UAV imagery for granulometric analysis in fluvial geomorphology is an efficient and promising approach, providing several advantages over conventional and optical methods. CNN-based models, such as GRAINet, can handle large amounts of data effectively (Yan et al. 2016). GRAINet, specifically, can rapidly process vast volumes of UAV orthoimagery with high spatial resolution, reducing the reliance on manual field measurements and enabling analysis of extensive study areas (Lang et al. 2021). The CNN-based GRAINet model capitalizes on the inherent capability of CNNs to capture complex spatial features and patterns, resulting in accurate classification and segmentation of grain sizes within UAV imagery, leading to improved granulometric measurements (Krizhevsky et al. 2012). In contrast, traditional optical granulometry methods often employ simple image-processing techniques, such as edge detection and thresholding, which may struggle to capture the nuanced grain size variations observed in natural environments (Buscombe et al. 2016). CNN-based approaches offer high adaptability and transferability, allowing the trained model to be fine-tuned for different granulometric settings with minimal additional training data, facilitating cost-effective model deployment in various river segments and conditions across river basins (Pan & Yang 2010; Lecun et al. 2015). However, it should be noted that CNNs, including the GRAINet algorithm, necessitate a substantial amount of labeled training data, which can be time-consuming and resource-intensive to collect and digitally annotate. Additionally, manual labeling introduces subjectivity, leading to variability and inconsistencies in interpreting grain boundaries (Lang et al. 2021). It requires significant expertise from the operator and is considered the most robust and reliable method when applied to fluvial environments. However, one potential reason for the variability in model performance is the availability of comprehensive training data. In the evaluation and validation conducted by Lang et al. (2021), they collected a dataset of 1,491 digitized image tiles from 25 gravel bars across six Swiss rivers. The average performance of the model across all 25 gravel bars resulted in a mean absolute error (MAE) of 0.3 cm and an ME of 0.1 cm. However, creating a large, manually labeled training dataset is laborious. Assuming we had enough data input for training, we missed an opportunity to enhance the robustness, reduce bias, and improve the accuracy and reliability of our predictions by not employing the geographical cross-validation technique used by Lang et al. (2021), which might reduce the constant underestimation of grains by the GRAINet algorithm. Like other DL methods, the predictive ability is highly dependent on the quality of training datasets, such as the number and diversity of training images.
The model's predictive ability might also be influenced by the image tiles' size, as Chen et al. (2021) highlighted. Larger image tiles may under-split due to limitations in GPU memory, while smaller tiles may over-split due to limitations imposed by the size of the largest grain that needs to be detected. Finding the optimal image tile size ensures accurate and reliable predictions. In our study, we employed various image tile sizes based on the resolution of the orthoimagery, as detailed in Table 2. Interestingly, we observed that the models using the largest image tiles exhibited the best predictive performance but required the longest processing time. This finding suggests that the choice of image tile size can significantly impact the models' accuracy and computational efficiency. However, it is essential to note that our study has limitations, and further research is warranted to thoroughly investigate the optimal image tile sizes in granulometric analysis. These findings should be considered in future studies to enhance the understanding and improve the methodology in this field.
Influence of environmental errors
A higher proportion of shadows (cast shadows, self-shadows, interstitial shadows, or odd shadowing during the afternoon or late afternoon flights) in the input orthoimage could cause false-positive identification of grains and their eventual classification, leading to the erroneous prediction of grain sizes. When considering the hue, a color property in the HSV (hue, saturation, value) provides insights into the dominant colors in the image. There is no inherent meaning of "high" or "low" hue in the context of these statistics, but we can interpret the hue values in the image to determine if specific colors dominate or are lacking in the image (Figure 11). A color-cast, i.e., a significant hue in the image, affects the chromatic fidelity of the image, rendering it monotone and thus causing loss of chromatic detail in the image. Saturation measures the intensity and fidelity of the colors in the image. A high degree of saturation means the colors are vivid and intense, while a low saturation means the colors are more muted. Lower saturation values, too, render the image monotonic, thus leading to a loss of detail. Contrast refers to the difference in brightness between an image's light and dark areas. High-contrast images differ significantly between the light and dark areas, resulting in more vivid and well-defined features. In contrast, low-contrast images have less difference between the light and dark areas, resulting in a more muted appearance and less well-defined features. Orthoimages suffering from a high degree of overexposed highlights might result in a loss of detail in the image, especially in its saturation. They may result in the model failing to classify individual grains or even potentially lead the model to identify and classify a cluster of grains as a single grain, leading to erratically erroneous dm predictions in the former case and a systematic overestimation of dm in the latter case.
Analyzing and considering these lighting variations and comparing the results shows that image properties have a minimal impact on the overall performance. When comparing the best-performing models from 2015 (ortho 2), 2018 (ortho 6), 2021 (ortho 7), and 2022 (ortho 8), they demonstrate opposing image conditions, yet both achieve the highest performance. Based on these observations, we can say that the GRAINet algorithm is fairly robust against such disturbances.
Apart from the above-mentioned sources of uncertainty, other environmental parameters are beyond operators' control and can contribute to potential model bias. These parameters include vegetation, obstacles, human impacts, and technical issues such as motion blur, camera sensor noise, incorrect exposure settings, sharpness, intergranular and unseen grains, and image resolution.
Additionally, the human impact should be considered when assessing changes in the granulometry of fluvial accumulations, even in natural areas outside settlement areas or stream modifications (Figure 12(d)). Granulometric analysis of point bars is performed at a detailed scale and with very high resolution at the level of individual stones. Thus, even low-intensity activity can influence grain size composition and distribution changes. An example is the study area, located in a protected area of an NP but accessible to tourists who regularly alter the riverine accumulation environment. For instance, human construction of stone cairns or other artificial structures can displace and eliminate gravels, resulting in dark/mixed gradients and significant heterogeneity in gravel accumulation, ultimately affecting the accuracy of dm predictions.
Regarding the technical component, motion blur plays a significant role in image capture and can notably impact the quality and accuracy of the resulting images. Motion blur occurs when there is movement during image capture, such as caused by wind or the motion of the camera or subject, leading to the blurring or smudging of details, ultimately affecting the overall sharpness and clarity of the image. On the other hand, camera sensor noise refers to unwanted random variations in brightness or color caused by the camera's electronic components, which can degrade image quality and interfere with subsequent image processing or analysis. Incorrect exposure settings occur when the camera's parameters, such as aperture, shutter speed, or ISO sensitivity, are not appropriately adjusted for the prevailing lighting conditions. This can result in overexposed or underexposed images, compromising the visibility of details and potentially leading to information loss, which we already discussed. In addition to these technical issues, adequate image resolution is crucial for accurately identifying and separating individual clasts and detecting their parameters. Lower-resolution images may lack sufficient texture information for precise PSD estimations. Poor-quality orthoimage maps further exacerbate the challenge, making it difficult to outline sediment objects accurately, like partly in our study. Issues like small-scale warping, artifacts, incorrect image stitching, blurred gravel boundaries, and irregular grain shapes can arise from undistorted single nadir images (Figure 12(e)). To address these issues, it is important to conduct UAV flyovers slowly and steadily, monitor scenes repeatedly by increasing the number of images and reconstructions, and choose appropriate image acquisition formats such as RAW instead of JPEG.
By the findings of Lang et al. (2021), GRAINet's effectiveness diminishes when it comes to coarser gravel bars, as observed in their research. This may be corroborated by our study, where we observed underestimation/overestimation in grains within the range of 3–10 cm. The observed decrease in performance in grains <3 cm can be attributed to several factors. Firstly, the dataset has more significant variability, which can challenge accurate predictions (Lang et al. 2021). Additionally, the substantial influence of larger individual grains can affect the algorithm's overall performance (Lang et al. 2021). Furthermore, an imbalanced data distribution may contribute to decreased performance. Another potential reason for the decreased performance is the algorithm's limited ability to estimate dm for lower image resolutions, as highlighted by Lang et al. (2021). This limitation can impact the accuracy of the predictions, mainly when dealing with image resolutions up to 2 cm GSD as in our case. It is recommended to incorporate digital line samples that include finer sediment (Figure 12(f)) in the training and testing datasets to enhance the accuracy and expand the application of GRAINet. This additional data can help improve the algorithm's performance and enable more reliable predictions.
Limitations and future work
We tested the GRAINet algorithm using UAV orthoimagery to predict the grain size characteristics of the entire river point bar in terms of dm size. While the model was trained with adequate training data, the overall performance of GRAINet was satisfactory. However, grain sizes were consistently underestimated, particularly for grains smaller than 3 cm. This underestimation can be attributed to a lack of training data for smaller grains, resulting in the algorithm misidentifying cohesive sands, vegetation, and human impacts on grains. To address this limitation, additional training datasets from diverse environments are recommended. Nonetheless, the preparation of such datasets involves manual labeling and is time-consuming. Even experienced operators may face challenges identifying grains, especially small grains <1 cm, in images with dense moss and vegetation debris. Like other DL methods, the smallest identifiable grain size by the GRAINet algorithm is constrained by image resolution, and the size of the image tiles limits the grain pattern learned by the model. Further research is needed to explore these limitations in more depth. An additional implication could be creating an individual digital data library for training. This library would consist of a diverse collection of river environmental scenarios that have not been previously encountered. The algorithm can learn to generalize and make accurate predictions in unseen river environments by utilizing such data in training. This approach would contribute to improving the algorithm's adaptability and enhancing its performance in real-world applications.
Additionally, the steep learning curve involved in installing the model for the end user may act as a barrier keeping researchers in the field from utilizing the model. Containerization of the model, as in the case of this study (with docker) and building of extensive example applications along with robust documentation could alleviate such a barrier.
CONCLUSIONS
This study is aimed at assessing the practical on-field potential of the GRAINet CNN approach in utilizing UAV optical aerial imagery to predict grain size characteristics, specifically mean diameter (dm). GRAINet offers an innovative and efficient solution for particle size analysis on river-bed gravel bars by comparing model-estimated dm with manual annotations obtained through digital line sampling. The research involved eight UAV campaigns between 2014 and 2022, covering an entire gravel river point bar in the Javoří Brook, Šumava NP, Czech Republic, and their subsequent evaluation of granulometric changes on this dynamically shifting point bar utilizing the GRAINet model. The study revealed relatively high accuracies in the resulting dm prediction maps. The MAE (L1) from tenfold cross-validations ranged from 1.9 to 4.4 cm, while the MSE (L2) and RMSE varied between 7.13 and 27.24 cm and 2.49 and 4.07 cm, respectively. In most models, there was a tendency towards underestimation, particularly in predicting smaller grains with dm <3 cm or even <1 cm, which proved challenging for the algorithm due to the high resolution (<2 cm) orthoimages. Nonetheless, the large-scale morphodynamics was captured by the model with agreeable accuracy. Considering its ability to make predictions over large domains, one may confidently conclude that the method and the model demonstrate a great deal of promise in drastically reducing the time and resources to assess and analyze river morphodynamics at large scales as compared to alternate methods that are less automated.
Despite encountering common limitations in optical granulometry methods, such as varying light conditions, including shadows, saturation, and overexposure, the GRAINet algorithm exhibited relative robustness. However, vegetation, cohesive sand, water-lodged image tiles, human impacts, small-scale warping, and image artifacts presented significant challenges. These factors could lead to the misidentification of grains and introduce predictive errors, especially for fine sediment prediction. Informing oneself of these complexities is crucial for addressing and potentially enhancing the accuracy and reliability of the algorithm.
This method, and by extension, the model, may also be used in combination with more robust manual methods discussed in the first section, to both assess river morphodynamics over a large model domain with agreeable accuracy and increased accuracy on a point scale, if necessary.
Creating a large, manually labeled training dataset is time-consuming but essential for the GRAINet CNN approach. Nonetheless, the GRAINet approach eliminates the need for user-parameter tuning, making it particularly beneficial for the transferability of the model to different channel morphologies and granulometric properties.
ACKNOWLEDGEMENTS
The research was funded by the Czech Science Foundation project 22-12837S Hydrological and hydrochemical responses of montane peat bogs to climate change, and the Technology Agency of the Czech Republic project SS02030040.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.