Abstract
The assessment of visual blockages in cross-drainage hydraulic structures, such as culverts and bridges, is crucial for ensuring their efficient functioning and preventing flash flooding incidents. The extraction of blockage-related information through computer vision algorithms can provide valuable insights into the visual blockage. However, the absence of comprehensive datasets has posed a significant challenge in effectively training computer vision models. In this study, we explore the use of synthetic data, the synthetic images of culvert (SIC) and the visual hydraulics lab dataset (VHD), in combination with a limited real-world dataset, the images of culvert openings and blockage (ICOB), to evaluate the performance of a culvert opening detector. The Faster Region-based Convolutional Neural Network (Faster R-CNN) model with a ResNet50 backbone was used as the culvert opening detector. The impact of synthetic data was evaluated through two experiments. The first involved training the model with different combinations of synthetic and real-world data, while the second involved training the model with reduced real-world images. The results of the first experiment revealed that structured training, where the synthetic images of culvert (SIC) were used for initial training and the ICOB was used for fine-tuning, resulted in slightly improved detection performance. The second experiment showed that the use of synthetic data, in conjunction with a reduced number of real-world images, resulted in significantly improved degradation rates.
HIGHLIGHTS
Culverts are prone to blockage and often cause flooding in urban regions; therefore, regular maintenance is significant.
Computer vision and artificial intelligence models are proposed to assess the culverts in terms of visual blockage to automate the process of unsafe and expensive manual inspections.
Proposes the use of artificially generated images to train the computer vision models and report the insights.
INTRODUCTION
The assessment of culvert visual blockage is a crucial task that is performed by flood management teams for the purpose of maintenance and to mitigate the risk of flash floods (Weeks et al. 2013; Iqbal et al. 2021a, 2022a, 2022b). The identification of the blockage status, degree of visual blockage, and the type of debris that is blocking the culvert is of utmost importance in the decision-making process regarding maintenance activities (Iqbal et al. 2021a). Traditionally, these assessments are carried out through manual site visits, which can be both economically and safety-wise infeasible, particularly during flooding conditions. To address this challenge, the application of real-time monitoring systems using visual sensors has been proposed as a potential solution (Barthelemy et al. 2020) motivated by the recent success of computer vision solutions in other water-related problems (e.g., water level monitoring (Sermet & Demir 2023), sewer defect detection (Myrans et al. 2019; Zhou et al. 2022), waterbody visual analysis (Erfani & Goharian 2023), and flood inundation detection (Han et al. 2021)). This approach, however, requires the management of a large volume of visual data that must be transmitted to a control station for assessment, presenting an additional constraint. To overcome these limitations, a computer vision-based solution has been proposed under the Smart Water Project, which aims to extract the relevant blockage information using state-of-the-art hardware and to transmit only the blockage status, rather than a complete video stream. This solution promises to offer a more efficient and effective approach to culvert blockage assessment (Barthelemy et al. 2020).
In the field of visual blockage assessment of culverts, deep learning algorithms have emerged as a promising solution to extract blockage-related information from culvert images. Object classification (He et al. 2016; Tan & Le 2019), detection (Girshick 2015; Ren et al. 2015; Redmon et al. 2016), and segmentation (He et al. 2017) techniques are widely used for such purposes, as these algorithms have demonstrated high performance for various real-world problems. The detection of culvert openings, the region of the culvert, and the debris material present in the image can provide valuable information to flood management professionals for determining the visual blockage of culverts. However, the performance of deep learning algorithms is greatly influenced by the size and quality of the training dataset (Alhaija et al. 2018; Nowruzi et al. 2019). The culvert visual blockage assessment problem is a specific problem, and there is currently a shortage of large and comprehensive visual datasets available in the literature, which limits the effectiveness of training deep learning models. Conventional data enhancement approaches using image transformation operations are anticipated to be not significantly useful because of the complexity of the problem.
The development of new datasets for custom tasks can be a challenging and costly endeavor, requiring extensive collection and annotation of new data instances, as well as verification (Alhaija et al. 2018; Nowruzi et al. 2019; Liu et al. 2020a, 2020b). In order to overcome the limitations of limited training datasets, researchers have explored various methods, such as domain adaptation (Tzeng et al. 2017; Wang & Deng 2018; Choudhary et al. 2020) and few-shot learning (Zhang et al. 2018; Sun et al. 2019; Hu et al. 2021). However, these methods have proven insufficient to completely replace the need for a comprehensive annotated dataset. In recent years, there has been a growing trend towards generating synthetic datasets using gaming engines and photo-realistic technologies. These datasets can serve as an alternative to real datasets by imitating real-world data. Two common approaches to generating synthetic datasets include augmenting original images with synthetic objects of interest (Alhaija et al. 2018) or using games (Johnson-Roberson et al. 2017; Richter et al. 2017; Wu et al. 2018) to collect synthetic images. However, both of these methods result in limited samples with limited diversity, and still require partial manual labeling for bounding box and segmentation annotations.
In this research, we followed two approaches to enhance the existing culvert blockage assessment dataset:
We developed a scaled physical model of the culvert in the hydraulics laboratory and simulated the multiple blockage scenarios under controlled flooding conditions. All these experiments were recorded using two cameras directed at different orientations. The images were extracted from the recorded experiments and organized into a dataset, referred to as Visual Hydraulics Lab Dataset (VHD).
We developed a 3D application using the Unity gaming engine (i.e., Synthetic Culvert Blockage Images Generator) to generate a comprehensive automatically annotated dataset for culvert visual blockage assessment. This application provides highly customizable functions to simulate any blockage scenario by picking and placing debris into the scene. The dataset generated using this application is referred to as synthetic images of culvert (SIC) (Iqbal 2022).
In the context of detecting the culvert opening for determining the visual blockage status, the impact of developed synthetic datasets on the performance of the culvert opening detector is assessed using two sets of experiments:
In the first experiment, different combinations of SIC, VHD, and ICOB are investigated for culvert opening detection performance.
In the second experiment, the SIC dataset is used in combination with the reduced ICOB dataset to investigate performance degradation by reducing the real dataset.
In summary, the following are the anticipated contributions of the manuscript:
- 1.
Development of synthetic datasets: We developed synthetic datasets using laboratory experiments and computer applications to facilitate the training of deep learning models for visual blockage assessment. These synthetic datasets aim to address the scarcity of comprehensive and diverse training data, providing valuable resources for training data-intensive deep learning models in the field of culvert visual blockage assessment.
- 2.
Investigating the impact of synthetic data on model training: We performed experiments to understand and report the impacts of synthetic datasets on the training of deep learning models for the culvert opening detection problems. These investigations aim to highlight the benefits and limitations of using synthetic data for training deep learning models.
SYNTHETIC CULVERT BLOCKAGE IMAGES GENERATOR
In an effort to generate a comprehensive automatically annotated dataset for culvert visual blockage assessment, a 3D application (i.e., Synthetic Culvert Blockage Images Generator) using Unity gaming engine has been developed at Digital Living Lab, SMART Infrastructure Facility, University of Wollongong, Australia (Iqbal 2022). In its current state, the developed computer application offers a number of features to generate a diverse and comprehensive culvert blockage dataset of synthetic images. Listed are the brief summaries of each variant in the application.
Culvert type: it is equipped with the capability to generate five distinct types of culverts by maintaining the constancy of all other variants. These implemented configurations consist of a pipe culvert, a single circular culvert, a double circular culvert, a single box culvert, and a triple box culvert. This feature provides a wide range of diversity in the generated dataset.
Culvert background: it has been designed with a ‘Natural Background’ featuring trees and mountains, providing a realistic and naturalistic setting for the generated culvert blockage images. However, to further increase the diversity of the dataset, additional backgrounds, such as ‘Urban Residential’ and ‘Urban Commercial’, are currently in the process of being implemented, which will provide a wider range of environments and contexts.
Time of the day and shadow: it is equipped with the capability to switch between various times of the day, each accompanied by the appropriate lighting conditions. In addition, to increase the realism of the generated images, corresponding shadowing effects are implemented based on the selected time of day, providing a more comprehensive representation of culvert blockage in various lighting scenarios.
Water level and water profile: in order to simulate a diverse range of flooding scenarios, it includes the option to select between different water levels, ranging from a low level to a fully submerged condition. This feature provides a comprehensive representation of culvert blockage in various flooding conditions. Additionally, the application offers the choice between a ‘clear’ and a ‘flooded’ water profile, further increasing the realism and diversity of the generated images.
Weather: it provides the ability to select between various weather conditions, including ‘sunny’, ‘rain’, and ‘fog’. Additionally, the severity of the selected weather can be adjusted through the use of a slide bar, providing a comprehensive representation of culvert blockage in varying weather scenarios.
Vegetated debris materials: it includes a range of vegetated debris that can be selected and positioned within the scene. These materials encompass a variety of common debris types, including channel grass vegetation, various branches from trees, leaf debris, and tree trunks of different sizes.
Gravel debris material: it provides the capability to place various types of gravel debris within the scene, including gravel beds, rocks, and bricks, through the use of a drop-down menu.
Urban debris material: it includes a range of urban debris that can be positioned within the scene to simulate blockage, including shopping carts, cars, doors, chairs, cans, bins, construction materials, drums, and shipping containers.
Camera viewpoints: it provides the ability to vary the camera viewpoint in terms of sideways movement and zoom. This feature enables the capture of images from multiple angles, resulting in a diverse dataset.
Batch screenshot: it includes the capability to formulate a comprehensive visual dataset through the implementation of a batch screenshot feature. This feature employs nested loops to capture all possible configurations, including culvert configurations, time of day, water levels, water color, and camera viewpoints, for a given simulated blockage scenario. The batch screenshot feature allows for the systematic and efficient capture of a diverse range of images, providing a comprehensive representation of culvert blockage in various conditions.
Bounding box annotations: it includes the capability to automatically generate bounding box annotations for each object in the scene, including the culvert structure and culvert openings. This feature is integrated with the batch screenshot capability and saves the .json annotations for each saved image with the same name. The automatic generation of bounding box annotations greatly enhances the efficiency and accuracy of the image annotation process and provides a comprehensive representation of the objects present in each image.
DATASETS
To study the impact of synthetic data on the performance of culvert opening detectors towards assessing the visual blockage, two synthetic datasets (i.e., SIC and VHD) were used along with the real dataset (i.e., ICOB). The real dataset is used as the benchmark while synthetic datasets are developed to enhance the benchmark dataset.
Images of culvert openings and blockage
Synthetic images of culverts
Visual hydraulics lab dataset
Evaluation dataset
The evaluation dataset was an essential component of this investigation, serving as the benchmark to assess the performance of the culvert opening detection models. The selection of real-world images of culverts was carefully made to ensure a high degree of diversity, reflecting the varied conditions that the models would encounter in practical applications. The diversity of the dataset in terms of culvert types, lighting conditions, capturing angles, scaling, and blockage status, among other factors, provided a comprehensive evaluation of the models' capabilities in a real-world context.
The 450 images were manually annotated with 965 bounding box annotations, accurately depicting the opening of the culvert, using a widely accepted annotation tool LabelImg. The annotations were verified for accuracy, ensuring that the evaluation dataset was of high quality and accurately represented the target objects. The evaluation dataset was kept completely separate from the training and validation datasets, with no overlap in terms of images or annotations. This practice, commonly referred to as ‘out-of-sample testing,’ ensured that the results of the model evaluation were not influenced by any prior exposure to the images or annotations.
FASTER R-CNN
The Faster R-CNN model, proposed by Ren et al. in (2015), addresses the issue of high computational cost in object detection by introducing a novel region proposal network (RPN). This model is based on the concept of sharing features between the feature extraction network and the detection network, reducing computational costs. The Fast R-CNN (Girshick 2015) and RPN networks were integrated using a shared CNN feature representation and the attention mechanism. In the RPN, the problem of multiple scales and aspect ratios of objects was addressed by using anchors placed at the center of each spatial window. The proposals generated were then parameterized in relation to the anchors, resulting in a unified single model with two modules: the RPN deep CNN model and the Fast R-CNN detector.
EXPERIMENTAL DESIGN AND EVALUATION MEASURES
Experiments were conducted to evaluate the effectiveness of a culvert opening detector for visual blockage assessment. The performance of the detector was evaluated using a combination of real (i.e., ICOB) and synthetic (i.e., SIC, VHD) datasets. The object detection model used for culvert opening detection was Faster R-CNN with ResNet50 as its backbone. The input images were resized to and fed to the model in a single image per batch configuration with a batch size of 1. During the training phase, the model was exposed to 50 epochs of training with a constant learning rate. The initial learning rate was set to 0.0003, and it was progressively reduced to 0.00003 and 0.000003 at the 60 and 80% epochs, respectively1. To ensure the validity of the training results, the ICOB, VHD, and SIC datasets were randomly divided into training and validation sets with an 80:20 split.
The reported experiments were designed to consider the systematic bias at data and model levels. At the data level, bias was handled by following standardized data collection and annotation procedures. For ICOB, a diverse and challenging set of culvert images covering variations in lighting conditions, backgrounds, scaling, blockage types, and opening types were covered. The VHD and SIC datasets were developed using controlled tools; however, the realism and process of culverts being blocked were reviewed by the experts in hydraulics and water resource engineering to ensure scientific consistency. The known limitations of each of the simulated datasets were clearly reported. The annotation process of the dataset involved the review of labels by at least another expert to ensure the correctness and to remove any bias. Any conflicts were resolved through the discussions. At the model level, to address any potential bias, standard training and optimization procedures were adopted. A standard dataset split was for training while a very challenging evaluation dataset was developed to eliminate any bias that may impact the model performance estimates.
Experiments were performed in two main categories; a combination of synthetic datasets and a reduction of dataset size. Details about both experimental protocols are listed as follows:
• Experiment 1 – dataset combinations: In this category, experiments were carried out to investigate the influence of incorporating synthetic data (i.e., SIC, VHD) into the ICOB dataset on the performance of culvert opening detection. Two main cases were studied: (a) a random mixture of real and synthetic datasets was used to train the model, and (b) the synthetic dataset was used to train the model, and ICOB was utilized to perform fine-tuning. The objective of this experiment was to compare the effects of a random mixture and a scheduled transfer learning approach. The results of these experiments aimed to provide insights into the most effective method for improving the performance of culvert opening detection by leveraging synthetic and real data. In summary, the following investigations are reported under experiment 1:
o Training the model with ICOB.
o Training the model with SIC.
o Training the model with VHD.
o Training the model with random mix of ICOB and SIC.
o Training the model with random mix of ICOB and VHD.
o Training the model with SIC and Fine-Tuning using ICOB.
o Training the model with VHD and Fine-Tuning using ICOB.
• Experiment 2 – dataset reduction: In this category, experiments were designed to assess the effect of reducing the size of the ICOB dataset on the performance of culvert opening detection. The dataset was reduced to 50 and 25% to evaluate the impact of data reduction. Furthermore, the impact of this reduction was compared to the scenario in which the model was trained using the SIC dataset and fine-tuned using the ICOB dataset, with the aim of determining if the incorporation of SIC could improve the performance. The results of these experiments aimed to provide valuable insights into the relationship between the size of the training dataset and the performance of culvert opening detection, as well as the potential benefits of using synthetic data in conjunction with real data to improve performance. In summary, the following investigations are reported under experiment 2:
o Training the model with 100% ICOB.
o Training the model with 50% ICOB.
o Training the model with 25% ICOB.
o Training the model with SIC and Fine-Tuning using 100% ICOB.
o Training the model with SIC and Fine-Tuning using 50% ICOB.
o Training the model with SIC and Fine-Tuning using 25% ICOB.
The performance of the culvert opening detector model was evaluated using COCO Evaluation Metrics which is a commonly used benchmark to measure the performance of an object detector. The COCO metric is based on the mean Average Precision (mAP), which summarizes the overall performance of the object detector by taking into account both precision and recall. Precision refers to the fraction of correct detections among all the detections made by the algorithm, while recall measures the fraction of correct detections among all the ground truth objects in the dataset. The mAP is calculated at different intersection-over-union (IoU) thresholds, which represent the degree of overlap between the predicted bounding boxes and the ground truth boxes. The mAP score is the average of the precision scores at different IoU levels. A higher mAP score indicates better performance of the object detector. For the performed investigations, results are reported in forms mAP@50, mAP@75, mAP@50–95, mAP@small, mAP@medium, and mAP@large.
RESULTS
Experiment 1 – dataset combinations
. | Training loss . | Validation loss . | Validation mAP@50 . | Validation mAP@75 . |
---|---|---|---|---|
ICOB | 0.02511 | 0.1994 | 0.9559 | 0.7802 |
SIC | 0.00497 | 0.0150 | 1.000 | 1.0000 |
VHD | 0.00170 | 0.0897 | 0.9991 | 0.9711 |
SIC + ICOB | 0.00480 | 0.0548 | 0.9848 | 0.9355 |
VHD + ICOB | 0.00256 | 0.0901 | 0.9796 | 0.9085 |
Trained on SIC + Fine-tuned on ICOB | 0.00908 | 0.2077 | 0.9669 | 0.7985 |
Trained on VHD + Fine-tuned on ICOB | 0.05015 | 0.1745 | 0.9636 | 0.7929 |
. | Training loss . | Validation loss . | Validation mAP@50 . | Validation mAP@75 . |
---|---|---|---|---|
ICOB | 0.02511 | 0.1994 | 0.9559 | 0.7802 |
SIC | 0.00497 | 0.0150 | 1.000 | 1.0000 |
VHD | 0.00170 | 0.0897 | 0.9991 | 0.9711 |
SIC + ICOB | 0.00480 | 0.0548 | 0.9848 | 0.9355 |
VHD + ICOB | 0.00256 | 0.0901 | 0.9796 | 0.9085 |
Trained on SIC + Fine-tuned on ICOB | 0.00908 | 0.2077 | 0.9669 | 0.7985 |
Trained on VHD + Fine-tuned on ICOB | 0.05015 | 0.1745 | 0.9636 | 0.7929 |
The quantitative summary of test results is presented in Table 2. The detection model with real data (i.e., ICOB) achieved a performance benchmark of mAP@50 of 0.782 and mAP@75 of 0.491. Individual training results indicate that the models trained using synthetic datasets performed poorly and were unable to generalize well to the real evaluation dataset (i.e., mAP@50 of 0.147 for SIC and mAP@50 of 0.176 for VHD). This is likely due to the limited diversity, inadequate color adaptation, and insufficient photo-realism in the synthetic dataset. This particular instance of the results highlights the need to re-examine the SIC data generation process and improve it in terms of diversity and photo-realism. It is essential to address these issues to ensure that the synthetic dataset captures the complexity and variability of the real data to train the model effectively. Failure to do so can result in models that are ineffective in real-world scenarios, rendering the synthetic data inadequate for training deep learning models. For mixed training, it was expected that model will learn the general trends from the synthetic and will use ICOB to adapt the domain, however, there is no particular schedule by which the model learns in case of mixed training. Despite that, it was interesting to observe that mixing synthetic data with ICOB did not degrade the performance by much (as observed in Kishore et al. (2021)) and contradicted the results of Nowruzi et al. (2019) and Liu et al. (2020b) where synthetic data mixing degraded the performance significantly.
. | mAP@50 . | mAP@75 . | mAP@50–95 . | mAP@small . | mAP@medium . | mAP@large . |
---|---|---|---|---|---|---|
ICOB | 0.782 | 0.491 | 0.456 | 0.054 | 0.394 | 0.463 |
SIC | 0.147 | 0.002 | 0.031 | 0.000 | 0.047 | 0.031 |
VHD | 0.176 | 0.021 | 0.053 | 0.000 | 0.067 | 0.053 |
SIC + ICOB | 0.808 | 0.471 | 0.452 | 0.039 | 0.380 | 0.459 |
VHD + ICOB | 0.779 | 0.426 | 0.425 | 0.064 | 0.365 | 0.432 |
Trained on SIC + Fine-tuned on ICOB | 0.828 | 0.508 | 0.481 | 0.015 | 0.394 | 0.490 |
Trained on VHD + Fine-tuned on ICOB | 0.775 | 0.483 | 0.449 | 0.064 | 0.381 | 0.457 |
. | mAP@50 . | mAP@75 . | mAP@50–95 . | mAP@small . | mAP@medium . | mAP@large . |
---|---|---|---|---|---|---|
ICOB | 0.782 | 0.491 | 0.456 | 0.054 | 0.394 | 0.463 |
SIC | 0.147 | 0.002 | 0.031 | 0.000 | 0.047 | 0.031 |
VHD | 0.176 | 0.021 | 0.053 | 0.000 | 0.067 | 0.053 |
SIC + ICOB | 0.808 | 0.471 | 0.452 | 0.039 | 0.380 | 0.459 |
VHD + ICOB | 0.779 | 0.426 | 0.425 | 0.064 | 0.365 | 0.432 |
Trained on SIC + Fine-tuned on ICOB | 0.828 | 0.508 | 0.481 | 0.015 | 0.394 | 0.490 |
Trained on VHD + Fine-tuned on ICOB | 0.775 | 0.483 | 0.449 | 0.064 | 0.381 | 0.457 |
In addition to individual training with synthetic and real data, the study also tested a more structured approach, where the model was initially trained using synthetic data and fine-tuned using the ICOB dataset. This approach, when trained using the SIC dataset and fine-tuned using ICOB, resulted in slightly better results, indicating that the model was able to extend its learning by transferring knowledge from the SIC dataset. Moreover, from a real-world deployment perspective, the structured learning approach can save resources in scenarios where only real data are updated. This is because fine-tuning a model on a small dataset trained using a comprehensive dataset is relatively fast compared to mixed training, where both synthetic and real data are used together. As a result, the structured learning approach can reduce the time and resources required for model training and deployment, making it a practical solution for real-world applications. The improved performance of the model as a result of structured training is in-line with Nowruzi et al. (2019) which also reported a similar trend.
In the realm of literature, two distinct categories of studies have been identified. The first group explores cases where the utilization of synthetic data immediately enhances the overall performance of a model, representing an ideal scenario. This group includes studies conducted by Shafaei et al. (2016), Kim et al. (2022) and Allken et al. (2019). These researchers observed that incorporating or generating synthetic datasets led to improved performance when applied to real-world datasets. However, it is important to note that these favorable outcomes may occur when the synthetic data share significant visual features with real-world data and lack challenging background variations. An illustrative example is Shafaei et al. (2016), where synthetic data are generated from a game and comprises road views with surrounding vehicles, individuals, and buildings. Due to the advanced physics and realism achieved in contemporary games, superior performance is expected in such cases.
On the other hand, the second group of literature, including works by Beery et al. (2020), Żarski et al. (2022) and Kishore et al. (2021), reports that the usage of synthetic data is not always beneficial unless employed meaningfully. Synthetic data lacking realistic visuals and variations can introduce dataset bias and lead to degraded performance, as observed in the cases of VHD-only and SIC-only datasets. Żarski et al. (2022) clearly demonstrate the detrimental impact of a simulated dataset that lacks challenging variations and realism, resulting in severe overfitting. The authors employed synthetic data for training and validation purposes, achieving higher accuracies. Similarly, Kishore et al. (2021) also observed a similar trend, wherein training the model solely on synthetic data led to the worst performance on real test datasets, highlighting a classic example of overfitting.
Experiment 2 – dataset size reduction
. | Training loss . | Validation loss . | Validation mAP@50 . | Validation mAP@75 . |
---|---|---|---|---|
ICOB | 0.02511 | 0.1994 | 0.9559 | 0.7802 |
ICOB 50% | 0.0083 | 0.2683 | 0.9236 | 0.6190 |
ICOB 25% | 0.1879 | 0.3201 | 0.8780 | 0.5430 |
SIC trained + ICOB 100% Fine-tuned | 0.0090 | 0.2077 | 0.9669 | 0.7985 |
SIC Trained + ICOB 50% Fine-tuned | 0.00791 | 0.2524 | 0.9500 | 0.6558 |
SIC Trained + ICOB 25% Fine-tuned | 0.1101 | 0.3491 | 0.9153 | 0.6430 |
. | Training loss . | Validation loss . | Validation mAP@50 . | Validation mAP@75 . |
---|---|---|---|---|
ICOB | 0.02511 | 0.1994 | 0.9559 | 0.7802 |
ICOB 50% | 0.0083 | 0.2683 | 0.9236 | 0.6190 |
ICOB 25% | 0.1879 | 0.3201 | 0.8780 | 0.5430 |
SIC trained + ICOB 100% Fine-tuned | 0.0090 | 0.2077 | 0.9669 | 0.7985 |
SIC Trained + ICOB 50% Fine-tuned | 0.00791 | 0.2524 | 0.9500 | 0.6558 |
SIC Trained + ICOB 25% Fine-tuned | 0.1101 | 0.3491 | 0.9153 | 0.6430 |
. | mAP@50 . | mAP@75 . | mAP@50–95 . | mAP@small . | mAP@medium . | mAP@large . |
---|---|---|---|---|---|---|
ICOB | 0.782 | 0.491 | 0.456 | 0.054 | 0.394 | 0.463 |
ICOB 50% | 0.747 | 0.424 | 0.420 | 0.000 | 0.300 | 0.431 |
ICOB 25% | 0.648 | 0.303 | 0.333 | 0.000 | 0.257 | 0.341 |
SIC trained + ICOB 100% Fine-tuned | 0.828 | 0.508 | 0.481 | 0.015 | 0.394 | 0.490 |
SIC trained + ICOB 50% Fine-tuned | 0.816 | 0.479 | 0.459 | 0.000 | 0.359 | 0.470 |
SIC trained + ICOB 25% Fine-tuned | 0.766 | 0.364 | 0.394 | 0.000 | 0.307 | 0.403 |
. | mAP@50 . | mAP@75 . | mAP@50–95 . | mAP@small . | mAP@medium . | mAP@large . |
---|---|---|---|---|---|---|
ICOB | 0.782 | 0.491 | 0.456 | 0.054 | 0.394 | 0.463 |
ICOB 50% | 0.747 | 0.424 | 0.420 | 0.000 | 0.300 | 0.431 |
ICOB 25% | 0.648 | 0.303 | 0.333 | 0.000 | 0.257 | 0.341 |
SIC trained + ICOB 100% Fine-tuned | 0.828 | 0.508 | 0.481 | 0.015 | 0.394 | 0.490 |
SIC trained + ICOB 50% Fine-tuned | 0.816 | 0.479 | 0.459 | 0.000 | 0.359 | 0.470 |
SIC trained + ICOB 25% Fine-tuned | 0.766 | 0.364 | 0.394 | 0.000 | 0.307 | 0.403 |
DISCUSSIONS
Experimental results presented in Section 6 indicate that the synthetic datasets (i.e., SIC, VHD) were relatively simpler for CNN models. The model was clearly overfitting (i.e., high training performance but poor test performance) and evidence that it failed to generalize the features from the synthetic dataset.
Talking about VHD, the potential reason for the poor performance may be the lack of visual diversity in comparison to the real dataset (as identified by Man & Chahl (2022)). All the images in the dataset are captured with the same background and same flume setup. The dataset shares all the visual features with only variations in the blockage conditions and water level variations. Although the scaled model experiments were successful in capturing the multiple blockage scenarios, however, lacked visual variations in comparison to ICOB where a number of natural and urban backgrounds are present. So, the question is how to enhance the existing dataset. Several potential solutions exist to improve the performance of the culvert opening detector on the VHD dataset, which displayed poor performance. One approach is to leverage neural style transfer models to transform the laboratory images and improve their visual realism. Alternatively, simplifying the problem by focusing on the culvert region only in both real and laboratory images could improve model performance and benefit from the use of synthetic data. Another approach involves using blending models to merge culvert regions with many different natural backgrounds to match the visual features of the ICOB dataset. However, implementing these solutions may require manual annotation, which could limit their feasibility, and the limited number of blockage scenarios requires a significant effort to collect a comprehensive dataset.
The poor performance of the SIC synthetic dataset can be attributed to the lack of photo-realism and the presence of a single natural background, resulting in overfitting of the CNN models (same problem identified by Man & Chahl 2022). However, the advantages of this approach, such as the ability to generate infinite images with complete control over the blockage scenarios and culverts, make it a worthwhile long-term investment. The application can be designed to address multiple problems in the same domain, with the option to add more features and update the dataset over time. Additionally, the application can be made user-friendly with an interactive graphical user interface (GUI), requiring no technical knowledge to operate. To improve the performance of the SIC synthetic dataset, several solutions can be considered. First, only considering the culvert region of the images, rather than the whole image, may improve the results. Since the dataset is generated through a gaming engine, it should be feasible to automatically generate the labels. Second, applying ray-tracing technology can improve the photo-realism of the images, which is expected to enhance the visual quality of the dataset. Lastly, adding more natural and urban scenes will help to create a diversity of datasets, which is anticipated to improve the performance.
In addition to the discussed approaches, there are other potential options that can be explored in the future. One such approach is the use of Generative Adversarial Networks (GANs) (Wang et al. 2017; Creswell et al. 2018) to generate synthetic data. However, in the case of the blockage problem, the limited number of available images and the complexity of the problem make it challenging to use GANs. The requirement of a large number of images for GAN training, as well as the need for significant computational resources, may make this solution cost-prohibitive. Moreover, the generation of annotations remains a partially automated process, which presents an additional challenge for generating large annotated datasets. The approach reported by Saleh et al. (2018) where a unique method of using synthetic images for semantic segmentation can also be considered. They argued that both background and foreground classes have different responses to the synthetic data and proposed the use of a detection approach for foreground objects considering that objects’ realism may be not the same in the synthetic data but their shape would be the same.
Recently, the NVIDIA Omniverse platform has gained popularity, and its ISAAC Replicator Sim can be used to generate high-quality synthetic datasets. This platform can help reduce the effort required to create custom scenes with custom objects and generate numerous images with annotations. While the level of control may not be as high as that of the SIC approach, the NVIDIA Omniverse platform offers a quick and efficient solution suitable for common problems.
CONCLUSION
In conclusion, this manuscript contributed to the field of culvert visual blockage assessment by developing synthetic datasets and investigating their impact on training deep learning models. The development of these datasets addressed the limitations of existing data sources, facilitating the training of data-intensive models. Through experimental analyses, this study shed light on the benefits, implications and limitations of incorporating synthetic data in the training process, advancing the understanding and application of deep learning techniques for culvert opening detection. Investigations were performed in two sets of experiments to understand the dataset combinations and dataset size reduction. Experiment 1 investigated the impact of different dataset combinations on the performance of the culvert opening detector. From the results, it is reported that, individually, both synthetic datasets were much simpler for the CNN and mixing these datasets randomly with ICOB did not improve the performance. However, when used in a structured way where synthetic data is used for training and ICOB for fine-tuning, the performance was reported to be improved. Experiment 2 investigated the impact of ICOB dataset reduction and the use of SIC with reduced ICOB. From the results, it is reported that dataset size reduction to 50% is not as significant as when reduced to 25%. Further, when incorporated with SIC, the degradation rate was found to improve, indicating that synthetic data can be used with very limited real data without compromising the performance by a significant amount. Potential future work in this area could involve the use of ray-tracing to improve the visual quality, the use of NVIDIA ISAAC Replicator to generate synthetic data, the use of the learning using privileged information (LUPI) framework for better use of synthetic data, and the use of style transfer models to improve the relevance of synthetic data.
ACKNOWLEDGEMENTS
I would like to thank the Wollongong City Council (WCC) for funding this investigation. This research was funded by the Smart Cities and Suburb Program (Round Two) of the Australian Government, grant number SCS69244.
Experimental configuration and scripts available at: https://github.com/umairchoudhry8/Synthetic_Data_Analysis-Culvert_Opening_Detection_Case.git
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.