ABSTRACT
To solve the problems of localization and identification of fish in the complex fishway environment, improving the accuracy of fish detection, this paper proposes an object detection algorithm YOLORG, and a fishway fish detection dataset (FFDD). The FFDD contains 4,591 images from the web and lab shots and labeled with the LabelIMG tool, covering fish in a wide range of complex scenarios. The YOLORG algorithm, based on YOLOv8, improves the traditional FPN–PAN network into a C2f Multi-scale feature fusion network with a Gather-and-Distribute mechanism, which solves the problem of information loss accompanied by the network in the fusion of feature maps of different sizes. Also, we propose a C2D Structural Re-parameterization module with a rich gradient flow and good performance to further improve the detection accuracy of the algorithm. The experimental results show that the YOLORG algorithm improves the mAP50 and mAP50-95 by 1.2 and 1.8% compared to the original network under the joint VOC dataset, and also performs very well in terms of accuracy compared to other state-of-the-art object detection algorithms, and is able to detect fish in very turbid environments after training on the FFDD.
HIGHLIGHTS
We propose an FFDD fish detection dataset.
We propose a Structural Re-parameterization convolution module C2D.
We propose a C2f Multi-scale feature fusion network to solve the problem of information loss in the YOLOv8 network.
We propose a YOLORG-series model constructed by C2D Structural Re-parameterization module and C2f Multi-scale feature fusion network.
The proposed method has fewer parameters.
INTRODUCTION
Fish is an important part of the ecosystem and human culture, with more than 300 million people in the world using it as their main diet (Vianna et al. 2020). At present, the destruction of fish-habitat has led to the diminishing of fishery resources. Fishway and other fish-crossing facilities have become an important measure to mitigate the adverse impacts caused by water conservancy projects. Biologists and fisheries practitioners estimate the presence and abundance of fish from videos and images taken underwater to help understand the natural environment and promote industry. However, manually analyzing large amounts of underwater videos and images would be a tedious and time-consuming task (Li et al. 2015). Therefore, it is very realistic to use object detection techniques to automatically detect fish in videos and images to help relevant practitioners save valuable time.
Traditional fish detection methods rely on sonar equipment to analyze, extract, and study the collected data by developing spatial and temporal sampling plans to achieve the goal of detecting fish. Digital image processing techniques have also been applied to fish detection, distinguishing fish by background subtraction and other classification methods. Object detection methods have been used frequently in fish detection, but mostly using two-stage or early single-stage detectors to identify and localize fish. Sonar-based fish detection, limited by expensive sonar equipment, is very unfriendly to small-scale smart fisheries projects. Fish detection based on digital image processing performs poorly in detection and classification. The method can’t meet the needs of modern intelligent fisheries. Fish detection based on early object detection methods still falls short of the modern smart fisheries requirements in inference speed and detection accuracy when analyzing images and videos in an automated manner. In this paper, we proposed an improved object detection algorithm YOLORG (YOLO Re-parameterization and Gather-and-Distribute Network) for the above-mentioned fish detection work, where fish detection is limited by the equipment, method, and performance. The YOLORG algorithm is an end-to-end fish detector that automatically detects multiple types of fish in images and videos, helping to build smart fisheries and smart fishway.
The YOLO series of algorithms is the cornerstone of the object detection field, many YOLO algorithms improve network performance by modifying network structure and designing new loss functions. What they fail to notice, however, is that feature maps at different scales are often accompanied by information loss when information is fused, although FPN–PAN networks have greatly alleviated this problem (Wang et al. 2023a). To this end, we designed a C2f Multi-scale feature fusion network; it fuses feature maps at different scales through Gather-and-Distribute mechanism (consisting of Feature Alignment Module-FAM, Information Fusion Module-IFM and Information Injection Module-Inject_C2f), which is able to reduce information loss in fusion phase and improve the network performance. In addition, we have performed a Structural Re-parameterization modification for C2f, the main module in the YOLOv8 network. The improved C2D module has a multi-branch complex structure in training phase, and it is able to convert the multi-branch complex structure into a single convolution in inference phase, which has the good performance of the multi-branch structure and can be equivalently embedded into the network to improve the accuracy of the network. Finally, we propose a fishway fish detection dataset (FFDD), containing 3,428 training images and 1,163 test images, covering motion pictures of fish in complex environments.
Overall, the contributions of this paper are as follows:
- (1)
We propose an FFDD, containing 3,428 training images and 1,163 test images, covering the motion pictures of fish in a variety of complex environments. FFDD enriches the underwater object detection dataset, fills in fish images within special environments, and expands the scope of underwater object detection.
- (2)
We propose a Structural Re-parameterization convolution module C2D, which possesses a complex multi-branch structure during training, and converts the multi-branch structure into a single convolution during inference, which is equivalently embedded into the network to improve the accuracy. The C2D Structural Re-parameterization method provides ideas for optimizing the detector performance, and the method can be equivalently applied to other detection algorithms.
- (3)
We propose a C2f Multi-scale feature fusion network to solve the problem of information loss, aligning different scale feature maps through the FAM module, fusing the aligned feature maps through the IFM module, injecting the fused feature information to different levels through Inject_C2f module. C2f Multis-cale Feature Fusion Network provides ideas for solving the information loss problem in the field of object detection.
- (4)
We propose a YOLORG-series model constructed by C2D Structural Re-parameterization module and C2f Multi-scale feature fusion network. The YOLORG-n model obtains excellent performance of 82.6 and 63.9% on the joint VOC dataset for and , which outperforms the existing YOLO series of object detectors and related fish detection methods. Multiple experimental results demonstrate that we have contributed an excellent detection algorithm for the field of object detection.
RELATED WORK
Structural Re-parameterization
In recent years, Structural Re-parameterization has been one of the most popular research hotspots in the field of CNN. Structural Re-parameterization obtains better performance without bringing any extra parameter by training the multi-branch structure of the module, and fusing the multi-branches into a single convolution when inference is performed. Ioffe & Szegedy (2015), Szegedy et al. (2015), Szegedy et al. (2016) and Szegedy et al. (2017) found that the multi-branch structure of the network can effectively enrich the feature space, demonstrating the importance of different connections and combinations of multiple branches. Hu et al. (2018) and Wang et al. (2020) proposed the SE Spatial Attention Module and ECA Spatial Attention Module, which are effective in improving the representation ability of modules. Ding et al. (2019), Ding et al. (2021a) and Ding et al. (2021c) proposed a convolutional module with a multi-branch structure capable of effectively extracting spatial feature information and was able to convert the multi-branch structure into a single convolution during inference, improving the accuracy of network detection without bringing in additional parameters. Ding et al. (2021b) first applied the Structural Re-parameterization method to FC by constructing convolutional layers within RepMLP during training and merging them into FC for inference. Combining the global representation capability and location awareness of FC with the local prior of convolution improves the performance and location patterns of the neural network. Meng et al. (2021) found that the Structural Re-parameterization method can only be applied to linear blocks, while the nonlinear layer (ReLU) must be placed outside the residual connections. Therefore, a new RM operation method is proposed, which removes the residual connections with nonlinear layers inside and keeps the results of the model unchanged. The work in this paper is to modify the Structural Re-parameterization of the main module in the object detection algorithm and apply it in a real environment to detect fish, improving the accuracy of the detector.
Multi-scale feature fusion
With the development of object detection, people have gradually realized that Neck networks in detectors play a very important role in small object recognition and detector accuracy improvement. Redmon & Farhadi (2018) used FPN networks for the first time in the YOLOv3 algorithm to achieve a downward fusion of feature maps at different scales, which dramatically improves the performance of the detector. Bochkovskiy et al. (2020) used the FPN–PAN network for the first time in the YOLOv4 algorithm, which was used to solve the problem of severe loss of information after the shallow information passes through the multilayer network, and drastically improved the detection accuracy of the network. Li et al. (2022b) designed BiC modules with bi-directional linkages for the FPN–PAN network and applied the Re-parameterization method to the FPN–PAN network, proposing the RepBi–PAN Multi-scale fusion network. The network introduces a bottom-up flow of information into the top-down delivery path, allowing shallow information to participate efficiently in the fusion. Xu et al. (2022) proposed a new structure Light-Backbone Heavy-Neck, which uses Efficient RepGFPN as the Neck network so that high-level semantic information and low-level spatial information can be fully exchanged and achieved SOTA performance. Li et al. (2022a) found that it is difficult for large models to meet the requirements of real-time detection in in-vehicle edge computing platforms, and lightweight models constructed by a large number of deeply separable convolutions can’t achieve sufficient accuracy. Therefore, a design paradigm Slim-Neck is proposed, which can better balance the accuracy and speed of the model.
Although the above detectors are excellent in terms of accuracy metrics, they still suffer from information loss in fusion different feature maps, even information loss is greatly mitigated by the design of many different FPN–PAN. Wang et al. (2023a) proposed a Gather-and-Distribute Mechanism Multi-scale feature fusion network implemented by convolution and self-attention, which greatly improves the problem of information loss and enhances the performance of networks. The work in this paper draws on the Gather-and-Distribute Mechanism Multi-scale fusion network in the Gold-YOLO algorithm to design a C2f Multi-scale feature fusion network for improving object detection algorithm accuracy.
Application of object detection in fisheries
Smart fishery can save a lot of manpower and resources, now it has become the main trend of fishery development. Paspalakis et al. (2020) use small, low-cost, autonomous underwater vehicles instead of traditional divers for periodic inspection of fishing nets, with significant cost savings. Dong et al. (2023) proposed a new method to localize fish keypoints based on object detection and regression model. Through YOLOv5 and perceptual strategies, the method is able to efficiently detect individual fish and estimate keypoints. Yu et al. (2023) used a detection model to measure fish size instead of acoustic detection methods with high-cost equipment and low detection accuracy. Kandimalla et al. (2022) use YOLOv3 and Mask-RCNN to detect and classify eight fish species from a high-resolution DIDESN imaging sonar dataset and integrated the Norfair object tracking framework to track and count fish. Zhang et al. (2020) proposed an automatic fish counting method to estimate the population of farmed Atlantic salmon by using machine vision and a new hybrid deep neural network model. Li et al. (2015) trained a Fast R-CNN algorithm with 24,277 homemade ImageCLEF fish images to detect and recognize fish in underwater images. Kay & Merrifield (2021) set up a website Fishnet.AI that continuously collects fish detection data and publishes the collected data on the website, including multiple fish categories. Xu & Matzner (2018) annotated 34,316 fish images by extracting the image frames of three fish videos and used the dataset to train the YOLOv3 algorithm to automatically identify fish in underwater videos. Muksit et al. (2022) proposed a fish detection model, YOLO-Fish, for detecting fish in real underwater environments. By fixing the up-sampling step problem existing in YOLOv3 and adding a spatial pyramid pooling module, they increased the model’s ability to detect fish in dynamic environments, and achieved good results. Li et al. (2023) proposed a new fish detection model RC_YOLOv5, which introduces the Res2Net residual structure and coordinate attention mechanism to achieve fast and accurate fish detection. Wang et al. (2022) proposed a fish anomaly detection algorithm, which added multi-level features and feature mapping to the YOLOv5 algorithm to achieve accurate fish detection, and used the single tracking algorithm SiamRPN++ to track abnormal fish individuals.
Although the above work on fish detection has made good progress, they still suffer from misdetections and omissions, and most algorithms have a lot of computation and parameters, which makes them inconvenient to deploy in actual projects. The reason for these problems is that these detectors are still lacking in image preprocessing, network structure, loss calculation, speed-accuracy balance, anchor assignment, etc., and most of them face the problem of information loss during information fusion. The work in this paper is applying object detector incorporating state-of-the-art methods in fish-passes fish detection. Meantime, we address the information loss problem present in the detector, propose a Structural Re-parameterization method to non-destructively improve the performance of the detector, and contribute a fish detection dataset FFDD, containing 3,428 training images and 1,163 test images.
METHOD
YOLO re-parameterization and Gather-and-Distribute network
Meanwhile, compared to other fish detection algorithms, YOLORG incorporates other advanced research results in the field of object detection. YOLORG used Mosica image preprocessing to enhance the generalization performance of the detector, used a more geometrically logical CIoU to calculate the fish localization loss, used Anchor-Free idel and TAL tag assignment strategies to reduce the impact of anchor on fish detection. Compared with other fish detection algorithms, YOLORG has faster convergence during training, detects fish with better generalization, higher classification and localization accuracy, and fewer false and missed detections. Therefore, YOLORG is a very good method to apply in fish detection.
The network structure of the YOLORG series of models is written in a Yaml configuration file, which is parsed and instantiated by code at runtime, and the parse pseudo-code is shown in Algorithm 1.
C2D Structural Re-parameterization module
Algorithm 2: C2D Module Pseudo-Code
Input: Input_tensor; in_channal; out_channal; number = 1; expansion = 0.5;
Output: Output_tensor;
Begin Initialization
c = int(out_channal * expansion)
cv1 = dbb(in_channal, 2 * c, 1, 1)
cv2 = dbb((2 + n) * c, out_channal, 1)
m = torch.ModuleList(BottleneckD(c, c, k=(3, 3), e=1.0) for_in range(number))
End Initialization
Begin Calculation
x = list(cv1(Input_tensor).chunk(2,1))
x.extend(module(x[-1]) for module in m)
return cv2(torch.cat(x,1))
End Calculation
The C2f module is the main module in the YOLOv8 algorithm. The C2f module references the C3 module in YOLOv5 (Jocher et al. 2021) and the ELAN module in YOLOv7 (Wang et al. 2023b). The C2f module has multiple gradient flow branches in the operation process, which not only can effectively extract spatial feature information but also has a small number of parameters and fast computation. The structure of C2f network is shown in Figure 3.
The six transformations are used in the DBB module and shown in Figure 4. All transformations are based on variations of homogeneity and additivity of convolutions, and will end up with a K*K size convolution. Through these six transformations, multiple branches in the DBB module are continuously ‘simplified’ and eventually merged into a K*K size convolution. The network structure diagram of the DBB module is shown in Figure 5. The structure of the DBB during training is a multi-branch structure similar to Inception, these branching structures have different-sized convolutions and therefore can obtain different sizes receptive fields. The DBB module can generate richer feature spaces and enhance the characterization of a single convolution by combining multi-branch structures. In the inference stage, the single convolution parameter corresponding to the parameters of the multi-branched structure is calculated in a linear combination and deployed in the model for using.
C2f Multi-scale feature fusion network
Information loss is a problem that exists in many detection algorithms. In algorithms, different branches are responsible for detecting different sizes objects. At the beginning they all have a lot of high-level semantic information in their feature maps, but this semantic information is dissipated during propagation or fusion, which means that the detection head obtains incomplete spatial information of the feature maps, ultimately affecting algorithm performance.
The reason why the C2f Multi-scale feature fusion network works is that it doesn’t have to obtain information from other layers indirectly or recursively like FPN–PAN networks. It replaces the downward fusion and upward fusion operations in the FPN–PAN network through the LowGD module and the HighGD module, which means that all the information in the C2f Multi-scale feature fusion network is acquired directly, with less loss and more efficiency, which improves the accuracy of the detector.
EXPERIMENT
Experimental environment and datasets
The experimental environment is four graphics cards NIVIDA RTX3060, CPU is Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, the memory size is 128GB, the system is Ubuntu 20.04.1, CUDA version is 11.3, the programming language is Python3.8, the deep learning framework is Pytorch version 1.12.1, and all algorithm results are tested on this experimental environment. Detailed experimental training data are shown in Table 1, and all datasets use this set of training parameters.
Model . | epoch . | batch_size . | optim . | learning_rate . | GPU(RTX 3060) . |
---|---|---|---|---|---|
YOLOv8-n | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv8-n C2D | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv8-n Multi-scale | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLORG-n | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv5-s | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv6-n | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv7-tiny | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLORG-s | 500 | 16 | SGD | 1e-2 | 0,1,2,3 |
Model . | epoch . | batch_size . | optim . | learning_rate . | GPU(RTX 3060) . |
---|---|---|---|---|---|
YOLOv8-n | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv8-n C2D | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv8-n Multi-scale | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLORG-n | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv5-s | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv6-n | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLOv7-tiny | 500 | 64 | SGD | 1e-2 | 0,1,2,3 |
YOLORG-s | 500 | 16 | SGD | 1e-2 | 0,1,2,3 |
The homemade dataset FFDD contains 3,428 training images and 1,163 test images. A portion of the fish images in the dataset were collected from the web and another portion of the fish images were taken from underwater cameras installed in the laboratory environment. The laboratory environment is mixed with a lot of sediment and water plants to simulate the real fish passage environment, the visibility is very low. All images in the dataset were labeled using the LabelIMG tool, and the FFDD covers multiple motion images of fish in complex environments compared to other underwater datasets.
COCO2017 public dataset, the training set contains 118,000 images, and the test set more than 40,000 images, the COCO dataset contains a total of 1.5 million objects and 80 detection categories, the COCO dataset is also widely used among other computer vision fields (Lin et al. 2014).
The joint VOC dataset is VOC2007+VOC2012, which contains 1.6W images and more than 4W detection targets in the training set, and 5k images and 1.2W object detection targets in the test set, which contains 20 common objects such as airplanes, bicycles, people, various small animals, etc (Everingham et al. 2010).
The DeepFish dataset contains approximately 40,000 images from 20 habitats in tropical Australian marine environments. The dataset contains classification labels, localization labels and segmentation labels, and is able to meet a wide range of underwater research needs (Saleh et al. 2020).
Ablation experiment
In order to verify the effectiveness of our proposed C2D Structural Re-parameterization module and C2f Multi-scale feature fusion network, we independently checked each module among YOLORG-n, focusing on the mAP, the number of parameters, and the amount of computation. The YOLOv8 model, with the addition of the C2D module, has a 0.2 and 0.3% improvement in accuracy due to the nature of the Structural Re-parameterization, which does not introduce any additional number of parameters or computations. After replacing the traditional FPN–PAN network with the C2f Multi-scale feature fusion network, YOLOv8 solves the problem of information loss and effectively improves network detection accuracy by 0.6 and 1.4% for and , respectively. YOLORG has a 1.2 and 1.8% increase in and on this basis. The experimental results are shown in Table 2. According to the experimental results, it can be found that we proposed C2D Structural Re-parameterization module and the C2f Multi-scale feature fusion network indeed can effectively improve the performance of the detector.
Model . | C2D . | C2f Multi-scale feature fusion network . | . | . | FLOPs(inference) . | Params . |
---|---|---|---|---|---|---|
YOLOv8-n | 81.4% | 62.1% | 8.1GFLOPs | 3M | ||
YOLOv8-n | ✓ | 81.6%(+0.2%) | 62.4%(+0.3%) | 8.1GFLOPs | 3M | |
YOLOv8-n | ✓ | 82%(+0.6%) | 63.5%(+1.4%) | 12.3GFLOPs | 6.1M | |
YOLORG-n | ✓ | ✓ | 82.6(+1.2%) | 63.9%(+1.8%) | 12.4GFLOPs | 6.1M |
Model . | C2D . | C2f Multi-scale feature fusion network . | . | . | FLOPs(inference) . | Params . |
---|---|---|---|---|---|---|
YOLOv8-n | 81.4% | 62.1% | 8.1GFLOPs | 3M | ||
YOLOv8-n | ✓ | 81.6%(+0.2%) | 62.4%(+0.3%) | 8.1GFLOPs | 3M | |
YOLOv8-n | ✓ | 82%(+0.6%) | 63.5%(+1.4%) | 12.3GFLOPs | 6.1M | |
YOLORG-n | ✓ | ✓ | 82.6(+1.2%) | 63.9%(+1.8%) | 12.4GFLOPs | 6.1M |
Bold values signifies best data in this column.
Comparison experiments
In order to validate the performance of the YOLORG algorithm, we selected state-of-the-art object detectors in the current field of object detection , controlling their computation and parameters, training them in the same experimental environment. The experimental results show that after training on the joint VOC dataset, YOLORG has the highest detection accuracy for the results obtained on the test set, it proves that our proposed YOLORG algorithm does have excellent detection capability. The experimental results are shown in Table 3.
Model . | . | . | FLOPs(inference) . | Pararms . |
---|---|---|---|---|
YOLOv5-s 7.0 (Jocher et al. 2021) | 79.8% | 57.2% | 15.9GFLOPs | 7M |
YOLOv6-n 0.2.0 (Li et al. 2022b) | 82.2% | 60.4% | 11.08GFLOPs | 4.3M |
YOLOv7-tiny (Wang et al. 2023b) | 79.2% | 55.3% | 13.2GFLOPs | 6M |
YOLORG-n | 82.6% | 63.9% | 12.4GFLOPs | 6.1M |
Model . | . | . | FLOPs(inference) . | Pararms . |
---|---|---|---|---|
YOLOv5-s 7.0 (Jocher et al. 2021) | 79.8% | 57.2% | 15.9GFLOPs | 7M |
YOLOv6-n 0.2.0 (Li et al. 2022b) | 82.2% | 60.4% | 11.08GFLOPs | 4.3M |
YOLOv7-tiny (Wang et al. 2023b) | 79.2% | 55.3% | 13.2GFLOPs | 6M |
YOLORG-n | 82.6% | 63.9% | 12.4GFLOPs | 6.1M |
Bold values signifies best data in this column.
Detection of fish in the fish passage
Table 4 shows the results of comparing the YOLORG algorithm with other related fish detection algorithms. The RetinaNet-ResNet101-FPN model was used in fish detection by Kay & Merrifield (2021). The experimental results of the model after training on the COCO2017 dataset are shown in the article only for , and don’t show the computational and parametric counts of the model. To be fair, YOLORG did not choose a very large model, but simply chose the YOLORG-s model, which gives the YOLORG-s algorithm a 4.7% improvement on compared to RetinaNet-ResNet101-FPN. Xu & Matzner (2018) used the YOLOv3 model to train on a homebrew dataset, unfortunately, we did not find his homebrew dataset, but we resized the Ultralytics YOLOv3 algorithm to the same size as YOLORG-n and trained it on the joint VOC dataset. YOLORG-s show 1.4 and 2.9% improvement on and , respectively. Muksit et al. (2022) used two improved YOLOv3 algorithms to train on the DeepFish dataset, and we also used the YOLORG-n algorithm to train on the DeepFish dataset. Although we didn’t find computational and data on YOLO-Fish-1 and YOLO-Fish-2 in Muksit et al. (2022), our parameter count is nearly 10 times less than these two algorithms, and improves the accuracy by nearly 2%. According to the experimental results, YOLORG is superior to the algorithms used in other related fish research articles.
Model . | DataSets . | . | . | FLOPs(inference) . | Pararms . |
---|---|---|---|---|---|
YOLOv3-n ultralytics | VOC2007+2012 | 81.2 | 61% | 12.8GFLOPs | 4.2M |
YOLORG-n | VOC2007+2012 | 82.6%(+1.4%) | 63.9%(+2.9%) | 12.4GFLOPs | 6.1M |
RetinaNet-ResNet101-FPN (Kay & Merrifield 2021) | COCO2017 | – | 40.4% | – | – |
YOLORG-s | COCO2017 | 61.9% | 45.1%(+4.7%) | 33.6GFLOPs | 14.2M |
YOLOv3 (Muksit et al. 2022) | DeepFish | 96.01% | – | – | 61.58M |
YOLO-Fish-1 (Muksit et al. 2022) | DeepFish | 96.15% | – | – | 61.6M |
YOLO-Fish-2 (Muksit et al. 2022) | DeepFish | 95.74% | – | – | 62.61M |
YOLORG-n | DeepFish | 98% | 67% | 12.4GFLOPs | 6.1M |
Model . | DataSets . | . | . | FLOPs(inference) . | Pararms . |
---|---|---|---|---|---|
YOLOv3-n ultralytics | VOC2007+2012 | 81.2 | 61% | 12.8GFLOPs | 4.2M |
YOLORG-n | VOC2007+2012 | 82.6%(+1.4%) | 63.9%(+2.9%) | 12.4GFLOPs | 6.1M |
RetinaNet-ResNet101-FPN (Kay & Merrifield 2021) | COCO2017 | – | 40.4% | – | – |
YOLORG-s | COCO2017 | 61.9% | 45.1%(+4.7%) | 33.6GFLOPs | 14.2M |
YOLOv3 (Muksit et al. 2022) | DeepFish | 96.01% | – | – | 61.58M |
YOLO-Fish-1 (Muksit et al. 2022) | DeepFish | 96.15% | – | – | 61.6M |
YOLO-Fish-2 (Muksit et al. 2022) | DeepFish | 95.74% | – | – | 62.61M |
YOLORG-n | DeepFish | 98% | 67% | 12.4GFLOPs | 6.1M |
Bold values signifies best data in this column.
Model . | . | . | FLOPs(inference) . | Pararms . |
---|---|---|---|---|
YOLOv5-n 7.0 (Jocher et al. 2021) | 90.7% | 64% | 4.2GFLOPs | 1.7M |
YOLOv6-n 0.2.0 (Li et al. 2022b) | 90.4% | 67% | 11.08GFLOPs | 4.3M |
YOLOv7-tiny (Wang et al. 2023b) | 90.8% | 63.8% | 13.2GFLOPs | 6M |
YOLOv8-n | 91.1% | 68.6% | 8.1GFLOPs | 3M |
YOLORG-n | 91.3% | 69.4% | 12.4GFLOPs | 6.1M |
Model . | . | . | FLOPs(inference) . | Pararms . |
---|---|---|---|---|
YOLOv5-n 7.0 (Jocher et al. 2021) | 90.7% | 64% | 4.2GFLOPs | 1.7M |
YOLOv6-n 0.2.0 (Li et al. 2022b) | 90.4% | 67% | 11.08GFLOPs | 4.3M |
YOLOv7-tiny (Wang et al. 2023b) | 90.8% | 63.8% | 13.2GFLOPs | 6M |
YOLOv8-n | 91.1% | 68.6% | 8.1GFLOPs | 3M |
YOLORG-n | 91.3% | 69.4% | 12.4GFLOPs | 6.1M |
Bold values signifies best data in this column.
CONCLUSION
In this paper, in order to improve the accuracy of fish detection and increase the amount of data in the field of fish detection, we propose an FFDD and a fish detection series model YOLORG. Through multiple experiments, it is proved that our proposed YOLORG algorithm achieved advanced results.
The FFDD is well organized, can be easily downloaded, and trained to help fish projects in need. The C2D Structural Re-parameterization module can be easily embedded into any existing fish detection model without changing original network structure, improving the detection performance without bringing additional parameters. The C2f Multi-scale feature fusion network solving the problem of information loss in fish detection models provides an idea for improving the accuracy of fish detection algorithms. The YOLORG series models are available in several versions, which can achieve a speed-accuracy balance according to the needs of the actual project. Also, the YOLORG-n model parameters are so small that they can directly run in embedded devices.
The FFDD and the YOLORG model can provide data and ideas for fish detection research, and can also provide some help for actual projects. In the future, we will further expand the number of FFDDs and propose more effective fish detection methods.
ACKNOWLEDGEMENTS
This work was funded in part by Scientific Research Fund of Hunan Provincial Education Department of China under Grant 22A0200, in part by Scientific and Technological Innovation Project of Quanmutang Reservoir under Grant 2022430119001440.
DATA AVAILABILITY STATEMENT
All relevant data are available from https://github.com/wannabetter/YOLORG.
CONFLICT OF INTEREST
The authors declare there is no conflict.