Tsiflitzi Anna

2025 Diploma Thesis Title: FINE TUNING OF YOLOv3 TRAINING FOR

OBJECT DETECTION IN IMAGES RECORDED BY A UAV

Abstract

This thesis focuses on optimizing training of the YOLOv3 (You Only Look Once) model for object detection using UAV (Unmanned Aerial Vehicle)-captured imagery. It showcases the importance of hyperparameter selection in training effectiveness. Specifically, through extensive analysis, we identified important hyperparameters that influence the trained model’s performance. By adjusting these hyperparameters we fine-tuned training of the model to achieve higher precision in detecting objects.

The study utilized annotated UAV datasets that were preprocessed to align with YOLOv3’s requirements. These datasets were publicly available and comprised of the UA Vehicle Detection Dataset, Stanford Dataset and VisDrone2019DET dataset.

Training optimization was approached by classifying hyperparameters into two categories. The first included hyperparameters that were set according to the characteristics of the training dataset and were kept invariant throughout the analysis. These included max batches, number of classes, filters, and steps. The second category contained the hyperparameters we selected to adjust; i.e., image resolution, backbone network, anchor box dimensions, dilated convolution, box loss and data augmentation techniques.

A Full-Factorial experimental design was employed to generate 96 (2^5 x 3) distinct combinations of these key hyperparameters. The training process was executed twice for each combination of the selected hyperparameters, resulting in a total of 192 trained models. During training, validation was performed every 100 iterations. Finally, after training, we conducted the testing process to evaluate model performance.

Each of the 192 experiments produced outputs consisting of the highest mAP achieved during validation and testing. The results of these experiments were analyzed using ANOVA, which revealed that all hyperparameters significantly influence model performance. Among them, the most impactful hyperparameters on are the backbone network, data augmentation and image resolution. Additionally, two significant two-way interactions were observed: a) between the backbone network and data augmentation, and b) between the backbone network and dilated convolution.

The best-performing model achieved mAP values of 60.99% during training/validation and 52.51% during testing. The model that achieved this performance corresponds to the following hyperparameter combination:

Image Resolution: 832x832
Dilated Convolution: No
Box Loss: DIoU
Anchor Dimensions: Default
Backbone: Darknet-53
Data Augmentation: Mosaic

On the other hand, the performance of the worst performing models was very low, indicating that hyperparameter selection and tuning plays an important role and could lead to significant improvements in YOLOv3’s detection performance.

The study demonstrates the important role of tailored training processes, dataset preparation, hyperparameter selection and tuning in enhancing YOLOv3’s effectiveness for object detection.

People:

Alumni

You are here

Tsiflitzi Anna

Publications

People

Lab Reports

You are here

Tsiflitzi Anna

User login

Publications

People

Lab Reports