This thesis focuses on optimizing the training process of YOLOv4-p6 to detect and classify persons, small vehicles, large vehicles, and ships for surveillance applications in warehouses and ports. The optimization process involved fine-tuning the training hyperparameters of YOLOv4-p6 to maximize mean Average Precision (mAP). Several configurations achieved higher mAP results compared to the default settings. Consequently, this thesis examines the impact of each hyperparameter on YOLOv4-p6's training performance.
Specifically, we trained and tested YOLOv4-p6 using publicly available UAV annotated image datasets, including Αerial vehicle, DOTA, VisDrone-DET, Stanford drone, and DAC-SDC. These datasets were modified to contain only the selected classes, and then combined into a single dataset consisting of 76,872 images with 876,388 annotated objects. Subsequently, we divided the combined UAV dataset into training (80% of the combined dataset), validation (10%) and testing (10%) subsets.
The training hyperparameters of the YOLOv4-p6 algorithm were divided in two subsets. The first subset comprised those hyperparameters, the values of which depend on the characteristics of the training set. The values/levels of these hyperparameters were kept invariant throughout the tuning experiments. The second set comprised the hyperparameters to be tuned to optimize the training performance of YOLOv4-p6. Specifically, the second set included five (two-levels) hyperparameters, which led to the generation of thirty-two experiments using the Full-Factorial method (2^5=32). For each of the 32 combinations we repeated the training and testing sessions to support the analysis of the results using Analysis of Variance (ANOVA). This resulted in 64 trained models.
The analysis of the mAP results by ANOVA revealed three statistically significant hyperparameters: image resolution, activation function, and anchor dimensions; furthermore, a three-way interaction has been identified as significant: among Non-Maximum Suppression, data augmentation, and anchor dimensions.
The best trained models (25th and 29th) achieved an average mAP value of 52% in validation and 53.3% in testing. This is in contrast with the lowest performing models, of which the mAP values were 39.8% in validation and 44% in testing. This supports our thesis that careful tuning of the hyperparameters during training may yield to major improvements in model effectiveness.
We also tested the best performing models on a new UAV dataset developed by the DeOPSys lab. They performed exceptionally well, achieving a value of average mAP up to 77.6% and 76.3%, respectively. This independent testing validates the quality of the trained models. More importantly it validates that the proposed hyperparameter tuning method enables effective training of high-performance YOLO models.