A COMPREHENSIVE EVALUATION OF YOLOv5s AND YOLOv5m FOR DOCUMENT LAYOUT ANALYSIS
Main Article Content
Abstract
Document Layout Analysis (DLA) in images, is highly dynamic within computer vision. Presently, deep learning architectures, particularly YOLOv5s and YOLOv5m, take the forefront in addressing this challenge This paper meticulously examines their performance, both qualitatively and quantitatively, measured by Average Precision (AP) on COCO datasets. Significant improvements are observed through fine-tuning specific datasets, notably books in Arabic and English languages. A comparative evaluation of YOLOv5m and YOLOv5s in the realm of DLA unfolds. Despite YOLOv5s showcasing an impressive Frames Per Second (FPS) of 123, surpassing YOLOv5m by 2 units, the latter proves to be the optimal model for DLA systems. Its comprehensive performance superiority shines through, boasting an mAP of 94.2%, outperforming other models in this study. Noteworthy is YOLOv5m's lower FPS, compensated by its respectable detection speed, rendering it a pragmatic choice for real-world applications where accuracy is paramount.