How do Darknet pre-trained weights on COCO adapt to different output size (different number of classes)?

Question

I've been working with Darknet for few months now, and there is a mystery I still cannot solve. It is actually about Darknet YOLO .weights file, which seem to adapt themselves regardless the number of classes, on which the number of filters in the final layers depends.

Let's take an example. I want to train a model on pool images to detect waste, using yolov4 weights pre-trained on COCO.
This file contains 162 layers, which is the number of total layers in YOLOv4 original model (CSP-Darknet53+SPP-net/PA-net). Being trained on COCO, it means it is supposed to have 3x(5+80) = 255 filters in each convolutional layer before [yolo] layers, precisely at layers 138 / 149 / 160.

Now, I tried to use these weights on my custom dataset. I changed all the files properly. I only used one class for detection (named "plastic"), and miraculously, everything worked well. The concerned detection layers are the following, which contain 3x(5+1)=18 filters each.

138 conv 18 1 x 1/ 1

149 conv 18 1 x 1/ 1

160 conv 18 1 x 1/ 1

My question: as for these layers, COCO detection required 255 filters, and in my case it required 18 filters but works with COCO pre-trained weights, does it mean Darknet cut these 3 layers in the pre-trained weights, and initialized them randomly using the appropriate number of filters (18)? Or did it use the first 18 (over 255) filters of yolov4.weights file?

How do Darknet pre-trained weights on COCO adapt to different output size (different number of classes)?

0 Answers0