I am a little confused about a few things, and I was wondering if I could get some help.
the necessity of softmax layers: I thought that for classification models the softmax layer converts creates percentage probabilities for each class in the output, which is necessary for classification. But looking at DenseNet and other pre-made architectures, they don't have any softmax layers, they don't even end in a dense layer, so I just wanna know what I'm missing.
Global average pooling, it must have the same number of channels as the output layer, right? If so, why is it that when I add it, in the model summary it says that I have 1024 channels in the GAP layer, and only 5 in the ending Dense layer?
I know this is kinda long, but I would really appreciate some help :)