I am relatively new the subject and have been doing loads of reading. What I am particularly confused about is how a CNN learns its filters for a particular labeled feature in a training data set.
Is the cost calculated by which outputs should or shouldn't be active on a pixel by pixel basis? And if that is the case, how does mapping the activations to the labeled data work after having down sampled?
I apologize for any poor assumptions or general misunderstandings. Again, I am new to this field and would appreciate all feedback.