I was just watching a video on transfer learning (training a model on a larger similar dataset if your dataset is small). I am confused about how the different dataset labels do not interfere with transfer learning.
I understand that transfer learning is typically used if there is only a small amount of data (let's call this Dataset A
) for your target task (say blurry cat photos) but a large dataset that has similar data (let's call this Dataset B
, the set of professionally taken and not blurry wolf photos) and whose lower-level feature could be used in learning Dataset A
(the intuition being that the same edge and curve detection/other lower level features that helps in detecting wolves from Dataset B
could also help in detecting cats from Dataset A
).
From what I understand, you would first train the neural network on Dataset B
, then set the weights of the last layers to random, and keeping all other parameters constant, retrain on Dataset A
.
But given that the label scheme for Dataset B
would be for wolves, and the labels for Dataset A
on cats, wouldn't the difference in labeling cause a problem?