Supervised find turning adds a extra output layer to the pre-trained model.
Does this extra layer alter the probability of words that are not related to the fine tune data?
Supervised find turning adds a extra output layer to the pre-trained model.
Does this extra layer alter the probability of words that are not related to the fine tune data?