-1

I am doing transfer learning with google audioset embeddings. According to the documentation,

the embedding layer does not include a final non-linear activation, so the embedding value is pre-activation

I want to train and test a new model on top of these embedding layer with the embedding data. I have planned to do the following

  1. Create new dense layers.
  2. Convert the embeddings from byte string to tensor. Split these embeddings to train, test and split dataset.
  3. Input these tensors to the new model.
  4. Validate and test the model using validate dataset and test dataset.

I have two confusions with this implementation

  • Is using the embeddings as input of the new layers enough for the transfer learning? I have seen in some Transfer Learning implementation that they load pre-trained weights to the new model and freeze the layers involving those weights. But in those implementation, they use new data for training, not the embeddings from the pre-trained model. I am confused how that works.
  • Is it okay to split the embeddings to train, test and validate dataset? I am not sure if all the embeddings were used for training the pre-trained model. If they all were used, then does it make sense to use part of them as validation and test dataset?
Sabid Habib
  • 419
  • 1
  • 4
  • 16
  • 3
    I’m voting to close this question because it is not about programming as defined in the [help] but about DL theory and/or methodology - please see the intro and NOTE in https://stackoverflow.com/tags/deep-learning/info – desertnaut Mar 28 '22 at 10:20

1 Answers1

1

Is using the embeddings as input of the new layers enough for the transfer learning?

This should work as expected. Of course, you should consider that your generalization capability might be lower than expected for unseen data points (when comparing with data points seen during training of the pre-trained model). Usually, when using a pre-trained model, every data point is unseen for the original network, but in your case some of the data points might have been used for training, so their performance might be "unrealistically too high" when compared with data that your pre-trained model has never seen.

Is it okay to split the embeddings to train, test and validate dataset?

This is a good approach to solve the problem from the previous point. If you don't know which data points were used for training, you could benefit from using cross-validation and create multiple splits to reduce the impact of this issue.

aaossa
  • 3,763
  • 2
  • 21
  • 34