I'm training a transformer model with OpenNMT-py on MIDI music files, but results are poor because I only have access to a small dataset pertaining to the style I want to study. To help the model learn something useful, I would like to use a much larger dataset of other styles of music for a pre-training and then fine-tune the results using the small dataset.
I was thinking of freezing the encoder side of the transformer after the pre-training and letting the decoder part free to do the fine-tuning. How would one do this with OpenNMT-py?