1

When I train my spacy model as follows

spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy

the model gets trained on train.spacy data file, and scored on dev.spacy. Then output_updated/model-best is the model with the highest score.

Is this best model finally trained on a combination of both train and dev data? I understand, it makes sense to split those datasets to avoid overfitting, but given little training data, I would like the final model to be trained on all data I have at hand.

dzieciou
  • 4,049
  • 8
  • 41
  • 85

1 Answers1

2

No, spaCy does not automatically merge your datasets before training model-best. If you want to do that you would need to manually create a new training data set.

If you have so little data that seems like a good idea, you should probably prioritize getting more data.

polm23
  • 14,456
  • 7
  • 35
  • 59