1

I know the more data, the better it's but what would be a reasonable amount of data required to train SyntaxNet?

user2161903
  • 577
  • 1
  • 6
  • 22

1 Answers1

3

Based on some trial and error, I have arrived at the following minimums:

  • Train corpus - 18,000 tokens (anything less than that and step 2 - Preprocessing with the Tagger- fails)
  • Test corpus - 2,000 tokens (anything less than that and step 2 - Preprocessing with the Tagger - fails)
  • Dev corpus - 2,000 tokens

    But please note that with this, I've only managed to get the steps in the NLP pipeline to run, I actually haven't managed to get anything usable out of it.

  • bulbul
    • 80
    • 6