How much data is required to train SyntaxNet?

Question

I know the more data, the better it's but what would be a reasonable amount of data required to train SyntaxNet?

bulbul · Accepted Answer · 2017-01-29T18:44:51.210

Based on some trial and error, I have arrived at the following minimums:

Train corpus - 18,000 tokens (anything less than that and step 2 - Preprocessing with the Tagger- fails)

Test corpus - 2,000 tokens (anything less than that and step 2 - Preprocessing with the Tagger - fails)

Dev corpus - 2,000 tokens

But please note that with this, I've only managed to get the steps in the NLP pipeline to run, I actually haven't managed to get anything usable out of it.

How much data is required to train SyntaxNet?

1 Answers1