5

I have tried, Parsey McParseface, the pre-trained POS tagger that comes with Syntax Net and it does a good job at tagging sentences that have proper capitalization.

I would like to tag sentences that are all lower case, like: i grew up in toronto and then parse it to identify named entities such as cities, in this case, toronto.

I have a couple of questions:

  • Is there a pre-trained case insensitive POS tagger for SyntaxNet that I can use?
  • How should I go about training my own case insensitive POS tagger for SyntaxNet?
  • Does training the SyntaxNet POS tagger require substantial amount of CPU/GPU power or it can be done on regular servers I could rent on Amazon or similar services?
  • Is the data-set that google used to train Parsey McParseface available for public use?
  • Can see on their Github page. "The included English parser, Parsey McParseface, was trained on the the standard corpora of the Penn Treebank and OntoNotes, as well as the English Web Treebank, but these are unfortunately not freely available.", You can look at universal dependencies for more sources. I trained my parser for French using those and it works well. I found this :https://github.com/dsindex/syntaxnet very helpful also.For training time, it always depends on the machine you are using. TensorFlow with GPU support will be fast than traditionnal CPU only machine. – ElCapitaine Aug 04 '16 at 03:30
  • I trained mine without GPU on a 4-Core machine. It took a couple a Weekend to train my model and results we're good. Also depends on your training size. – ElCapitaine Aug 04 '16 at 03:32

0 Answers0