view file of concatenated protocol buffers: Stanford CoreNLP

Question

I'm trying to replicate the results of a paper which uses the stanford core NLP, in the documentation they state:

the fully annotated sentences are provided in a file of concatenated
  protocol buffers:

delimitedSentences.proto.bz

This file should be read with the Java function
  `CoreNLPProtos.Sentence.parseDelimitedFrom(<input stream>)`,
  or in other languages taking into consideration that every protocol buffer is
  prepended with the size of the buffer, as a VarInt.
Each proto contains all the annotations for the MIML-RE featurizer, in addition to
  some useful additions (e.g., antecedent for every token).

I've scoured the code for the CoreNLPProtos.Sentence.parseDelimitedFrom(<input stream>) function, but it's nowhere to be found.

I'm not quite so familiar with protos.

What am I supposed to do with this?

score 0 · Answer 1 · answered Mar 04 '15 at 19:29

Hopefully these will be in the next release of CoreNLP -- in the meantime, the file is on the public GitHub at: https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/CoreNLPProtos.java

Let me know if you come across other problems using the data! I can fix bugs as they come up, so hopefully the process is smoother for future users.

view file of concatenated protocol buffers: Stanford CoreNLP

1 Answers1