0

I'm trying to replicate the results of a paper which uses the stanford core NLP, in the documentation they state:

the fully annotated sentences are provided in a file of concatenated
  protocol buffers:

delimitedSentences.proto.bz

This file should be read with the Java function
  `CoreNLPProtos.Sentence.parseDelimitedFrom(<input stream>)`,
  or in other languages taking into consideration that every protocol buffer is
  prepended with the size of the buffer, as a VarInt.
Each proto contains all the annotations for the MIML-RE featurizer, in addition to
  some useful additions (e.g., antecedent for every token).

I've scoured the code for the CoreNLPProtos.Sentence.parseDelimitedFrom(<input stream>) function, but it's nowhere to be found.

I'm not quite so familiar with protos.

What am I supposed to do with this?

Christopher Manning
  • 9,360
  • 34
  • 46
smatthewenglish
  • 2,831
  • 4
  • 36
  • 72

1 Answers1

0

Hopefully these will be in the next release of CoreNLP -- in the meantime, the file is on the public GitHub at: https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/CoreNLPProtos.java

Let me know if you come across other problems using the data! I can fix bugs as they come up, so hopefully the process is smoother for future users.

Gabor Angeli
  • 5,729
  • 1
  • 18
  • 29