3

I want to use Stanford Parser to create a .conll file for further processing. So far I managed to parse the test sentence with the command:

stanford-parser-full-2013-06-20/lexparser.sh  stanford-parser-full-2013-06-20/data/testsent.txt > output.txt

Instead of a txt file I would like to have a file in .conll. I'm pretty sure it is possible, at it is mentioned in the documentation (see here). Can I somehow modify my command or will I have to write Javacode?

Thanks for help!

Frakcool
  • 10,915
  • 9
  • 50
  • 89
Rattlesnake
  • 143
  • 3
  • 12

3 Answers3

9

If you're looking for dependencies printed out in CoNLL X (CoNLL 2006) format, try this from the command line:

java -mx150m -cp "stanford-parser-full-2013-06-20/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz stanford-parser-full-2013-06-20/data/testsent.txt >testsent.tree

java -mx150m -cp "stanford-parser-full-2013-06-20/*:" edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile testsent.tree -conllx

Here's the output for the first test sentence:

1       Scores        _       NNS     NNS     _       4       nsubj        _       _
2       of            _       IN      IN      _       0       erased       _       _
3       properties    _       NNS     NNS     _       1       prep_of      _       _
4       are           _       VBP     VBP     _       0       root         _       _
5       under         _       IN      IN      _       0       erased       _       _
6       extreme       _       JJ      JJ      _       8       amod         _       _
7       fire          _       NN      NN      _       8       nn           _       _
8       threat        _       NN      NN      _       4       prep_under   _       _
9       as            _       IN      IN      _      13       mark         _       _
10      a             _       DT      DT      _      12       det          _       _
11      huge          _       JJ      JJ      _      12       amod         _       _
12      blaze         _       NN      NN      _      15       xsubj        _       _
13      continues     _       VBZ     VBZ     _       4       advcl        _       _
14      to            _       TO      TO      _      15       aux          _       _
15      advance       _       VB      VB      _      13       xcomp        _       _
16      through       _       IN      IN      _       0       erased       _       _
17      Sydney        _       NNP     NNP     _      20       poss         _       _
18      's            _       POS     POS     _       0       erased       _       _
19      north-western _       JJ      JJ      _      20       amod         _       _
20      suburbs       _       NNS     NNS     _      15       prep_through _       _
21      .             _       .       .       _       4       punct        _       _
Keith Flower
  • 4,032
  • 25
  • 16
4

I'm not sure you can do this through command line, but this is a java version:

for (List<HasWord> sentence : new DocumentPreprocessor(new StringReader(filename))) {
        Tree parse = lp.apply(sentence);

        GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
        GrammaticalStructure.printDependencies(gs, gs.typedDependencies(), parse, true, false);
}
Dana
  • 2,619
  • 5
  • 31
  • 45
0

There is a conll2007 output, see the TreePrint documentation for all options.

Here is an example using the 3.8 version of the Stanford parser. It assumes an input file of one sentence per line, output in Stanford Dependencies (not Universal Dependencies), no propagation/collapsing, keep punctuation, and output in conll2007:

java -Xmx4g -cp "stanford-corenlp-full-2017-06-09/*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -sentences newline -outputFormat conll2007 -originalDependencies -outputFormatOptions "basicDependencies,includePunctuationDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz input.txt
Andrew Olney
  • 691
  • 4
  • 12