1

I am trying to run the following lines of code:

import os
os.environ['JAVAHOME'] = 'path/to/java.exe'
os.environ['STANFORD_PARSER'] = 'path/to/stanford-parser.jar'
os.environ['STANFORD_MODELS'] = 'path/to/stanford-parser-3.8.0-models.jar'

from nltk.parse.stanford import StanfordDependencyParser
dep_parser = StanfordDependencyParser(model_path="path/to/englishPCFG.ser.gz")
sentence = "sample sentence ..."

# Dependency Parsing:
print("Dependency Parsing:")
print([parse.tree() for parse in dep_parser.raw_parse(sentence)])

and at the line:

print([parse.tree() for parse in dep_parser.raw_parse(sentence)])

I get the following issues:

Traceback (most recent call last): File "C:/Users/Norbert/PycharmProjects/untitled/StanfordDependencyParser.py", line 21, in print([parse.tree() for parse in dep_parser.raw_parse(sentence)]) File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\parse\stanford.py", line 134, in raw_parse return next(self.raw_parse_sents([sentence], verbose)) File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\parse\stanford.py", line 152, in raw_parse_sents return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose)) File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\parse\stanford.py", line 218, in _execute stdout=PIPE, stderr=PIPE) File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\internals.py", line 135, in java print(_decode_stdoutdata(stderr)) File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\internals.py", line 737, in _decode_stdoutdata return stdoutdata.decode(encoding) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xac in position 3097: invalid start byte

Any idea what could be wrong ? I am not even dealing with any non-utf-8 text.

Uther Pendragon
  • 302
  • 2
  • 14
  • Is "sample sentence ..." the sentence under which you are seeing the error? – gimg1 Jul 27 '17 at 19:55
  • @gimg1 no, I just put that as a placeholder. I tried about 5 different sentences containing just normal a-zA-Z letters and gives me the same error – Uther Pendragon Jul 27 '17 at 21:09
  • Can you try encoding the string to utf-8 just to be sure there is no character in their causing the error? `sentence.encode('utf-8').strip()` – gimg1 Jul 28 '17 at 23:30

1 Answers1

1

I can print a few things by doing this, maybe is not what you wanted but is a start.

print("Dependency Parsing:")
result = dependency_parser.raw_parse(sentence)
#print (next(result))
dep = next(result)
print (list(dep.triples()))

Uncomment the line -> print(next(result)) if you want to see the entire output.

Joe9008
  • 645
  • 7
  • 14