8

I'm trying to run pycorenlp on a long text and get an CoreNLP request timed out. Your document may be too long error message. How to fix it? Is there any way to increase Stanford CoreNLP's timed out?

I don't want to segment the text into smaller texts.

Here is the code I use:

'''
From https://github.com/smilli/py-corenlp/blob/master/example.py
'''
from pycorenlp import StanfordCoreNLP
import pprint

if __name__ == '__main__':
    nlp = StanfordCoreNLP('http://localhost:9000')
    fp = open("long_text.txt")
    text = fp.read()
    output = nlp.annotate(text, properties={
        'annotators': 'tokenize,ssplit,pos,depparse,parse',
        'outputFormat': 'json'
    })
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(output)

The Stanford Core NLP Server was launched using:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer 9000
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

1 Answers1

12

You can add 'timeout': '50000' (unit is ms) in the properties dictionary:

output = nlp.annotate(text, properties={
    'timeout': '50000',
    'annotators': 'tokenize,ssplit,pos,depparse,parse',
    'outputFormat': 'json'
})

Otherwise, you can launch the Stanford Core NLP Server specifying the timeout:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 50000

(The documentation doesn't mention the timeout parameter, maybe they forgot to add it, it's at least present in stanford-corenlp-full-2015-12-09, a.k.a. 3.6.0., which is the latest public release)

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • 1
    Seems like they do now but it appears that there is no unlimited timeout and they also do not specify a maximum value in the docs. – Stefan Falk Oct 01 '16 at 16:44
  • 1
    @displayname One step at a time :/ – Franck Dernoncourt Oct 01 '16 at 16:46
  • 2
    Yeah, unfortunately [the problem I'm currently having](http://stackoverflow.com/questions/39809061/edu-stanford-nlp-io-runtimeioexception-could-not-connect-to-server) might have something to do with a timeout issue or something like that. – Stefan Falk Oct 01 '16 at 16:48