0

Some weeks ago I had tika-python working without any issue in Windows 10. Today I had to re-create my virtualenv and upgraded tika to version 1.19 but when I tried to use it as usual and I got 502 and 504 errors all the time.

I tried to use it in ubuntu18.04 and with previous tika versions and nothing changed.

Can anyone help? (I'm not a native english speaker so sorry if my english is not very good)

parsed_data = parser.from_buffer(buffer)

2019-05-25 20:40:42,446 [MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.15/tika-server-1.15.jar.md5 to /tmp/tika-server.jar.md5.
Traceback (most recent call last):
    File "/home/ohm/Documentos/TFG/venv/lib/python3.7/site-packages/tika/tika.py", line 651, in getRemoteJar
        urlretrieve(urlOrPath, destPath)
    File "/usr/lib/python3.7/urllib/request.py", line 247, in urlretrieve
        with contextlib.closing(urlopen(url, data)) as fp:
    File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
        return opener.open(url, data, timeout)
    File "/usr/lib/python3.7/urllib/request.py", line 531, in open
        response = meth(req, response)
    File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
        'http', request, response, code, msg, hdrs)
    File "/usr/lib/python3.7/urllib/request.py", line 569, in error
        return self._call_chain(*args)
    File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
        result = func(*args)
    File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 504: Gateway Time-out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/home/ohm/Documentos/TFG/venv/lib/python3.7/site-packages/tika/parser.py", line 51, in from_buffer
        {'Accept': 'application/json'}, False)
    File "/home/ohm/Documentos/TFG/venv/lib/python3.7/site-packages/tika/tika.py", line 506, in callServer
    serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath)
    File "/home/ohm/Documentos/TFG/venv/lib/python3.7/site-packages/tika/tika.py", line 557, in checkTikaServer
        if not checkJarSig(tikaServerJar, jarPath):
    File "/home/ohm/Documentos/TFG/venv/lib/python3.7/site-packages/tika/tika.py", line 572, in checkJarSig
        getRemoteJar(tikaServerJar + ".md5", jarPath + ".md5")
    File "/home/ohm/Documentos/TFG/venv/lib/python3.7/site-packages/tika/tika.py", line 661, in getRemoteJar
        urlretrieve(urlOrPath, destPath) 
    File "/usr/lib/python3.7/urllib/request.py", line 247, in urlretrieve
        with contextlib.closing(urlopen(url, data)) as fp:
    File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
        return opener.open(url, data, timeout)
    File "/usr/lib/python3.7/urllib/request.py", line 531, in open
        response = meth(req, response)
    File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
        'http', request, response, code, msg, hdrs)
    File "/usr/lib/python3.7/urllib/request.py", line 569, in error
        return self._call_chain(*args)
    File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
        result = func(*args)
    File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPErr: HTTP Error 502: Bad Gateway
thebjorn
  • 26,297
  • 11
  • 96
  • 138
  • It seems that the website you are retrieving something from (search.maven.org) is down. You can try by copy-pasting the URL from your log in your browser, you will get the same error. – wasmachien May 25 '19 at 19:13
  • Yes, I got the same response from the browser. By the way, in one of the many tries i made the download started (just once and never again so far), so i guess i'll wait for a while and try again later... Thanks! – Olga Herranz Macías May 25 '19 at 19:41

0 Answers0