Questions tagged [tika-server]
90 questions
2
votes
2 answers
Python-Tika returning "None" content for PDF's, but works with TIFF's
I have a PDF that i'm trying to get Tika to parse. The PDF is not OCR. Tesseract is installed on my machine.
I used ImageMagik to convert the file.tiff to file.pdf, so the tiff file I am parsing is a direct conversion from the PDF.
Tika parses the…

Jonathan Coe
- 1,485
- 4
- 18
- 36
2
votes
1 answer
How to change the language parameter that Tika passes to Tesseract OCR?
currently I'm using tika-app-1.16.jar to OCR my PDFs (when combined with Tesseract):
java -jar tika-app-1.16.jar /tmp/testing/input.pdf
However, by default it only supports English. And I would like to find a way to pass a different language.
As to…

Gugols
- 63
- 1
- 9
2
votes
0 answers
Output for TikaBatch via tika-app-X.Y.jar
I am trying to extract the text for a bunch of documents(.pdf, .doc, etc) present in an "Input" using (in cygwin)
java -jar tika-app-1.14.jar -t -i /Inputfolder -o /Outputfolder
The causeForTermination is "COMPLETED_NORMALLY" but I can't see any…

alapalak
- 147
- 2
- 2
- 9
1
vote
0 answers
Why are the NER NamedEntityParser not appearing in my list of available parsers in Tika (2.8.0)
I am trying to get named entity recognition to work within Tika. I have followed the guidelines that are provided here by David Meikle as well as the guide within the tika-docker examples git repo. I can get tika server deployed and processing files…

David Corkill
- 11
- 2
1
vote
1 answer
Apache Tika Server Password protected pdf file parsing
I am using Tika server 2.5, when trying to parse pdf document which is password protected getting exeption of
EncryptedDocumentException, so is their any way to parse this document or send the password to tika server for parsing?

BChe
- 13
- 4
1
vote
1 answer
How to use Apche Tika Server for NER
I am checking out Tika for an NER task and running the NER example. I can get my file meta data by hitting the documented meta endpoint:
curl -T test.txt http://localhost:9998/meta --header "Accept: application/json" | jq
How do I do NER?

Tom
- 981
- 11
- 24
1
vote
1 answer
How to enable debug logs in Apache Tika 2.4.0
Want to enable debug logs in Apache tika container. Tried following Tika config configuration through tika-config.xml, but do not see any debug logs getting printed.
…

Manjunath D
- 21
- 1
1
vote
1 answer
Apache Tika: Convert Apache Tika server REST endpoints(Jax-Rs) http to https
We use Apache Tika to extract data from files(multiple formats). We call Tika server Rest endpoints internally from the .Net code to do the data extraction process. We are trying to research and see if we can add SSL/TLS support to the Tika server…

Rakesh Gourineni
- 1,361
- 5
- 16
- 30
1
vote
0 answers
Error while reading file using resume_parser
Getting Tika Server Jar file error while reading the file using resume_parser python module. File format is pdf/doc/docx. Its throwing a warning:
2021-05-22 18:12:05,899 [MainThread ] [INFO ] Retrieving…

P Krishnama Naidu
- 45
- 7
1
vote
1 answer
Tika server with python returns None for large file but works file with small pdfs
I have some small and large PDF's that I'm trying to parse in string format using python Tika. I've locally Tika server and the conversion works file with around 200mb file size but now I've 1.3gb pdf. So when I try to convert it…

A l w a y s S u n n y
- 36,497
- 8
- 60
- 103
1
vote
1 answer
Empty parsers tika python
When I run a simple command to tika I get empty parsers.
from tika import parser
url = 'mygroovyurl'
string_parsed = parser.from_buffer('Good evening, Dave', serverEndpoint=url)
string_parsed
I get back
{'metadata': {'Content-Type':…

mlanier
- 167
- 2
- 3
- 14
1
vote
0 answers
Apache TIKA: org.apache.cxf.interceptor.Fault: XML_WRITE_EXC
Apacke Tika 1.24.
Tika runs in server mode as follows:
java -Xmx3G -jar tika-server.jar -spawnChild --host=hostname.domain.com
I'm observing the following error in Tika Server log. What could cause it?
rmeta/text (autodetecting type)
ERROR Problem…

freeAR
- 943
- 3
- 18
- 32
1
vote
1 answer
Apache TIKA: Tried to allocate an array of length 1835606, but 1000000 is the maximum for this record type
Running Apache Ticka 1.24.1, as follows:
java -Xmx3G -Djava.io.tmpdir=/mytmp/tmp -spawnChild -taskPulseMillis 240000 -jar tika-server.jar --host=hostname.domain.com
Can the array length be changed to not get this…

freeAR
- 943
- 3
- 18
- 32
1
vote
0 answers
Apache Tika version upgrade cause ClassCastException
I'm struggling with an issue(Java/Scala web project with Gradle), because I have to upgrade apache tika version to at least version 1.22, from 1.19.1(previous version have security vulnerabilities). But when i try to change version(even to lower…

Maciej Wadowski
- 11
- 3
1
vote
0 answers
Tika Parser is unable to parse Greek Characters
I am trying to parse a .doc file using Apache Tika which contains greek characters like alpha,beta,gamma in it and the result from tika is completely different from what I Expected , I am using the below code for parsing .doc file
FileInputStream…

Akhil
- 391
- 3
- 20