Questions tagged [tika-server]

90 questions
0
votes
0 answers

Can't write/read a string text extracted from a PDF

I have extracted the whole text from a PDF and saved in a variable "CCR". I can print and it shows me the text fine. But when i try to read its lines or save in a txt file, it just show me/save blank/nothing. Any ideas? Example when i print my…
0
votes
0 answers

Getting 504 error when trying to parse some text with tika in python

Some weeks ago I had tika-python working without any issue in Windows 10. Today I had to re-create my virtualenv and upgraded tika to version 1.19 but when I tried to use it as usual and I got 502 and 504 errors all the time. I tried to use it in…
0
votes
0 answers

How to read individual slide from ppt using tika package in python?

I want to compare data in two pptx file and show the differences if any using python. I have tried with below code, but it is giving all content in single file. No way to segregate data based on slides. I am able to read all content of pptx using…
0
votes
0 answers

Passing data in chunks to Apache Tika for Parsing

Is there a way, to configure Apache Tika, to parse data in chunks ? Let's say Data is divided in 10 chunks. Can it parse each chunk as it receives it ? Or it can only parse when it gets all 10 chunks ? public OutputStream parse(InputStream instream)…
user10543142
0
votes
0 answers

Apache Tika keeps dying

I am using openEdgar to parse SEC filings data and it uses Apache Tika to parse HTML, XML and LBRL content. I am running this on a box with 4G of memory and it keeps dying on me. I ended up starting it this way: java…
abolotnov
  • 4,282
  • 9
  • 56
  • 88
0
votes
1 answer

Apache Tika on python extracts text from pdf on MacBook Pro but not Windows server

As above, I am extracting text from multiple documents using tika in python, but on one particular pdf, it is extracting the text on my development machine (MacBook Pro) but not on Windows Server 2012, where it returns a 'NoneType'. Very confusing,…
Hairy
  • 101
  • 7
0
votes
1 answer

Getting 422 response from apache Tika with python 2

can some one please help me to solve the mentioned error? I uninstalled the tika and reinstalled it but getting error. I don't have idea about how to solve this error.
0
votes
0 answers

422 Tika server response? Tika-Python

I have been trying to get Apache-Tika to work with this python package: https://github.com/chrismattmann/tika-python I have the following code in my python program: #!/usr/bin/env python import tika tika.initVM() from tika import parser parsed =…
Ryan Fasching
  • 449
  • 2
  • 11
  • 21
0
votes
0 answers

Define a MIME type for .TXT files for Tika

I want to define the MIME type of *.txt files: text/txt, so that Tika can apply a more specific parser than the one used for text/plain files. The glob *.txt is included in the definition of the type text/plain in tika-mimetypes.xml. Moreover, it…
mbl
  • 101
  • 9
0
votes
0 answers

Parse a selection of types with Tika

I want Tika to parse only zip files and pdf files. With the following tika_config.xml:
mbl
  • 101
  • 9
0
votes
0 answers

Apache Tika Configuration to include space between div's

I need to know the way to configure Apache Tika. Right now we are using it to parse our html files and then do a search based on the parsed data obtained from Apache Tika parser. Issue : Apache tika actually merging the data available from…
Girish kumar
  • 735
  • 6
  • 8
0
votes
0 answers

Get all metadata of a file apache tika TikaJAXRS

Hi i deployed https://wiki.apache.org/tika/TikaJAXRS to a server and when i upload a file and call /meta i get the response below for a docx file u'{"Content-Encoding":"UTF-16LE","Content-Type":"application/json; …
Rob Smith
  • 137
  • 1
  • 9
0
votes
0 answers

CURL in PHP to call a Tika server with a remote file

I've been stuck with this for quite a while now. I want to parse a PDF to Text using Tika hosted on an external server dedicated for this. It should work with any remote pdf url and any Tika server (currently using this free test some amazing guy…
Alfredo Gago
  • 81
  • 1
  • 7
0
votes
2 answers

Apache Tika: docx files parsing via Rest in java

I'm using Appache Tika in server mode. I need to develop java rest client for parsing files. For pdf file upload i'm using code: fileBody = new FileBody(file, "application/pdf"); multiPartEntity.addPart("uploaded_file",…
Vakhtang
  • 431
  • 2
  • 9
0
votes
0 answers

Tika app failed with causeForTermination='MAIN_LOOP_EXCEPTION_NO_RESTART'

I am using tika-app-1.14.jar to convert my pdf and image files to text using command-line. java -jar tika-app-1.14.jar -t -i /inputFolder -o /OutputFolder It runs well but when I run the same script from the automation tool it fails, saying…
Amu
  • 161
  • 3
  • 12
1 2 3 4 5
6