Questions tagged [tika-server]

90 questions
0
votes
1 answer

How can i change the context path of tika server?

I want to run the tika server docker image in openshift. This works fine out of the box but as soon as i run other services at the same time i need a context path to determine which service should be addressed in the…
0
votes
1 answer

How to use Apache Tika Server 2.5 as API and call this in .net6?

planning to use Apache Tika Server 2.5 in .net6. how can we use that and call from .net component.
BChe
  • 13
  • 4
0
votes
1 answer

Latest Tesseract in Tika

Newest available version of Tesseract is 5.x. but the latest tika is still using 4.x. Is it possible to upgrade version of tesseractOCR in Tika?
0
votes
1 answer

How to configure tika service within k8s cluster

We are using tika to extract text from a lot of documents, for this we need to give tika service custom config file (xml) While in docker you can do it just the same as it appears in tika docker image instructions: docker run -d -p 9998:9998 -v…
NNH
  • 265
  • 3
  • 10
0
votes
0 answers

Start and run apache Tika in docker file

I want to install and run apache tika in a docker container in order to do that I need to specify all of that inside a docker file , how to do that exactly ?
Yoshi
  • 49
  • 1
  • 5
0
votes
0 answers

Apache Tika Server - How to allow it to handle large documents

I am testing Apache Tika Server (v2.4.1) and I see that it fails for large documents with an error of: Error 500 Server Error HTTP ERROR 500 Server Error URI:/rmeta/form/text STATUS:500 MESSAGE:Server Error SERVLET:- CAUSED…
user2173353
  • 4,316
  • 4
  • 47
  • 79
0
votes
2 answers

How to deal with large pdf?

I'm trying to extract text from a large pdf using this code(my file comes from a blob on azure and the pdf takes 7.3mb, it has got 140 pages and they are all images) and it's always reaching the timeout. os.environ['TIKA_SERVER_ENDPOINT'] =…
Tau n Ro
  • 108
  • 8
0
votes
1 answer

Tika server returned 500 status code when processing a pdf file

Code : dd= parser.from_file(r"file_path") Line number 554 in tika .py resp = verbFn(serviceUrl, encodedData, **effectiveRequestOptions) Reason in resp was INKApi Error. I am running tika server on my system.
shobhna
  • 13
  • 6
0
votes
1 answer

Tika server fails to start in airflow(from the fourth simultaneous run) deployed on kubernetes

I wanted to ask if any of you have encountered a similar error. I am working in a company where we are using airflow, deployed on Azure kubernetes. We have a Dag in charge of extracting some information about different documents. Among many of the…
Tau n Ro
  • 108
  • 8
0
votes
1 answer

How to PUT file to Tika-server in NodeJs

The Scenario I am running a VueJs client, a NodeJs Restify API Server, and a Tika-server out of the official Docker Image. A user makes a POST call with formData containing a PDF file to be parsed. The API server receives the POST call and I save…
Dent7777
  • 220
  • 3
  • 16
0
votes
1 answer

How to extract inline images from PDF using Apache Tika Server and save them as files?

Is there a way to do this? I'm using the following headers in my PUT request to http://localhost:9998/tika "Content-Type", "application/pdf" "X-Tika-OCRLanguage", "eng" "X-Tika-PDFextractInlineImages", "true" "X-Tika-PDFOcrStrategy", "no_ocr" Will…
erotavlas
  • 4,274
  • 4
  • 45
  • 104
0
votes
0 answers

TIKA - Compute Content-Encoding of a document

I'm using Tika 1.26 in order to extract metadata of a document. I first gave a try to the Tika Server and then I switched to programmatic API. Nevertheless, even if the documentation states that the Content-Encoding of a document should be returned…
verodigiorgio
  • 353
  • 3
  • 13
0
votes
0 answers

How can I use Tika to parse PDF without having Java on my PC(In Python)

We need to have Java 8 or higher for Tika to work in python. It creates a server.jar in the temp folder. I was thinking if we can add it in a folder where my python file is kept so the user doesn't need to have Java installed
Frosty Boi FN
  • 21
  • 1
  • 3
0
votes
0 answers

Is there a way to make tika-server.jar permanent if I clean "temp" later?

I'm using TIKA parser for extracting text from PDF but it downloads a tika-server.jar into C:\Users\User\AppData\Local\Temp. Is there a way to make this permanent if I clean "temp" later? Can we use TIKA parser in production? Tried other libraries…
Abhay
  • 31
  • 2
0
votes
1 answer

Apache Tika Docker Image on Google Cloudrun (Heapsize Issue)

I am trying to run the Apache Tika server on Google Cloud Run. These are the steps followed: I copied Apache Tika's Docker image from DockerHub onto GCP Artifactory using the command gcrane cp apache/tika:1.24.1-full…