Questions tagged [solr-cell]

Solr Content Extraction Library: a SOLR contrib module responsible for converting the raw content of a rich document to something usable by Solr.

The Solr Cell's main component is the ExtractingRequestHandler, which uses Tika to allow users to upload binary files to Solr and have Solr extract text from it and then index it.

71 questions
0
votes
1 answer

How can I add data to dynamic fields when using solr's extract functionality?

I'm using a PHP library called solr-php-client (http://code.google.com/p/solr-php-client/) to interface with my Solr server. I can extract data from the document, store it, and search on it, but I can't seem to get it to allow me to add my own data…
Travis
  • 599
  • 2
  • 6
  • 16
0
votes
0 answers

solr /update/extract 404 Not Found

I'm encountering an issue while trying to upload documents to solr via the endpoint /update/extract. I run solr 8.5.2 and zookeeper 3.5.8 in docker and could index data before via ... solr.add(solr_documents) My Setup: The Filesystem (the django…
0
votes
0 answers

Solr cell avoid metadata in fmap.content

I am using Solr Cell for content extraction from PDFs. I am storing extracted content of the PDFs in field named content. content. And inside this field I also get metadata in addition to the content itself, that I…
0
votes
0 answers

Solr Cell turn off metadata extraction

I am indexing documents with Solr Cell, but I am not interested in metadata at all. Is it possible to turn off metadata extraction by Solr Cell? If yes how can I change request handler settings for this?
0
votes
0 answers

Indexing a PDF document and providing additional JSON data using Solr Cell

I'm using Solr Cell to index PDF documents in my Solr collection. I also have additional metadata in JSON format that I want to associate with each indexed PDF document. Is it possible to index both the PDF document and the JSON data in a single…
0
votes
0 answers

Soalrium PHP Extract query setFile()

I am using Solarium library for Solr and I am getting files from database in blob format and I want to send them to solr cell using extract query. However, the setFile() method in the Extract Query expects a local file path as input, but I want to…
0
votes
0 answers

Indexing PDFs and MS Office documents with Solr while running schemaless mode

After starting Solr in schemaless mode via "solr start -e schemaless", proceeded to index some documents (PDFs and docx). Indexing seems to have succeeded as can be seen below. However, when running queries from within the Solr Admin UI, I only get…
hbha
  • 1
0
votes
1 answer

Does SOLR cell in any way limit the amount of characters imported into a solr.TextField?

I'm indexing with Solr Cell a large HTML page using a curl command with a Windows command prompt like so: curl http://localhost:8987/solr/myexample/update/extract -d @test.html -H 'Content-type:html' I have found that I'm missing data (text) in my…
ImTrying
  • 45
  • 7
0
votes
1 answer

Solr Cell / ExtractingRequestHandler cannot parse some *.doc files

I need to index content of doc/docx/pdf files uploaded by users and use Solr (1.4.1) ExtractingRequestHandler component (817165) for that. If that matters, I don't request indexing from it - the component is always called with extractOnly parameter…
Yuriy
  • 1,964
  • 16
  • 23
0
votes
1 answer

How to configure Tika 0.9 with Solr 3.1

can you give me the Steps to configure Tika 0.9 with Solr 3.1