Questions tagged [solr-cell]

Solr Content Extraction Library: a SOLR contrib module responsible for converting the raw content of a rich document to something usable by Solr.

The Solr Cell's main component is the ExtractingRequestHandler, which uses Tika to allow users to upload binary files to Solr and have Solr extract text from it and then index it.

71 questions

votes

1 answer

Solr ExtractingRequestHandler giving empty content for pdf documents

I am using ExtractingRequestHandler in Solr for getting document content and index it. It works fine for all Microsoft Documents, but for PDFs, the content being extracted is empty. I have also tried the extractOnly=true with curl, and that also…

asked Dec 30 '09 at 22:34

aseem

votes

4 answers

Get page numbers of searchresult of a pdf in solr

I'm building a web application where users can search for pdf documents and view them with pdf.js. I would like to display the search results with a short snippet of the paragraph where the search term where found and a link to open the document at…

pdf solr full-text-search apache-tika solr-cell

asked Feb 27 '13 at 15:41

Gesh

votes

0 answers

Getting date metadata using SolrCell

I'm using Solr 3.6 to index many different types of documents. I have several fields that define common information for all the documents, one of them being 'date' (ideally last modified date, just something to indicate how recent a document…

solr metadata apache-tika solr-cell

asked Sep 27 '12 at 20:46

The Doge Prince

votes

1 answer

Solr open document after searching a keyword

I am trying to index some PDF documents and then create a Search UI . This question is somewhat related to Solr Index PDF documents and post them to a remote server 1) Indexing PDF Docs - > I use tika jar to convert PDF to text files and then use…

solr full-text-search apache-tika solr-cell

asked Jul 25 '11 at 18:54

Balaji.N.S

votes

1 answer

How to index pdf's content with SolrJ?

I'm trying to index a few pdf documents using SolrJ as described at http://wiki.apache.org/solr/ContentStreamUpdateRequestExample, below there's the code: import static org.apache.solr.handler.extraction.ExtractingParams.LITERALS_PREFIX; import…

java solr solr-cell

asked Apr 17 '11 at 13:06

alessmar

4,689
7
43
52

votes

1 answer

Indexing pdf documents

What the best way to index pdf documents? Should I index them by converting pdf documents to txt or there is a better way to index pdf files?

pdf solr full-text-indexing apache-tika solr-cell

asked Sep 17 '10 at 21:34

Ahsan Iqbal

1,422
5
20
39

votes

2 answers

Using Zend Lucene to search Office 2003 or older files

I know there are already objects supporting Office 2007 files, but is there any native Office 2003 or earlier support ?

php zend-framework solr lucene solr-cell

asked Oct 30 '09 at 05:50

Amadeus45

1,228
2
17
28

vote

1 answer

No results when searching indexed PDF with Solr Cell

i've been working with Solr for a while, i recently tried the solr-cell component and i'm indexing some PDFs, however im having the exact same problem presented in this thread. When I search for *:* in the admin console, the PDFs are listed. However…

pdf solr solr-cell

asked Feb 06 '12 at 23:03

jag

vote

0 answers

Solr Get Paragraphs of Documents

I've been working with solr for a couple of days, and I need to split a document into its paragraphs and then search on every one of them. I tried a lot of things, but solr just doesn't want to capture paragraphs correctly; either it captures…

java solr solr-cell

asked Dec 31 '11 at 13:40

user1124347

vote

2 answers

Tika Solr Metadata mapping ignore document title

I have the following config file for solr: