Highest Voted 'solr-cell' Questions

1

vote

3 answers

Solr ExtractingRequestHandler giving empty content field

I'm using Solr 6.2.1 and ExtractingRequestHandler (already included in Solr 6.2.1) to index pdf and word documents. All documents (pdf and word) are indexed with metadata (title, date, cp_revision, compagny, ...) but the content field is always…

asked Oct 20 '16 at 14:38

Marine Msg

23
2

1

vote

1 answer

SOLR Tika: add text of file to existing record (ExtractingRequestHandler)

I am indexing posts in SOLR with "name", "title", and "description" fields. I'd like to later be able to add a file (like a Word doc or a PDF) using Tika / the ExtractingRequestHandler. I know I can add documents like so: (or through other…

solr full-text-search apache-tika solr-cell

asked Jul 27 '10 at 20:12

Matt Hampel

5,088
12
52
78

1

vote

1 answer

Ways to send binary/structured documents to SOLR?

I am using SOLR's ExtractingRequestHandler to ingest the text of documents. The examples in the documentation all use curl to stream documents, like so: curl 'http://.../extract?literal.id=doc1&commit=true' -F "myfile=@tutorial.html" That works…

search solr full-text-search apache-tika solr-cell

asked Jul 27 '10 at 16:31

Matt Hampel

5,088
12
52
78

1

vote

0 answers

Solr: Perform stemming on a field and get the sorted list of stemmed words which were most frequent

Is there a way that I can use stemming on a field at index time and then retrieve a sorted list of stemmed words by frequency of their original occurrence at query time. For example assume my 'text' field has contents of a document and contains only…

solr4 stemming word-frequency solr-cell

asked Nov 17 '14 at 19:30

shaffooo

1,478
23
28

1

vote

0 answers

How to remove a lot of "\n" in text extracted from a Word file using Solr?

when i index a .docx document , with Apache Solr 4.9 (solr cell); it extracts the text with a lot of "\n", is there some way to either clean the field content or remove the "\n"? field content looks like: "content": [ " \n \n \n \n \n \n …

java solr indexing solrj solr-cell

asked Aug 26 '14 at 02:24

kinopio

21
7

1

vote

5 answers

Can we search for .txt files in Solr search engine?

I am using solr search engine for my project purpose in document retrival. My dataset is in .txt file format. But solr gives options for json,xml,pdf and some other file formats only. There is no option for text files. Do I need some modifications…

solr solr-cell

asked Apr 04 '14 at 16:14

Madhusudan

435
2
9
26

1

vote

1 answer

Error while indexing .xml files in solr

I am trying to index xml files in solr search engine using following command: java -Durl=http://10.1.11.143:8080/solr/#/ -jar post.jar solr.xml But I am getting following error: SimplePostTool version 1.5 Posting files to base url…

solr solr-cell

asked Mar 21 '14 at 06:45

Madhusudan

435
2
9
26

1

vote

1 answer

Setting maximum string length in ExtractingRequestHandler ("Solr Cell") .. setMaxStringLength()

I'm using Solr and ExtractingRequestHandler to index documents but I do not know how to do the equivalent of Tika setMaxStringLength(). It appears to be indexing all of the smaller documents but not all of the text of a large document, which might…

solr solr-cell

asked May 23 '13 at 21:29

mlevy

87
6

1

vote

1 answer

Solr: Excluding certain HTML tags or only including certain tags within indexes

I'm currently using Solr-Cell to grab the contents of several html pages and index them. The issue is that I have a menu in the header which is shown on all the pages. This Menu and all its items are appearing within the search results. I don't want…

apache solr solr-cell

asked Mar 04 '13 at 22:44

mangesh

100
5

1

vote

0 answers

how to get date strings from content of pdf with apache solr

Hi all i am new to apache solr. i have a pdf which is containing date informations like - bla bla bla 2012-11-23 11:11:12 bla bla ...- i want to get all dates from content. i read some documentation…

apache solr solr-cell

asked Nov 23 '12 at 09:25

user1847011

11
1

1

vote

1 answer

how to make a association by using lucene/solr import record from database and doc file at same time

i store binary documents information (file meta) in database, and store binary documents in filesystem. using file name associate with file information in database. now i want to import all those data (file meta and full-text content in binary…

solr lucene solr-cell

asked May 03 '12 at 12:41

EeE

665
5
12
27

1

vote

0 answers

#500 Internal Server Error when trying to add PDF to Solr index with extraction

I am a first-time Solr user, using v3.5 with Tomcat 7 on a Windows 7 system. I went through the XML example in example-docs with no problems. However, I'm going to need to use extraction with HTML and PDF files, and when I try to Post a PDF file…

solr solr-cell

asked Apr 12 '12 at 04:02

user1263226

250
3
12

0

votes

1 answer

Apache Solr - indexing PDF files

Hi I have tried doing this with the binary distribution as well as compiled the source code my self. Tried running this with Apache Tomcat as well. But I am always getting the following error when I use a pdf file for indexing purposes. I am using…

solr lucene solr-cell

asked Mar 29 '12 at 21:46

SarfarazSoomro

413
4
8

0

votes

3 answers

NoClassDefFoundError MimeTypeException with PDF extraction

I am getting an exception trying to use update/extract with PDF files My Set up is:- Ubuntu Server 11.10 Tomcat 6 Solr 3.5.0.2011.11.22.15.54.38 I can browse to solr/admin OK I have put all the contrib/extract and apache-solr-cell3.5.0.jar libraries…

solr apache-tika solr-cell

asked Dec 09 '11 at 11:39

paulusm

786
6
19

0

votes

1 answer

Solr ExtractingRequestHandler pdf text extraction

I've a problem with the pdf text extraction of Solr. Solr uses Apache Tika for extracting the text of a PDF file and tika uses PDFBox for that. When I send my PDF file to Solr it extracts the text successfully, but the text is totally messed…

solr pdfbox apache-tika solr-cell

asked Nov 07 '11 at 20:28

itsme

852
1
10
23

Questions tagged [solr-cell]