Questions tagged [solr-cell]

Solr Content Extraction Library: a SOLR contrib module responsible for converting the raw content of a rich document to something usable by Solr.

The Solr Cell's main component is the ExtractingRequestHandler, which uses Tika to allow users to upload binary files to Solr and have Solr extract text from it and then index it.

71 questions
0
votes
1 answer

Solr; What does this mean?

At the end of the README.txt file which is located in the example directory under solr, I find this line: NOTE: This Solr example server references SolrCell jars outside of the server directory with statements in the solrconfig.xml. If you make…
user188962
0
votes
1 answer

Solrj ContentStreamUpdateRequest fails to save all literal fields unless they are dynamic

I am using the Extracting Request Handler to index html and pdf files. Along with what tika finds I want to add metadata above and beyond content from tika. To do this I use the literal.= support. Unless I use dynamic fields "*_s" the data is not…
whomer
  • 575
  • 9
  • 21
0
votes
1 answer

Result of Solr Search Engine

When I write some query in query box of solr search engine and ask for result then it shows that some number of documents found (numFound), but it shows only ten documents per page. How to see further retrived documents. There is no link like "next…
Madhusudan
  • 435
  • 2
  • 9
  • 26
0
votes
1 answer

Is there a way to integrate spring-data-solr with Tika?

Is there a way, via configuration, to use spring-data-solr with Tika? Otherwise, is there some alternative to solrj’s ContentStreamUpdateRequest+addfile for spring-data-solr? Currently I am using Solrj + Tika in this manner: SolrServer server = new…
Osy
  • 1,613
  • 5
  • 21
  • 35
0
votes
1 answer

Set multivalued fields with ContentStreamUpdateRequest in Solr

I'm using SolrJ+SolrCell to index the contents of various Word/Excel/PDF files, but there are some fields (e.g. id, name) that I want to be able to set myself: ContentStreamUpdateRequest req = new…
benjammin
  • 537
  • 5
  • 23
0
votes
1 answer

solr extractingrequesthandler is not a org.apache.solr.request.SolrRequestHandler

I'm trying to use post.jar to index a folder with pdf files. i have added the requesthandler but i'm getting a error on startup. To be it seems, that it could be a version conflict or a duplicate class load and is therefor not recognized as a…
user2436745
  • 17
  • 1
  • 6
0
votes
1 answer

Solr metadata index

I am new with Solr and I am extracting metadata from binary files through URLs stored in my database. I would like to know what fields are available for indexing from PDFs (the ones that would be initiated as column=””). I would also like to know…
Luis
  • 1
  • 1
  • 1
0
votes
1 answer

Can Solr retain the formatting of the HTML documents whcih was fed to it in its result?

How do I maintain the Original formatting of the HTML document in the results given by Solr? I am trying to provide search functionality in one of my companies website that is having millions of documents and all are not having similar formatting,…
Mantra
  • 316
  • 3
  • 16
0
votes
1 answer

Solr - How to add meta data to indexed binary files that were indexed through Solr Cell?

I'm creating an PHP app that allows the user to search for files using Solr to power the search. This is mainly because the app requires content searching of Word Docs and PDFs. The app also uses a MySql database to keep track of the files. I'm…
jd182
  • 3,180
  • 6
  • 21
  • 30
0
votes
2 answers

Upload a file to solr with my own parameters added

I would like to upload a file (some ms word document) for instance to solr, but I would like to add my own fields to this upload, like the userId of the person who uploaded it or a number of tags. The content of the file must be parsed and…
Ronald
  • 346
  • 1
  • 2
  • 12
0
votes
2 answers

Index every word of a text file which are delimited by whitespace in solr?

I am implementing solr 3.6 in my application.as i have the below data in my text file.. ** date=2011-07-08 time=10:55:06 timezone="IST" device_name="CR1000i" device_id=C010600504-TYGJD3 deployment_mode="Route" log_id=031006209001 log_type="Anti…
Asha Koshti
  • 2,763
  • 4
  • 22
  • 30
1 2 3 4
5