1

I am a first-time Solr user, using v3.5 with Tomcat 7 on a Windows 7 system. I went through the XML example in example-docs with no problems. However, I'm going to need to use extraction with HTML and PDF files, and when I try to Post a PDF file for indexing I'm getting the following:

SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8080/solr/update/extract?literal.id=doc2..
SimplePostTool: POSTing file test.pdf
SimplePostTool: FATAL: Solr returned an error #500 Internal Server Error

The command I used is:

java -Durl=http://localhost:8080/solr/update/extract?literal.id=doc2 -Dtype=application/pdf -jar post.jar test.pdf

My solr home directory is C:\solr, where I have done the following so far:

  • Copied the contents of the solr download package's example/solr folder
  • Copied the solr download package's contrib/extraction/lib folder to C:\solr\lib
  • Copied the solr download package's dist/apache-solr-cell-3.5.0.jar to C:\solr\dist\apache-solr-cell-3.5.0.jar
  • Modified the appropriate "lib" tags in C:\solr\conf\solrconfig.xml to <lib dir="lib" /> and <lib dir="dist/" regex="apache-solr-cell-\d.*\.jar" />

What else do I need to do to make this work for PDF and HTML files? I've read multiple tutorials and "Getting Started" guides but can't seem to understand what's wrong. I'm also a Tomcat beginner and as far as I can tell, none of this is showing up in Tomcat's logs ... so I'm pretty much stuck. Again, I'm not having any problem with the XML example, so Tomcat itself is running fine and recognizes solr (I can see the solr admin page). Any help is appreciated.

user1263226
  • 250
  • 3
  • 12
  • 1
    Try it without the `-Dtype` parameter; if that fails, then try with a `curl` command (please google uploading documents with curl). Also `curl` will return detailed error messages (html body of the `500`). Do update this question with that error message. – Jesvin Jose Apr 12 '12 at 05:42
  • Or you can have a look at your solr log file and post here the relevant error. – javanna Apr 12 '12 at 08:39
  • Where is the solr log? Does it need to be turned on, or does it run by default? – user1263226 Apr 12 '12 at 18:04
  • @aitchnyu, same results without -Dtype – user1263226 Apr 12 '12 at 18:06
  • Do download CURL for Windows and run `curl "http://localhost:8983/solr/update/extract?literal.docid=DOC_ID" -F "file=@MYFILE.DOC" ` from terminal. – Jesvin Jose Apr 13 '12 at 05:10
  • @user1263226 The solr.log goes by default within System out. Have a look at your catalina.out. – javanna Apr 13 '12 at 08:31
  • @aitchnyu thanks for informing me about cURL, it is working for me ... I'd still like to understand why post.jar isn't though ... not even for deletes ... for which I don't get an error, but looking at the Solr admin page the changes aren't reflected (and cURL works for this too) – user1263226 Apr 13 '12 at 20:41
  • @javanna, I'm new to Tomcat, can you explain what you mean by "look at your catalina.out"? – user1263226 Apr 13 '12 at 20:43
  • @user1263226 Well, it's just a file within your `tomcat/logs` directory. Not sure of the suffix `.out` if you're using windows. – javanna Apr 16 '12 at 07:40
  • possible duplicate of [Error while indexing .xml files in solr](http://stackoverflow.com/questions/22552021/error-while-indexing-xml-files-in-solr) – kenorb Mar 29 '15 at 14:21

0 Answers0