I am a first-time Solr user, using v3.5 with Tomcat 7 on a Windows 7 system. I went through the XML example in example-docs with no problems. However, I'm going to need to use extraction with HTML and PDF files, and when I try to Post a PDF file for indexing I'm getting the following:
SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8080/solr/update/extract?literal.id=doc2..
SimplePostTool: POSTing file test.pdf
SimplePostTool: FATAL: Solr returned an error #500 Internal Server Error
The command I used is:
java -Durl=http://localhost:8080/solr/update/extract?literal.id=doc2 -Dtype=application/pdf -jar post.jar test.pdf
My solr home directory is C:\solr, where I have done the following so far:
- Copied the contents of the solr download package's example/solr folder
- Copied the solr download package's contrib/extraction/lib folder to C:\solr\lib
- Copied the solr download package's dist/apache-solr-cell-3.5.0.jar to C:\solr\dist\apache-solr-cell-3.5.0.jar
- Modified the appropriate "lib" tags in C:\solr\conf\solrconfig.xml to
<lib dir="lib" />
and<lib dir="dist/" regex="apache-solr-cell-\d.*\.jar" />
What else do I need to do to make this work for PDF and HTML files? I've read multiple tutorials and "Getting Started" guides but can't seem to understand what's wrong. I'm also a Tomcat beginner and as far as I can tell, none of this is showing up in Tomcat's logs ... so I'm pretty much stuck. Again, I'm not having any problem with the XML example, so Tomcat itself is running fine and recognizes solr (I can see the solr admin page). Any help is appreciated.