I'm trying to set things up (in my local environment) so I can store PDFs in Solr, but I cannot get it to work. Right now I'm working with the files in the example folder Solr provides.
I did not modify the solrconfig.xml in solr-3.6.0/example/conf because it seems to already be configured as described in Extracting Request Handler. That is, it already contains this:
<lib dir="../../dist/" regex="apache-solr-cell-\d.*\.jar" />
<lib dir="../../contrib/extraction/lib" regex=".*\.jar" />
And this:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
I'm running Solr from the example directory with this command:
java -jar start.jar
And I'm trying to send the pdf to Solr with this command:
java -Durl=http://localhost:8983/solr/update/extract -Dauto -jar /Applications/Solr-3.6.0/example/exampledocs/post.jar /path/to/pdf/mypdf.pdf
If I don't make any changes to /Solr-3.6.0/example/solr/conf/schema.xml I get the message:
FATAL: Solr returned an error #400 [doc=null] missing required field: id
If I change the value of the property "required" in the id element in schema.xml to false I get:
FATAL: Solr returned an error #400 Document is missing mandatory uniqueKey field: id
I would think that if the required property of an element is false in the schema then I could just send files that do not contain that field but apparently that is not the case.
I have also tried adding the parameter -Dparams=literal.id=mypdf1 in the command that send that pdf but that doesn't help either. Any thoughts?