0

I'm trying to implement deduplication in Solr by updating solrconfig.xml and schema.xml according to this link: https://lucene.apache.org/solr/guide/7_6/de-duplication.html

The deduplication generates file signatures but the signatures are being set to 0000000000000000 (16 zeros). I see this other post asking the same question but no one answered it: Solr Deduplication (dedupe) giving all zeros in signatureField

Notes:

My file setup:

solrconfig.xml:

<updateRequestProcessorChain name="dedupe">
  <processor class="solr.update.processor.SignatureUpdateProcessorFactory">
    <bool name="enabled">true</bool>
    <str name="signatureField">signature</str>
    <bool name="overwriteDupes">true</bool>
    <str name="fields">name,content</str>
    <str name="signatureClass">solr.update.processor.Lookup3Signature</str>
  </processor>
  <processor class="solr.update.LogUpdateProcessorFactory" />
  <processor class="solr.update.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>


<requestHandler name="/update" class="solr.UpdateRequestHandler" >
  <lst name="defaults">
    <str name="update.chain">dedupe</str>
  </lst>
</requestHandler>

schema.xml:

<field name="signature" type="string" stored="true" indexed="true" multiValued="false" />

Any help is appreciated! :)

n4rush0
  • 119
  • 1
  • 2
  • 8
  • 1
    What's the definition for `name` and `content`? Are either copyfields? Have you tried providing the `update.chain` name manually? What is the content when indexing? The default signature is `0` if hashing never occurs in the code, so initial guess is that it never gets any data to update the signature with. – MatsLindh Aug 20 '19 at 06:54
  • Thank you! I was not using any of the actual fields for my files. Such as name, content, cat, fields, etc. It works now after I specified the correct field names. – n4rush0 Aug 20 '19 at 15:14

0 Answers0