I am setting up a solr search based de-duplication system that would return search results matching the search criteria. I have used dataimport handler to pull data from database and create indexed documents on the Solr server.
My solr schema is as below:
<field name="customer_id" type="int" indexed="true" stored="true" required="true" />
<field name="fname" type="phonetic" indexed="true" stored="true" />
<field name="lname" type="phonetic" indexed="true" stored="true"/>
<field name="address" type="text_en" indexed="true" stored="true" />
<field name="city" type="string" indexed="true" stored="true" />
<field name="state" type="string" indexed="true" stored="true" />
<field name="zipcode" type="string" indexed="true" stored="true" />
<field name="telephone" type="string" indexed="true" stored="true" />
As seen above, I have specified the type of first name (fname) and last name (lname) fields as phonetic for phonetic search using DoubleMetaphoneFilterFactory. The description of phonetic field type is as below:
<fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
<filter class="solr.DoubleMetaphoneFilterFactory" inject="true"/>
</analyzer>
</fieldtype>
I want my searches to return the documents that match all the specified query fields and not just either of the search fields.
My problem is that if I search for either fname, lname or address alone then the results are quite relevant but when I use filter query along with primary search query then the results contain union of results from both the search criteria.
Please can somebody point out what I am doing wrong. Also, are there any best practices to keep in mind to design a solr schema for such a de-duplication system for a bank that could identify duplicate customer record(s).
Thanks in advance!!