I'm implementing a solr more like this handler to find similar customers.
I have 2 customers, with different names that live on the same address. I want to give an entity_id to solr and get all clients with similar names / addresses back. The client will be able to link both customers together with the click of a button.
I'm using the SolariumBundle to do this in code, but it should be enough to get it to work with the raw query first, if that works I can adapt it to solarium myself.
This is my solrconfig.xml
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<luceneMatchVersion>LUCENE_36</luceneMatchVersion>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
<updateHandler class="solr.DirectUpdateHandler2" />
<requestDispatcher handleSelect="true" >
<requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" />
</requestDispatcher>
<!-- request handlers -->
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
<lst name="defaults">
<int name="mlt.mintf">2</int>
<int name="mlt.mindf">1</int>
<int name="mlt.minwl">5</int>
<int name="mlt.maxwl">1000</int>
<int name="mlt.maxqt">50</int>
<int name="mlt.maxntp">50000</int>
<bool name="mlt.boost">true</bool>
<str name="mlt.fl">customer_data,entity_data,street</str>
<bool name="mlt.match.include">false</bool>
</lst>
</requestHandler>
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<!-- config for the admin interface -->
<admin>
<defaultQuery>solr</defaultQuery>
</admin>
</config>
The relevant part of my schema.xml is:
<fields>
<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true" />
<field name="type" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="entity_id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="sort_id" type="int" indexed="true" stored="true" multiValued="false"/>
<field name="external_id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="status" type="text" indexed="true" stored="true" multiValued="false"/>
<field name="language" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="created" type="int" indexed="true" stored="true" multiValued="false"/>
<field name="name" type="text" indexed="true" stored="true" multiValued="false"/>
<field name="email" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="city" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="country" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="street" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="zipcode" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="entity_data" type="text_ngrm" indexed="true" stored="true" multiValued="true"/>
<field name="customer_data" type="text_ngrm" indexed="true" stored="true" multiValued="true" termVectors="true" />
<!-- Entity data filling -->
<copyField source="entity_id" dest="entity_data"/>
<copyField source="briljant_id" dest="entity_data"/>
<copyField source="name" dest="entity_data"/>
<copyField source="email" dest="entity_data"/>
<!-- End entity data -->
<!-- Customer data -->
<copyField source="name" dest="customer_data"/>
<copyField source="email" dest="customer_data"/>
<copyField source="city" dest="customer_data"/>
<copyField source="country" dest="customer_data"/>
<copyField source="street" dest="customer_data"/>
<copyField source="zipcode" dest="customer_data"/>
<!-- End customer data -->
</fields>
I currently execute this query: http://localhost:8983/solr/core0/mlt?q=entity_id%3A50&wt=json&indent=true&mlt.fl:customer_data
and that does return results for customers that have a similiar name.
For example if customer_id:50 (the one I'm querying for) has the name "Foo Bar", it does return customers with the names "Foo Bar", "Bar Foo", "John Foo". The similiarity on street / country / zipcode doesn't work.
In the debug:parsedquery I can see different mutations of customer_data:Foo customer_data:Bar customer_data oo Bar, ...
but nothing on the address part.
How can I make sure that the query is for: customer_data:Foo customer_data:Bar customer_data:teststreet customer_data:Antwerp
?