Clusters produced by Solr and Carrot2 Workbench not consistent

Question

I'm trying to tune clustering in Solr using Carrot2 Workbench. While Workbench is producing nice results Solr is refusing to do so and its clusters are very much different.

My flow:

Prepare set of doc ids and query on them alone (fq)
Tune and export XML config from Workbench
Restart Solr to make sure it's all picked up
Repeat the same query (I also made sure it's exactly the same as one from Wrokbench by checking Solr logs)
Compare clusters... And this is the moment I'm lost. They are completely different even is structure. Workbench produces longer more complex labels, while Solr labels are very simple.

I tried to tweak parameters both from XML and query, but with very little effect. However enough to see that configs are being picked up.

Another thing I checked was Carrto2 CLI tool. I exported data from Solr to XML and used CLI together with config I exported from Workbench to produce clusters and CLI is consistent with Workbench.

That leaves Solr being an odd one. I use Carrot2 v3.15.1 and Solr 7.2.1

What am I missing? Why Solr is producing different clusters from the same data and configuration?

One thing to check would be if the Solr plugin clustering is applied to full documents or to the contextual snippets only (the carrot.produceSummary parameter). Also, is it possible that in Workbench you're using the Lingo clustering algorithm while for some reason Solr uses the STC algorithm? (the difference in labels would suggest that) — Stanislaw Osinski, Apr 04 '18 at 10:28

Clusters produced by Solr and Carrot2 Workbench not consistent

0 Answers0