0

I'm using the Solarium PHP library to connect to a SOLR instance. I have an index with around 3.5 mio documents. Searching and filtering works great, but I have one thing that just doesn't seem to work quite well with SOLR.

The documents describe companies. Now I want to know how many unique phonenumbers are in the index given a specific query. Some companies are related and share the phonenumber, some don't have a phonenumber at all.

Facets are not really an option since they are limited to 100 results per request. For 3.5 mio documents that would mean a lot of requests. I tried to use the getStats() option, but that was slow too. I finally resided to GroupComponent queries, which seem to do the job.

Still if there are a lot of results (100k+) in the resultset, it is loading for a very long time and eventually crashing SOLR. I increased the memory limits to prevent the crashes, but it is still not loading within decent time constraints. This is my code:

 $groupComponent = $select->getGrouping();
 $groupComponent->addField('phone');
 $groupComponent->setNumberOfGroups(true);
 $groupComponent->setLimit(0);
 $groupComponent->setTruncate(true);
 $groupComponent->setFormat('simple');
 $groupComponent->setFacet(true);

 $resultset = $this->client->execute($select);
 $groups = $resultset->getGrouping();

I actually only need the counts, not the results. I set the limit to 0, but I'm not sure if that stands for zero or unlimited in this case. If I put it to 1 it doesn't make any difference. So I'm not sure if it is possible to just get the counts. I have also tried to add $groupComponent->setMainresult(true); but that doesn't make it faster and seems to return 0 all the time for the number of phonenumbers.

If anybody has a suggestion for speeding up the process in Solarium or directly in SOLR I love to hear it. Thanks!

Frank
  • 530
  • 5
  • 15
  • 1
    The amount of facets can be adjusted with `facet.limit` (on a per field basis with `f.fieldname.facet.limit` if necessary); make sure you have docValues enabled for the field (and a field type that supports docvalues). However, the JSON Facet API supports more functionality than the older facet API - and has a `unique()` function (a "stat facet") that will give you the count of unique numbers in a facet bucket. That sounds like exactly what you want: https://solr.apache.org/guide/solr/latest/query-guide/json-facet-api.html - `json.facet={"unique_phone_nos": "unique(phone_no)"}` – MatsLindh Jul 11 '22 at 10:42
  • I thought the facet.limit option was also limited to 100 just like with regular queries, but I see in the documentation that is not the case. I will try that and dive into the unique function, sounds like the latter will improve performance. Thanks! – Frank Jul 11 '22 at 19:52
  • `facet.limit=-1` will give you all facets; you'll have to then count the number of returned elements for a possibly very long response - using `unique` will offload that to the server instead. – MatsLindh Jul 11 '22 at 20:33
  • 1
    Thanks, it works flawlessly and quick. For Solarium this code is required: ```$select->getFacetSet() ->createJsonFacetAggregation('phone') ->setFunction('unique(phone)');``` – Frank Jul 14 '22 at 13:08

0 Answers0