0

Im not sure if its possible as I am currently learning Solr and am very much a novice. I am querying a dataset with a few hundred thousand pieces to it and I'm returning the stats on the price of those sets. Most of my data has a price that is well above 0 (ideally it would be over 150 or 500), but with some queries there are far more 0's.

Is there some way to exclude the stats from making a min, max, average, and all that with values that are below a set value (0 in this case)?

I have tried /select?q=*%3A*&wt=json&indent=true&stats=true&stats.field={!min=5000}PRICE&indent=true but it doesn't appear to be that easy.

An example of the data I'm seeing is as follows

"stats":{
    "stats_fields":{
        "PRICE":{
            "min":0.0,
            "max":81977.0,
            "count":163,
            "missing":0,
            "sum":799747.0,
            "sumOfSquares":5.9263205819E10,
            "mean":4906.423312883436,
            "stddev":18482.44220103939
        }
    }
}

2 Answers2

0

The stats component works with the set of documents retrieved, so if you don't want those with 0 to be counted, only request those with a higher value:

?q={0 TO *]

This will fetch any value larger than 0 (so 0.00001 will still be included). If your actual values start at 1, you can use q=[1 TO *] to include the 1 ({ is an exclusive range, while [ is inclusive).

You can also apply filters through fq to filter the result set, or use JSON Facets to calculate stats for a set of documents.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • that sounds like a pretty good idea, is there a way to do it without modifying the results of the query like doing `select?q=*%3A*&wt=json&indent=true&stats=true&stats.field=PRICE{0 TO *]&indent=true`? – McMuffin Aug 13 '18 at 14:46
  • Not that I know of, but using a [JSON Facet with a Facet Query](https://lucene.apache.org/solr/guide/7_4/json-facet-api.html) should work without modifying the main query. – MatsLindh Aug 13 '18 at 18:34
0

As already stated, the stats component works with the set of documents retrieved. However, I needed to exclude values from the statistics without removing them from the results retrieved. This question is almost 4 years old but thought I'd add my solution to help others in a similar situation.

Instead of relying on solr, I calculated the statistics myself in PHP by passing the values to an array, excluding the values I didn't need and then running the calculations.

Example:

$array = [0, 1, 2, 2, 2, 3, 3, 3, 4, 5, 6, 7, 8, 9];
$array = array_unique($array); // remove duplicate items
$exclude = [0, 2, 4, 6, 8]; // List of values to exclude
$array = array_diff($array, $exclude); // exclude items from main array

//Calculate statistics
$min = min($array);
$max = max($array);
$range = (max($array)-min($array))+1; // value is inclusive
// $count and $mean will have different values depending on whether or not you remove duplicates
$count = count($array);
$mean = array_sum($array)/$count;

This returns: Min: 1 Max: 9 Range: 9 Count: 5 Mean: 5

drew
  • 3
  • 3