3

is it possible to bucket on the count() of aggregates? The grammar Select parameter language grammar seems to suggest that it is but I could be interpreting it wrong.

My rough interpretation:
predefined([expr = (aggr = (count())], bucket(...))

( "predefined" "(" exp "," "(" bucket ( "," bucket )* ")" ")" ) |
exp        ::= ( "+" | "-") ( "$" identifier [ "=" math ] ) | ( math ) | ( aggr )
aggr       ::= ( ( "count" "(" ")" ) |
                 ( "sum" "(" exp ")" ) |
                 ( "avg" "(" exp ")" ) |
                 ( "max" "(" exp ")" ) |

attempt ("Expression 'count()' not applicable for single hit.")

 all(group(predefined(status, bucket["field1"] ) ) each(  
       all(group(predefined(count(), bucket[0,10>, bucket[11,20>)) each(
         output(count() as(count)
       )) 
     ))     
Narin
  • 81
  • 1
  • 4
  • As a follow up the grouping reference very clearly states what can be grouped https://docs.vespa.ai/documentation/reference/grouping-syntax.html#group but Ill wait on any feedback that might provide/suggest a solution – Narin Feb 19 '19 at 19:57

1 Answers1

2

Creating predefined buckets of count() (or other aggregators) is not supported. Count in general (i.e when counting subgroups rather than hits) would be a bit tricky because it is computed across the nodes as a data sketch, whose output would them need to be sent back down for bucketing.

Is this is something you need to do? If so, create a ticket for it on https://github.com/vespa-engine/vespa/issues

Jon
  • 2,043
  • 11
  • 9
  • Thanks much, Ive submitted a feature request. https://github.com/vespa-engine/vespa/issues/8566. – Narin Feb 20 '19 at 15:23
  • Would it be possible to name a class or even package that I can look at that would be a good place to look at? ExpressionCountAggregationResult.java? I imagine its not as easy as that but I just want to research so I can get a better understanding of how counting/bucketing/grouping works. – Narin Mar 08 '19 at 04:29
  • Yes, [ExpressionCountAggregationResult.java](https://github.com/vespa-engine/vespa/blob/master/searchlib/src/main/java/com/yahoo/searchlib/aggregation/ExpressionCountAggregationResult.java) converts the sketch data (computed by the C++ code on the content nodes) is combined into an estimate. I think you need to use those estimates as input to another round of grouping. See [GroupingExecutor.java](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/grouping/vespa/GroupingExecutor.java) on setting this up. – Jon Mar 11 '19 at 07:43
  • Thank you, I was researching the hyperloglog algorithm just to see what it does as trivial algorithms don't really translate well to big data it seems. – Narin Mar 11 '19 at 18:59