1

I'm in the midst of exploring RediSearch and I thought I'd give the aggregations feature a shot and have hit a roadblock.

I can't seem to get a good result.

For testing purposes I created a basic index/schema like so:

FT.CREATE test SCHEMA field TEXT

FT.ADD test 1A 1 FIELDS field hello
FT.ADD test 2A 1 FIELDS field hello
FT.ADD test 3A 1 FIELDS field hello
FT.ADD test 4A 1 FIELDS field world

Next, I issued the following query:

FT.AGGREGATE test "*" GROUPBY 1 @field REDUCE COUNT 0 AS agg

My expectation was that I get a result indicating hello occurs three times and world occurs once... but instead I get the following result:

1) (integer) 1
2) 1) "field"
   2) (nil)
   3) "agg"
   4) "4"

I thought it was pretty straight forward... but I'm obviously doing something wrong.

Also, the following is the output from the MODULE LIST command:

1) 1) "name"
   2) "ft"
   3) "ver"
   4) (integer) 10300
2) 1) "name"
   2) "ReJSON"
   3) "ver"
   4) (integer) 10001

Any help would be super.

Thanks!

Tombatron
  • 1,567
  • 1
  • 13
  • 27
  • 2
    Please try after defining the field as `SORTABLE` during schema creation. – Itamar Haber Oct 31 '18 at 00:16
  • 2
    Also, consider upgrading to RediSearch v1.4.1 that was released earlier today – Itamar Haber Oct 31 '18 at 00:19
  • 1
    @ItamarHaber I updated to the latest version of Redis and RediSearch AND made the property I was aggregating on `SORTABLE`. Making the property `SORTABLE` seems to have done the trick. Did I miss something in the documentation? – Tombatron Oct 31 '18 at 09:18
  • Another way to go about this, w/o reverting to the use of sortable fields, is to tune the (currently global) timeout - that's basically the classic space-time trade-off – Itamar Haber Oct 31 '18 at 16:27

1 Answers1

5

It turns out that I should have read the documentation better.

From the section in the aggregations documentation where they describe the FT.AGGREGATE command parameters they mention LOAD {nargs} {property}, where they say:

Load document fields from the document HASH objects. This should be avoided as a general rule of thumb. Fields needed for aggregations should be stored as SORTABLE, where they are available to the aggregation pipeline with very low latency. LOAD hurts the performance of aggregate queries considerably since every processed record needs to execute the equivalent of HMGET against a redis key, which when executed over millions of keys, amounts to very high processing times.

From the query example in the original question I had:

FT.AGGREGATE test "*" GROUPBY 1 @field REDUCE COUNT 0 AS agg

Since the schema definition didn't have field defined as SORTABLE I would have to LOAD "field" in order to perform an aggregation on it.

FT.AGGREGATE test "*" LOAD 1 @field GROUPBY 1 @field REDUCE COUNT 0 AS agg

However, since according to the documentation LOAD hurts performance I should have instead defined the field I want to aggregate as SORTABLE.

FT.CREATE test SCHEMA field TEXT SORTABLE

With the schema properly defined my original query works.

Tombatron
  • 1,567
  • 1
  • 13
  • 27