0

I have recently started learning Redis Search and encountered a few issues while working on a specific example. I am storing multiple entries, each containing a tag called accountID and a size. My goal is to calculate the sum of sizes for each account.

Here's my hash template:

HSET my-key:<<unique_ID>> accountID <<acc>> size <<the size>>

Example:

HSET my-key:1 accountID acc1 size 100
HSET my-key:2 accountID acc1 size 100
HSET my-key:3 accountID acc2 size 100

To calculate the total size per accountID, I created the following index:

FT.CREATE storageUsageIdx ON hash PREFIX 1 "my-key:" SCHEMA accountID TAG SORTABLE size NUMERIC

And here's my aggregation query:

FT.AGGREGATE storageUsageIdx "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage

This query works well when the number of entries is below 500k.

FT.AGGREGATE storageUsage "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
   2) "acc1"
   3) "total_usage"
   4) "880800"

However, when the number of entries exceeds a certain threshold, the result is always null:

FT.AGGREGATE storageUsageIdx "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
   2) "null"
   3) "total_usage"
   4) "0"

Interestingly, when I filter by accountID, I get the expected result:

FT.AGGREGATE storageUsageIdx "*" FILTER "@accountID == 'acc1'" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
   2) "acc1"
   3) "total_usage"
   4) "100000000"

There are 1M entries for accountID =acc1 with a size of 100, so the result is correct. This query takes around 3 seconds to execute.

I also added 100 more entries for accountID =acc2, and the query to calculate the total size also takes the same amount of time. This leads me to believe that my index configuration might be incorrect?

I would greatly appreciate any guidance on resolving this aggregate issue and optimizing my index configuration. Thank you for your assistance.

Valdemar
  • 2,560
  • 3
  • 18
  • 14
  • 1
    Hi there - I was initially thinking you might be hitting the query timeout, but 3 seconds would be longer than the default 500ms for that (see https://redis.io/docs/stack/search/configuring/) . Similarly I wondered if you had hit the MAXSEARCHRESULTS value (default 1000000) but I guess not. Then l thought maybe its MAXAGGREGATERESULTS that you are hitting but default for that seems to be unlimited too. Can you run FT.CONFIG GET * and post your output? Thsi may help with diagnosis. – Simon Prickett Apr 21 '23 at 13:29
  • Additionally would suggest removing RedisGear from the title here and going with RediSearch. Redis Gears is a different product that's not involved in this question. – Simon Prickett Apr 21 '23 at 13:31
  • Sorry about the misleading title. I'm fixing it. Regarding FT.CONFIG, I just noticed it is not available. I'm using Redis Enterprise and they disabled a few redis commands. – Valdemar Apr 21 '23 at 14:09
  • Can you try to use SlowLog https://redis.io/commands/slowlog-get/ and see what you get? – BSB Apr 21 '23 at 19:44
  • @SimonPrickett I think you were right. Redis Enterprise is blacklisting a ton of useful redis comands. One is the `FT.CONFIG GET`. I just did a test on a community based image and I had no issues getting the result that I wanted once I increased the timeout: `FT.CONFIG SET TIMEOUT 3000` – Valdemar Apr 22 '23 at 16:37

1 Answers1

2

In your last query, you have a problem; when you use FILTER, your filter will be applied after aggregation, so your query is faced with all of your data and filter that after aggregation. According to documentation,ft.aggregate

FILTER {expr}: filters the results using predicate expressions relating to values in each result. They are applied post-query and relate to the current state of the pipeline.

So, you can change your last query as follows:

FT.AGGREGATE storageUsageIdx "@accountID:{acc1}" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage

This query will be significantly faster.

iman safari
  • 194
  • 1
  • 1
  • 8