1

I'm using RediSearch in my project which has an index with over 13 millions documents. I need to fetch latest documents if there is no filter provided by users. My index schema has a NUMERIC field with SORTABLE flag and I've tried to run following query.

FT.SEARCH media * SORTBY media_id DESC LIMIT 0 10

It doesn't return a response for a while and I usually terminate the query.

Is there a way to get last documents in an acceptable time?

ozgurky
  • 126
  • 1
  • 5

2 Answers2

3

I was able to reproduce the behavior you describe by inserting documents with increasing values for the numeric field. I have created a FlameChart to check which part of the code consumes the CPU.

The culprit is the sorting heap we use which is an expensive data structure. In my experiment, each numeric value is inserted into the heap which results in a lengthy query time. This is the expected behavior for how you run your query.

As a solution, you can run the query with LIMIT 0 1 which will reduce the heap work to almost nothing then use the value you will get to run a query with a filter and LIMIT 0 10.

We are considering ways to optimize such queries but for now, there is no solution.

Cheers

FlameChart

Ariel
  • 529
  • 2
  • 13
2

A short term work around might be to store the lastest document ID in a Redis string as you update the index. Run in a pipeline to eliminate an unnecessary network back and forth

SET LASTEST_DOCUMENT_ID $docId
HSET $docId KEY VALUE....

Then you can simply GET LASTEST_DOCUMENT_ID if there are no search parameters

namizaru
  • 646
  • 3
  • 5
  • @namirazu Thanks, I'm using mysql to fetch max id, and send it to redis search to run a numeric filter with [max_id - 10000, +inf] for example. But your work around seems simpler, I'm going to change it most probably. – ozgurky Dec 14 '21 at 06:13