44

I am trying to find out values stored in a list of keys which match a pattern from redis. I tried using SCAN so that later on i can use MGET to get all the values, The problem is:

SCAN 0 MATCH "foo:bar:*" COUNT 1000

does not return any value whereas

SCAN 0 MATCH "foo:bar:*" COUNT 10000

returns the desired keys. How do i force SCAN to look through all the existing keys? Do I have to look into lua for this?

DarthSpeedious
  • 965
  • 1
  • 13
  • 25
  • 9
    Forcing SCAN to do the entire keyspace in one go is the equivalent to running KEYS. Note that SCAN was introduced exactly for that purpose - not running KEYS. – Itamar Haber Oct 16 '15 at 11:07
  • 1
    @ItamarHaber does SCAN also block the the keyspace (for one iteration) like KEYS command does for the complete iteration? – DarthSpeedious Feb 06 '16 at 07:25
  • 1
    Yes - almost all operations are blocking. – Itamar Haber Feb 06 '16 at 12:12
  • 4
    Is there any recommendation for maximum value of count parameter? – Adam Szecowka Jul 22 '16 at 06:07
  • "Since these commands allow for incremental iteration, returning only a small number of elements per call, they can be used in production without the downside of commands like KEYS or SMEMBERS that may block the server for a long time (even several seconds) when called against big collections of keys or elements." – ives Oct 22 '19 at 21:25
  • https://redis.io/commands/scan – ives Oct 22 '19 at 21:25

3 Answers3

44

With the code below you will scan the 1000 first object from cursor 0

SCAN 0 MATCH "foo:bar:*" COUNT 1000 

In result, you will get a new cursor to recall

SCAN YOUR_NEW_CURSOR MATCH "foo:bar:*" COUNT 1000

To scan 1000 next object. Then when you increase COUNT from 1000 to 10000 and retrieve data you scan more keys then in your case match more keys.

To scan the entire list you need to recall SCAN until the cursor give in response return zero (i.e entire scan)

Use INFO command to get your amount of keys like

db0:keys=YOUR_AMOUNT_OF_KEYS,expires=0,avg_ttl=0

Then call

SCAN 0 MATCH "foo:bar:*" COUNT YOUR_AMOUNT_OF_KEYS
Jason Law
  • 965
  • 1
  • 9
  • 21
khanou
  • 1,344
  • 11
  • 15
  • 2
    How do i force SCAN to look through all the existing keys to see if there exists a match in one go? – DarthSpeedious Oct 16 '15 at 10:05
  • This would be too slow if there are a lot of keys in redis right? I guess i will have to reconsider my approach? Which would be better to search for the keys in one go or to make a loop which iterates over the cursor? – DarthSpeedious Oct 16 '15 at 10:23
  • 8
    Yes, this is not the right approach in production. Better is to iterates over the cursor, SCAN is made for this. Iterates until cursor return zero for a full scan. This will scan the count value each time. – khanou Oct 16 '15 at 10:26
  • 1
    Beware that on large production databases you might still have performance issues with SCAN, as your app will need to make multiple (or even lots) of calls to redis. For example check this discussion: https://github.com/xetorthio/jedis/issues/1338 – AbstractVoid Feb 27 '18 at 10:54
  • Agree. How do we decide `COUNT` batch size? should it be 100 or 500 or 1000 or 5000? – roottraveller Dec 31 '20 at 06:20
29

Just going to put this here for anyone interested in how to do it using the python redis library:

import redis
redis_server = redis.StrictRedis(host=settings.redis_ip, port=6379, db=0)
mid_results = []
cur, results = redis_server.scan(0,'foo:bar:*',1000)
mid_results += results

while cur != 0:
    cur, results = redis_server.scan(cur,'foo:bar:*',1000)
    mid_results += results

final_uniq_results = set(mid_results)

It took me a few days to figure this out, but basically each scan will return a tuple.

Examples:

(cursor, results_list)

(5433L, [... keys here ...])
(3244L, [... keys here, maybe ...])
(6543L, [... keys here, duplicates maybe too ...])
(0L, [... last items here ...])
  • Keep scanning cursor until it returns to 0.
  • There is a guarantee it will return to 0.
  • Even if the scan returns an empty results_list between scans.
  • However, as noted by @Josh in the comments, SCAN is not guaranteed to terminate under a race condition where inserts are happening at the same time.

I had a hard time figuring out what the cursor number was and why I would randomly get an empty list, or repeated items, but even though I knew I had just put items in.

After reading:

It made more sense, but still there is some deep programming magic and compromises happening to iterate the sets.

jmunsch
  • 22,771
  • 11
  • 93
  • 114
  • 1
    +1 Thank you very much for this. I will add though, that `SCAN` is not guaranteed to terminate. In most cases it will, but if the number of items in the DB continues to grow and outpace the iterator, it won't. From the official Redis docs: `...is guaranteed to terminate only if the size of the iterated collection remains bounded to a given maximum size`. – Josh Nov 15 '18 at 02:38
  • 1
    @Josh awesome point, thankfully I didn't come across that issue, but it makes a lot of sense. – jmunsch Nov 15 '18 at 02:41
5

If your use case involves Python, or if you just want to get the values once and has Python installed on your machine, this is a trivial task if you use the scan_iter method on the redis python library:

from redis import StrictRedis

redis = StrictRedis.from_url(REDIS_URI)

keys = []
for key in redis.scan_iter('foo:bar:*', 1000):
    keys.append(key)

In the end, keys will contain all the keys you would get by applying @khanou 's method.

This is also more efficient than doing shell scripts, since those spawn a new client on each iteration of the loop.

João Haas
  • 1,883
  • 13
  • 13