2

I suddenly encountered performance problems when trying to read 1M records from Redis sorted set. I used ZSCAN with cursor and batch size 5K.

Code was executed using Erlang R14 on the same machine that hosts Redis. Receiving of 5K elements batch takes near 1 second. Unfortunately, I failed to compile Erlang R16 on this machine, but I think it does not matter.

For comparison, Node.js code with node_redis (hiredis parser) does 1M in 2 seconds. Same results for Python and PHP.

Maybe I do something wrong?

Thanks in advance.

Here is my Erlang code:

-module(redis_bench).
-export([run/0]).

-define(COUNT, 5000).

run() ->
    {_,Conn} = connect_to_redis(),
    read_from_redis(Conn).

connect_to_redis() ->
    eredis:start_link("host", 6379, 0, "pass").

read_from_redis(_Conn, 0) ->
    ok;
read_from_redis(Conn, Cursor) ->
    {ok, [Cursor1|_]} = eredis:q(Conn, ["ZSCAN", "if:push:sset:test", Cursor, "COUNT", ?COUNT]),
    io:format("Batch~n"),
    read_from_redis(Conn, Cursor1).

read_from_redis(Conn) ->
    {ok, [Cursor|_]} = eredis:q(Conn, ["ZSCAN", "if:push:sset:test", 0, "COUNT", ?COUNT]),
    read_from_redis(Conn, Cursor).
Vitaly Chirkov
  • 1,692
  • 3
  • 17
  • 33

2 Answers2

0

Switching to redis-erl decreased read time of 1M keys to 16 seconds. Not fast, but acceptable.

Here is new code:

-module(redis_bench2).
-export([run/0]).

-define(COUNT, 200000).

run() ->
    io:format("Start~n"),
    redis:connect([{ip, "host"}, {port, 6379}, {db, 0}, {pass, "pass"}]),
    read_from_redis().

read_from_redis(<<"0">>) ->
    ok;
read_from_redis(Cursor) ->
    [{ok, Cursor1}|_] = redis:q(["ZSCAN", "if:push:sset:test", Cursor, "COUNT", ?COUNT]),
    io:format("Batch~n"),
    read_from_redis(Cursor1).

read_from_redis() ->
    [{ok, Cursor}|_] = redis:q(["ZSCAN", "if:push:sset:test", 0, "COUNT", ?COUNT]),
    read_from_redis(Cursor).
Vitaly Chirkov
  • 1,692
  • 3
  • 17
  • 33
0

9 out of 10 times, slowness like this is a result of badly written drivers more than it is a result of the system. In this case, the ability to pipeline requests to Redis is going to be important. A client like redo can do pipelining and is maybe faster.

Also, beware measuring one process/thread only. If you want fast concurrent access, it is often balanced out against fast sequential access.

I GIVE CRAP ANSWERS
  • 18,739
  • 3
  • 42
  • 47
  • Thanks for answer. In my case pipelining will not improve anything because I forced to wait a cursor to send next command during sequential read. And as long as I didn't shard my DB, I can't spawn more readers. But I hope I'll do it soon. – Vitaly Chirkov Mar 22 '14 at 16:53
  • 1
    _forced to wait a cursor_ : you might be able to solve that by storing the cursor (temporarily) in Redis. Normally I'd suggest a Lua script, but you have a SCAN command. Can't you redesign your data structure? SCAN is very useful, but avoiding it is usually better. I don't know Erlang, but it also looks to me you're recursing yourself into a very deep stack. Could be wrong here though. – Tw Bert Mar 22 '14 at 19:47
  • I need to walk through all keys. It's ok by now that read operations is sequential. There is no bottleneck, it's periodic background task. I'll perform an HTTP-requests for all of this keys in separate processes next ;) 16 seconds for 1M keys is enough. – Vitaly Chirkov Mar 22 '14 at 20:35