How to insert millions of keys/values in Redis using Multithreaded C# application

Question

I need to create a C# application (Windows Service), which runs every 5 seconds (interval), generates around 20 Million values.

I need to insert these 20 Million values into Redis (1 Key / Value) in under 5 seconds, making sure the inserts are finished before the next interval is started.

Note: I have to only keep 7 cycles in Redis => 20 Million * 7 => 140 Million Keys in Redis

I am using C#'s Threading.Tasks to call a function (20 Million Times), so that they are processed in parallel (asynchronously).

I have even created a pool for Redis clients for my process to be able to execute Redis queries also in parallel.

Here is the C# part calling the function 20 Million times:

List<Task> tasksList = new List<Task>();

foreach (object k in ListOf20MillionData)
{
    tasksList.Add(

        Task.Factory.StartNew(() =>
        {
            GenerateValue(k);
            //Inside 'GenerateValue' data is generated and pushed to redis
        })
    );
}

Here is a section of code inside 'GenerateValue' which gets a redis client object from a Pool of clients, executes the insert and releases back the redis client into the pool.

RedisClient redisClientObj = RedisPool.GetNextAvailableClient();

redisClientObj.Add("SomeKey", "SomeValue");

RedisPool.ReleaseRedisClient(redisClientObj );

My Concerns and challenges:

Is my concept of Redis Pools ok?
How many Client connections can Redis handle?
Is my request even possible to achieve using C# and Redis?
Any advice or recommendation is highly appreciated.

Please explain why and how you expect your CPU to be able to handle 20 million Threads at the same time, or your network card to handle 20 million connections. — Camilo Terevinto, Apr 04 '19 at 12:00
Mostly your code is incorrect to point out few issues, why would you generate data on separate tasks, how does `GenerateValue(k);` works is it async or in memory. you should use the async processing for the inserting in the redis thus making it efficient, that should not be a blocking the IO call. Currently you are generating 20 Mn tasks that will bring down your system. You shall process the Task Async, which RedisClient is this, StackOverflow — Mrinal Kamboj, Apr 04 '19 at 12:04
a) Use StackExchange's Redis library rather than ServiceStack's one (it performs better at scale). b) Benchmark, benchmark, benchmark. c) https://stackoverflow.com/questions/27796054/pipelining-vs-batching-in-stackexchange-redis — mjwills, Apr 04 '19 at 12:04
@CamiloTerevinto tasks != threads; and nobody mentioned 20M connections — Marc Gravell, Apr 04 '19 at 12:04
@MarcGravell I'm aware of the difference, not too aware of how Redis works though — Camilo Terevinto, Apr 04 '19 at 12:05
@CamiloTerevinto The GenerateValue function is a simple function In-Memory. You could say it's a Mathematical model. — Nisho, Apr 04 '19 at 12:10
@mjwills The GenerateValue function is a simple function In-Memory. You could say it's a Mathematical model. — Nisho, Apr 04 '19 at 12:10
So, at the risk of asking a stupid question - why not just leave it in memory? Why do you need it in Redis? The data is only surviving in Redis for 7 cycles (35 seconds) anyway - why not just do all of the work in the RAM of the computer rather than involve Redis? — mjwills, Apr 04 '19 at 12:11
Why don't you write a normal synchronous code to insert data? — Dan Nguyen, Apr 04 '19 at 12:12
@mjwills as far as processing, the 20 Million calls are easily executed under 1 second. Now My problem is inserting this 20 Million values into Redis. — Nisho, Apr 04 '19 at 12:12
@mjwills Exactly. I started working in Memory only (C#), then a request came that other machines needs to access this data and not only the machine executing the application. Thus I thought of inserting them onto redis, so that other machines can access it too. — Nisho, Apr 04 '19 at 12:13
@mjwills I can get it down to 16 seconds locally... (edit: 13.9s with async) — Marc Gravell, Apr 04 '19 at 12:21

Marc Gravell · Answer 1 · 2019-04-04T12:31:09.357

Is my concept of Redis Pools ok?

Not really. Pools don't give you more throughput. They break up different logical connection scopes re sequential commands, and they allow simple concurrency... but the redis core is single-threaded, and you should be looking to saturate network, not threads.

How many Client connections can Redis handle?

Tons, but adding more won't help you if you can't saturate them - in fact, having lots of connections increases the overheads.

Is my request even possible to achieve using C# and Redis?

Only on very beefy boxes with a huge network; you might increase throughput with "cluster", but that also increases packet fragmentation

Any advice or recommendation is highly appreciated.

Batch. Batch like crazy, to minimize round-trips. Fat batches with a tiny responses makes very effective use of the network, and don't require you to have complex code. And the redis mset command is optimized for exactly that: fat batches with tiny responses.

Locally, with the same machine inventing the data on a single thread and being the redis server, for me it still takes 34 seconds, though:

    static void Main()
    {
        using (var conn = ConnectionMultiplexer.Connect("127.0.0.1:6379"))
        {
            var db = conn.GetDatabase();
            var watch = Stopwatch.StartNew();
            foreach(var batch in InventData(20000000).Batchify(5000))
            {
                db.StringSet(batch);
            }
            watch.Stop();
            Console.WriteLine(watch.ElapsedMilliseconds);
        }
    }

or if I use Parallel, i.e.

            var watch = Stopwatch.StartNew();
            Parallel.ForEach(InventData(20000000).Batchify(5000),
                batch => db.StringSet(batch));
            watch.Stop();

it takes 16 seconds.

and (see comments) if I combine Parallel with async:

            var watch = Stopwatch.StartNew();
            Parallel.ForEach(InventData(20000000).Batchify(5000),
                batch => db.StringSetAsync(batch));
            watch.Stop();

then it takes just under 14s.

with

    static IEnumerable<KeyValuePair<RedisKey, RedisValue>> InventData(int count)
    {
        if (count < 0) throw new ArgumentOutOfRangeException(nameof(count));
        string dictionary = "abcdefghijklmnopqrstuvwxyz _@:0123456789";
        int dLen = dictionary.Length;
        var rand = new Random(12345);
        const int KEY_LEN = 10, MAX_VAL_LEN = 50;
        char[] keyData = new char[KEY_LEN];
        char[] valueData = new char[MAX_VAL_LEN];
        while (count-- != 0)
        {
            for (int i = 0; i < keyData.Length; i++)
                keyData[i] = dictionary[rand.Next(dLen)];
            var len = rand.Next(10, MAX_VAL_LEN);
            for(int i = 0; i < len; i++)
                valueData[i] = dictionary[rand.Next(dLen)];

            yield return new KeyValuePair<RedisKey, SomeType>(
                new string(keyData), new string(valueData, 0, len));
        }
    }

    static IEnumerable<T[]> Batchify<T>(this IEnumerable<T> source, int batchSize)
    {
        var batch = new List<T>(batchSize);
        foreach(var item in source)
        {
            batch.Add(item);
            if (batch.Count == batchSize)
            {
                var arr = batch.ToArray();
                batch.Clear();
                yield return arr;
            }
        }
        if (batch.Count != 0) yield return batch.ToArray(); // trailers
    }

Marc aren't there `Async APIs` to keep posting the requests to the `Redis Server` — Mrinal Kamboj, Apr 04 '19 at 12:27
@MrinalKamboj there *are*, but I judged that *in this specific scenario*, we probably don't want/need to do that - but: just to be sure, I've tried it with `StringSetAsync` instead of `StringSet` (in the `Parallel.ForEach` version) - and it does indeed reduce the time to 13.9s — Marc Gravell, Apr 04 '19 at 12:30

How to insert millions of keys/values in Redis using Multithreaded C# application

1 Answers1