Batch set data from Dictionary into Redis

Question

I am using StackExchange Redis DB to insert a dictionary of Key value pairs using Batch as below:

private static StackExchange.Redis.IDatabase _database;
public void SetAll<T>(Dictionary<string, T> data, int cacheTime)
{
    lock (_database)
    {
        TimeSpan expiration = new TimeSpan(0, cacheTime, 0);
        var list = new List<Task<bool>>();
        var batch = _database.CreateBatch();               
        foreach (var item in data)
        {
            string serializedObject = JsonConvert.SerializeObject(item.Value, Formatting.Indented,
        new JsonSerializerSettings { ContractResolver = new SerializeAllContractResolver(), ReferenceLoopHandling = ReferenceLoopHandling.Ignore });

            var task = batch.StringSetAsync(item.Key, serializedObject, expiration);
            list.Add(task);
            serializedObject = null;
        }
        batch.Execute();

        Task.WhenAll(list.ToArray());
    }
}

My problem: It takes around 7 seconds to set just 350 items of dictionary.

My question: Is this the right way to set bulk items into Redis or is there a quicker way to do this? Any help is appreciated. Thanks.

Would this link help: [Bulk create keys in Redis - ServiceStack C#](https://stackoverflow.com/a/39515188/6741868) — Keyur PATEL, Jun 16 '17 at 09:14
@KeyurPATEL Nope, I am using StackExchange not ServiceStack. — User3250, Jun 16 '17 at 09:25
@KeyurPATEL it is my "informed guess" that the key issues here are serialization cost and bandwidth cost - which means it won't actually matter *which* tooling you use (I say "informed guess" because I can't be sure without an actual repro, but: I am very experienced in both serialization and redis, so I'd wager that my hunch is a good one) — Marc Gravell, Jun 16 '17 at 09:26
I tell you what I'd love to know: the total payload size, meaning: if you do `long totalChars = 0;` and then in the loop `totalChars += item.Key.Length + serializedObject.Length + 25;`, what is `totalChars` at the end? Obviously this isn't *quite* the same as bytes (UTF-8 being variable length), but: it would be a really quick and easy way of understanding how much data you're transporting here; the `+25` is for the transport overheads per command, assuming `*3\r\n$3\r\nSET\r\n$X\r\n...\r\n$Y\r\n...\r\n`) — Marc Gravell, Jun 16 '17 at 09:29
@MarcGravell Just checked `totalChars` is around **3639111**. — User3250, Jun 16 '17 at 09:49
@User3250 k; so you're transporting *at least* 3.6 MB and generating *at least* 3.6MB of garbage to collect (probably much more of both); that's not *vast* by today's throughput ability, but it could be a key factor. I strongly suggest getting "serialization only" timings; I also strongly suggest putting some serious thought towards what I say in the second of the two answers I've posted — Marc Gravell, Jun 16 '17 at 09:58

Marc Gravell · Accepted Answer · 2017-06-16T09:21:48.240

"just" is a very relative term, and doesn't really make sense without more context, in particular: how big are these payloads?

however, to clarify a few points to help you investigate:

there is no need to lock an IDatabase unless that is purely for your own purposes; SE.Redis deals with thread safety internally and is intended to be used by competing threads
at the moment, your timing of this will include all the serialization code (JsonConvert.SerializeObject); this will add up, especially if your objects are big; to get a decent measure, I strongly suggest you time the serialization and redis times separately
the batch.Execute() method uses a pipeline API and does not wait for responses between calls, so: the time you're seeing is not the cumulative effect of latency; that leaves just local CPU (for serialization), network bandwidth, and server CPU; the client library tools can't impact any of those things
there is a StringSet overload that accepts a KeyValuePair<RedisKey, RedisValue>[]; you could choose to use this instead of a batch, but the only difference here is that it is the varadic MSET rather than muliple SET; either way, you'll be blocking the connection for other callers for the duration (since the purpose of batch is to make the commands contiguous)
you don't actually need to use CreateBatch here, especially since you're locking the database (but I still suggest you don't need to do this); the purpose of CreateBatch is to make a sequence of commands sequential, but I don't see that you need this here; you could just use _database.StringSetAsync for each command in turn, which would also have the advantage that you'd be running serialization in parallel to the previous command being sent - it would allow you to overlap serialization (CPU bound) and redis ops (IO bound) without any work except to delete the CreateBatch call; this will also mean that you don't monopolize the connection from other callers

So; the first thing I would do would be to remove some code:

private static StackExchange.Redis.IDatabase _database;
static JsonSerializerSettings _redisJsonSettings = new JsonSerializerSettings {
    ContractResolver = new SerializeAllContractResolver(),
    ReferenceLoopHandling = ReferenceLoopHandling.Ignore };

public void SetAll<T>(Dictionary<string, T> data, int cacheTime)
{
    TimeSpan expiration = new TimeSpan(0, cacheTime, 0);
    var list = new List<Task<bool>>();
    foreach (var item in data)
    {
        string serializedObject = JsonConvert.SerializeObject(
            item.Value, Formatting.Indented, _redisJsonSettings);

        list.Add(_database.StringSetAsync(item.Key, serializedObject, expiration));
    }
    Task.WhenAll(list.ToArray());
}

The second thing I would do would be to time the serialization separately to the redis work.

The thrid thing I would do would be to see if I can serialize to a MemoryStream instead, ideally one that I can re-use - to avoid the string alocation and UTF-8 encode:

using(var ms = new MemoryStream())
{
    foreach (var item in data)
    {
        ms.Position = 0;
        ms.SetLength(0); // erase existing data
        JsonConvert.SerializeObject(ms,
            item.Value, Formatting.Indented, _redisJsonSettings);

        list.Add(_database.StringSetAsync(item.Key, ms.ToArray(), expiration));
    }
}

The object in this context is big with huge infos in string props and many nested classes. Let me try your suggestions. Will get back soon. Thanks. — User3250, Jun 16 '17 at 09:29
@User3250 great; in that case I think it is *even more important* to separate out the serialization cost from the network cost, just so you know which one you are measuring; a really cheap way of doing this would be to just comment out the redis code so you **only** serialize it to a string or stream (but don't do **anything** with it, except maybe to call `GC.KeepAlive` to prevent the JIT doing anything clever) - see how long *that* takes. That time is the serialization cost — Marc Gravell, Jun 16 '17 at 09:39
`JsonConvert.SerializeObject(ms,item.Value, Formatting.Indented, _redisJsonSettings);` isn't working. Overload not found. I tried using the former code `list.Add(_database.StringSetAsync(item.Key, serializedObject, expiration));` works really great! Time reduced to whooping 100ms approx. — User3250, Jun 16 '17 at 10:08
@User3250 one sec, I'll find the correct Json.NET code for using with a stream - every little helps :) update: boo! it only accepts `TextWriter` or `JsonWriter`... meh; not worth the messing — Marc Gravell, Jun 16 '17 at 10:09
Yes, so it seems `_database.StringSetAsync` did the trick. I wish I could give your answers +25 upvotes :) — User3250, Jun 16 '17 at 10:42

Marc Gravell · Answer 2 · 2017-06-16T09:59:11.843

This second answer is kinda tangential, but based on the discussion it sounds as though the main cost is serialization:

The object in this context is big with huge infos in string props and many nested classes.

One thing you could do here is not store JSON. JSON is relatively large, and being text-based is relatively expensive to process both for serialization and deserialization. Unless you're using rejson, redis just treats your data as an opaque blob, so it doesn't care what the actual value is. As such, you can use more efficient formats.

I'm hugely biased, but we make use of protobuf-net in our redis storage. protobuf-net is optimized for:

small output (dense binary without redundant information)
fast binary processing (absurdly optimized with contextual IL emit, etc)
good cross-platform support (it implements Google's "protobuf" wire format, which is available on just about every platform available)
designed to work well with existing C# code, not just brand new types generated from a .proto schema

I suggest protobuf-net rather than Google's own C# protobuf library because of the last bullet point, meaning: you can use it with the data you already have.

To illustrate why, I'll use this image from https://aloiskraus.wordpress.com/2017/04/23/the-definitive-serialization-performance-guide/:

serializer performance

Notice in particular that the output size of protobuf-net is half that of Json.NET (reducing the bandwidth cost), and the serialization time is less than one fifth (reducing local CPU cost).

You would need to add some attributes to your model to help protobuf-net out (as per How to convert existing POCO classes in C# to google Protobuf standard POCO), but then this would be just:

using(var ms = new MemoryStream())
{
    foreach (var item in data)
    {
        ms.Position = 0;
        ms.SetLength(0); // erase existing data
        ProtoBuf.Serializer.Serialize(ms, item.Value);

        list.Add(_database.StringSetAsync(item.Key, ms.ToArray(), expiration));
    }
}

As you can see, the code change to your redis code is minimal. Obviously you would need to use Deserialize<T> when reading the data back.

If your data is text based, you might also consider running the serialization through GZipStream or DeflateStream; if your data is dominated by text, it will compress very well.

Hi @Marc just a side question, do you use gzip? If so, how much of a performance hit did you experience compared to "normal" serialization? I see huge performance hits when using it +json where serialization takes ~3 times longer. And second, the MS Bond (binary) serializer is actually even faster than protobuf in some scenarios, ever tried it? Thanks, m — MichaC, Jun 17 '17 at 10:58
@michaC I don't have exact numbers on overhead there; we're happy to pay it to minimize IO (network, per key per read) and RAM (server, per key); on Bond - I've looked but not played with it. Any info on the scenarios it really excels at? I'm working aggressively on protobuf-net - maybe I can "up my game" in those scenarios :) — Marc Gravell, Jun 17 '17 at 11:53
@MarcGravell I've created a small github repo https://github.com/MichaCo/SerializationBenchmarks if you want to take a look / continue the conversation ;) For now, there is only a very simple list of strings benchmark, I might add more — MichaC, Jun 17 '17 at 16:07
@MichaC awesome - I will certainly look. Will focus on shipping 2.3.0 first though. I'll add an issue to remind myself: done https://github.com/mgravell/protobuf-net/issues/266 — Marc Gravell, Jun 17 '17 at 16:55

Batch set data from Dictionary into Redis

2 Answers2

Linked