My Redis database currently has a large hash table - approx. 130m key/value pairs where each key and value is an integer (obviously stored as string). The hash works fine in the database and consumes < 10GB of memory of 64GB total system memory, per `Redis.info():
'used_memory_human': '9.05G'
It's computationally expensive to create this hash, taking ~5.5 hours to complete across 16 cores. For this reason I want to be able to dump the hash table to disk and restore it when I stand up the application. I was able to do dump the hash table to disk successfully using:
rdb = redis.Redis()
def dump_lookup_table():
with open('redis_dumps/full_hash.rdb', 'wb') as full_hash_file:
full_hash_file.write(rdb.dump("my_redis_hash"))
This results in a 2.4GB file saved to the relevant folder. The problem is when I try to restore the hash from the file:
def restore_lookup_table():
with open('redis_dumps/full_hash.rdb', 'rb') as full_hash_file:
rdb.restore("my_redis_hash", 0, full_hash_file.read())
ConnectionError: Error 104 while writing to socket. Connection reset by peer.
I have tested with a sub-section of the hash (about 25m records), which is a file of ~900MB on disk. This restores successfully, so the issue seems to be a limitation of Redis to accept this very large file for restoration.
In reviewing the docs, it seems that I'm hitting the Query Buffer Hard Limit described here, which limits the maximum client buffer to 1GB. Unless I'm misreading the docs and/or the redis.conf file, this doesn't seem to be configurable so I can't increase it to allow the restore of the 2.4GB rdb dump.
One potential workaround that I've found is to create an intermediate temporary has to which I copy a subset of keys, then dump that hash to rdb and delete it. I can then restore the temporary hashes individually and copy the keys into my single desired hash. This works but it's very, very slow in comparison with a simple dump / restore of the single hash.
Is there a generally accepted way to dump / restore relatively large hashes like this? Or is there any way to override or work around the 1GB hard limit on the client buffer?