I'm developing an API using Go and Redis. The problem is that RAM usage is inadequate and I can't find the root of the problem.
TL;DR version
There are hundreds/thousands of hash objects. Each one of 1 KB objects (key+value) takes ~0.5 MB of RAM. However, there is no memory fragmentation (INFO
shows none).
Also, dump.rdb is 70x times smaller than the RAM set (360KB dump.rdb vs 25MB RAM for 50 objects, and 35.5MB vs 2.47GB for 5000 objects).
Long version
Redis instance is filled mostly with task:123
hashes of the following kind:
"task_id" : int
"client_id" : int
"worker_id" : int
"text" : string (0..255 chars)
"is_processed" : boolean
"timestamp" : int
"image" : byte array (1 kbyte)
Also, there are a couple of integer counters, one list and one sorted set (both consist of task_id's).
RAM usage has a linear dependency on the number of task objects.
INFO output for 50 tasks:
# Memory
used_memory:27405872
used_memory_human:26.14M
used_memory_rss:45215744
used_memory_peak:31541400
used_memory_peak_human:30.08M
used_memory_lua:35840
mem_fragmentation_ratio:1.65
mem_allocator:jemalloc-3.6.0
and 5000 tasks:
# Memory
used_memory:2647515776
used_memory_human:2.47G
used_memory_rss:3379187712
used_memory_peak:2651672840
used_memory_peak_human:2.47G
used_memory_lua:35840
mem_fragmentation_ratio:1.28
mem_allocator:jemalloc-3.6.0
Size of dump.rdb
for 50 tasks is 360kB and for 5000 tasks it's 35553kB.
Every task object has serializedlength of ~7KB:
127.0.0.1:6379> DEBUG OBJECT task:2000
Value at:0x7fcb403f5880 refcount:1 encoding:hashtable serializedlength:7096 lru:6497592 lru_seconds_idle:180
I've written a Python script trying to reproduce the problem:
import redis
import time
import os
from random import randint
img_size = 1024 * 1 # 1 kb
r = redis.StrictRedis(host='localhost', port=6379, db=0)
for i in range(0, 5000):
values = {
"task_id" : randint(0, 65536),
"client_id" : randint(0, 65536),
"worker_id" : randint(0, 65536),
"text" : "",
"is_processed" : False,
"timestamp" : int(time.time()),
"image" : bytearray(os.urandom(img_size)),
}
key = "task:" + str(i)
r.hmset(key, values)
if i % 500 == 0: print(i)
And it consumes just 80MB of RAM!
I would appreciate any ideas on how to figure out what's going on.