I'm using Redis Timeseries in order to read timeseries data that was previously stored in a CSV file.
The problem: Redis is far slower than Python Pandas reading the same set of data from Redis Server.
I provide here a MWE in order to show the issue. Here, I generate some random data composed by the unix timestamp and a number; then I fill the CSV and Redis with the same data in order to measure the READING time (I'm not concerned about writing in this scenario).
import csv
from datetime import datetime
from random import randrange
from datetime import timedelta
import redis
import pandas as pd
import time
def random_date(start, end):
delta = end - start
int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
random_second = randrange(int_delta)
return start + timedelta(seconds=random_second)
with open('justcsv.csv', mode='w', newline='') as file:
file_writer = csv.writer(
file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
# Init Redis
r = redis.Redis(host="localhost", port="6379")
r.flushall()
# Create the key
r_tsname = "TESTKEY"
label = { "label" : r_tsname}
key_name = "TESTKEY1"
r.ts().create(key_name, labels=label)
# Init random timestamp between two datetime
d1 = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')
d2 = datetime.strptime('1/1/2022 4:50 AM', '%m/%d/%Y %I:%M %p')
# For loop cycle
for x in range(30000):
dt = random_date(d1, d2)
timestamp = int(dt.timestamp())
random_number = round(random.uniform(1.5, 1000.9), 2)
# write current row in CSV
file_writer.writerow([timestamp, random_number])
# write current row in REDIS
r.ts().add(key_name, timestamp, random_number)
# READ data from CSV with Pandas and benchmark it
start_csv = time.time()
df = pd.read_csv('justcsv.csv') # benchmark
end_csv = time.time()
print("CSV READING TIME IS: " + str(end_csv-start_csv))
# READ data from Redis and benchmark
thelabel = "label=" + "TESTKEY"
mrange_filters = [ thelabel ]
start_redis = time.time()
full_range = r.ts().range("TESTKEY1", "-", "+") #benchmark
end_redis = time.time()
print("REDIS READING TIME IS: " + str(end_redis-start_redis))
Benchmark result:
10000 iterations - slower x2
CSV READING TIME IS: 0.0124
REDIS READING TIME IS: 0.052
20000 iterations - slower x4
CSV READING TIME IS: 0.025
REDIS READING TIME IS: 0.102
30000 iterations - slower x10
CSV READING TIME IS: 0.0139
REDIS READING TIME IS: 0.153
I used the latest Docker image from: https://hub.docker.com/r/redislabs/redistimeseries
My observations:
- From what I understood, Redis should be extremely faster in this task, even because in this context there is a built-in data structure for dealing with timestamps;
- Redis should also be faster because the in-memory feature compared to reading a CSV file from disk
- the time gap from CSV rapidly increases as the data size grows
- even querying the data via redis-cli doesn't change the elapsed time
My questions:
- Why Redis is slower (and so slow)?
- Am I missing something?
- Is there a way to fix this?