0

Is there a way to calculate the SUM of scores saved under 24 hour respecting performance of the Redis server ? (For around 1Million new rows added per day)

What is the right format to use in order to store timestamp and score of users using sorted sets ?

Actually I am using this command:

ZADD allscores 1570658561 20

As score, it is the actual time in seconds ... and other field is the real score.

But, there is a problem here ! When another user get the same score (20), it is not added since it's already present - Any solution for this problem ?

I am thinking to use a LUA script, but there is 2 headaches:

  • The LUA script will block other commands from working until it is finished the job (Which is not a good practice for my case since the script have to work 24/24 7/7 meanwhile many users have to fetch datas in the same time from the Redis cache server like users scores, history infos ect.) - Plus, the LUA script have to deal each time with many records saved each day inside a specific key - So, while the Lua script is working, users can't fetch datas ... knowing that the Lua script will work in loop all time.

  • Second, it is related to the first problem that do not let me store same score if I use timestamp as score in the command so I can return 24 hour datas.

If you are in my case, how will you deal with this ? Thanks

1 Answers1

0

Considering that the data is needed for last 24 hours(Sliding window) and the number of rows possible is 1 million. We cannot use sorted set data structure to compute sum with high performance.

High performance design and also solving your duplicate score issue:

Instead with a little decision on the accuracy, you can have a highly performant system by crunching the data within a window.

Sample Input data: input 1: user 1 wants to add time: 11:10:01 score: 20 input 2: user 2 wants to add time: 11:11:02 score: 20 input 3: user 1 wants to add time: 11:17:04 score: 50

You can have 1 minute, 5 minutes or 1 hour accuracy and decide window based on that.


If you accept an approximation of 1 hour data, you can have this while insertion, for input 1 :

INCRBY SCORES_11_hour 20

for input 2:

INCRBY SCORES_11_hour 20

for input 3:

INCRBY SCORES_11_hour 20

To get the data for last 24 hours, you need to sum up only 24 hourly keys.

MGET SCORES_previous_day_12_hour SCORES_previous_day_13_hour SCORES_previous_day_14_hour .... SCORES_current_day_10_hour SCORES_current_day_11_hour


If you accept an approximation of 5 minutes, you can have this while insertion, along with incrementing the hourly keys, you need to store the 5 minute window data.

for input 1 :

INCRBY SCORES_11_hour 20

INCRBY SCORES_11_hour_00_minutes 20

for input 2:

INCRBY SCORES_11_hour 20

INCRBY SCORES_11_hour_00_minutes 20

for input 3:

INCRBY SCORES_11_hour 20

INCRBY SCORES_11_hour_05_minutes 20

To get the data for last 24 hours, you need to sum up only 23 hour keys(whole hours data) + 12 five minute window keys


If the time added is based on the current time, you can optimize it further. (Assuming that if it is 11th hour and the data for 10th, 9th and the previous hours wont change at all).

As you told it is going to be 24/7, we can use some computed values from the previous iterations too.

Say it is computed on 11th hour, you would've got the values for past 24 hours. If it is again computed on 12th hour, you can reuse the sum for 22 intermediate hours whose data is unchanged and get only the missing 2 hours data from redis.

Similarly further optimisations can be applied based on your need.

Community
  • 1
  • 1
vkm
  • 111
  • 1
  • 8