1

Django 2.2

I need to fetch 4000-10000 data rows from a particular datatable (let's call this commonsites) amongst others to display a webpage.

I can narrow down to just 3 fields of these 4000-10000 rows (id, name, business_id)

My traffic is low. But i was wondering whether it's a good idea to use caching to fetch these 4000-10000 rows

The data for these rows are unlikely to change. But in case they do change or get deleted, how do I update/remove individual rows in the cache, rather than the entire cache?

Or is this even a good idea?

My installs are :

Update

Adding more clarity, the webpage is a retrieve. Once the page request is made, the javascript frontend will make an API call. This API call will then fetch these 4000-10000 data rows from a datatable.

So these datarows are pre-existing data.

The data is sent as API json data as a list in the json data.

Just to be clear, the data will not be paginated. It will all be displayed. And I haven't measured the data size, so I cannot say how large the data is. I doubt it be more than 5mb.

Kim Stacks
  • 10,202
  • 35
  • 151
  • 282
  • what data types do you use for these rows ? (lists, hashes, sets etc) - how do you fetch them ?(lrange, zrangebyscore, hget etc) ? – Ersoy May 25 '20 at 10:18
  • Sorry forgot to add that this is a REST endpoint. So the data is sent as API json data in list. > HOw do you fetch? I intend to use MyModel.objects.all().only("id", "name", "business_id") then the serializer will convert to the json data format – Kim Stacks May 25 '20 at 10:47
  • just to be clear, you add your data(after json serialize) with `lpush`/`rpush` to a single `redis list` and `lrange` to fetch all -> filter on application side? – Ersoy May 25 '20 at 11:09
  • Just to be clear, this webpage is a retrieve. the 4000-10000 rows are already pre-existing data in the database. So every page refresh, it will call an API endpoint (done using DRF). and this API endpoint will then fetch these data rows which can number between 4000-10000 rows – Kim Stacks May 25 '20 at 14:30
  • Last question, how are you going to display 4K~10K rows in front-end, do you plan to make pagination ? Because when you need to fetch all - then network will be critical. If it is 5MB string - whenever n users are in that page - it will cause a problem in redis server side, since redis is single threaded. – Ersoy May 25 '20 at 21:18
  • >If it is 5MB string - whenever n users are in that page - it will cause a problem in redis server side, since redis is single threaded I wasn't aware of this. And I haven't measured the data size. No, I do not plan to make pagination. I do intend to display all. It's a user requirement. – Kim Stacks May 26 '20 at 04:46

1 Answers1

1

Since we q/a in the comments, i may give a complete answer.

Keeping ~4K-10K of rows in a single string may not be a good practice. But you don't plan any pagination on the front-end side. Even you reduce the number of parameters to id, name, business_id - for that number of rows it may not be good for following reasons;

  • You need to get the value as whole, that may cause network problems. Especially when the traffic is getting high, every single person will fetch that chunk of data.
  • If you want to update/delete some of the rows, you need to (get+update+set)/set them as whole (network again)
  • You can't invalidate some part of the value.
  • You can't set partial TTL - you either expire/persist the whole.

The data for these rows are unlikely to change. But in case they do change or get deleted, how do I update/remove individual rows in the cache, rather than the entire cache?

Since you don't need pagination and want to keep them in a single key, instead of string you can use hash to satisfy conditions listed above.

127.0.0.1:6379> hset row 1 "some-json"
(integer) 1
127.0.0.1:6379> hset row 2 "some-json2"
(integer) 1
127.0.0.1:6379> hset row 3 "some-json3"
(integer) 1
127.0.0.1:6379> hset row 4 "some-json4"
(integer) 1
127.0.0.1:6379> hgetall row
1) "1"
2) "some-json"
3) "2"
4) "some-json2"
5) "3"
6) "some-json3"
7) "4"
8) "some-json4"
127.0.0.1:6379> hset row 3 "some-other-json"
(integer) 0
127.0.0.1:6379> hgetall row
1) "1"
2) "some-json"
3) "2"
4) "some-json2"
5) "3"
6) "some-other-json"
7) "4"
8) "some-json4"
127.0.0.1:6379> hdel row 3
(integer) 1
127.0.0.1:6379> hgetall row
1) "1"
2) "some-json"
3) "2"
4) "some-json2"
5) "4"
6) "some-json4"
127.0.0.1:6379>

You may use hset to update single/multiple row/s at once. You may delete via hdel an individual row from the cache. You can still get all the data as a single via using hgetall. There is no partial TTL for hash fields(sorted sets have that with some code at application layer) but hash is more suitable than strings for your use cases.

Ersoy
  • 8,816
  • 6
  • 34
  • 48
  • Thanks for detailed answer. but i think i will take ur initial recommendation and avoid doing this. I will go back to sq 1 and rethink this thru. I will give you 50 pts as my way of saying thanks – Kim Stacks May 28 '20 at 09:20
  • Thanks @KimStacks - any further questions related to this answer, you may write down - i will try to answer. Best – Ersoy May 28 '20 at 09:22
  • 1
    Something went wrong when I tried to award the bounty. Will try tomorrow instead – Kim Stacks May 28 '20 at 09:38
  • On my laptop, but I still cannot award. Not sure why. Did you receive additional 50 pts? – Kim Stacks May 28 '20 at 16:36
  • @KimStacks no i didn't receive but it is totally okay (didn't notice there was a bounty on the question when i answered) - maybe there is a minimum wait time etc. Thank you trying/time. – Ersoy May 28 '20 at 16:59
  • 1
    I figured out why. There's a min 24 hr wait time after starting bounty. Tomorrow I will award then. Thanks – Kim Stacks May 28 '20 at 17:08
  • 1
    Bounty Awarded. Thank you – Kim Stacks May 30 '20 at 08:21