If you're just storing article data, I find that you should store each article property in a per-article hash but you should build a single hash where key should be the article identifier while the value should be a JSON-serialized object string.
Usually you use hashes when you need to access particular propierties of some object, but I guess you're obtaining these articles to get them listed in some UI so there's no reason to use a hash per article. Anyway, both hash-per-article and a hash for all articles as JSON can coexist: the hash-per-article if you need to access a particular article property without getting the entire object, and the hash for all articles for getting an entire object or to list objects.
Just imagine how many calls to Redis you may avoid using this approach. You get all article identifiers from the list, and then you use a single hmget
command to get all articles in a single trip. Since you're using lrange
, I understand you're not going to get all articles but you're using pagination.
Your API gets all JSON objects as strings and it returns them to the API client directly.
Some concern about your API resource URI
I checked your statement:
Upon having this request:
api/v1/app/1/articles/20
In REST, articles/20
would me "get the article 20 by id" rather than "get 20 articles".
Let me suggest you that there're two approaches for me to address the range thing:
- Using query string:
api/v1/app/1/articles?startFrom=0&max=20
(parameter names are just my suggestion...).
- Using HTTP headers. You can send an HTTP header along with your request like
MyApi-Range: 0 20
, where 0 is the start position and 20 the maximum page size (i.e. maximum results).
Update: some details about the approach.
OP said in some comment:
We only keep 20 articles at any given time. So when an app pushes a
new article, the last one drops from the list and new one gets added
to left of list. Then we remove the artice:{ID} hash. With your
solution, I need to read the json serial string, remove the
article:{ID} property, add the new one, then save it (and override the
previous key). Some more work on the backend side. Is there no other
way do get those hashes in a faster way apart from keeping them as a
json serial? I know LUA could help Redis do it in one command, but I'm
not sure if the load on Redis would remain same.
My approach is:
- Articles are stored in a hash
articles
where keys are article ids and values JSON-serialized article objects:
[1] => {title: "Hello", description: "World"}
[2] => {title: "Hello 2", description: "World 2"}
....
Also, you should keep insertion order adding article ids in a list called - for example - articles:ids
:
[1, 2]
When you want to store a new article, you serialize the article object and you add to the articles
hash using hset
, and you add the article id to the articles:ids
list using lpush
. Do it using a MULTI
command to be sure that the operation is done atomically!.
If you want to get articles by insertion order, you need to get articles:ids
article ids and use hmget
to get all articles.
When there're 20 articles, as you said in your comment, you need to get the latest article id in the articles:id
using the rpop
command, and you use hdel
command to remove the article object from articles
hash. Do it using a MULTI
command to be sure that the operation is done atomically!.
Update 2: Some clarifications
OP said:
How would I retrieve the articles, using HMGET? How well would that
hash scale when it contains around a million of keys?
About how you would retrieve the articles using hmget
, it's easy: you get the list items (maybe using lrange
) and you give all obtained ids as argument of hmget
to get the whole articles from the hash.
Concerning the how well would scale the hash with million of keys, check that hget
time complexity is O(1)
so it means that the number of keys doesn't affect the access time, while hmget
(since it's a hash multiple get) is O(n)
because the access time is increased by the number of keys being get (rather than the total keys stored in the hash).
BTW, since Redis 3.x is gold, and it provides a big improvement in terms of scalability thanks to Redis Cluster, you should learn more about this new feature and how sharding can help in case of large data sets.