3

I have a LIST containing pointers to some HASH data. Something like:

[LIST] app:1 ["article1", "article2", "article3" ...]
[HASH] article1 {title: "Hello", description: "World"}
[HASH] article2 {title: "Hello", description: "World"}
[HASH] article3 {title: "Hello", description: "World"}

Upon having this request:

api/v1/app/1/articles/20

I do the following:

$pointers = $this->redis->lrange($appID, 0, $request->articles);
$articles = [];

foreach($pointers as $pointer) {
   $articles[] = $this->redis->hgetall($pointer);
}

So I end up having: 1x lrange call, then $request->articles amount of hgetall calls. May I ask what could be the fastest solution for this?

I thought about:

  1. Doing HMGET

  2. Doing MULTI/EXEC

  3. Writing this functionality with LUA and getting them in a single command.

Any ideas?

Aristona
  • 8,611
  • 9
  • 54
  • 80

2 Answers2

3

If you're just storing article data, I find that you should store each article property in a per-article hash but you should build a single hash where key should be the article identifier while the value should be a JSON-serialized object string.

Usually you use hashes when you need to access particular propierties of some object, but I guess you're obtaining these articles to get them listed in some UI so there's no reason to use a hash per article. Anyway, both hash-per-article and a hash for all articles as JSON can coexist: the hash-per-article if you need to access a particular article property without getting the entire object, and the hash for all articles for getting an entire object or to list objects.

Just imagine how many calls to Redis you may avoid using this approach. You get all article identifiers from the list, and then you use a single hmget command to get all articles in a single trip. Since you're using lrange, I understand you're not going to get all articles but you're using pagination.

Your API gets all JSON objects as strings and it returns them to the API client directly.

Some concern about your API resource URI

I checked your statement:

Upon having this request:

api/v1/app/1/articles/20

In REST, articles/20 would me "get the article 20 by id" rather than "get 20 articles".

Let me suggest you that there're two approaches for me to address the range thing:

  • Using query string: api/v1/app/1/articles?startFrom=0&max=20 (parameter names are just my suggestion...).
  • Using HTTP headers. You can send an HTTP header along with your request like MyApi-Range: 0 20, where 0 is the start position and 20 the maximum page size (i.e. maximum results).

Update: some details about the approach.

OP said in some comment:

We only keep 20 articles at any given time. So when an app pushes a new article, the last one drops from the list and new one gets added to left of list. Then we remove the artice:{ID} hash. With your solution, I need to read the json serial string, remove the article:{ID} property, add the new one, then save it (and override the previous key). Some more work on the backend side. Is there no other way do get those hashes in a faster way apart from keeping them as a json serial? I know LUA could help Redis do it in one command, but I'm not sure if the load on Redis would remain same.

My approach is:

  • Articles are stored in a hash articles where keys are article ids and values JSON-serialized article objects:
[1] => {title: "Hello", description: "World"}
[2] => {title: "Hello 2", description: "World 2"}
....
  • Also, you should keep insertion order adding article ids in a list called - for example - articles:ids:

    [1, 2]

  • When you want to store a new article, you serialize the article object and you add to the articles hash using hset, and you add the article id to the articles:ids list using lpush. Do it using a MULTI command to be sure that the operation is done atomically!.

  • If you want to get articles by insertion order, you need to get articles:ids article ids and use hmget to get all articles.

  • When there're 20 articles, as you said in your comment, you need to get the latest article id in the articles:id using the rpop command, and you use hdel command to remove the article object from articles hash. Do it using a MULTI command to be sure that the operation is done atomically!.

Update 2: Some clarifications

OP said:

How would I retrieve the articles, using HMGET? How well would that hash scale when it contains around a million of keys?

About how you would retrieve the articles using hmget, it's easy: you get the list items (maybe using lrange) and you give all obtained ids as argument of hmget to get the whole articles from the hash.

Concerning the how well would scale the hash with million of keys, check that hget time complexity is O(1) so it means that the number of keys doesn't affect the access time, while hmget (since it's a hash multiple get) is O(n) because the access time is increased by the number of keys being get (rather than the total keys stored in the hash).

BTW, since Redis 3.x is gold, and it provides a big improvement in terms of scalability thanks to Redis Cluster, you should learn more about this new feature and how sharding can help in case of large data sets.

Matías Fidemraizer
  • 63,804
  • 18
  • 124
  • 206
  • So you're telling me to keep all articles as a serialized json object in a single string? I feel like it cannot happen, sadly. Because we keep ~20 articles, when I push a new article into the LIST, I remove the last one. If I keep them as a serial, I need to rewrite it whenever it updates, otherwise I'll end up having garbage values in the serial. Also, initially I went with this structure because it is more organized and it's the purpose of hashes. I thought it would be faster than returning a long json serial each time. – Aristona May 26 '15 at 16:40
  • As for the endpoint, ignore it. :) It's actually a POST endpoint that listens those requests, but it had to be done like that to avoid some problems we got with CORS/older jQuery versions/sites with unicode characters in the URI. It's an internal API, anyway, with no public access. – Aristona May 26 '15 at 16:42
  • @Aristona Hey, BTW I don't understand the thing of pushing a new article into the list and remove the last one, and I don't understand the difference of dropping a key in a hash or drop hashes – Matías Fidemraizer May 26 '15 at 21:42
  • @Aristona About the "long JSON", don't you feel that returning a string is faster than accessing key by key in a hash and returning it? – Matías Fidemraizer May 26 '15 at 21:43
  • We only keep 20 articles at any given time. So when an app pushes a new article, the last one drops from the list and new one gets added to left of list. Then we remove the artice:{ID} hash. With your solution, I need to read the json serial string, remove the article:{ID} property, add the new one, then save it (and override the previous key). Some more work on the backend side. Is there no other way do get those hashes in a faster way apart from keeping them as a json serial? I know LUA could help Redis do it in one command, but I'm not sure if the load on Redis would remain same. – Aristona May 27 '15 at 07:56
  • @Aristona IMHO, I believe your approach is wrong. You don't need to remove `id` property from the JSON. You simply drop the entire JSON and you store the new article JSON as a new key in the hash. Lua here wouldn't help at all... – Matías Fidemraizer May 27 '15 at 08:03
  • @Aristona Check my update on my answer to get further details about the approach... – Matías Fidemraizer May 27 '15 at 08:25
  • Oh I thought you were going to do SET articles:{appID} "json serial containing all articles of this site". Your approach sounds good. How would I retrieve the articles, using HMGET? How well would that hash scale when it contains around a million of keys? – Aristona May 27 '15 at 14:53
  • @Aristona Finally you caught it!!!!!!! :) Check my #2 update on my answer where I give you a hint about these concerns. – Matías Fidemraizer May 27 '15 at 15:14
  • Thanks for the clarification. :) Appreciated. – Aristona May 28 '15 at 11:52
  • 2
    @Aristona You're welcome and also welcome to the exciting world of Redis :) – Matías Fidemraizer May 28 '15 at 12:45
0

Change your hash key from article1 to app1:article1

Infinite Recursion
  • 6,511
  • 28
  • 39
  • 51
halil
  • 1,789
  • 15
  • 18
  • I cannot do it, I need to keep track of article/app relations in the fastest way possible. It would end up make Redis scan whole database since I cannot do something like HGETALL *app1*, unless you're talking about something different. – Aristona May 27 '15 at 14:24
  • Ok, i misunderstand your question. – halil May 28 '15 at 05:39