1

Let's assume I have a front-end app for a blog and I stored the blog posts in an Elasticsearch instance (this is a hypothetical example).

I want multiple users to be able to mark some blog posts as favorite and the super users to be able to flag blog posts. For marking as favorite, only the user that did the marking is able to see it as marked. For flagging, if one user flags it, all the other users sees it as flagged.

I was thinking about adding a boolean field for the flagging and an array field with the user ids for the marking. This way I can use a boolean query to find flagged posts and for the favorite posts of a user I can use an exists query.

I'm pretty new to Elasticsearch so I'm not sure if this will perform good enough on millions/billions of posts. What other options are there?

Edit: Forgot to mention that I would also like to have paging for the blog posts and be able to filter out/in the flagged or marked posts. For example I want first (ordered by creation date) 10 blog posts that are marked as favorite, or last 10 blog posts flagged.

Foryah
  • 193
  • 2
  • 11

1 Answers1

1

To make a favorite system one solution is to store the data in a different index with blog_id user_id created_at

This way you can easily add remove and search.

I want multiple users to be able to mark some blog posts as favorite

User 1 click on the favorite link of blog 2, system will store in "favorite" index {"user_id":1, "blog_id": 2, "created_at": "2019-10-02 12:00:02", "blog_created_at": "2019-01-01 09:10:11"}

only the user that did the marking is able to see it as marked.

You can search with get by id if you concaten the user_id-blog_id or you can make a search with blog_id, user_id and you can know if the record exist if the blog you display is marked as favorite by the user who read. Same for list page as you know the user_id and after you build the list of blog_ids you'll display you can make the search and retrieve a list that you will use when you'll display your list of blogs.

This solution will have good performance even for billions of posts.

If you have flag you can also flag your blog post the same way and put a category field.

Depends on how much flag and which kind of flag you'll have you can consider saving in the same index with a category field ['favorite', 'flag', ...] or save in different indices.

Also another thing to check about is using periodic index (monthly, weekly or daily) depend on the number of document you will store and how much update (add/remove favorite) you will have. You can rollup your index in yearly later if you have lass activity on them. And a last thing, maybe consider using cache to handle frenetic click on favorite button that can lead to add/remove document which will increase the number of deleted document in your index, that can make your index slow.

Edit for the Edit in the question:

For example I want first (ordered by creation date) 10 blog posts that are marked as favorite, or last 10 blog posts flagged.

You can add the blog creation date in your favorite records "blog_created_at" (I updated the example document). So you can sort by blog creation date and limit your aggregation at 10 if you want the 10 first.

For the other case in your comment:

If I want to get just 10 blog posts, ordered by date, which are not marked as favorite or not flagged

You can add a field in your blog and set as True if you have a favorite, something like "has_favorite" or "has_flag". You set as True when you first set as favorite, if it's already favorite you do nothing. So you can search against this field to filter the blog that don't have favorite.

If somebody remove a favorite you can count how much this blog have favorite if 0 set has_favorite to False. <-- only this case can generate update but it's maybe 0.001% of case so better to focus on the 99% of case. If it increase, need to adapt the solution.

Gabriel
  • 192
  • 8
  • I thought about this as well but then I'm not sure how to manage pagination. I guess I forgot to mention this in the question so I'll update it in a second. If I want to get just 10 blog posts, ordered by date, which are not marked as favorite or not flagged, how do I do that? In a relational database you would need a join for this, not sure how I can do this in Elasticsearch and if it has good performance. – Foryah Oct 02 '19 at 07:25
  • I upated the answer according to this case and the edit. – Gabriel Oct 04 '19 at 03:56
  • Thanks for the update! The thing that I'm trying to do is the "marks" (as in "mark as favorite") have to be at user level and the "flags" would be at the blog post level. So, when I want to get 10 blog posts that are marked as favorite, these will be different depending on the user making the request. For the flagged ones, they will be the same. So then I cannot use a "has_favorite" flag since it's not at the blog post level. Not sure if I explained the issue clear enough. – Foryah Oct 05 '19 at 17:47
  • @Foryah Adding the has_favorite on the blog level is only for the ask about get the 10 blog posts which are not marked as favorite. – Gabriel Oct 07 '19 at 03:53
  • I understand, but it's not a blog level property. Say I have 2 users, user1 and user2. Say I have 20 blog posts and say user1 marked the first 10 as favorite and user2 marked the last 10 as favorite. If user1 makes a request for 10 blogs posts which are *not* favorite, he will get the last 10. If user2 makes the same request, he'll get the first 10. So this will have to be (in RDBM terms) a join between the two indexes: blog index and favorite index. – Foryah Oct 07 '19 at 07:38
  • 1
    Sorry for the late reply, You don't have to think in term of database and relations, it may make the things more confuse. Keeping the same example you gave if you get the first 10 results of favorite index you will get a list of blogs id so you can make a second search by ids, to get the blogs documents (title, summary etc...) and your problem is solved. – Gabriel Oct 28 '19 at 02:26