4

Im trying to build my own social network / forum application, where people can add and like each others posts. Im using DynamoDB as my database with a single table. For the post liking functionality Im using a Lambda Function in combination with DynamoDB-Streams which aggregates the like attribute.

Currently Im working on a ranking mechanism for these user posts. With that I want to make sure my users can list the interesting posts in a forum in that point of time.
For that purpose, I read how reddit handles its ranking algorithm on this page.
I also read this question on Stackoverflow which is near to my, without a good answer imo.

My question is, how one would solve this problem with the help of the AWS ecosystem (Maybe even with DynamoDB and Lambda Functions alone ?)

EDIT:
My database schema looks something like this:

Partitionkey                                     Sortkey             likes       ...
----------                                       --------            ------
forum#soccer                                     01.08.19 13:15
forum#baseball                                   22.08.19 20:11
post#soccer#Do you think FC Barcelona wins?      05.08.19 10:20       203
post#soccer#Which club is your favorite ?        05.08.19 10:20       2
like#Which club is your favorite ?               John Wick
like#Which club is your favorite ?               Walter White
...

With each insert of an item which starts with like# a lambdafunction is getting triggered and updates the post entry on column likes.
My aim is to query the trendiest posts of the current time. This should be possible with the available information like the creation time and like count of the post. Currently my query is just returing the newest posts

osazemeu
  • 87
  • 2
  • 11
Ahmet K
  • 713
  • 18
  • 42
  • What is your actual question, how do I organize my DDB table so I can query it for posts with the most likes? If so, please describe your current table design. – Ashaman Kingpin Oct 15 '19 at 19:00
  • Your table structure would be helpful, it is difficult to understand it from the description. – dmigo Oct 15 '19 at 19:59
  • @dmigo I edited my question. Is it clear now ? Thank you for your help – Ahmet K Oct 16 '19 at 17:56
  • @AshamanKingpin I edited my question :) – Ahmet K Oct 16 '19 at 17:58
  • What exactly do you want the query to return? A list of posts ordered by the number of likes? Would the list of posts be for all time? The last hour? The last day? – matt helliwell Oct 19 '19 at 13:41
  • @matthelliwell It should return the posts with the highest trending score. The score could be calculated with this formula: `Trending Score = (p - 1) / (t + 2)^1.5 ` where **p** is the amount of likes the post got and **t** for the time since submission in hours – Ahmet K Oct 20 '19 at 15:16

1 Answers1

4

I'll provide a possible solution considering only DynamoDB and Lambda (and maybe AWS SQS). If it doesn't fit, we may think using other solutions as Amazon ElastiCache.


Algorithm:

  1. Your DynamoBD table will have an item with a partition key (NOTE 1) named trending#posts, only trending (it's up to you) and sort key as date or type of post (or anything you want to sort. You may want to analyze the trending over time - using sort key as date - or filter trendings by post type). Or if you don't want filters, you might use just a single value.

  2. Each like in a post will trigger a Lambda which will handle trending posts (NOTE 2).

  3. When triggered, the Lambda will receive the liked post and will perform:

    1. Read all N trending posts saved in your table.

    2. Read number of likes and post time of those posts.

    3. Perform the trending score in the current N posts and, if the liked post is different from those, in the new post too.

    4. Sort again the posts and save the N with greatest score in your table.


NOTE 1: you don't need to have the exact score over time, just the ranking. I mean, if you save the trending at 9 A.M., you don't need the correct trending at 1 P.M., just the position of the 1st, 2nd... You just need the new score when a new like occurs.

NOTE 2: I said "and maybe AWS SQS" because users may like posts at the same time and Lambda would be executed concurrently and consistency problems may happen. With AWS SQS, each like will push the event to SQS which triggers the Lambda. This way Lambdas will not be executed at the same time.

Pedro Arantes
  • 5,113
  • 5
  • 25
  • 60