2

I have a large set of users in my project like 50m.

I should create a playlist for each user every day, for doing this, I'm currently using this method:

I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.

I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at

After that, I run a foreach on the result and publish a message for each in my message-broker.

Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.

If my message-broker messages list was empty, I will repeat these steps (While - Do-While)


I think the processing method is good enough and can be scalable as well, but the method we use to create the message for each user is not scalable!

How should I do to dispatch a large set of messages for each of my users?

Mahdi Youseftabar
  • 2,273
  • 1
  • 22
  • 29
  • With these many users (and I'm assuming this number will only increase), why don't you leverage systems like `kafka`, for example and have separate producer and consumer logic which pertains to your use case? – vish4071 Mar 17 '22 at 23:26
  • I'm not aware of your algorithm for creating playlists, but why you don't use a graph, for example neo4j to have a cloud of playlist items which makes you able to choose the best fit for each user based on their interests? So instead of making 50m playlists, you will only make a huge one then choosing a subset for each user – Saeed Falsafin Mar 18 '22 at 09:05
  • the algorithm of creating a playlist is not important here ... the problem is do that algorithm for all the users! @SaeedFalsafin – Mahdi Youseftabar Mar 18 '22 at 10:11
  • i'm using message-borker in my system ... kafka is a message broker! also you can use any AMQP server or ... @vish4071 – Mahdi Youseftabar Mar 18 '22 at 10:13
  • I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at -> what is the trigger condition for doing this / when you do this? – Phenomenal One Mar 18 '22 at 11:30
  • in a while(true) in a process @PhenomenalOne – Mahdi Youseftabar Mar 18 '22 at 12:57

1 Answers1

1

Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.

Although this is a design question and there are multiple solutions, here's how I would solve it.

First up, think of updating the playlist for a user as a job.

Now, in your case this is a scheduled Job. ie. once a day.

  1. So, use a scheduler to schedule the next job time.
  2. Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
  3. Generate the playlist for the user based on the job. Create a Schedule event for the next day.
  4. You could persist Scheduled Job data just to avoid race conditions.
Phenomenal One
  • 2,501
  • 4
  • 19
  • 29
  • i really liked your solution ... for check that i really understood your solution: - for the first time i should dispatch the job by myself. is it right? - if we lost our message broker data, we should manually dispatch the job for all users again? - if something goes wrong and next event not registered, how we could find out that and recover that job? – Mahdi Youseftabar Mar 25 '22 at 19:03
  • Yes, for the first time, maybe some script. About job control, use a strongly consistent DB that persists the last run time every day. Suppose some events are getting missed, you could simply query the DB for the misses. – Phenomenal One Mar 26 '22 at 07:06
  • it would be great to add your comment into the answer :) – Mahdi Youseftabar Mar 26 '22 at 12:51