I'm looking for the best solution to retrieve the most relevant outputs to the each user.
I simplified my models as UserProfile and Groups like below
-Model Name: UserProfile
styles: ['a', 'b', 'f', 'r'] <- ('styles' are field name)
-Group 1
styles: ['a', 'f']
-Group 2
['g', 'a', 'h']
...
-Group 1,000,000
styles: ['s', 'w', 'x']
(Let's say we have millions of Groups)
I want to sort and retrieve groups based on the user's styles. So in this case, 'Group 1' scores 2 because of the styles 'a', 'f', and 'Group 2' scores 1 because of the style 'a'.
We can't store the scores in our main database because each user has different styles.
- My Approach 1: rank all database every time when user requests (I wrote a code conceptually)
views.py
for group in Group.objects.all():
# store the score to the new field of the group
group.style_count = group.styles.join_count(user.styles)
list_view_output = Group.objects.order_by(style_count)
- Approach 2: Store the rank on database Execute the query and store the outputs (with rank of course and user id) in Redis in-memory cache database. And retrieve results when specific user wants to
Problems in mind:
- The query seems quite expensive. O(n) for iterating * O( min( user.style.count(), group.style.count() ) ) for joining. How can I do better? Maybe I can do something in Model?
- Unfortunately if we have a million groups and 1,000 users, I need to store a billion rows in cache memory (Redis). And I can definitely not afford it (I think I can have maximum 8GB,, or maybe more)
- Maybe I won't need to store every users' rank data in cache because some users have same styles. Do you know any AI approach on this?
Also could you provide any advice to build this better?
Thank you...!!!!!