I am working on a way to find the most popular likes in my friend network. "Most popular in my friend network" is defined as "having the most number of likes by my friends".
Suppose each friend has an unique id and has a number of liked pages. So, given an array of such friends, I want to find the likes which is liked by the most number of friends, and also the friends who liked this thing. Essentially, I want to show something like "Your friend X, Y & Z likes this."
My first solution is to use a Map (for storing the reverse mapping: like->set)and a Priority Queue (for finding the top N). Here is my algorithm (using C++ STL):
map< like, set<friend> > like2friendsMap;
for each friend {
for each like {
like2friendsMap[like].insert(friend); //populate the map
}
}
priority_queue< pair<like, int> > pq;
for each like in like2friendsMap {
int count = like2friendsMap[like].size(); //no. of friends who like this or "popularity"
pq.push(like, count); //count is the priority
}
map< like, set<friend> > result
for i in 1 to N { //N is how many popular items I want
result = pq.top(); //gives me the element with highest priority (most popular like)
pq.pop();
}
As STL internally uses Red Black Tree for implementing map and min/max heap for priority queue, this approach seems pretty fast to me. But if I have 100s of friends and each have 100s of likes, the memory usage would be huge. Of course, instead of storing the whole objects, I should use friend id and like id for all calculations which would reduce memory usage a lot.
What other algorithm or data structure can be used to improve efficiency (increase speed, reduce memory)? For some reason, I can't store the list of friends against each like, it must be computed in run time. I am developing this using C++, so solutions using STL or boost would be even better.