1

Let's say I have the following data about users

User1: {location: "Topeka, KS", school: "University of Texas", interests: ["running"] }
User2: {location: "Austin, TX", school: "University of Texas", interests: ["knitting", "running"] }
User3: {location: "Topeka, KS", school: "University of Kansas" interests: ["kayaking"]}

Given this information, I'm writing a matching algorithm that pairs together the "best" users. There are a few criteria -

  1. Not all properties are weighted equally. Let's say "location" is far more heavily weighted than any other property. In the above even though Users 1 and 2 share two properties (school and "running"), the best match for User 1 is still User 3 because of the high weight of the location

  2. The algorithm, when running at scale, should be fairly performant. This means I'd like to avoid comparing each user individually to each other user. For N users this is an O(N^2) operation. Ideally I'd like to develop some sort of "score" that I can generate for each user in isolation, since this involves looping through all users only once. Then I can find other users with similar scores and determine the best match based off that.

  3. The list of interests, locations, schools, etc... is not known ahead of time. They are provided by an external API and could literally be any string.

Is there any sort of known algorithm that optimizes pairing in this way?

Thanks!

user2490003
  • 10,706
  • 17
  • 79
  • 155
  • in your case, I would go with a clustering algorithm, setting different weight on your criteria. – Whitefret May 09 '16 at 07:52
  • Thanks! Just a quick look at the wikipedia page tells me theres at least half a dozen *types* of clustering algorithms, each with multiple algorithmic implementations. Any thoughts on a more specific type of clustering algorithm I could investigate? Thanks again! https://en.wikipedia.org/wiki/Cluster_analysis – user2490003 May 09 '16 at 07:58
  • I don't really which one will suit best because I only used k-means algorithm for that. It might be a good start. – Whitefret May 09 '16 at 08:13
  • I am looking for the solution with similar problem. Hope you might have found the solution? Can you share please? Thanks – Dilip Lilaramani Jun 25 '18 at 09:37

0 Answers0