Let's say I have the following data about users
User1: {location: "Topeka, KS", school: "University of Texas", interests: ["running"] }
User2: {location: "Austin, TX", school: "University of Texas", interests: ["knitting", "running"] }
User3: {location: "Topeka, KS", school: "University of Kansas" interests: ["kayaking"]}
Given this information, I'm writing a matching algorithm that pairs together the "best" users. There are a few criteria -
Not all properties are weighted equally. Let's say "location" is far more heavily weighted than any other property. In the above even though Users 1 and 2 share two properties (school and "running"), the best match for User 1 is still User 3 because of the high weight of the location
The algorithm, when running at scale, should be fairly performant. This means I'd like to avoid comparing each user individually to each other user. For N users this is an O(N^2) operation. Ideally I'd like to develop some sort of "score" that I can generate for each user in isolation, since this involves looping through all users only once. Then I can find other users with similar scores and determine the best match based off that.
The list of interests, locations, schools, etc... is not known ahead of time. They are provided by an external API and could literally be any string.
Is there any sort of known algorithm that optimizes pairing in this way?
Thanks!