14

I am trying to work on android mobile app where I have a functionality to find matches according to interest and location. Many dating apps are already doing some kinda functionality for example Tinder matches based on locations, gender and age etc.

I do not want to reinvent the wheel if it has been done already. I have searched on google and some suggested to use clustering algorithm for this Algorithm for clustering people with similar interests User similarities algorithm

Lets I have data in this JSON format for users

User1: {location: "Delhi, India", interests: ["Jogging", "Travelling", "Praying"] }
User2: {location: "Noida, India", interests: ["Running", "Eating", "Praying"] }
User3: {location: "Bangalore, India", interests: ["Exercise", "Visiting new places", "Chanting"] }

I am writing a matching algorithm that matches few below criteria -

  1. If user1 is having an interest in "Jogging" and another user2 is having an interest in "Running" so as jogging and running is alternatively a kind of exercise so they should match both the profiles as well as it should be location wise also as nearest should be on top.

  2. The algorithm, when running at scale, should be fairly performant. This means I'd like to avoid comparing each user individually to each other user. For N users this is an O(N^2) operation. Ideally, I'd like to develop some sort of "score" that I can generate for each user in isolation since this involves looping through all users only once. Then I can find other users with similar scores and determine the best match based off that.

Can anyone suggest me with some implementation of how can I achieve this with the help of firebase-cloud-function and firebase-database.

Community
  • 1
  • 1
N Sharma
  • 33,489
  • 95
  • 256
  • 444

2 Answers2

3

I think hard coding similarity is a wrong approach. FYI none of the major search engines rely on such mappings.

A better approach is to be more data driven. Create an ad hoc methodology to start with and once you have sufficient data build machine learning models to rank matches. This way you do not have to assume anything.

For the location, have some kind of a radius (preferably this can be set by the user) and match people within the radius.

ElKamina
  • 7,747
  • 28
  • 43
  • Do you mean that first I need a some database of profiles, start with random and location wise, once I have data of users then start matching on basis of interest ? – N Sharma Apr 24 '17 at 05:58
  • Don't start completely random. But have some broad heuristics. For example, give exact matches slight priority. Once you have enough data abandon heuristics and adopt machine learning based model. – ElKamina Apr 24 '17 at 15:07
  • I gotcha. I am new for these kinda machine learning based model, do you have an idea how I should start with my case ? Should I run script on my server kinda cron job which analyze all profiles data and then map profiles in database which profile is matching or not ? like matches attributes of each profile in database has profile ids of those which matches ? – N Sharma Apr 24 '17 at 15:10
3

First of all i would say get rid of the redundant features in your dataset, Jogging and running could be 1 feature instead of 2, also after that you can use K-means algorithm to group data in an unsupervised way to learn more about K-means you can go to this link: https://www.coursera.org/learn/machine-learning/lecture/93VPG/k-means-algorithm

Also as you're building an online system, it has to improve itself everyday You can watch this for learning a bit more about online learning https://www.coursera.org/learn/machine-learning/lecture/ABO2q/online-learning

Also https://www.coursera.org/learn/machine-learning/lecture/DoRHJ/stochastic-gradient-descent this stochastic gradient will be helpful to know.

These are conceptual videos do not implement anything yourself, you can always use a library like tensorflow https://www.tensorflow.org/

I know this looks a bit hard to understand but you'll need this knowledge in order to build your own custom recommendation system.

Ankit Arora
  • 509
  • 2
  • 18
  • by reducing the features i mean to say that instead of setting jogging and running as 2 features you can set them as 1 only when you're analyzing data, but i am not saying that you should remove them for users – Ankit Arora Apr 26 '17 at 03:07
  • "you can use K-means algorithm to group data" - Do you have an example or snippet of code which demonstrate functionality of grouping ? – N Sharma Apr 26 '17 at 05:02
  • i don't have anything handy right now written by me for that, but here you can see the implementation in scipy http://glowingpython.blogspot.in/2012/04/k-means-clustering-with-scipy.html – Ankit Arora Apr 26 '17 at 06:50