0

I am currently working on a concept for an matching algorithm based on huge amount of data. And it's my first time.

That's the case:

  • we've got X objects of type "House" with features like size, location and so on
  • we have people looking for houses, their search includes size, location and so on

=> we want to match houses to people based on their preferences (size, location, ..)

What's the better approach?

1) Clustering all houses and check to which cluster the person (who wants to buy) belongs to (match people/house with same feature values like size and location) 2) Build a recommender what would also require many people who bought houses in the past in our HDSF

Which technology stack to use for the better approach?

I am currently thinking of: Hadoop/Hive (Storage) - Sqoop (Get data into storage) - Mahout (analysis)

Your help is much appreciated! Thanks in advance!

Dennis Ruske
  • 228
  • 2
  • 12

1 Answers1

0

I would suggest, based on the fact that you have no users yet to match the houses to, that the best approach would be to use clustering, and once you have consistent clusters, to assign a class to every cluster, reducing the problem to a classification one.

With respect to the stack, it depends largely on personal preferences plus the available hardware.

shirowww
  • 533
  • 4
  • 18
  • Thanks for your answer. Do you also have an idea of how to make it real-time? So if a users comes to the website, I can calculate the clusters in real-time and tell him what houses match his needs? – Dennis Ruske Sep 23 '15 at 09:44
  • Once the class labels are assigned to the clusters, it is reduced to a classification task, as stated. Then, you only have to train a classifier model over your dataset (also, the clustering will do). For every arriving user, extract those same features used for houses, through preferences asking. Then, classify him using your trained model. – shirowww Sep 23 '15 at 11:13