0

I'm new to mahout and still trying to figure things out.

I'm trying to run a KNN based recommender using mahout 0.8 that runs in hadoop cluster (distributed recommender). I'm using mahout 0.8, so KNN is deprecated, but it is still usable (at least when I make it in java code)

I have several questions:

  1. Is it true that there are basically two mahout implementations? distributed (runs from command line) non disributed (runs from jar file)

  2. Assumming (1) is correct, Is mahout support running KNN based recommender from command-line? Can someone gives me a direction to do it?

  3. Assumming (1) is wrong, how can I build a recommender in java (I'm using eclipse) that runs in hadoop cluster (distributed)?

Thanks!

Daniel W
  • 1,092
  • 1
  • 12
  • 25

1 Answers1

2

KNN is being deprecated because it is being replaced with item-based and user-based cooccurrence recommenders and the ALS-WR recommender, which are better, more modern.

  1. Yes, but not all code has a CLI interface. For the most part the CLI jobs in Mahout are Hadoop/distributed jobs that produce files in HDFS for output. These can be run from jar files with your own code wrapping them as you must with the local/non-distributed/non-Hadoop versions, which do not have a CLI. The in-memory recommenders require you to pass in a user ID to get recs, so you have to write code to do that. The Hadoop versions do have a CLI since they precalculate all recs for all users and put them in files. You'll probably insert them in your DB or serve them up some other way.
  2. No, to my knowledge only user-based, item-based, and ALS-WR recommenders are supported from the command line. This runs the Hadoop/distributed version of the recommenders. This can work on a single machine, of course even using the local filesystem since Hadoop can be set up that way.
  3. For the in-memory recommenders, just write your driver code and run them in eclipse, since Hadoop is not involved it works fine. If you want to use the Hadoop versions, setup Hadoop on your dev machine to run locally using the local filesystem and everything works fine in eclipse. Once you have things debugged move it to your Hadoop cluster. You can also debug remotely on the cluster but that is another question altogether.

The latest thing in Mahout recommenders is one that is trained in the background using Hadoop then the output is indexed by Solr. You then query Solr with items the user has expressed a preference for, no need to precalculate all recs for all users since they returned from a Solr query in near realtime. This is in Mahout 1.0-SNAPSHOT's mahout/examples/ or here https://github.com/pferrel/solr-recommender

BTW this code is being integrated with Mahout 1.0 and moved to run on Spark instead of Hadoop so even the training step will be much much faster.

Update: I've clarified what can be run from the CLI above.

pferrel
  • 5,673
  • 5
  • 30
  • 41
  • Thanks for answering! I'm choosing KNN since I figure the algorithm is pretty easy to understand it will be a good place to start, definitely gonna check out cooccurrence & als-wr. I'm not familiar with "in-memory recommenders". could you elaborate more on that? Also, it's a bit off topic.. But do you know if how to implement user-based recommender on hadoop (distributed version)? – Daniel W May 11 '14 at 04:12
  • Some of the recommenders are implemented "in-memory" in the sense that they are data structures and classes you call from your app to return recs. The alternative implementations are on Hadoop and so are mapreduce and produce files in HDFS. You would likely take the files and put them in a database for your app to query. BTW if only for the deprecation I would avoid KNN. The Item or User based recommenders are neighborhood methods, just not exactly like KNN. – pferrel May 11 '14 at 22:33