3

I'm working on a project where I need to implement an article/news recommendation engine. I'm thinking of combining different methods (item-based, user based, model CF) and have a question regarding the tool to use.

From my research Lucene is definitely the tool for text processing but for the recommendation part, it's not so clear. If I want to implement an item CF on articles based on text similarity : - I've seen case studies using Mahout but also solr (http://fr.slideshare.net/lucenerevolution/building-a-realtime-solrpowered-recommendation-engine), as it's really close to a search problem I would think that solr is maybe better, am I right ? - What are the differences in term of time processing between the 2 tools (I think Mahout is more batch and solr real time) ? - Can I get a text distance directly from Lucene (it's not really clear for me what is the added value of solr compared to Lucene) ? - For more advanced method (model based on matrix factorization), I would use Mahout but is there any SVD-like function in solr for concept/tag discovering ?

Thanks for your help.

Taryn
  • 242,637
  • 56
  • 362
  • 405
Alex
  • 351
  • 1
  • 12

1 Answers1

0

it depends on your requirements, if you only need offline recommendaton function, mahout is good. for online, i am testing it too. In fact, I have tested with lucene and mahout, they work fine together. for solr, im not so sure, all i know it uses lucene as its core. so all the heavy liftings are still done by lucene. In my case, I combined mahout and lucene in my java program, basically lucene does preprocessing and primitive similarity calculations and then the result is sent to mahout to be further analysed.

ikel
  • 1,790
  • 6
  • 31
  • 61
  • Do you mind sharing your code? I also trying to cluster a bunch of news articles saved in a lucene index. What kind of clustering did you use from mahout? How well does it scale? Thank you! – nilsi Apr 01 '14 at 14:08