0

how would i build Recommendation Engine with amazon Redshift as a data source.is there any mahout data model for amazon redshift or S3

Sravan K Reddy
  • 1,082
  • 1
  • 10
  • 19

1 Answers1

0

Mahout uses Hadoop to read data, except for a few supported NoSQL dbs and JDBC dbs. Hadoop in turn can use S3. You'd have to configure Hadoop to use the S3 filesystem and then Mahout should work fine reading and writing to S3.

Redshift is a data warehousing solution based on Postgres and supporting JDBC/ODBC. Mahout 0.9 supports data models stored in JDBC compliant stores so, though I haven't done it, it should be supported

The Mahout v1 recommenders runs on Spark and input and output is text by default. All I/O goes through Hadoop. So S3 data is fine for input but the models created are also text and need to be indexed and queried with a search engine like Solr or Elasticsearch. You can pretty easily write a reader to get data from any other store (Redshift) but you might not want to save the models in a data warehouse since they need to be indexed by solr and should have super fast search engine style retrieval.

pferrel
  • 5,673
  • 5
  • 30
  • 41
  • Thanks pferrel. My data is in sql server,we build data warehouse on redshift.we are not using Hadoop,EMR clusters. we want use mahout for real time (or near real time) recommendations.Please give me some ideas,my tech set is (Redshift ,sql server,S3,Mahout,R). – Sravan K Reddy Nov 14 '14 at 06:38
  • NRT recommendations, use Mahout v1 (uses Spark) + Solr or Elasticsearch. At runtime the query is the current user's history of preferences to Solr, which is very fast, this returns an ordered list of items to recommend. The model you index in Solr is created by Mahout v1 "spark-itemsimilarity" References here: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html Presentations and blog posts here: https://occamsmachete.com/ml Short book on the subject here: https://www.mapr.com/practical-machine-learning – pferrel Nov 15 '14 at 16:55