Questions tagged [mahout]

Apache Mahout open source scalable machine learning project

This topic covers questions related to Apache Mahout, a scalable machine learning project written in Java and largely based on Apache Hadoop, with implementations of algorithms for:

1171 questions
5
votes
2 answers

How to install mahout using ambari server

I have created a hadoop cluster using 3 slaves and 1 master using ambari server(hortonworks). I need to install mahout 0.9 in the master machine in order to run mahout jobs in the cluster. How do I do that? I am using ambari 1.5.1 and HDP 2.1.
DesirePRG
  • 6,122
  • 15
  • 69
  • 114
5
votes
1 answer

Scaling up Cassandra and Mahout with Hadoop

Is it possible to configure Mahout to retrieve input data from a Cassandra cluster while executing a Recommender Job over Hadoop? I have found some resources on this topic - see…
Dumitru P.
  • 63
  • 4
5
votes
1 answer

How to vectorize text file in mahout?

I'm having a text file with label and tweets . positive,I love this car negative,I hate this book positive,Good product. I need to convert each line into vector value.If i use seq2sparse command means the whole document gets converted…
user2175315
  • 69
  • 1
  • 5
5
votes
2 answers

Run cvb in mahout 0.8

The current Mahout 0.8-SNAPSHOT includes a Collapsed Variational Bayes (cvb) version for Topic Modeling and removed the Latent Dirichlet Analysis (lda) approach, because cvb can be parallelized way better. Unfortunately there is only documentation…
JoKnopp
  • 171
  • 1
  • 9
5
votes
5 answers

Mahout: CSV to vector and running the program

I'm analysing the k-means algorithm with Mahout. I'm going to run some tests, observe performance, and do some statistics with the results I get. I can't figure out the way to run my own program within Mahout. However, the command-line interface…
Eduard Gamonal
  • 8,023
  • 5
  • 41
  • 46
5
votes
2 answers

Using the Apache Mahout machine learning libraries

I've been working with the Apache Mahout machine learning libaries in my free time a bit over the past few weeks. I'm curious to hear about how others are using these libraries.
Brian
  • 1,337
  • 9
  • 25
5
votes
2 answers

Most effective similarity measure for list-ranked items

We're trying to find similarity between items (and later users) where the items are ranked in various lists by users (think Rob, Barry and Dick in Hi Fidelity). A lower index in a given list implies a higher rating. I suppose a standard approach…
Tom Martin
  • 2,498
  • 3
  • 29
  • 37
5
votes
1 answer

Converting CSV to SequenceFile

I have a CSV file which I would like to convert to a SequenceFile, which I would ultimately use to create NamedVectors to use in a clustering job. I've been using the seqdirectory command to try to make a SequenceFile, and then fed that output into…
Alison
  • 99
  • 2
  • 7
5
votes
1 answer

Recently SVM implementation was added into Mahout & I am planning to use SVM. Anyone tried it yet?

Any new developments happening around SVM (Support Vector Machines) in Mahout (Machine Learning With Hadoop) using Hadoop? Recently SVM implementation was added into Mahout. and I am planning to use SVM. Anyone tried it yet? Very little information…
rashid
  • 264
  • 2
  • 13
4
votes
2 answers

Why Mahout doesn't yet have Linear Regression

I am just starting to work with Mahout, and one thing which perplexed me a great deal is the lack of Linear Regression. Even logistic regression, which is much harder, is supported to some degree with research going on, but it's all silent on linear…
KalEl
  • 8,978
  • 13
  • 47
  • 56
4
votes
3 answers

Recommendation engine for Alfresco?

I want to implement Amazon-like recommendations in Alfresco. For instance, if an employee searches for "financial reports 2007", the search UI will show related documents, for instance documents that were downloaded/viewed by users who previously…
Nicolas Raoul
  • 58,567
  • 58
  • 222
  • 373
4
votes
1 answer

Continuous collaborative filtering using Mahout

I am in the process of evaluating Mahout as a collaborative-filtering-recommendation engine. So far it looks great. We have almost 20M boolean recommendations from 12M different users. According to Mahout's wiki and a few threads by Sean Owen, one…
Daniel Zohar
  • 1,962
  • 2
  • 13
  • 19
4
votes
4 answers

How to solve product recommendation issue like: User __bought__ XXX also __viewed__ YYY

I am currently learning recommender system, learned something about collaborative filtering, User CF, Item CF, it is obvious to use these algorithm to solve problem like: 1) User bought XXX also bought YYY 2) User viewed XXX also viewed YYY My…
James.Xu
  • 8,249
  • 5
  • 25
  • 36
4
votes
3 answers

Datasets for Apache Mahout

I am looking for datasets that can be used for implementing recommendation system usecase of Apache Mahout. I know of only MovieLens Data Sets from GroupLens Research group. Anyone knows any other datasets that can be used for recommendation system…
Harsha Hulageri
  • 2,810
  • 1
  • 22
  • 23
4
votes
1 answer

Can I use logistic regression algorithm to predict an ETA for a given task based on historical data?

Can I use logistic regression algorithm to predict an ETA for a given task based on historical data? I have some tasks which takes variable amount of time based on few factors like task type, weather, season, time of request etc. Today we capture…