Questions tagged [mahout]

Apache Mahout open source scalable machine learning project

This topic covers questions related to Apache Mahout, a scalable machine learning project written in Java and largely based on Apache Hadoop, with implementations of algorithms for:

1171 questions
6
votes
2 answers

Mahout for sentiment analysis

Using mahout I am able to classify sentiment of data . But I am stuck with a confusion matrix. I am using mahout 0.7 naive bayes algorithms to classify sentiment of tweets. I use trainnb and testnb naive bayes classifiers to train the classifier…
Vanitha Reddy
  • 181
  • 1
  • 11
6
votes
1 answer

Mahout: how to make recommendations for new users

We plan to use Mahout for a movie recommendation system. And we also plan to use SVD for model building. When a new user comes we will require him/her to rate a certain number of movies (say 10). The problem is that, in order to make a…
Ahmet Yılmaz
  • 125
  • 1
  • 2
  • 8
6
votes
2 answers

Web page recommender system

I am trying to build a recommender system which would recommend webpages to the user based on his actions(google search, clicks, he can also explicitly rate webpages). To get an idea the way google news does it, it displays news articles from the…
Rajan Soni
  • 163
  • 1
  • 2
  • 9
6
votes
1 answer

Computing user similarity using mahout mapreduce

I am using Mahout clustering and I have large clusters each having around 100k users and each user having 5 features. In the next step i need compute pearson correlation to find similarity between the users of the cluster. Currently i have a python…
learner
  • 885
  • 3
  • 14
  • 28
6
votes
1 answer

is there any seqFileDir option for "clusterdump" in the latest "apache mahout" library?

I am trying to do a "clusterdump" on the output of mahout kmeans clustering example (synthetic_control example). But I am experiencing the following error: > ~/MAHOUT/trunk/bin/mahout clusterdump --seqFileDir clusters-10-final --pointsDir…
6
votes
3 answers

How to directly send the output of a mapper-reducer to a another mapper-reducer without saving the output into the hdfs

Problem Solved Eventually check my solution in the bottom Recently I am trying to run the recommender example in the chaper6 (listing 6.1 ~ 6.4)from the Mahout in Action. But I encountered a problem and I have googled around but I can't find the…
dotcomXY
  • 1,586
  • 1
  • 15
  • 18
5
votes
3 answers

How to maintain data entry id in Mahout K-means clustering

I'm using mahout to run k-means clustering, and I got a problem of identifying the data entry when clustering, for example I have a 100 data entries id data 0 0.1 0.2 0.3 0.4 1 0.2 0.3 0.4 0.5 ... ... 100 0.2 0.4 0.4…
Breakinen
  • 619
  • 2
  • 7
  • 13
5
votes
1 answer

Using Neo4j as Mahout Datastore

Has anyone successfully integrated Apache Mahout with Neo4j as a datastore? If so, how much works was involved, and what was the performance like?
DeejUK
  • 12,891
  • 19
  • 89
  • 169
5
votes
4 answers

Interpreting output from mahout clusterdumper

I ran a clustering test on crawled pages (more than 25K docs ; personal data set). I've done a clusterdump : $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-1/ --output clusteranalyze.txt The output after running cluster dumper is…
lucif
  • 108
  • 1
  • 6
5
votes
1 answer

Can apache mahout ALS work without hadoop?

I tried using ParallelALSFactorizationJob, but it crashes here: Exception in thread "main" java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) …
Stepan Yakovenko
  • 8,670
  • 28
  • 113
  • 206
5
votes
4 answers

How to fix the error: test source folder 'src/test/java' in project must have an output folder that is not also used for main source"?

I have installed mahout and I have imported the existing maven project apache-mahout-distribution-0.12.2 to Eclipse IDE for Java Developers and can not build and I couldn't fix these problems. Please share me your knowledge!
Mulualem M
  • 51
  • 1
  • 1
  • 6
5
votes
4 answers

Just how much Java does one need to use Hadoop and Mahout effectively?

I'm a PHP developer. Let's just get that out of the way now. But Hadoop – and Mahout in particular – have piqued my interest. I'm ready to take the dive into Java in order to use them. So from people experience enough to know, just how much Java…
Josh Smith
  • 14,674
  • 18
  • 72
  • 118
5
votes
3 answers

Using Apache Mahout with Ruby on Rails

I have a ruby on rails application. I have the idea of implementing recommendations in the application. I came to know about Apache Mahout through stackoverflow. Now, If I have to use Mahout, what are the stuff that I have to do. Since it is a Java…
felix
  • 11,304
  • 13
  • 69
  • 95
5
votes
2 answers

How to do multi-label classification in Apache Spark

I want to do a multi-label text classification on a big data set set and it seems like that big data machine learning tools such as Apache Mahout or Spark MLLib are not currently support that. I would like to know has any one done a multi-label…
HHH
  • 6,085
  • 20
  • 92
  • 164
5
votes
3 answers

How to use Mahout in a Windows environment?

I am trying to use Mahout in an application running on Windows. I want to build clusters from a lucene index using k-means. As soon as I have to create sequence files (creating vectors from a lucene index), I get a Hadoop-Exception, since Hadoop…
user249210
  • 51
  • 1
  • 2