Questions tagged [mahout]

Apache Mahout open source scalable machine learning project

This topic covers questions related to Apache Mahout, a scalable machine learning project written in Java and largely based on Apache Hadoop, with implementations of algorithms for:

1171 questions
7
votes
4 answers

Getting an IOException when running a sample code in “Mahout in Action” on mahout-0.6

I'm learning Mahout and reading "Mahout in Action". When I tried to run the sample code in chapter7 SimpleKMeansClustering.java, an exception popped up: Exception in thread "main" java.io.IOException: wrong value class: 0.0: null is not class…
Nebulach
  • 71
  • 1
  • 3
7
votes
1 answer

Deploying Mahout on hadoop cluster

I want to run Mahout's K-Means example in a hadoop cluster of 5 machines. Which Mahout jar files should I need to keep in all the nodes, in order for the K-Means to be executed in a distributed manner. Thanks. -Venkiram
Venkiram
  • 201
  • 1
  • 3
  • 6
7
votes
5 answers

Py4J has bigger overhead than Jython and JPype

After searching for an option to run Java code from Django application(python), I found out that Py4J is the best option for me. I tried Jython, JPype and Python subprocess and each of them have certain limitations: Jython. My app runs in…
HIP_HOP
  • 79
  • 1
  • 4
7
votes
2 answers

Why is Maven trying to compile my code as -source 1.3?

I get this error mvn -e package in Ubuntu 12.04: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project HadoopSkeleton: Compilation failure: Compilation failure: [ERROR]…
Jesvin Jose
  • 22,498
  • 32
  • 109
  • 202
6
votes
5 answers

Amazon EC2 vs PiCloud

We are students trying to handling data size of about 140 million records and trying to run few machine learning algorithms. we are newbie to the entire cloud solutions and mahout implementations.Currently we have set them up in postgresql database…
Sree Aurovindh
  • 705
  • 1
  • 6
  • 18
6
votes
1 answer

How to find whether a url is of ecommerce or non ecommerce website, programatically?

In a project there is a module takes a URL and determines whether it is of "Ecommerce" or "NON-Ecommerce" website. I have tried following approaches: Using Apache mahout, Classification : URL ---> Take html dump ---> pre process the html dump by…
geek
  • 61
  • 2
6
votes
2 answers

How to perform k-means clustering in mahout with vector data stored as CSV?

I have a file containing vectors of data, where each row contains a comma-separated list of values. I am wondering how to perform k-means clustering on this data using mahout. The example provided in the wiki mentions creating sequenceFiles, but…
Dan Q
  • 2,227
  • 3
  • 25
  • 36
6
votes
3 answers

Mahout : To read a custom input file

I was playing with Mahout and found that the FileDataModel accepts data in the format userId,itemId,pref(long,long,Double). I have some data which is of the format String,long,double What is the best/easiest method to work with this…
learner
  • 885
  • 3
  • 14
  • 28
6
votes
2 answers

Recommendations using R with SimpleDB or BigQuery or using PHP with SimpleDB

I am currently working on system that generated product recommendations like those on Amazon : "People who bought this also bought this.." Current Scenario: Extract the Google Analytics data of the client and insert it in database. On the website…
samridhi
  • 500
  • 2
  • 10
6
votes
4 answers

In practice, how many machines do you need in order for Hadoop / MapReduce / Mahout to speed up very parallelizable computations?

I need to do some heavy machine learning computations. I have a small number of machines idle on a LAN. How many machines would I need in order for distrubuting my computations using hadoop / mapreduce / mahout to to be significantly faster than…
user334911
6
votes
3 answers

Mahout runs out of heap space

I am running NaiveBayes on a set of tweets using Mahout. Two files, one 100 MB and one 300 MB. I changed JAVA_HEAP_MAX to JAVA_HEAP_MAX=-Xmx2000m ( earlier it was 1000). But even then, mahout ran for a few hours ( 2 to be precise) before it…
crazyaboutliv
  • 3,029
  • 9
  • 33
  • 50
6
votes
5 answers

Choice of Machine Learning Platform

I have a data set of users and their loan repayment metrics (how long they took, how many installments etc). Now I want to analyse a user's past loan history and say, "If we loan them X they will most likely repay over Y installments, over Z…
Ngetha
  • 254
  • 3
  • 10
6
votes
4 answers

Spark - How to use the trained recommender model in production?

I am using Spark to build a recommendation system prototype. After going through some tutorials, I have been able to train a MatrixFactorizationModel from my data. However, the model trained by Spark mllib is just a Serializable. How can I use this…
shihpeng
  • 5,283
  • 6
  • 37
  • 63
6
votes
3 answers

Hadoop 2.2.0 is compatible with Mahout 0.8?

I have hadoop cluster version 2.2.0 running with mahout 0.8, is it compatible? Because whenever I run this command: bin/mahout recommenditembased --input mydata.dat --usersFile user.dat --numRecommendations 2 --output output/ --similarityClassname…
fsi
  • 1,319
  • 1
  • 23
  • 51
6
votes
1 answer

How to classify images using Apache Mahout?

How to perform image classification from mahout? How to convert the image to a form which is accepted by mahout classification algorithms? Is the any starter code to start with? Please share me some starter tutorials. Is mahout good library for…
Suren Raju
  • 3,012
  • 6
  • 26
  • 48
1 2
3
78 79