Questions tagged [mahout]

Apache Mahout open source scalable machine learning project

This topic covers questions related to Apache Mahout, a scalable machine learning project written in Java and largely based on Apache Hadoop, with implementations of algorithms for:

1171 questions
12
votes
6 answers

Is it worth purchasing Mahout in Action to get up to speed with Mahout, or are there other better sources?

I'm currently a very casual user of Apache Mahout, and I'm considering purchasing the book Mahout in Action. Unfortunately, I'm having a really hard time getting an idea of how worth it this book is -- and seeing as it's a Manning Early Access…
Gabriel Reid
  • 2,506
  • 18
  • 20
12
votes
2 answers

Mahout Plugin for ruby on rails

I want to use Apache Mahout in my project on Ruby on Rails for implementing recommendations and collaborative filtering. In Particular my requirements are: suggesting related tags. suggesting related articles. based on user's preferences prompt him…
12
votes
3 answers

mahout lucene document clustering howto?

I'm reading that i can create mahout vectors from a lucene index that can be used to apply the mahout clustering algorithms. http://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text I would like to apply K-means clustering…
maiky
  • 3,503
  • 7
  • 28
  • 28
12
votes
1 answer

Clustering -- Sparse vector and Dense Vector

For clustering, Mahout input needs to be in vector form. There are two types of vector implementations. One is Sparse Vector and another is Dense Vector. What is difference between two ? Usage scenarios for Sparse and Dense ?
user1261215
12
votes
6 answers

Recommendation Engines for Java applications

I was wondering if there is any open source recommendation engine available? It should suggest something like Amazon and Netflix. I have heard of a framework called Apache Mahout - Taste. I am trying it next week. It would be great if you can share…
SomaSekhar
  • 348
  • 1
  • 6
  • 17
11
votes
2 answers

Classify data using Apache Mahout

I am trying to solve a simple classification problem. The Problem: I have a set of text and I have to categorize them based on the content. Solution using Mahout: I understood that I have to convert the input to a sequence file to generate…
vkris
  • 2,095
  • 7
  • 22
  • 30
11
votes
3 answers

Production architecture for big data real time machine learning application?

I'm starting to learn some stuff about big data with a big focus on predictive analysis and for that I have a case study I would like to implement: I have a dataset of servers health information that is polled every 5sec. I want to show the data…
AlfaTeK
  • 7,487
  • 14
  • 49
  • 90
10
votes
2 answers

How to acquire or generate test data for a recommender system

I'm currently researching recommender systems and would like to know how other researchers acquire or generate test data to evaluate the systems' performance?
Ullr
  • 101
  • 3
10
votes
4 answers

K-means with really large matrix

I have to perform a k-means clustering on a really huge matrix (about 300.000x100.000 values which is more than 100Gb). I want to know if I can use R software to perform this or weka. My computer is a multiprocessor with 8Gb of ram and hundreds Gb…
Delphine
  • 1,113
  • 5
  • 15
  • 22
9
votes
1 answer

User profiling with Mahout from categorized user behavior

I'm trying to cluster and classify users with Mahout. At the moment I am at the planning phase, my mind is completely mixed with ideas, and since I'm relatively new to the area I'm stuck at the data formatting. Let's say we have two data table (big…
Turcia
  • 653
  • 1
  • 12
  • 29
8
votes
2 answers

Full utilization of all cores in Hadoop pseudo-distributed mode

I am running a task in pseudo-distributed mode on my 4 core laptop. How can I ensure that all cores are effectively used. Currently my job tracker shows that only one job is executing at a time. Does that mean only one core is used? The following…
Nemo
  • 24,540
  • 12
  • 45
  • 61
8
votes
2 answers

Apache Mahout Performance Issues

I have been working with Mahout in the past few days trying to create a recommendation engine. The project I'm working on has the following data: 12M users 2M items 18M user-item boolean recommendations I am now experimenting with 1/3 of the full…
Daniel Zohar
  • 1,962
  • 2
  • 13
  • 19
8
votes
3 answers

Using mahout and hadoop

I am a newbie trying to understand how will mahout and hadoop be used for collaborative filtering. I m having single node cassandra setup. I want to fetch data from cassandra Where can I find clear installation steps for hadoop first and then…
deggi
  • 81
  • 1
  • 3
8
votes
2 answers

How to do an item based recommendation in spark mllib?

In Mahout, there is support for item based recommendation using API method: ItemBasedRecommender.mostSimilarItems(int productid, int maxResults, Rescorer rescorer) But in Spark Mllib, it appears that the APIs within ALS can fetch recommended…
8
votes
4 answers

How do I build/run this simple Mahout program without getting exceptions?

I would like to run this code which I found in Mahout In Action: package org.help; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import…
dranxo
  • 3,348
  • 4
  • 35
  • 48
1
2
3
78 79