Highest Voted 'apache-spark-mllib' Questions

1

vote

2 answers

What is the reason for compilation errors if different version of Spark-core and Spark-mllib are mixed?

I am copying and pasting the exact Spark MLlib LDA example from here: http://spark.apache.org/docs/latest/mllib-clustering.html#latent-dirichlet-allocation-lda I am trying the Scala sample code, but I am having the following errors when I am trying…

scala apache-spark apache-spark-mllib

asked Sep 17 '15 at 09:42

Rami

8,044
18
66
108

1

vote

1 answer

Spark python MLlib Random Forest out of memory error

I am running spark 1.2.1 to train a random forest. I have a master and a worker node setup on AWS EC2 with total 96GB of memory allocated to spark. I played with various parallelism values (32, 64, 6400) and I keep getting the same error. According…

python apache-spark random-forest apache-spark-mllib

asked Sep 16 '15 at 21:49

foboi1122

1,727
4
19
36

1

vote

1 answer

spark pyspark mllib model - when prediction rdd is generated using map, it throws exception on collect()

I am using spark 1.2.0 (cannot upgrade as I dont have control over it). I am using mllib to build a model points = labels.zip(tfidf).map(lambda t: LabeledPoint(t[0], t[1] )) train_data, test_data = points.randomSplit([0.6, 0.4], 17) iterations =…

apache-spark pyspark rdd apache-spark-mllib

asked Aug 26 '15 at 08:12

Abhishek

33
6

1

vote

1 answer

Broadcast Random-Forest Model in PySpark

I'm using spark 1.4.1. When i'm trying to broadcast random forest model it shows me this error: Traceback (most recent call last): File "/gpfs/haifa/home/d/a/davidbi/codeBook/Nice.py", line 358, in broadModel = sc.broadcast(model) File…

apache-spark pyspark broadcast random-forest apache-spark-mllib

asked Aug 18 '15 at 14:14

dadibiton

13
4

1

vote

1 answer

How to serialize apache spark's MatrixFactorizationModel in Java

I am building a recommendation system using Apache Spark MLlib and Java. Once the MatrixFactorizationModel is built, I have serialized it as a java object and when retrieving the model, I am getting the following exception. Caused by:…

java serialization apache-spark apache-spark-mllib collaborative-filtering

asked Aug 17 '15 at 10:57

madawa

496
6
24

1

vote

1 answer

Spark 1.4 Mllib LDA topicDistributions() returning wrong number of documents

I have an LDA model running on corpus size of 12,054 documents with vocab size of 9,681 words and 60 clusters. I am trying to get the topic distribution over documents by calling .topicDistributions() or .javaTopicDistributions(). Both of these…

cluster-analysis apache-spark-mllib lda apache-spark-1.4

asked Aug 14 '15 at 21:57

smannan

136
1
1
4

1

vote

1 answer

How to save a Spark LogisticRegressionModel model?

I am using MLlib 1.1.0 and struggling to find a way to save my model. Docs do not seem to support such as feature in this version. Any ideas?

apache-spark apache-spark-mllib

asked Aug 12 '15 at 10:26

user706838

5,132
14
54
78

1

vote

1 answer

mllib and pyspark bag of words model for multiple text documents

I have 150 text documents (training set) that I would like to perform a "bag of words" representation on with pyspark and mllib package "feature". From here I then have another 150 text documents (testing set) that I would like to also convert each…

python apache-spark pyspark apache-spark-mllib tf-idf

asked Aug 10 '15 at 18:18

Matt

1,196
1
9
22

1

vote

3 answers

How to extract data from Spark MLlib FP Growth model

I am running spark master and slaves in standalone mode, no Hadoop cluster. Using spark-shell, I can quickly build a FPGrowthModel with my data. Once the model is built, I am trying to look at the patterns and frequencies captured within the model,…

hadoop apache-spark apache-spark-mllib

asked Aug 10 '15 at 17:26

emily

198
2
10

1

vote

0 answers

Java heap space Error while running SVMWithSGD algorithm in MLlib

My fnl2 dataset is of the form: scala> fnl2.first() res4: org.apache.spark.mllib.regression.LabeledPoint =…

apache-spark apache-spark-mllib

asked Aug 10 '15 at 09:27

user706838

5,132
14
54
78

1

vote

1 answer

How to convert an RDD to Vector in Spark

I have an RDD of type RDD[(Int,Double)] in which the first element of the pair is the index and the second is the value and I'd like to convert this RDD to a Vector to use for classification. Could someone help me with that? I have the following…

scala apache-spark apache-spark-mllib

asked Aug 05 '15 at 18:28

HHH

6,085
20
92
164

1

vote

1 answer

How to convert Mahout VectorWritable to Vector in Spark

I have a VectorWritable (org.apache.mahout.math.VectorWritable) which is coming from a sequence file generated by Mahout and I would like to convert that into Vector (org.apache.spark.mllib.linalg.Vectors) type is Spark. How can I do that in Scala?

scala apache-spark mahout apache-spark-mllib

asked Jul 31 '15 at 00:05

HHH

6,085
20
92
164

1

vote

1 answer

"main" java.lang.ClassCastException: [Lscala.Tuple2; cannot be cast to scala.Tuple2 in Spark MLlib LDA

I'm using Spark 1.3.0 (Scala 2.10.X) MLlib LDA algorithm with Spark Java API. I have the following issue when I try to read the document-topic distribution from LDA model during runtime. "main" java.lang.ClassCastException: [Lscala.Tuple2; cannot…

java scala apache-spark apache-spark-mllib lda

asked Jul 29 '15 at 11:00

Jay

63
8

1

vote

1 answer

Issue with Zeppelin on Spark-Cassandra system: Classnotfoundexception

I have recently started to work with zeppelin on top of a Spark-Cassandra Cluster (Master + 3 Workers) System to run simple machine learning algorithms using the MLlib library. Here are the libraries that I loaded to…

apache-spark cassandra classloader apache-spark-mllib

asked Jul 28 '15 at 09:55

Med3

11
4

1

vote

1 answer

Distributed BlockMatrix out of Spark Matrices

How to make a distributed BlockMatrix out of Matrices (of the same size)? For example, let A, B be two 2 by 2 mllib.linalg.Matrices as follows import org.apache.spark.mllib.linalg.{Matrix, Matrices} import…

scala apache-spark apache-spark-mllib

asked Jul 27 '15 at 07:48

Ehsan M. Kermani

912
2
12
26

Questions tagged [apache-spark-mllib]