Highest Voted 'apache-spark-mllib' Questions

1

vote

0 answers

java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream with Spark on local mode

I have used Spark before in yarn-cluster mode and it's been good so far. However, I wanted to run it "local" mode, so I created a simple scala app, added spark as dependency via maven and then tried to run the app like a normal application. However,…

scala maven hadoop apache-spark apache-spark-mllib

asked Jul 22 '15 at 23:18

MV23

285
5
17

1

vote

2 answers

can a trained classification model be stored in Apache Spark?

I'm going to train a naive bayes classifier on a bunch of training document using Apache Spark (or Mahout in Hahoop). I'd like to use this model when I receive new documents to classify. I wonder to know whether there is any possibility to store the…

apache-spark mahout apache-spark-mllib

asked Jul 09 '15 at 21:51

HHH

6,085
20
92
164

1

vote

1 answer

Difference between spark Vectors and scala immutable Vector?

I am writing a project for Spark 1.4 in Scala and am currently in between converting my initial input data into spark.mllib.linalg.Vectors and scala.immutable.Vector that I later want to work with in my algorithm. Could someone briefly explain the…

scala hadoop apache-spark apache-spark-mllib

asked Jul 06 '15 at 21:15

Sasha

109
2
12

1

vote

0 answers

How to find probabilities for the predicted classes in Spark MLlib Classifiers?

Spark MLlib provides several algorithm for classification, such as Random Forests and Logistic Regression. Examples of classifier training and class prediction are straightforward. Yet it is not clear what classifier API to use to get probability…

apache-spark classification probability apache-spark-mllib

asked Jul 06 '15 at 18:23

zork

2,085
6
32
48

1

vote

2 answers

Writing output of the Principal Components Analysis to text file

I have performed a Principal Component Analysis on a matrix I previously loaded with sc.textFile. The output being a org.apache.spark.mllib.linalg.Matrix I then converted it to a RDD[Vector[Double]]. with: import java.io.PrintWriter I did: val…

scala apache-spark apache-spark-mllib

asked Jul 06 '15 at 15:11

fricadelle

511
1
8
26

1

vote

1 answer

How to get the probability per instance in classifications models in spark.mllib

I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per…

apache-spark random-forest logistic-regression apache-spark-mllib

asked Jul 05 '15 at 14:42

Tal

127
1
7

1

vote

1 answer

Registering kmeans model as UDF

Hi I am trying to use Spark kmeans model to predict the cluster number. But when I register it and use it in SQL it gives me a java.lang.reflect.InvocationTargetException def findCluster(s:String):Int={ model.predict(feautarize(s)) } I am…

apache-spark apache-spark-sql spark-streaming apache-spark-mllib

asked Jun 29 '15 at 09:23

Vishnu Subramanian

664
1
5
10

1

vote

1 answer

Sequentially updating columns of a Matrix RDD

I'm having philosophical issues with RDDs used in mllib.linalg. In numerical linear algebra one wants to use mutable data structure but since in Spark everything (RDDs) is immutable, I'd like to know if there's a way around this, specifically for…

scala matrix apache-spark apache-spark-mllib scala-breeze

asked Jun 27 '15 at 17:58

Ehsan M. Kermani

912
2
12
26

1

vote

1 answer

How to use MLlib in spark SQL

Lately, I've been learning about spark sql, and I wanna know, is there any possible way to use mllib in spark sql, like : select mllib_methodname(some column) from tablename; here, the "mllib_methodname" method is a mllib method. Is there some…

apache-spark apache-spark-sql apache-spark-mllib

asked Jun 25 '15 at 14:47

ldl

156
3
12

1

vote

1 answer

SPARK ERROR:executor.CoarseGrainedExecutorBackend: Driver while executing KMeans Clustering onspark on EC2 cluster

I am trying to submit a job(Kmeans clustering in python) to my spark standalone cluster on EC2. It has 18 nodes. I am using the latest version of spark(1.4.0). I submit the job from the master using : SPARK_WORKER_INSTANCES=30 SPARK_WORKER_CORES=4…

amazon-ec2 apache-spark pyspark rdd apache-spark-mllib

asked Jun 17 '15 at 23:48

Rogers Jefrey L

256
2
5
15

1

vote

0 answers

TypeError: Incorrect padding while running Kmeans on Spark Mllib (spark 1.4.0)

I am trying to run k-means clustering on a large dataset using spark . I get the following error after k-means converges. Following are the logs: 15/06/17 14:47:44 INFO KMeans: Run 0 finished in 10 iterations 15/06/17 14:47:44 INFO KMeans:…

apache-spark pyspark apache-spark-mllib

asked Jun 17 '15 at 15:07

Rogers Jefrey L

256
2
5
15

1

vote

2 answers

How to get Spark MLlib RandomForestModel.predict response as text value YES/NO?

I am trying to implement RandomForest algorithm using Apache Spark MLLib. I have the dataset in the CSV format with the following…

java apache-spark machine-learning apache-spark-mllib

asked Jun 03 '15 at 14:38

Umesh K

13,436
25
87
129

1

vote

1 answer

collaborative filtering with implicit feedback , How to set preferences?

I have a dataset with only two fields itemId, productid, i would like to try mahout ALS or mllib for implicit feedback, is the best approach to create the preference column in the dataset with all 1's? reading koren paper (Collaborative Filtering…

recommendation-engine apache-spark-mllib mahout-recommender collaborative-filtering

asked Jun 01 '15 at 14:25

user3468556

21
2

1

vote

2 answers

How to multiply an IndexedRowMatrix by another IndexedRowMatrix in spark mllib

I am learning how to use spark mllib to calculate the product of two matrics.Now my code is like this: val…

apache-spark rdd apache-spark-mllib

asked May 20 '15 at 06:01

赵祥宇

497
3
9
19

1

vote

2 answers

Spark MLlib libsvm issues with data

I'm trying with the demo in http://spark.apache.org/docs/1.2.1/mllib-linear-methods.html with the example via scala version. I run the demo it was worked fine but when I changed data and the step of train it just error with 15/05/05 16:32:02 INFO…

scala libsvm apache-spark-mllib

asked May 05 '15 at 08:54

user4311101

Questions tagged [apache-spark-mllib]