Questions tagged [apache-spark-mllib]

MLlib is a machine learning library for Apache Spark

MLlib is a low level, RDD based machine learning library for Apache Spark

External links:

Related tags:

,

2241 questions
1
vote
0 answers

java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream with Spark on local mode

I have used Spark before in yarn-cluster mode and it's been good so far. However, I wanted to run it "local" mode, so I created a simple scala app, added spark as dependency via maven and then tried to run the app like a normal application. However,…
MV23
  • 285
  • 5
  • 17
1
vote
2 answers

can a trained classification model be stored in Apache Spark?

I'm going to train a naive bayes classifier on a bunch of training document using Apache Spark (or Mahout in Hahoop). I'd like to use this model when I receive new documents to classify. I wonder to know whether there is any possibility to store the…
HHH
  • 6,085
  • 20
  • 92
  • 164
1
vote
1 answer

Difference between spark Vectors and scala immutable Vector?

I am writing a project for Spark 1.4 in Scala and am currently in between converting my initial input data into spark.mllib.linalg.Vectors and scala.immutable.Vector that I later want to work with in my algorithm. Could someone briefly explain the…
Sasha
  • 109
  • 2
  • 12
1
vote
0 answers

How to find probabilities for the predicted classes in Spark MLlib Classifiers?

Spark MLlib provides several algorithm for classification, such as Random Forests and Logistic Regression. Examples of classifier training and class prediction are straightforward. Yet it is not clear what classifier API to use to get probability…
zork
  • 2,085
  • 6
  • 32
  • 48
1
vote
2 answers

Writing output of the Principal Components Analysis to text file

I have performed a Principal Component Analysis on a matrix I previously loaded with sc.textFile. The output being a org.apache.spark.mllib.linalg.Matrix I then converted it to a RDD[Vector[Double]]. with: import java.io.PrintWriter I did: val…
fricadelle
  • 511
  • 1
  • 8
  • 26
1
vote
1 answer

How to get the probability per instance in classifications models in spark.mllib

I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per…
1
vote
1 answer

Registering kmeans model as UDF

Hi I am trying to use Spark kmeans model to predict the cluster number. But when I register it and use it in SQL it gives me a java.lang.reflect.InvocationTargetException def findCluster(s:String):Int={ model.predict(feautarize(s)) } I am…
1
vote
1 answer

Sequentially updating columns of a Matrix RDD

I'm having philosophical issues with RDDs used in mllib.linalg. In numerical linear algebra one wants to use mutable data structure but since in Spark everything (RDDs) is immutable, I'd like to know if there's a way around this, specifically for…
1
vote
1 answer

How to use MLlib in spark SQL

Lately, I've been learning about spark sql, and I wanna know, is there any possible way to use mllib in spark sql, like : select mllib_methodname(some column) from tablename; here, the "mllib_methodname" method is a mllib method. Is there some…
ldl
  • 156
  • 3
  • 12
1
vote
1 answer

SPARK ERROR:executor.CoarseGrainedExecutorBackend: Driver while executing KMeans Clustering onspark on EC2 cluster

I am trying to submit a job(Kmeans clustering in python) to my spark standalone cluster on EC2. It has 18 nodes. I am using the latest version of spark(1.4.0). I submit the job from the master using : SPARK_WORKER_INSTANCES=30 SPARK_WORKER_CORES=4…
1
vote
0 answers

TypeError: Incorrect padding while running Kmeans on Spark Mllib (spark 1.4.0)

I am trying to run k-means clustering on a large dataset using spark . I get the following error after k-means converges. Following are the logs: 15/06/17 14:47:44 INFO KMeans: Run 0 finished in 10 iterations 15/06/17 14:47:44 INFO KMeans:…
Rogers Jefrey L
  • 256
  • 2
  • 5
  • 15
1
vote
2 answers

How to get Spark MLlib RandomForestModel.predict response as text value YES/NO?

I am trying to implement RandomForest algorithm using Apache Spark MLLib. I have the dataset in the CSV format with the following…
Umesh K
  • 13,436
  • 25
  • 87
  • 129
1
vote
1 answer

collaborative filtering with implicit feedback , How to set preferences?

I have a dataset with only two fields itemId, productid, i would like to try mahout ALS or mllib for implicit feedback, is the best approach to create the preference column in the dataset with all 1's? reading koren paper (Collaborative Filtering…
1
vote
2 answers

How to multiply an IndexedRowMatrix by another IndexedRowMatrix in spark mllib

I am learning how to use spark mllib to calculate the product of two matrics.Now my code is like this: val…
赵祥宇
  • 497
  • 3
  • 9
  • 19
1
vote
2 answers

Spark MLlib libsvm issues with data

I'm trying with the demo in http://spark.apache.org/docs/1.2.1/mllib-linear-methods.html with the example via scala version. I run the demo it was worked fine but when I changed data and the step of train it just error with 15/05/05 16:32:02 INFO…
user4311101