Questions tagged [apache-spark-mllib]

MLlib is a machine learning library for Apache Spark

MLlib is a low level, RDD based machine learning library for Apache Spark

External links:

Related tags:

,

2241 questions
1
vote
1 answer

Why do I get a type error in model.predictOnValues when I try the official example of Streaming Kmeans Clustering of Apache Spark?

I'm trying the Streaming Clustering example code at the end of the official guide, but I get a type error. Here is my code: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import…
Emre Sevinç
  • 8,211
  • 14
  • 64
  • 105
1
vote
1 answer

spark mllib predict error with map

I have a linear regression model model and a set of LabeledPoint regPoints. I am able to predict the first sample scala> model.predict(regPoints.first.features) 15/02/12 16:17:56 INFO SparkContext: Starting job: first at :61 15/02/12…
Donbeo
  • 17,067
  • 37
  • 114
  • 188
1
vote
1 answer

How to use Apache Spark ALS (alternating-least-squares) algorithm with limited Rating values

I am trying to use ALS, but currently my data is limited to information about what user bought. So I was trying to fill ALS from Apache Spark with Ratings equal 1 (one) when user X bought item Y (and only such information I provided to that…
1
vote
1 answer

Train Spark k-means with Mahout vectors

I have some Mahout vectors in my hdfs in sequence file format. Is it possible to use the same vectors in some way to train a KMeans model in Spark? I could just convert the existing Mahout vectors into Spark vectors (mllib) but I'd like to avoid…
1
vote
1 answer

apache spark mllib naive bayes LabeledPoint usage

I want to use spark mllib naive bayes to process (train and test) data like this Male,Suspicion of Alcohol,Weekday,12am-4am,75,30-39 so that I can test for labels Male / Female / Unknown. I want to create a LabeledPoint so that this data can be run…
Mike Frampton
  • 53
  • 1
  • 4
1
vote
2 answers

How to configure kernel selection and loss function for Support Vector Machines in Spark MLLib

I have installed spark on AWS Elastic Map Reduce(EMR) and have been running SVM using the packages in MLLib. But there are no options to choose parameters for building the model like kernel selection and cost of misclassification (Like in e1071…
1
vote
3 answers

MLlib collaborative filtering to generate Top N recommendations

I was looking to find out a way to generate top n recommendations for all users using MLlib's ALS matrix factorization, but remained unsuccessful. Can anybody tell me does any such method exist?
1
vote
1 answer

How to fit Spark's classifier in parallel?

Guys I have a strange problem... I'm trying to train multiclass SVM classifier like this: JavaPairRDD, SVMModel> jp = scmap.mapToPair(new PairFunction, RDD>,Tuple2,…
dimson
  • 783
  • 2
  • 10
  • 21
1
vote
2 answers

Run KMeans with fixed seed

I want to run the KMeans algorithm of MLLIB (Apache Spark), but with reproducible results. Is it possible to run KMeans of MLLIB (Apache Spark) with fixed seed? How? Thanks and regards,
learning_spark
  • 669
  • 1
  • 8
  • 19
1
vote
1 answer

MLLIb: Saving and loading a model

I'm using LinearRegressionWithSGD and then I save the model weights and intercept. File that contains weights has this format: 1.20455 0.1356 0.000456 Intercept is 0 since I am using train not setting the intercept so it can be ignored for the…
user3803714
  • 5,269
  • 10
  • 42
  • 61
1
vote
1 answer

Spark MLLib Collobarative filtering Implicit Feedback: TypeError: reduce() of empty sequence with no initial value

I'm trying to use Spark MLlib for building Implicit feedback recommender system. I start with running the code from the tutorial on MovieLens dataset in this link https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html. The…
1
vote
1 answer

How can I use private functions[mllib] in my code?

I started working with spark, specifically with mllib library. several of the functions are limited in scope and private statements. How can I use these functions in my code? Example: KMeans.scala private[mllib] def pointCost( centers:…
1
vote
1 answer

SVMWithSGD in Spark documentation example not working

I am running Spark 1.1.0 with PySpark. When I run the example taken straight from the documentation: from pyspark.mllib.regression import LabeledPoint from pyspark.mllib.classification import SVMWithSGD import array data = [ LabeledPoint(0.0,…
poiuytrez
  • 21,330
  • 35
  • 113
  • 172
1
vote
1 answer

MLlib and pyspark features

I would like to use areaUnderROC from MLlib in Apache Spark. I am currently running Spark 1.1.0 and this function is not available in pyspark but is available in scala. Is there a feature tracker that tracks the advancement of porting Scala apis to…
poiuytrez
  • 21,330
  • 35
  • 113
  • 172
1
vote
1 answer

How to implement the predict_proba(X) -equivalent of Scikit-Learn in MLlib

python-wise I am preferring .predict_proba(X) instead of .decision_function(X) since it is easier for me to interpret the results. as far as I can see, the latter functionality is already implemented in Spark (well, in version 0.9.2 for example I…
user706838
  • 5,132
  • 14
  • 54
  • 78