Highest Voted 'apache-spark-mllib' Questions

1

vote

0 answers

How to overcome SVMWithSGD that throws ArrayIndexOutOfBoundsException for index bigger that 5000?

In order to detect visitors demographics based on their behavior I used SVM algorithm from SPARK MLlib: JavaRDD data = MLUtils.loadLibSVMFile(sc.sc(), "labels.txt").toJavaRDD(); JavaRDD training = data.sample(false,…

java apache-spark svm apache-spark-mllib

asked Jun 16 '16 at 08:42

Iura Gaitur

53
6

1

vote

0 answers

Spark error on ALS trainImplicit: assertion failed: lapack.dppsv returned 1

I am getting below error when training ALS (implicit) using hadoop 2.6.1 and spark 1.5.2 on ubuntu 14 16/06/16 06:26:41 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 16/06/16 06:26:41 WARN BLAS: Failed to…

java hadoop apache-spark-mllib

asked Jun 16 '16 at 06:38

Smrutiranjan Sahu

6,911
2
15
12

1

vote

1 answer

How to randomy shuffe rows of an RDD in Spark?

I have an Rdd[String] and I want to shuffle all of the rows of this Rdd. How do I achieve this? For example: RDD object named rdd and you can run: rdd.collect.foreach(t => println(t)) has output: 1 2 3 4 I want to shuffe the rows of rdd so that…

scala apache-spark apache-spark-mllib

asked Jun 13 '16 at 13:29

user3494047

1,643
4
31
61

1

vote

0 answers

Spark MLLib ALS: Efficient mapping of misc user and product IDs to integer

I am attempting to build an online recommender system using the Spark recommendation ALS algorithm. My data resides in MongoDB, where I keep collections of users, items and ratings. The identifiers for these documents are of the default type…

apache-spark pyspark apache-spark-mllib

asked Jun 11 '16 at 16:19

Fulco

284
1
3
16

1

vote

1 answer

Spark MLlib recommender engine's methods

I'm using pySpark MLlib and the method of ALS from the box for collaborative filtering. Just wondering, does Spark provide some other methods of doing filtering (for calculating distance), for example Pearson's or Cosine's? Can they be done in Spark…

python apache-spark pyspark apache-spark-mllib

asked Jun 11 '16 at 13:46

Keithx

2,994
15
42
71

1

vote

1 answer

How to choose combining strategy for MLlib's random forests

Is it possible to choose the combining strategy for MLlib's random forests? I can't find any clue on the official API docs. Here's my code: val numClasses = 10 val categoricalFeaturesInfo = Map[Int, Int]() val numTrees = 10 val…

scala apache-spark random-forest apache-spark-mllib

asked Jun 07 '16 at 16:28

Franjrg

100
1
11

1

vote

0 answers

PredictionIO pio train fails with exception

I am setting up Prediction IO in my Unix machine. I am able to setup every thing required and now using the Lead Scoring template. I am successfully able to build the template using pio build --verbose command it says engine is ready to train.…

apache-spark apache-spark-mllib predictionio

asked Jun 06 '16 at 12:27

gaurav

317
1
3
10

1

vote

0 answers

Logistic regression scoring: java.lang.NumberFormatException

I am using Spark 1.5 and I would like to use logistic regression model that I have saved from my training phase for scoring on new dataset. Here is my sample data in libsvm file format: 1132106-2011-05-10 52:1 64:1 207:1 232:1 353:1 597:1 The first…

apache-spark apache-spark-mllib

asked Jun 05 '16 at 21:03

user3803714

5,269
10
42
61

1

vote

1 answer

Spark Streaming Model Overwrite

This so straightforward question. How can I save my updated model with the same name to same directory. org.apache.spark.sql.AnalysisException: path file:/home/mali/model/UpdatedmyRandomForestClassificationModel/data already exists There is SaveMode…

java apache-spark spark-streaming apache-spark-mllib

asked Jun 05 '16 at 12:49

mehmet izci

47
5

1

vote

1 answer

What about the files with smaller size than the hadoop block size: spark + machine learning

My hadoop block size if 128 MB and my file is 30 MB. And my cluster on which spark is running is a 4 node cluster with total of 64 cores. And now my task is to run a random forest or gradient boosting algorithm with paramater grid and 3-fold cross…

hadoop apache-spark apache-spark-mllib apache-spark-ml

asked Jun 02 '16 at 12:51

Abhishek

3,337
4
32
51

1

vote

0 answers

Are the spark ml libraries suitable for classifying instances one by one?

The Spark ml library proudly presents it's capability of model selection. I thought it fits my use case: In bigdata world: Train on many many labeled data points, do clever model selection by tuning parameters etc. and save the best model to…

scala apache-spark machine-learning classification apache-spark-mllib

asked Jun 01 '16 at 05:02

Tarrasch

10,199
6
41
57

1

vote

0 answers

spark-mllib gbdt algorithm questions

has anyone read the mllib gbdt code? i have some questions about this algorithm, i don't know how the program calculate the current node impurity, i only see the override calculate function in sub-class of Impurity ,in this function, parameter is…

scala machine-learning apache-spark-mllib

asked May 31 '16 at 12:21

lee li

11
1

1

vote

1 answer

Why is the StreamingKMeans cluster centers different vs regular Kmeans

I have two models trained using same data the KMeans model in like below: int numIterations = 20; int numClusters = 5; int runs = 10; double epsilon = 1.0e-6; KMeans kmeans = new KMeans(); kmeans.setEpsilon(epsilon); …

java apache-spark cluster-analysis spark-streaming apache-spark-mllib

asked May 24 '16 at 10:34

Subba Rao

165
14

1

vote

1 answer

MLlib LogisticRegressionWithLBFGS error when using model.predict

I'm using MLlib's LogisticRegressionWithLBFGS to train a model with 4 classes. This is the code for preparing my data, val labeledTraining = trainingSetVectors.map{case(target,features) => LabeledPoint(target,features) }.cache() val…

apache-spark apache-spark-mllib

asked May 23 '16 at 15:50

other15

839
2
11
23

1

vote

3 answers

How to skip line in spark rdd map action based on if condition

I have a file and I want to give it to an mllib algorithm. So I am following the example and doing something like: val data = sc.textFile(my_file). map {line => val parts = line.split(","); Vectors.dense(parts.slice(1,…

scala apache-spark apache-spark-mllib

asked May 23 '16 at 07:56

user3494047

1,643
4
31
61

Questions tagged [apache-spark-mllib]