Highest Voted 'apache-spark-ml' Questions

0

votes

1 answer

Finding items which are similar

I have a big database of many items of retail company. If I would like to find the items which are similar to any particular item, can I use pearson correlation in Spark ML to do that? Is there any other better algorithm to do it? How do I make sure…

machine-learning apache-spark-mllib knn apache-spark-ml

asked Aug 12 '16 at 21:40

passionate

503
2
7
25

0

votes

1 answer

Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

I have to use this code: val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth); I need to add categorical features information so that the…

apache-spark apache-spark-mllib decision-tree apache-spark-ml

asked Aug 10 '16 at 19:20

user3553070

45
9

0

votes

1 answer

Using MatrixUDT as column in SparkSQL Dataframe

I'm trying to load set of medical images into spark SQL dataframe. Here each image is loaded into matrix column of dataframe. I see spark recently added MatrixUDT to support this kind of cases, but i don't find a sample for using in dataframe.…

apache-spark apache-spark-mllib apache-spark-ml

asked Aug 08 '16 at 17:57

Karthik Vadla

53
1
8

0

votes

1 answer

Got OutOfMemory when run Spark MLlib kmeans

I always got OutOfMemory error when I ran Spark Kmeans on big data set. The training set about 250GB, I have 10 nodes spark cluster each machine with 16 cpus and 150G memory. I give the job 100GB memory on each node and 50 cpus totally. I set the…

apache-spark machine-learning apache-spark-mllib apache-spark-ml

asked Jul 22 '16 at 23:12

Jack

5,540
13
65
113

0

votes

1 answer

spark-ml naive bayes save to hdfs

I know through spark-mllib we can save naive bayes model to hdfs by save() method . But we I try with spark-ml naive bayes to save into hdfs then it giving error . Wrong FS: hdfs://localhost:8020/pa/model/nb, expected: file:/// I am using…

hadoop apache-spark apache-spark-mllib apache-spark-ml

asked Jun 21 '16 at 13:47

mahendra singh

384
1
13

0

votes

1 answer

How Spark MLlib deal with Java program?

I was wondering how Spark deal with Java program calling some machine learing algorithm provided by MLlib. Do I need to download Spark Project ML Library? What's more, where is the source code of MLlib for Java API ? I can't find it in it's…

java apache-spark apache-spark-mllib apache-spark-ml

asked Jun 15 '16 at 04:38

Hereme

193
1
1
5

0

votes

1 answer

Spark ML Word2Vec Serialization Issues

Spark Version: 1.6.1 I have recently refactored our Word2Vec code to move to DataFrame based ml models, but I am having problem in serializing and loading the model locally. I am able to successfully: Fit the dataframe and create the…

apache-spark apache-spark-mllib apache-spark-ml

asked Jun 09 '16 at 10:24

skgemini

600
4
7

0

votes

1 answer

Is it inefficient to manually iterate Spark SQL data frames and create column values?

In order to run a few ML algorithms, I need to create extra columns of data. Each of these columns involves some fairly intense calculations that involves keeping moving averages and recording information as you go through each row (and updating it…

scala apache-spark apache-spark-sql apache-spark-ml

asked Jun 07 '16 at 00:25

Eric Staner

969
2
9
14

0

votes

1 answer

Trainning a spark ml linear regresion model fail after migrating to 1.6.1

I use spark-ml to train a linear regression model. It worked perfectly with spark version 1.5.2 but now with 1.6.1 I get the following error : java.lang.AssertionError: assertion failed: lapack.dppsv returned 228. It seems to be related to some…

apache-spark apache-spark-ml

asked Jun 03 '16 at 14:58

philippe

121
1
6

0

votes

1 answer

sc.parallelize not working in the ML pipeline with the training algorithm

With org.apache.spark.mllib learning algorithms, we used to set the pipeline without the training algorithm var stages: Array[org.apache.spark.ml.PipelineStage] = index_transformers :+ assembler val pipeline = new Pipeline().setStages(stages) and…

scala apache-spark pyspark apache-spark-mllib apache-spark-ml

asked May 31 '16 at 14:25

Abhishek

3,337
4
32
51

0

votes

1 answer

How to avoid hardcoding in column selection in data frame in apache spark | Scala

I have the following data frame and I need to run logistic regression using spark ml on it: uid a b c label d 1 0 1 3 0 2 2 3 0 0 1 0 While using the the ml package, i came to know that I need to create the data in the…

apache-spark dataframe apache-spark-ml

asked May 23 '16 at 07:18

hbabbar

947
4
15
33

0

votes

1 answer

Error with RDD[Vector] in function parameter

I am trying to define a function in scala to iterate on it with Spark. Here is my code : import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.SQLContext import org.apache.spark.ml.{Pipeline, PipelineModel} import…

scala apache-spark apache-spark-mllib apache-spark-ml

asked May 13 '16 at 15:18

pierre_comalada

300
3
11

0

votes

0 answers

Spark ML, parameter for "rawPredictionCol" for Binary Classification

I want to use The binary Classificator in Spark.ml to evaluate my model after my Pipeline. I use this code : val gbt = new GBTClassifier() .setLabelCol("Label_Index") .setFeaturesCol("features") .setMaxIter(10) .setMaxDepth(7) …

scala apache-spark apache-spark-ml

asked May 05 '16 at 16:56

pierre_comalada

300
3
11

0

votes

1 answer

Why does LogisticRegressionModel fail at scoring of libsvm data?

Load the data that you want score. The data is stored in libsvm format in the following manner: label index1:value1 index2:value2 ... (the indices are one-based and in ascending order) Here is the sample data 100 10:1 11:1 208:1 400:1…

apache-spark apache-spark-mllib apache-spark-ml

asked Apr 28 '16 at 17:37

user3803714

5,269
10
42
61

0

votes

1 answer

Overwriting ML model in S3 bucket

I am saving an ML model to an S3 bucket. After a long search this thread helped me find a solution. My code looks as follows: sc.parallelize(Seq(model), 1).saveAsObjectFile("s3a://bucket/nameModel.model") The first time a run this job everything…

scala amazon-s3 apache-spark apache-spark-ml

asked Apr 27 '16 at 13:14

RudyVerboven

1,204
1
14
31

Questions tagged [apache-spark-ml]