Highest Voted 'apache-spark-mllib' Questions

10

votes

1 answer

How to convert org.apache.spark.rdd.RDD[Array[Double]] to Array[Double] which is required by Spark MLlib

I am trying to implement KMeans using Apache Spark. val data = sc.textFile(irisDatasetString) val parsedData = data.map(_.split(',').map(_.toDouble)).cache() val clusters = KMeans.train(parsedData,3,numIterations = 20) on which I get the following…

apache-spark apache-spark-mllib

asked Jan 08 '15 at 06:29

sand

137
1
2
9

10

votes

2 answers

Mllib dependency error

I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when trying to bulid the program: Object Mllib is not a member of package org.apache.spark Then, I realized that I have to add Mllib as dependency…

scala apache-spark apache-spark-mllib

asked Dec 12 '14 at 06:50

user3789843

1,009
2
11
18

10

votes

3 answers

How do I run the Spark decision tree with a categorical feature set using Scala?

I have a feature set with a corresponding categoricalFeaturesInfo: Map[Int,Int]. However, for the life of me I cannot figure out how I am supposed to get the DecisionTree class to work. It will not accept anything, but a LabeledPoint as data.…

scala apache-spark tree apache-spark-mllib categorical-data

asked Jul 30 '14 at 13:36

Climbs_lika_Spyder

6,004
3
39
53

9

votes

3 answers

Spark v3.0.0 - WARN DAGScheduler: broadcasting large task binary with size xx

I'm new to spark. I'm coding a machine learning algorithm in Spark standalone (v3.0.0) with this configurations set: SparkConf conf = new SparkConf(); conf.setMaster("local[*]"); conf.set("spark.driver.memory",…

java apache-spark apache-spark-mllib apache-spark-ml

asked Sep 02 '20 at 10:52

vittoema96

121
1
1
6

9

votes

1 answer

What Type should the dense vector be, when using UDF function in Pyspark?

I want to change List to Vector in pySpark, and then use this column to Machine Learning model for training. But my spark version is 1.6.0, which does not have VectorUDT(). So what type should I return in my udf function? from pyspark.sql import…

python apache-spark machine-learning pyspark apache-spark-mllib

asked Apr 03 '18 at 06:31

nick_liu

415
5
17

9

votes

1 answer

Vector assembler in Pyspark is creating tuple of multiple vectors instead of a single vector, how to solve the issue?

My python version is 3.6.3 and spark version is 2.2.1. Here is my code: from pyspark.ml.linalg import Vectors from pyspark.ml.feature import VectorAssembler from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession sc =…

python apache-spark pyspark apache-spark-mllib

asked Feb 06 '18 at 09:01

Mir Md Faysal

477
3
13

9

votes

2 answers

Comparing two arrays and getting the difference in PySpark

I have two array fields in a data frame. I have a requirement to compare these two arrays and get the difference as an array(new column) in the same data frame. Expected output is: Column B is a subset of column A. Also the words is going to be in…

python pyspark apache-spark-sql apache-spark-mllib

asked Oct 27 '17 at 11:15

jiks-hue

139
1
1
7

9

votes

1 answer

How to extract vocabulary from Pipeline

I can extract vocabulary from CountVecotizerModel by the following way fl = StopWordsRemover(inputCol="words", outputCol="filtered") df = fl.transform(df) cv = CountVectorizer(inputCol="filtered", outputCol="rawFeatures") model =…

python apache-spark pyspark apache-spark-mllib

asked Oct 12 '17 at 17:27

user2377528

9

votes

2 answers

How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

How can you evaluate the implicit feedback collaborative filtering algorithm of Apache Spark, given that the implicit "ratings" can vary from zero to anything, so a simple MSE or RMSE does not have much meaning?

apache-spark apache-spark-mllib

asked Sep 28 '17 at 06:36

Dimitris Poulopoulos

1,139
2
15
36

9

votes

0 answers

Non-linear SVM is not available in Apache Spark

Does avyone know the reason why the Non-Linear SVM has not been implemented in Apache Spark? I was reading this page: https://issues.apache.org/jira/browse/SPARK-4638 Look at the last comment. It says: "Commenting here b/c of the recent dev list…

scala apache-spark svm apache-spark-mllib

asked May 12 '17 at 23:16

Vitrion

405
5
14

9

votes

1 answer

How to do prediction with Sklearn Model inside Spark?

I have trained a model in python using sklearn. How we can use same model to load in Spark and generate predictions on a spark RDD ?

python apache-spark scikit-learn pyspark apache-spark-mllib

asked Mar 19 '17 at 14:15

Tanveer

890
12
22

9

votes

2 answers

Online learning of LDA model in Spark

Is there a way to train a LDA model in an online-learning fashion, ie. loading a previously train model, and update it with new documents ?

apache-spark machine-learning apache-spark-mllib lda apache-spark-ml

asked Mar 08 '17 at 18:11

mathieu

2,330
2
24
44

9

votes

2 answers

(Spark) object {name} is not a member of package org.apache.spark.ml

I'm trying to run self-contained application using scala on apache spark based on example here: http://spark.apache.org/docs/latest/ml-pipeline.html Here's my complete code: import org.apache.spark.ml.classification.LogisticRegression import…

scala apache-spark sbt apache-spark-mllib

asked Oct 27 '16 at 10:07

Yusata

199
1
3
16

9

votes

4 answers

How to create a Row from a List or Array in Spark using java

In Java, I use RowFactory.create() to create a Row: Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3)); where "record" is a record from a database, but I cannot know the length of "record" in advance, so I want to…

java apache-spark apache-spark-mllib

asked Sep 26 '16 at 06:52

user2736706

103
1
1
5

9

votes

1 answer

Speed up collaborative filtering for large dataset in Spark MLLib

I'm using MLlib's matrix factorization to recommend items to users. I have about a big implicit interaction matrix of M=20 million users and N=50k items. After training the model I want to get a short list(e.g. 200) of recommendations for each user.…

scala apache-spark apache-spark-mllib collaborative-filtering

asked Aug 23 '16 at 15:05

Rainfield

1,172
2
14
29

Questions tagged [apache-spark-mllib]