Highest Voted 'apache-spark-mllib' Questions

18

votes

2 answers

Incremental training of ALS model

I'm trying to find out if it is possible to have "incremental training" on data using MLlib in Apache Spark. My platform is Prediction IO, and it's basically a wrapper for Spark (MLlib), HBase, ElasticSearch and some other Restful parts. In my app…

apache-spark machine-learning prediction apache-spark-mllib predictionio

asked Jan 01 '15 at 20:21

Wouter

1,678
3
20
32

17

votes

2 answers

KMeans clustering in PySpark

I have a spark dataframe 'mydataframe' with many columns. I am trying to run kmeans on only two columns: lat and long (latitude & longitude) using them as simple values). I want to extract 7 clusters based on just those 2 columns and then I want to…

machine-learning pyspark k-means apache-spark-mllib apache-spark-ml

asked Dec 01 '17 at 02:22

user3245256

1,842
4
24
51

17

votes

3 answers

Spark Word2vec vector mathematics

I was looking at the example of Spark site for Word2Vec: val input = sc.textFile("text8").map(line => line.split(" ").toSeq) val word2vec = new Word2Vec() val model = word2vec.fit(input) val synonyms = model.findSynonyms("country name here",…

apache-spark machine-learning apache-spark-mllib word2vec

asked Dec 09 '15 at 06:38

user3803714

5,269
10
42
61

17

votes

1 answer

How to get word details from TF Vector RDD in Spark ML Lib?

I have created Term Frequency using HashingTF in Spark. I have got the term frequencies using tf.transform for each word. But the results are showing in this format. [,…

apache-spark apache-spark-mllib tf-idf apache-spark-ml

asked Aug 29 '15 at 11:46

Srini

3,334
6
29
64

17

votes

2 answers

Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification

I'm writing a spark application and would like to use algorithms in MLlib. In the API doc I found two different classes for the same algorithm. For example, there is one LogisticRegression in org.apache.spark.ml.classification also a…

scala apache-spark apache-spark-mllib

asked May 14 '15 at 07:35

ailzhang

173
1
4

17

votes

5 answers

PySpark & MLLib: Random Forest Feature Importances

I'm trying to extract the feature importances of a random forest object I have trained using PySpark. However, I do not see an example of doing this anywhere in the documentation, nor is it a method of RandomForestModel. How can I extract feature…

apache-spark pyspark random-forest apache-spark-mllib

asked Mar 10 '15 at 19:01

Bryan

5,999
9
29
50

17

votes

1 answer

apache spark MLLib: how to build labeled points for string features?

I am trying to build a NaiveBayes classifier with Spark's MLLib which takes as input a set of documents. I'd like to put some things as features (i.e. authors, explicit tags, implicit keywords, category), but looking at the documentation it seems…

java apache-spark machine-learning apache-spark-mllib feature-selection

asked Dec 06 '14 at 18:01

riffraff

2,429
1
23
32

16

votes

3 answers

How to overwrite entire existing column in Spark dataframe with new column?

I want to overwrite a spark column with a new column which is a binary flag. I tried directly overwriting the column id2 but why is it not working like a inplace operation in Pandas? How to do it without using withcolumn() to create new column and…

apache-spark dataframe pyspark apache-spark-sql apache-spark-mllib

asked Jun 19 '17 at 06:21

GeorgeOfTheRF

8,244
23
57
80

16

votes

1 answer

How to convert ArrayType to DenseVector in PySpark DataFrame?

I'm getting the following error trying to build a ML Pipeline: pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually…

python apache-spark pyspark apache-spark-mllib apache-spark-ml

asked Aug 18 '16 at 19:02

Evan Zamir

8,059
14
56
83

16

votes

4 answers

PySpark computing correlation

I want to use pyspark.mllib.stat.Statistics.corr function to compute correlation between two columns of pyspark.sql.dataframe.DataFrame object. corr function expects to take an rdd of Vectors objects. How do I translate a column of df['some_name']…

python apache-spark pyspark apache-spark-sql apache-spark-mllib

asked Jun 03 '16 at 16:06

VJune

1,195
5
16
26

16

votes

1 answer

Spark ML indexer cannot resolve DataFrame column name with dots?

I have a DataFrame with a column named a.b. When I specify a.b as the input column name to a StringIndexer, AnalysisException with the message "cannot resolve 'a.b' given input columns a.b". I'm using Spark 1.6.0. I'm aware that older versions of…

java apache-spark apache-spark-mllib apache-spark-ml

asked Jan 22 '16 at 18:22

Joshua Taylor

84,998
9
154
353

16

votes

1 answer

Why spark.ml don't implement any of spark.mllib algorithms?

Following the Spark MLlib Guide we can read that Spark has two machine learning libraries: spark.mllib, built on top of RDDs. spark.ml, built on top of Dataframes. According to this and this question on StackOverflow, Dataframes are better (and…

machine-learning apache-spark pyspark apache-spark-mllib apache-spark-ml

asked Oct 20 '15 at 12:47

Paladini

4,522
15
53
96

16

votes

1 answer

What is rank in ALS machine Learning Algorithm in Apache Spark Mllib

I Wanted to try an example of ALS machine learning algorithm. And my code works fine, However I do not understand parameter rank used in algorithm. I have following code in java // Build the recommendation model using ALS int rank = 10; …

algorithm apache-spark machine-learning apache-spark-mllib

asked Jun 09 '15 at 10:37

hard coder

5,449
6
36
61

16

votes

4 answers

What is the right way to save\load models in Spark\PySpark

I'm working with Spark 1.3.0 using PySpark and MLlib and I need to save and load my models. I use code like this (taken from the official documentation ) from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating data =…

python apache-spark pyspark apache-spark-mllib

asked Mar 25 '15 at 12:03

artemdevel

641
1
9
21

15

votes

1 answer

Spark ML VectorAssembler returns strange output

I am experiencing a very strange behaviour from VectorAssembler and I was wondering if anyone else has seen this. My scenario is pretty straightforward. I parse data from a CSV file where I have some standard Int and Double fields and I also…

scala apache-spark apache-spark-mllib apache-spark-ml

asked Nov 09 '16 at 11:22

Dimitris

2,030
3
27
45

Questions tagged [apache-spark-mllib]