Highest Voted 'apache-spark-ml' Questions

9

votes

1 answer

How to map variable names to features after pipeline

I have modified the OneHotEncoder example to actually train a LogisticRegression. My question is how to map the generated weights back to the categorical variables? def oneHotEncoderExample(sqlContext: SQLContext): Unit = { val df =…

scala apache-spark apache-spark-mllib apache-spark-ml

asked Mar 21 '16 at 03:12

lapolonio

1,107
2
14
24

8

votes

1 answer

Slowdown with repeated calls to spark dataframe in memory

Say I have 40 continuous (DoubleType) variables that I've bucketed into quartiles using ft_quantile_discretizer. Identifying the quartiles on all of the variables is super fast, as the function supports execution of multiple variables at once.…

r apache-spark apache-spark-ml sparklyr

asked Aug 21 '18 at 23:23

hgb1234

83
3

8

votes

2 answers

How to print the decision path / rules used to predict sample of a specific row in PySpark?

How to print the decision path of a specific sample in a Spark DataFrame? Spark Version: '2.3.1' The below code prints the decision path of the whole model, how to make it print a decision path of a specific sample? For example, the decision path…

apache-spark pyspark apache-spark-ml

asked Jul 31 '18 at 13:04

PolarBear10

2,065
7
24
55

8

votes

3 answers

Spark Java IllegalArgumentException at org.apache.xbean.asm5.ClassReader

I'm trying to use Spark 2.3.1 with Java. I followed examples in the documentation but keep getting poorly described exception when calling .fit(trainingData). Exception in thread "main" java.lang.IllegalArgumentException at…

java apache-spark apache-spark-mllib apache-spark-ml

asked Jul 15 '18 at 22:11

Viacheslav Shalamov

4,149
6
44
66

8

votes

2 answers

ALS model - how to generate full_u * v^t * v?

I'm trying to figure out how an ALS model can predict values for new users in between it being updated by a batch process. In my search, I came across this stackoverflow answer. I've copied the answer below for the reader's convenience: You can…

apache-spark apache-spark-mllib apache-spark-ml

asked Jan 08 '17 at 20:23

Chris Snow

23,813
35
144
309

8

votes

1 answer

Spark schema from case class with correct nullability

For a custom Estimator`s transformSchema method I need to be able to compare the schema of a input data frame to the schema defined in a case class. Usually this could be performed like Generate a Spark StructType / Schema from a case class as…

apache-spark apache-spark-sql apache-spark-ml apache-spark-dataset spark-csv

asked Nov 27 '16 at 14:43

Georg Heiler

16,916
36
162
292

8

votes

1 answer

Issue with VectorUDT when using Spark ML

I am writing an UDAF to be applied to a Spark data frame column of type Vector (spark.ml.linalg.Vector). I rely on spark.ml.linalg package so that I do not have to go back and forth between dataframe and RDD. Inside the UDAF, I have to specify a…

scala apache-spark apache-spark-sql apache-spark-ml

asked Aug 16 '16 at 17:53

Alexey Svyatkovskiy

646
5
16

8

votes

2 answers

Any way to access methods from individual stages in PySpark PipelineModel?

I've created a PipelineModel for doing LDA in Spark 2.0 (via PySpark API): def create_lda_pipeline(minTokenLength=1, minDF=1, minTF=1, numTopics=10, seed=42, pattern='[\W]+'): """ Create a pipeline for running an LDA model on a corpus. This…

python apache-spark pyspark apache-spark-mllib apache-spark-ml

asked Jul 29 '16 at 17:42

Evan Zamir

8,059
14
56
83

8

votes

5 answers

Spark Scala: How to convert Dataframe[vector] to DataFrame[f1:Double, ..., fn: Double)]

I just used Standard Scaler to normalize my features for a ML application. After selecting the scaled features, I want to convert this back to a dataframe of Doubles, though the length of my vectors are arbitrary. I know how to do it for a specific…

scala apache-spark apache-spark-sql apache-spark-ml

asked Jun 29 '16 at 21:06

mt88

2,855
8
24
42

8

votes

1 answer

How to serialize a pyspark Pipeline object?

I'm trying to serialize a PySpark Pipeline object so that it can be saved and retrieved later. Tried using the Python pickle library as well as the PySpark's PickleSerializer, the dumps() call itself is failing. Providing the code snippet while…

python apache-spark serialization pyspark apache-spark-ml

asked Apr 15 '16 at 14:58

Dinoop Thomas

81
1
2

8

votes

1 answer

Feature normalization algorithm in Spark

Trying to understand Spark's normalization algorithm. My small test set contains 5 vectors: {0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 70000.0}, {-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, 70000.0}, {-0.95, 0.018,…

apache-spark apache-spark-mllib apache-spark-ml

asked Dec 12 '15 at 00:37

Alex B

347
1
3
9

7

votes

1 answer

Getting the leaf probabilities of a tree model in spark

I'm trying to refactor a trained spark tree-based model (RandomForest or GBT classifiers) in such a way it can be exported in environments without spark. The toDebugString method is a good starting point. However, in the case of…

apache-spark pyspark apache-spark-ml

asked Nov 12 '19 at 13:14

nicola

24,005
3
35
56

7

votes

0 answers

Is there an ARIMA model in Spark Scala?

How can we do ARIMA modeling in spark scala? Can we directly import any ARIMA package like regression or classification? In Spark's ml library, we do not have anything like ARIMA model.

apache-spark apache-spark-mllib apache-spark-ml arima

asked Mar 14 '19 at 09:46

Ashwin Padhy

111
1
9

7

votes

1 answer

Reading a custom pyspark transformer

After messing with this for quite a while, in Spark 2.3 I am finally able to get a pure python custom transformer saved. But I get an error while loading the transformer back. I checked the content of what was saved and find all the relevant…

apache-spark pyspark pipeline apache-spark-ml

asked Sep 21 '18 at 12:00

Subramaniam Ramasubramanian

859
1
12
31

7

votes

4 answers

pyspark - Convert sparse vector obtained after one hot encoding into columns

I am using apache Spark ML lib to handle categorical features using one hot encoding. After writing the below code I am getting a vector c_idx_vec as output of one hot encoding. I do understand how to interpret this output vector but I am unable to…

pyspark apache-spark-sql apache-spark-mllib apache-spark-ml one-hot-encoding

asked Jun 19 '18 at 14:48

Akash Singh

83
1
6

Questions tagged [apache-spark-ml]