Highest Voted 'apache-spark-mllib' Questions

9

votes

2 answers

Polynomial regression in spark/ or external packages for spark

After investing good amount of searching on net for this topic, I am ending up here if I can get some pointer . please read further After analyzing Spark 2.0 I concluded polynomial regression is not possible with spark (spark alone), so is there…

machine-learning regression apache-spark-mllib

asked Aug 10 '16 at 13:58

sourabh

223
2
13

9

votes

1 answer

Non linear (DAG) ML pipelines in Apache Spark

I've set-up a simple Spark-ML app, where I have a pipeline of independent transformers that add columns to a dataframe of raw data. Since the transformers don't look at the output of one another I was hoping I could run them in parallel in a…

apache-spark apache-spark-mllib apache-spark-ml

asked May 31 '16 at 09:19

hillel

2,343
2
18
25

9

votes

2 answers

How to fix "MetadataFetchFailedException: Missing an output location for shuffle"?

If I increase the model size of my word2vec model I start to get this kind of exception in my log: org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 6 at…

scala apache-spark apache-spark-mllib word2vec

asked Apr 23 '16 at 19:38

Stefan Falk

23,898
50
191
378

9

votes

2 answers

How to convert type Row into Vector to feed to the KMeans

when i try to feed df2 to kmeans i get the following error clusters = KMeans.train(df2, 10, maxIterations=30, runs=10, initializationMode="random") The error i get: Cannot convert type into…

apache-spark pyspark k-means apache-spark-mllib apache-spark-sql

asked Mar 21 '16 at 22:39

chessosapiens

3,159
10
36
58

9

votes

1 answer

How to map variable names to features after pipeline

I have modified the OneHotEncoder example to actually train a LogisticRegression. My question is how to map the generated weights back to the categorical variables? def oneHotEncoderExample(sqlContext: SQLContext): Unit = { val df =…

scala apache-spark apache-spark-mllib apache-spark-ml

asked Mar 21 '16 at 03:12

lapolonio

1,107
2
14
24

9

votes

1 answer

How do I use Spark's Feature Importance on Random Forest?

The documentation for Random Forests does not include feature importances. However, it is listed on the Jira as resolved and is in the source code. HERE also says "The main differences between this API and the original MLlib ensembles API…

scala apache-spark random-forest apache-spark-mllib

asked Jan 05 '16 at 22:15

Climbs_lika_Spyder

6,004
3
39
53

9

votes

4 answers

PySpark & MLLib: Class Probabilities of Random Forest Predictions

I'm trying to extract the class probabilities of a random forest object I have trained using PySpark. However, I do not see an example of it anywhere in the documentation, nor is it a a method of RandomForestModel. How can I extract class…

apache-spark pyspark random-forest apache-spark-mllib

asked Mar 02 '15 at 20:15

Bryan

5,999
9
29
50

9

votes

1 answer

Spark MLLib TFIDF implementation for LogisticRegression

I try to use the new TFIDF algorithem that spark 1.1.0 offers. I'm writing my job for MLLib in Java but I can't figure out how to get the TFIDF implementation working. For some reason IDFModel only accepts a JavaRDD as input for the method transform…

java apache-spark apache-spark-mllib tf-idf

asked Nov 12 '14 at 22:29

Johnny000

2,058
5
30
59

8

votes

1 answer

Using Jackson 2.9.9 in java Spark

I am trying to use the MLLIB library (java) but one of my dependencies uses Jackson 2.9.9. I noticed that a pull request was made such that the master branch's dependency is upgraded to this particular version. Now I wanted to use this master branch…

java apache-spark jackson apache-spark-mllib

asked Aug 16 '19 at 06:26

Jasper

628
1
9
19

8

votes

1 answer

Failed to execute user defined function($anonfun$9: (string) => double) on using String Indexer for multiple columns

I am trying to apply string indexer on multiple columns. Here is my code val stringIndexers = Categorical_Model.map { colName =>new StringIndexer().setInputCol(colName).setOutputCol(colName + "_indexed")} var dfStringIndexed =…

scala apache-spark apache-spark-mllib

asked Jul 22 '19 at 09:40

Leothorn

1,345
1
23
45

8

votes

2 answers

Are random seeds compatible between systems?

I made a random forest model using python's sklearn package where I set the seed to for example to 1234. To productionise models, we use pyspark. If I was to pass the same hyperparmeters and same seed value, i.e. 1234, will it get the same…

python random scikit-learn pyspark apache-spark-mllib

asked Sep 12 '18 at 11:17

Auren Ferguson

479
6
17

8

votes

3 answers

Spark Java IllegalArgumentException at org.apache.xbean.asm5.ClassReader

I'm trying to use Spark 2.3.1 with Java. I followed examples in the documentation but keep getting poorly described exception when calling .fit(trainingData). Exception in thread "main" java.lang.IllegalArgumentException at…

java apache-spark apache-spark-mllib apache-spark-ml

asked Jul 15 '18 at 22:11

Viacheslav Shalamov

4,149
6
44
66

8

votes

1 answer

How to set a custom loss function in Spark MLlib

I would like to use my own loss function instead of the squared loss for the linear regression model in spark MLlib. So far can't find any part in the documentation that mentions if it is even possible.

scala apache-spark machine-learning regression apache-spark-mllib

asked Nov 14 '17 at 17:34

user4658980

8

votes

3 answers

convert dataframe to libsvm format

I have a dataframe resulting from a sql query df1 = sqlContext.sql("select * from table_test") I need to convert this dataframe to libsvm format so that it can be provided as an input for pyspark.ml.classification.LogisticRegression I tried to do…

apache-spark pyspark apache-spark-sql apache-spark-mllib

asked May 11 '17 at 15:44

sah.stc

105
2
2
8

8

votes

2 answers

How to use QuantileDiscretizer across groups in a DataFrame?

I have a DataFrame with the following columns. scala> show_times.printSchema root |-- account: string (nullable = true) |-- channel: string (nullable = true) |-- show_name: string (nullable = true) |-- total_time_watched: integer (nullable =…

scala apache-spark apache-spark-sql apache-spark-mllib

asked May 02 '17 at 16:27

kalyan chakravarthy

643
10
29

Questions tagged [apache-spark-mllib]