Highest Voted 'apache-spark-ml' Questions

-1

votes

1 answer

Pyspark NLP - CountVectorizer Max DF or TF. How to filter common occurrences from dataset

I am using CountVectorizer to ready a dataset for ML. I want to filter out the rare words and I use the parameter of CountVectorizer, minDF or minTF for that. I would also like to remove items that appear 'often' in my dataset. I do not see a maxTF…

python apache-spark pyspark nlp apache-spark-ml

asked Jul 02 '18 at 21:41

JB5

97
2
8

-1

votes

1 answer

convert Seq[(String, Any)] to Seq[(String, org.apache.spark.ml.PredictionModel[_, _])] in spark

i had trained my dataset into different models such as nbModel, dtModel, rfModel, GbmModel . All these are machine learning models now when i am saving it into a variable as val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel),…

apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml

asked Apr 02 '18 at 06:17

shane

3
1

-1

votes

1 answer

type mismatch error while running ml.PredictionModel in spark

After training all the model, i am trying to rename each model prediction column to uniquely identify the model prediction inside the dataset.I am getting type mismatch error as specified below : import org.apache.spark.ml.PredictionModel import…

apache-spark apache-spark-sql apache-spark-ml

asked Apr 01 '18 at 09:52

Parv bali

147
1
11

-1

votes

1 answer

Spark ML- prediction in KMeans

I have created a KMeans model using Spark ML methods. val kmeans = new KMeans() val model = kmeans.fit(df) I got my model ready. But how to predict that in which cluster new data points will fall. In MLlib, model.predict(Vector) predict the cluster…

apache-spark k-means apache-spark-mllib apache-spark-ml

asked Jan 03 '18 at 09:34

Ishan Kumar

1,941
3
20
29

-1

votes

3 answers

PCA() got an unexpected keyword argument 'k'

I am trying t perform pca from a spark application using PySpark API on a python script. I doing This way: pca = PCA(k=3, inputCol="features", outputCol="pcaFeatures") PCAmodel = pca.fit(data) when I run those two code line in the pyspark shell it…

pyspark pca apache-spark-ml

asked Nov 15 '17 at 15:44

user5492457

-1

votes

1 answer

Regression in PySpark. Which library to Use

What are the differences between "pyspark.mllib.regression" and "pyspark.ml.regression" Which one should be used

apache-spark pyspark apache-spark-mllib apache-spark-ml

asked Sep 06 '17 at 16:45

Shiv

369
2
13

-1

votes

1 answer

How can we compare the decision trees algorithm performance in terms of accuracy from scikit-learn and from Spark ML?

I am comparing the accuracy for text classification obtained using sklearn DT and Spark ML DT with same features and dataset. Is it appropriate to even compare them? The reason being, the parameters list is different for both of them so I think…

machine-learning scikit-learn classification decision-tree apache-spark-ml

asked May 05 '17 at 02:48

Aishwarya Soni

105
1
9

-1

votes

2 answers

can't define a udf inside pyspark project

I have a python project that uses pyspark and i am trying to define a udf function inside the spark project (not in my python project) specifically in spark\python\pyspark\ml\tuning.py but i get pickling problems. it can't load the udf. The…

python pyspark udf apache-spark-ml

asked Sep 22 '16 at 14:25

ofer-a

521
5
21

-1

votes

1 answer

Exception on using VectorAssembler in apache spark ml

I'm trying to create a vectorAssembler to create an input for logistic regression and am using the following code : //imports import org.apache.spark.ml.feature.VectorAssembler import org.apache.spark.mllib.linalg.{Vector, Vectors, VectorUDT} 1…

apache-spark apache-spark-mllib apache-spark-ml

asked May 23 '16 at 10:27

hbabbar

947
4
15
33

-3

votes

1 answer

How to process dataframe for ML using Pyspark

I am doing GBT modelling in using pyspark. I have a dataframe, the features for input (X) are multiple columns: A,B,C the output (Y) is one column with binary values 0 and 1. I am confused with the VectorAssembler and transform in processing the…

python apache-spark machine-learning pyspark apache-spark-ml

asked Feb 06 '18 at 07:06

Xin Chang

87
1
5

Questions tagged [apache-spark-ml]