Highest Voted 'apache-spark-mllib' Questions

0

votes

1 answer

matrix factorization model returning much smaller dataframe after predicting ratings in pyspark

I'm trying to create a product recommender with the code below. I'm using matrix factorization from spark ml. I have data that has a customer_id, product_id, and a numeric rating value that has been normalized. So all rating values are between 0…

pyspark apache-spark-mllib recommendation-engine apache-spark-ml matrix-factorization

asked Mar 25 '22 at 22:04

user3476463

3,967
22
57
117

0

votes

1 answer

How do I extract feature_importances from my model in SparklyR?

I would like to extract feature_importances from my model in SparklyR. So far I have the following reproducible code that is working: library(sparklyr) library(dplyr) sc <- spark_connect(method = "databricks") dtrain <- data_frame(text =…

r apache-spark apache-spark-mllib apache-spark-ml sparklyr

asked Mar 16 '22 at 22:21

piper180

329
2
12

0

votes

1 answer

Generate sparse vector for all the column values in spark dataframe

column1 column2 1 1 1 0 1 0 0 0 Now I want to calculate the hash or sparse vector of all the values in column1 and column2

apache-spark pyspark apache-spark-mllib minhash

asked Mar 10 '22 at 15:13

Tanmay Sinha

1

0

votes

1 answer

How to groupBy and perform data scaling over each and every group using MlLib Pyspark?

I have a dataset just like in the example below and I am trying to group all rows from a given symbol and perform standard scaling of each group so that at the end all my data is scaled by groups. How can I do that with MlLib and Pyspark? I could…

pyspark rdd apache-spark-mllib

asked Feb 23 '22 at 11:46

Yordan Иванов

13
1
9

0

votes

1 answer

Pyspark Pipeline Performance

Is there any performance difference between using 2 separate pipelines vs 1 combined pipeline? For example, 2 separate pipelines: from pyspark.ml import Pipeline from pyspark.ml.feature import VectorAssembler df = spark.createDataFrame([ (1.0,…

python pyspark apache-spark-mllib apache-spark-ml

asked Feb 10 '22 at 16:50

Tim

3,178
1
13
26

0

votes

1 answer

How to convert a DataFrame to an Array of dense vectors?

How would I convert the following DataFrame val df = Seq( (5.0, 1.0, 1.0, 3.0, 7.0), (2.0, 0.0, 3.0, 4.0, 5.0), (4.0, 0.0, 0.0, 6.0, 7.0)).toDF("m1", "m2", "m3", "m4", "m5") //df: res166: org.apache.spark.sql.DataFrame = [m1: int, m2: int ...…

dataframe scala apache-spark vector apache-spark-mllib

asked Dec 19 '21 at 10:55

Amazonian

391
2
8
22

0

votes

1 answer

Adding custom metadata to DataFrame schema using iceberg table format

I'm adding custom metadata into the DataFrames schema in my PySpark application using StructField's metadata field It worked fine when I wrote parquet files directly into s3. The custom metadata was available when reading these parquet files as…

apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml apache-spark-2.0

asked Nov 22 '21 at 09:38

Almog Gelber

11
1

0

votes

1 answer

Training/Test data with SparkML in Scala

I've been facing with an issue for the past couple of hours. In theory, when we split data for training and testing, we should standardize the data for training independently, so as not to introduce bias, and then after having trained the model do…

scala apache-spark pipeline apache-spark-mllib

asked Nov 19 '21 at 23:42

Aron Latis

38
1
6

0

votes

1 answer

ML Tuning - Cross Validation in Spark

I am looking the cross validation code example found in https://spark.apache.org/docs/latest/ml-tuning.html#cross-validation It says: CrossValidator begins by splitting the dataset into a set of folds which are used as separate training and test…

apache-spark machine-learning apache-spark-mllib

asked Nov 10 '21 at 11:03

rayqz

249
1
8

0

votes

1 answer

Mix Smark MLLIB and SparkNLP in pipeline

In a MLLIB pipeline, how can I chain a CountVectorizer (from SparkML) after a Stemmer (from Spark NLP) ? When I try to use both in a pipeline I get: myColName must be of type equal to one of the following types: [array, array] but…

scala apache-spark apache-spark-mllib johnsnowlabs-spark-nlp

asked Oct 07 '21 at 17:29

Benjamin

3,350
4
24
49

0

votes

1 answer

Vertex ai custom model training for pyspark ml model

Is it possible to train a spark/pyspark ML lib model using VertexAI custom container model building? I couldn't find any reference in the vertex ai documents regarding spark model training. For distributed processing model building only options…

apache-spark pyspark apache-spark-mllib machine-learning-model google-cloud-vertex-ai

asked Sep 03 '21 at 06:00

Ashwar Gupta

1

0

votes

0 answers

alternative to pivoting column to create vector for kmeans in pyspark

I am trying to cluster with kmeans in pyspark. I have data like the id_predictions_df example below. I'm first pivoting the data to create a dataframe where the columns are the id_y indices and the rows would be the id_x. The values are then the…

python-3.x pyspark k-means apache-spark-mllib apache-spark-ml

asked Jul 28 '21 at 01:12

user3476463

3,967
22
57
117

0

votes

0 answers

How to decode the one hot encoder values in spark ml

Is it possible to perform oneHotDecoder after using OneHotEncoder in spark ml? Is there any way to achieve this? StringIndexer dateIndexer = new StringIndexer(); csvData =…

logistic-regression apache-spark-mllib apache-spark-ml

asked Jul 27 '21 at 08:49

Sudeep Nanda

1
2

0

votes

1 answer

Java Spark ML - java.lang.IllegalArgumentException: label does not exist. Available:

Small question regarding a Spark exception I am getting please. I have a very straightforward dataset: myCoolDataset.show(); +----------+-----+ | time|value| +----------+-----+ |1621900800| 43| …

java apache-spark apache-spark-mllib apache-spark-ml

asked Jul 15 '21 at 04:36

PatPanda

3,644
9
58
154

0

votes

1 answer

How to specify "positive class" in sparkml classification?

How to specify the "positive class" in sparkml (binary) classification? (Or perhaps: How does a MulticlassClassificationEvaluator determine which class is the "positive" one?) Suppose we were training a model to target Precision in a binary…

apache-spark-mllib apache-spark-ml

asked Jul 03 '21 at 11:58

lampShadesDrifter

3,925
8
40
102

Questions tagged [apache-spark-mllib]