Highest Voted 'apache-spark-mllib' Questions

1

vote

0 answers

how to reduce time of spark application run by "java -jar"

I execute a spark application two ways. Application is Naive bayes training using MlLib. using "spark-submit": then successful execute on a set of data. using "java -jar": then take more time form case 1. In both case have same data set and same…

java hadoop apache-spark apache-spark-mllib

asked Oct 07 '14 at 07:15

Tinku

751
5
19

1

vote

1 answer

Twitter sentiment analysis using Naive Bayes in apache spark

I am trying to do a basic twitter sentiment analysis, by using apache spark. The below page explains on Naive Bayes function used at apache spark which would be a candidate for the above…

java twitter apache-spark machine-learning apache-spark-mllib

asked Sep 18 '14 at 09:07

Siva

1,839
5
21
31

1

vote

1 answer

Classification with Spark MLlib in Java

I am trying to build a classification system with Apache Spark's MLlib. I have shortlisted Naive Bayes algorithm to do this, and will be using Java 8 for the support of Lambda expressions. I am a newbie in terms of lambda expressions and hence am…

apache-spark java-8 apache-spark-mllib tf-idf

asked Sep 02 '14 at 16:35

jatinpreet

589
1
4
11

1

vote

1 answer

Apache Spark - MLlib - K-Means Input format

I want to perform a K-Means task and fail training the model and get kicked out of Sparks scala shell before I get my result metrics. I am not sure if the input format is the problem or something else. I use Spark 1.0.0 and my input textile (400MB)…

scala apache-spark k-means apache-spark-mllib

asked Aug 07 '14 at 14:35

user3400996

73
1
9

1

vote

1 answer

Passing long values into MLlib's Rating() method

I am trying to build a recommender system using Spark's MLlib library. (using Scala) In order to be able to use the ALS train method , I need to build a rating matrix using the Rating() method (which is a part the package…

scala apache-spark apache-spark-mllib

asked Jun 19 '14 at 13:08

shahharsh2603

73
2
9

0

votes

0 answers

spark-job on spark kubernetes cluster took long time to complete

I have setup 3 node spark kubernetes cluster with spark-kubernetes-operator helm-chart. The kubernetes cluster deployed on aws t2.2xlarge instances with 8 vcpus and 32gb memory. I have build RandomForest price prediction spark-pipeline with Scala…

apache-spark kubernetes machine-learning random-forest apache-spark-mllib

asked Sep 03 '23 at 00:42

eranga

519
7
17

0

votes

0 answers

Is weightcol of spark decision tree classifier used directly in impurity calculation?

My dataset is > 300mil rows so I think spark-ml would be a better choice than sklearn. Since there are many rows with same set of features (they are different data points), I furtherly aggregate the dataset by feature and target and produce a weight…

decision-tree apache-spark-mllib

asked Aug 25 '23 at 20:06

Zhenyu Zhang

63
3

0

votes

0 answers

How to update required memory for single node Apache Spark Scala Job?

I have been running Spark scala Job using 32 cores in a single node Apache Spark-3.2.x cluster. The actual host machine consists of 256 GB RAM and 128 cores. How can I provide the memory required for the entire Job? At present this Job processes 10M…

scala apache-spark apache-spark-mllib

asked Aug 22 '23 at 14:58

user648330

25
3

0

votes

0 answers

How to change a sparse vector column in a dataframe into a dense one with PySpark? (or how to translate my Scala function to a pySpark one?)

I'm trying to tune a LLM (Bert or embeddings such as Glove) on a text column for text classification. I'm using SparkNLP for preprocessing and creating the embeddings, and PySpark (Spark ML) for the machine learning part. I'm at a point that I have…

scala apache-spark pyspark apache-spark-mllib transfer-learning

asked Aug 17 '23 at 14:07

RFAI

459
4
17

0

votes

1 answer

Using PipelineModel.load() in custom MLFlow PyFunc class results in error

Am creating a custom PyFunc class to use with Databricks Feature Store as their Model Serving UI and feature store's log_model() methods only work with the PythonModel class. The underlying model is a PipelineModel() which perform various binning…

apache-spark azure-databricks apache-spark-mllib mlflow feature-store

asked Jul 17 '23 at 13:50

Darren Teo

41
1

0

votes

0 answers

How to substract one DenseVector from another in Spark MLLib

a and b are two spark.mllib.linalg.DenseVectors: import org.apache.spark.mllib.linalg.DenseVector ... val a = DenseVector(Array(1.0, 2.0, 3.0)) val b = DenseVector(Array(2.0, 3.0, 1.0)) ... I want to get c as subtraction of b from c. How to do it?

scala apache-spark-mllib

asked Jun 23 '23 at 11:12

Георгий Гуминов

117
1
2
15

0

votes

1 answer

Spark KMeans produces deterministic results and not random

I am running Spark KMeans and I would like to have random seeds in every run for different results every time, however this is not the case. This is the code that I am using: KMeans kmeans = new KMeans().setK(4).setInitMode("random"); KMeansModel…

java apache-spark k-means apache-spark-mllib

asked May 15 '23 at 07:06

Des0lat0r

482
3
18

0

votes

1 answer

How to load spark saved pipeline and retrain with new data

I hope to load a saved pipeline with spark and than re-fit it with new data collected in a day by day strategy. Here is my current code: new_data_df = data in current day if target path exists: model = PipelineModel.load("path/to/pipeline") …

pyspark apache-spark-mllib

asked Mar 26 '23 at 08:23

G_cy

994
3
13
28

0

votes

1 answer

PySpark ArrayIndexOutOfBoundsException error during model fit: How can I diagnose and fix the issue?

I am working on a PySpark project where I'm trying to fit a MultilayerPerceptronClassifier model to my text data using the fit method.I am using the Word2ve model provided bu Mllib to extract features . However, I keep running into an…

pyspark word2vec apache-spark-mllib mlp

asked Mar 17 '23 at 13:08

Ibtissam Youb

1
2

0

votes

0 answers

Apache Spark MLlib StandardScaler vs z-score

So, I am wondering if there is any difference between the StandardScaler of Spark and a simple z-score calculation. The formula for the z-score calculation is: z = (x-mean)/Std However for the StandardScaler of Spark it is not clear to me how…

normalization apache-spark-mllib z-score

asked Mar 15 '23 at 09:32

Des0lat0r

482
3
18

Questions tagged [apache-spark-mllib]