Highest Voted 'spark3' Questions

1

vote

0 answers

Can SPARK 3.1 push metrics on Prometheus? Is there a handler?

I am investigating if spark 3.1 and PPrometheus have push mechanisms between them. I know it's possible to pull but I'd like to send the metrics from Spark to Prometheus.

asked Jun 07 '22 at 10:17

Esther Rodriguez

11
1

1

vote

0 answers

Beam spark3-runner conflict with scala version

When trying to use Beam with spark 3.1.2 we are running into this issue : InvalidClassException: scala.collection.mutable.WrappedArray As explained here : https://www.mail-archive.com/issues@spark.apache.org/msg297820.html It's an incompatibility…

scala apache-beam spark3

asked Jan 10 '22 at 12:38

syronanm

11
2

1

vote

1 answer

SPARK 3 - Populate value with value from previous rows (lookup)

apache-spark pyspark apache-spark-sql spark3

asked Dec 12 '21 at 08:41

Salva

312
2
11

1

vote

2 answers

Start of the week on Monday in Spark

This is my dataset: from pyspark.sql import SparkSession, functions as F spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame([('2021-02-07',),('2021-02-08',)], ['date']) \ .select( F.col('date').cast('date'), …

apache-spark pyspark apache-spark-sql dayofweek spark3

asked Nov 30 '21 at 18:32

ZygD

22,092
39
79
102

1

vote

0 answers

Can we set up both Spark2.4 and Spark3.0 in single system?

I have Spark 2.4 installation in my Windows . This is required as my Production env. uses Spark2.4 . Now, i wanted to test Spark3.0 feature Also . So can i install Spark-3.0 binaries ,in same Windows machine without disturbing Spark-2.4 installation…

apache-spark spark3 spark2.4.4

asked May 14 '21 at 15:03

HimanshuSPaul

278
1
4
19

1

vote

0 answers

Why AQE is not shown?

My code is like sql = ''' SELECT ... FROM a LEFT JOIN b ON ... LEFT JOIN c ON ... LEFT JOIN d ON ... ''' df = spark.sql(sql) (df .repartition('col') .write .format('parquet') .mode('overwrite') .partitionBy('col') .option(...) …

apache-spark apache-spark-sql spark3

asked May 01 '21 at 07:25

Brad

11
1
2

1

vote

0 answers

Phoenix Driver ClassNotFound in Spark3 Streaming

I am migrating the existing spark streaming application from spark2.3 to spark3.1.1. I have updated below-mentioned spark dependencies org.apache.spark …

java apache-spark spark-structured-streaming apache-phoenix spark3

asked Apr 22 '21 at 17:15

Rakesh SKadam

378
1
2
18

1

vote

1 answer

How to access Spark DataFrame data in GPU from ML Libraries such as PyTorch or Tensorflow

Currently I am studying the usage of Apache Spark 3.0 with Rapids GPU Acceleration. In the official spark-rapids docs I came across this page which states: There are cases where you may want to get access to the raw data on the GPU, preferably…

tensorflow apache-spark pytorch rapids spark3

asked Jan 04 '21 at 16:02

deepNdope

179
3
14

1

vote

1 answer

AnalysisException when loading a PipelineModel with Spark 3

I am upgrading my Spark version from 2.4.5 to 3.0.1 and I cannot load anymore the PipelineModel objects that use a "DecisionTreeClassifier" stage. In my code I load several PipelineModel, all the PipelineModel with stages ["CountVectorizer_[uid]",…

python apache-spark machine-learning spark3

asked Nov 10 '20 at 08:13

Be Chiller Too

2,502
2
16
42

1

vote

2 answers

Spark 3.0 and Cassandra Spark / Python Conenctors: Table is not being created prior to write

I'm currently trying to upgrade my application to Spark 3.0.1. For table creation, I drop and create a table using cassandra-driver, the Python-Cassandra connector. Then I write a dataframe into the table using the spark-cassandra connector. There…

python pyspark cassandra spark3

asked Oct 18 '20 at 21:18

L. Chu

123
3
14

1

vote

2 answers

How to save spark dataset in encrypted format?

I am saving my spark dataset as parquet file in my local machine. I would like to know if there are any ways I could encrypt the data using some encryption algorithm. The code I am using to save my data as parquet file looks something like…

java apache-spark hadoop encryption spark3

asked Aug 28 '20 at 15:15

Somesh Dhal

336
2
15

1

vote

1 answer

org.apache.spark.shuffle.FetchFailedException: Connection from server1/xxx.xxx.x.xxx:7337 closed

Highlight I have upgraded Spark and trying to run already present Spark Streaming application (Accepts file names via stream, which are then read from HDFS, transformed using rdd and dataframes operations, finally analysed data set is persisted in…

apache-spark spark-streaming hadoop-yarn shuffle spark3

asked Jul 14 '20 at 06:26

Ajay Sharma

73
1
7

1

vote

1 answer

Spark binary data source vs sc.binaryFiles

Spark 3.0 enables reading binary data using a new data source: val df = spark.read.format(“binaryFile”).load("/path/to/data") Using previous spark versions you cloud load data using: val rdd = sc.binaryFiles("/path/to/data") Beyond having the…

scala apache-spark binary-data spark3

asked Jun 26 '20 at 12:23

Yosi Dahari

6,794
5
24
44

1

vote

1 answer

Bootstrapping Spark 3.0.0 on EMR cluster

A few days back Spark 3.0.0 was launched. I would like to use some of these functionalities. The default version for Spark on an EMR cluster now is Spark 2.4.5. I specifically make use of PySpark. My question is: how can I install/bootstrap Spark…

amazon-web-services installation pyspark amazon-emr spark3

asked Jun 22 '20 at 16:02

thijsvdp

404
3
16

0

votes

0 answers

Why are spark3 dynamic partitions slow to write to hive

Question 1: I have a table with a small amount of data, but there are a lot of dynamic partitions in the daily writes, the original spark2 writes can be solved in only 2 minutes, but after upgrading to spark3 it takes 10 minutes to write completely.…

apache-spark apache-spark-sql hive bigdata spark3

asked Aug 29 '23 at 06:02

langxianashen2001

1
2

Questions tagged [spark3]