Highest Voted 'spark3' Questions

2

votes

1 answer

Spark 3 KryoSerializer issue - Unable to find class: org.apache.spark.util.collection.OpenHashMap

I am upgrading a Spark 2.4 project to Spark 3.x. We are hitting a snag with some existing Spark-ml code: var stringIndexers = Array[StringIndexer]() for (featureColumn <- FEATURE_COLS) { stringIndexers = stringIndexers :+ new…

asked Oct 05 '22 at 18:35

Zachary Steudel

93
7

2

votes

1 answer

Apache Livy 0.7.0 Failed to create Interactive session

While creating a new session using apache Livy 0.7.0 I am getting below error. I am also using zeppelin notebook(livy interpreter) to create the session. Using Scala version 2.12.10, Java HotSpot(TM) 64-Bit Server VM, 11.0.11 Spark 3.0.2 zeppelin…

apache-zeppelin livy spark3

asked Jun 17 '21 at 19:19

Sushil Behera

817
6
20

2

votes

0 answers

SPARK SQL throws AssertionError: assertion failed: Found duplicate rewrite attributes (Spark 3.0.2)

Executing the above in Spark 3.0.2 produces Exception in thread "main" java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes. It was working in Spark 2.4.3. SELECT COALESCE(view_1_alias.name, view_2.name) AS name, …

apache-spark apache-spark-sql spark3

asked May 07 '21 at 08:42

KilyenOrs

1,456
1
20
26

2

votes

0 answers

in spark3.0.1,use DataFrame.foreachPartition,value foreach is not a member of Object

in idea version:spark3.0.1,scala2.12.12,java1.8.0_212 my code: val df=spark.range(10) df.foreachParitition(rows=>{ rows.foreach(.......) }) error: value foreach is not a member of Object rows.foreach(row=>{ if use spark2.4.7 and…

spark3

asked Mar 12 '21 at 08:58

ZhiYing

21
2

2

votes

0 answers

Running on external Spark 3.0.1 cluster from IntelliJ

I have recently upgraded to Spark 3.0.1 from 2.4.6 (and scala 2.11.12 to scala 2.12.10). I write and execute applications from IntelliJ Idea and in the past was able to run with both the Master set to local[*] or remotely using spark://xx:7077. My…

scala apache-spark intellij-idea spark3

asked Oct 28 '20 at 11:59

TJVR

315
6
15

2

votes

1 answer

Monitoring Spark 3 applications with Prometheus

Have some very basic questions around the pull mechanism with metrics and how Spark 3 applications can be monitored using Prometheus: Does the PrometheusServlet sink supported with Spark-3 contain all the metrics since the application start time?…

prometheus prometheus-node-exporter spark3

asked Aug 26 '20 at 04:58

soontobeared

441
4
9
30

2

votes

4 answers

PySpark structured Streaming + Kafka Error (Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.StreamWriteSupport )

I am trying to run Python Spark Structured Streaming + Kafka, when I run the command Master@MacBook-Pro spark-3.0.0-preview2-bin-hadoop2.7 % bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:2.4.5…

apache-kafka spark-structured-streaming spark3

asked Apr 22 '20 at 10:27

Petro Hupalo

23
1
5

1

vote

0 answers

How to provide hive metastore information via spark-submit?

Using Spark 3.1, I need to provide the hive configuration via the spark-submit command (not inside the code). Inside the code (which is not the solution I need), I can do the following which works fine (able to list database and select from tables.…

python apache-spark spark3 apache-spark-3.0

asked Mar 13 '23 at 10:42

Itération 122442

2,644
2
27
73

1

vote

0 answers

Spark AQE drastically reduces number of partitions

I am using spark 3.2.1 to summarise high volume data using joins. Spark's plan shows that 1 executor was tasked with 90GB of data to process after Spark's AEQShuffleRead step as shown below. Also the shuffle partition of 900 was drastically brought…

optimization spark3

asked Mar 02 '23 at 14:43

Gladiator

354
3
19

1

vote

1 answer

Why would finding an aggregate of a partition column in Spark 3 take very long time?

I'm trying to query the MIN(dt) in a table partitioned by dt column using the following query in both Spark2 and Spark3: SELECT MIN(dt) FROM table_name The table is stored in parquet format in S3, where each dt is a separate folder, so this seems…

apache-spark apache-spark-sql spark3 catalyst-optimizer

asked Jan 24 '23 at 01:52

RyanCheu

3,522
5
38
47

1

vote

0 answers

Spark custom Aggregator with multiple columns

I have written a Spark UDAF that takes as input two columns (timestamp and value) and calculates a rate of change via least squares over all data points in a given window. It works perfectly fine, the code is below (shortened to relevant…

java apache-spark spark3

asked Dec 15 '22 at 06:38

Tim Zimmermann

6,132
3
30
36

1

vote

1 answer

to_date conversion failing in PySpark on Spark 3.0

Having known about calendar change in Spark 3.0, I am trying to understand why the cast is failing in this particular instance. Spark 3.0 has issues with dates before year 1582. However, in this example, year is greater than 1582. rdd =…

apache-spark pyspark spark3

asked Dec 09 '22 at 14:40

Keerthimanu Gattu

39
1
6

1

vote

0 answers

UDF function fails in Spark 3.3.0

I have an application developed with Scala 2.11 and Spark 2.4 where and UDF is applied to a streaming dataframe to add a new column. Due to other library requirements, I have moved the application to Scala 2.12 and Spark 3.3 but now the code fails…

scala apache-spark spark-streaming user-defined-functions spark3

asked Nov 17 '22 at 10:27

aguemes

11
1

1

vote

1 answer

Spark: DF.as[Type] fails to compile

I'm trying to run an example from the Spark book Spark: The Definitive Guide build.sbt ThisBuild / scalaVersion := "3.2.1" libraryDependencies ++= Seq( ("org.apache.spark" %% "spark-sql" % "3.2.0" %…

scala apache-spark apache-spark-sql scala-3 spark3

asked Oct 30 '22 at 00:57

Yashwanth

37
1
7

1

vote

1 answer

No TypeTag available for a case class using scala 3 with spark 3

I have my code that runs a spark job with scala 3 @main def startDatasetJob(): Unit = val spark = SparkSession.builder() .appName("Datasets") .master("local[*]") .getOrCreate() case class CarRow(Name: String, …

scala apache-spark scala-3 spark3

asked Jul 14 '22 at 22:12

Liusha He

11
1

Questions tagged [spark3]