Highest Voted 'spark-shuffle' Questions

0

votes

1 answer

Repartition on non-deterministic expression

I want to write code like this: df.repartition(42, monotonically_increasing_id() / lit(10000)) Is this code going to break something due to non-determinatic expression in repartition? I understand that this code will turn into HashPartitioning…

apache-spark apache-spark-sql spark-shuffle

asked Oct 28 '22 at 21:45

evalgor

3
2

0

votes

1 answer

How wide transformations are influenced by shuffle partition config

How does wide transformations actually work based on shuffle partitions configuration? If I have following program: spark.conf.set("spark.sql.shuffle.partitions", "5") val df = spark .read .option("inferSchema", "true") .option("header",…

apache-spark apache-spark-dataset spark-shuffle

asked Sep 24 '22 at 09:50

Mandroid

6,200
12
64
134

0

votes

2 answers

Spark NullPointerException: Cannot invoke invalidateSerializedMapOutputStatusCache() because "shuffleStatus" is null

I'm running a simple little Spark 3.3.0 pipeline on Windows 10 using Java 17 and UDFs. I hardly do anything interesting, and now when I run the pipeline on only 30,000 records I'm getting this: [ERROR] Error in removing shuffle…

java apache-spark spark-shuffle

asked Sep 15 '22 at 14:29

Garret Wilson

18,219
30
144
272

0

votes

1 answer

how to decide number of executors for 1 billion rows in spark

We have a table which has one billion three hundred and fifty-five million rows. The table has 20 columns. We want to join this table with another table which has more of less same number of rows. How to decide number of…

apache-spark pyspark spark-shuffle

asked Jul 26 '22 at 04:10

Surendiran Balasubramanian

25
2
7

0

votes

0 answers

How to clear Spark temporary shuffle files between stages to avoid "no space left on device" error?

I am running a spark job on a AWS EMR 6.6, (Spark 3.2.0) however it seems that spark is writing a lot of data to disk. I always thought that spark was all in memory, but it appears that spark writes temporary files to disk each time there is a wide…

amazon-web-services apache-spark pyspark amazon-emr spark-shuffle

asked Jul 13 '22 at 01:00

Mattreex

189
2
17

Questions tagged [spark-shuffle]

Repartition on non-deterministic expression

How wide transformations are influenced by shuffle partition config

Spark NullPointerException: Cannot invoke invalidateSerializedMapOutputStatusCache() because "shuffleStatus" is null

how to decide number of executors for 1 billion rows in spark

How to clear Spark temporary shuffle files between stages to avoid "no space left on device" error?