1

I have recently been introduced to SparkSQL. We use Spark 2.4. I recently found out that SparkSQL query supports the following hints for its Join strategies:

  • BROADCAST hint
  • MERGE hint
  • SHUFFLE_HASH hint

Unfortunately, I have not found any online materials which elaborately discuss these hints and their application scenarios. I wish to learn some tips regarding when to use these hints in a query Join for improving query performance.

Can anyone explain with some examples. Any help is appreciated. Thanks

Matthew
  • 315
  • 3
  • 5
  • 16

1 Answers1

0
  1. Broadcast join is very high performance join with sending data of the small table to every executor to execute a map-side join . here is the configuration :spark.sql.autoBroadcastJoinThreshold
  2. Sort-merge join is a default join choice after spark 2.3

there are some post ,Hope it help you: Spark SQL Joins Sort-Merge Join

Jax Ma
  • 11
  • 4