We have a table which has one billion three hundred and fifty-five million rows. The table has 20 columns.
We want to join this table with another table which has more of less same number of rows.
How to decide number of spark.conf.set("spark.sql.shuffle.partitions",?)
How to decide number of executors and its resource allocation details?
How to find the amount of storage those one billion three hundred and fifty-five million rows will take in memory?