0

We use broadcast as one of the joining optimize solution in spark. Could you please help me to understand below things


1) Always broadcast table size should be less than driver memory .

In this case suppose my broadcast table size is 4 GB but driver memory is 3GB , Can i increase the driver memory to 6 GB and broadcast 4 GB table


2) What could be the maximum driver memory can i provide is there any limit ?

I think , It totally depends on what we are bringing to driver ( broadcast, collect etc)


3) I heard we can broadcast upto only 2GB data , because java serialisation has support till 2GB data only , is it true ?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • 1
    Welcome! Could you please focus your question to one specific question? What is the question you really need an answer to? – Matt Andruff May 25 '22 at 13:53
  • 2) Yes, there is a limit for the server RAM your code runs on. 3) Spark can (and generally should) use Kyro serialization rather than Java's built-in one, but I don't think broadcasting should be used for GB worth of data. – OneCricketeer May 25 '22 at 16:51
  • https://stackoverflow.com/questions/41045917/what-is-the-maximum-size-for-a-broadcast-object-in-spark i got answer from this link – Anubeig Mogal May 26 '22 at 04:25

0 Answers0