2

I'm new to Spark, and I'm trying to understand what are the benefit of using broadcast var on using singleton wrapper. I am aware that Spark also attempts to distribute broadcast variables using efficient broadcast algorithms to reduce communication cost- yet let's assume this happens once on a long living application, hence is not an overhead.

is each task will hold a copy of the singelton or only the executor itself?

I'm trying to understand how it works with singleton and compare it to broadcast.

If this question repeat itself, please let me know, since I didn't find once that were answered.

Community
  • 1
  • 1
apolak
  • 141
  • 1
  • 14

1 Answers1

0

is each task will hold a copy of the singelton or only the executor itself?

Each worker keeps a single cached copy of broadcast variable. All the tasks that need to access broadcast variable consult the same copy residing on that worker. Yes it's a single copy on each worker but not a JVM level singleton as the it's lifecycle is managed by BroadcastManager and ContextCleaner.

This book describes it brilliantly here https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-broadcast.html See how good it is delineated there in that diagram.

Shubham Chaurasia
  • 2,472
  • 2
  • 15
  • 22