1

A simple Spark streaming app without any heavy in memory computation is consuming 17GB of Memory as soon after the STATE gets changed to RUNNING.

Cluster setup:

  • 1x master (2 vCPU, 13.0 GB memory)
  • 2x workers (2 vCPU, 13.0 GB memory)

YARN resource manager displays: Mem Total - 18GB, vCore Total - 4

Spark streaming app source code can be found here and as you can see it doesn't do much:

Spark submit command (through SSH not GCLOUD SDK):

spark-submit --master yarn \
             --deploy-mode cluster \
             --num-executors 1 \
             --driver-cores 1 \
             --executor-memory 1g  \
             --driver-memory 512m \
             --executor-cores 1 \
             --class JavaCustomReceiver my_project.jar

Why would such a simple app allocate that much of memory?

I'm using GCP Dataproc default configuration, is there any YARN config that should be amended?

howie
  • 2,587
  • 3
  • 27
  • 43
Devester
  • 1,183
  • 4
  • 14
  • 41

1 Answers1

1

How many tasks does your application require? Note, Dataproc by default has dynamic allocation turned on which will request more executors from YARN as necessary.

  • thanks for that...good to know that dynamic allocation is enabled by default. We haven't figure out yet on how to limit the amount of resource per app. The app is very simple it shouldn't consume 17GB of ram. – Devester Aug 12 '19 at 21:49