I am using spark streaming in my application. Data comes in the form of streaming files every 15 minute. I have allocated 10G of RAM to spark executors. With this setting my spark application is running fine.
But by looking the spark UI, under Storage tab -> Size in Memory the usage of RAM keep on increasing over the time.
When I started streaming job, "Size in Memory" usage was in KB. Today it has been 2 weeks 2 days 22 hours since when I started the streaming job and usage has increased to 858.4 MB.
Also I have noticed on more thing, under Streaming heading:
When I started the job, Processing Time and Total Delay (from the image) was 5 second and which after 16 days, increased to 19-23 seconds while the streaming file size is almost same. Before increasing the executor memory to 10G, spark jobs keeps on failing almost every 5 days (with default executor memory which is 1GB). With increase of executor memory to 10G, it is running continuously from more than 16 days.
I am worried about the memory issues. If "Size in Memory" values keep on increasing like this, then sooner or later I will run out of RAM and spark job will get fail again with 10G of executor memory as well. What I can do to avoid this? Do I need to do some configuration?
Just to give the context of my spark application, I have enable following properties in spark context:
SparkConf sparkConf = new SparkConf().setMaster(sparkMaster). .set("spark.streaming.receiver.writeAheadLog.enable", "true")
.set("spark.streaming.minRememberDuration", 1440);
And also, I have enable checkpointing like following:
sc.checkpoint(hadoop_directory)
I want to highlight one more thing is that I was having issue while enabling checkpointing. Regarding checkpointing issue, I have already posted a question on following link: Spark checkpoining error when joining static dataset with DStream
I was not able to set the checkpointing the way I wanted, so did it differently (highlighted above) and it is working fine now. I am not asking checkpointing question again, however I mentioned it so that it might help you to understand if current memory issue somehow related to previous one (checkpointing).
Environment detail: Spark 1.4.1 with two node cluster of centos 7 machines. Hadoop 2.7.1.