Spark Streaming - StandAlone Mode (cleanup not deleting data in work folder for each app for each batch)

Question

In Streaming have set these parameters as below spark.worker.cleanup.enabled true spark.worker.cleanup.interval 60 spark.worker.cleanup.appDataTtl 90

This clears out already killed spark batch/streaming jobs data in work/app-2016*/(1,2,3,4,5,6,...) folders. But on running Spark Streaming job the history data in the current app-* is not deleted. Since we are using Kafka-Spark connector jar,for every micro batch it copies this jar with app jar and stderr,stdout results on each folders(work/app-2016*/(1,2,3,4,5,6,...) . This itself is eating up lot of memory as Kafka-Spark connector is an uber jar and is around 15 MB and in a day it coming to 100 GB .

Is there a way to delete data from current running Spark Streaming job or we should do some scripting for that...?

*Since we are using Kafka-Spark connector jar, for every micro batch it copies this jar with app jar and stderr,stdout results on each folders* That makes no sense. The JAR should only be copied once at the beginning of job submittion, not for every micro-batch. Perhaps you're seeing log files blow up? — Yuval Itzchakov, Mar 23 '16 at 19:43
Nope its adding Kafka-Spark jar in each directory of each microbatch and we are using PySpark. — Santosh B, Mar 24 '16 at 10:09

Spark Streaming - StandAlone Mode (cleanup not deleting data in work folder for each app for each batch)

0 Answers0