6

We're using Spark Thrift Server as a long-running service for ad-hoc SQL queries, instead of Hive/Tez. This is working out fairly well, except that every few days it starts filling up the disk on worker nodes. The files are all in /hadoop/yarn/nm-local-dir/usercache/root/appcache/application_*/blockmgr-{GUID}, and do not seem to be cleared. I set yarn.nodemanager.localizer.cache.cleanup.interval-ms and yarn.nodemanager.localizer.cache.target-size-mb, but I think that only applies to jobs that are no longer running. We have no individual queries that run for very long, only the Thrift Server application itself stays up. Is there any way to automatically clean up these files from Spark (short of some script in cron)?

user271667
  • 181
  • 4

0 Answers0