0

I am experiencing some data deletion issue since we have migrated from CDH to HDP (spark 2.2 to 2.3). The tables are being read from an hdfs location and after a certain time running spark job that reads and processes those tables, it throws table not found exception and when we check that location all the records are vanished. In my spark(Java) code I see before that table is read, clearCache() is called. Can it delete those files? If yes, how do I fix it?

mthmulders
  • 9,483
  • 4
  • 37
  • 54
  • `clearCache` _Removes all cached tables **from the in-memory cache**._ https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/catalog/Catalog.html#clearCache-- – mazaneicha Aug 18 '20 at 22:09
  • Yes, I have checked this , but would that mean it removes files/table records from hdfs? I don't understand what it means by removing from in-memory cache. @mazaneicha – Pratyasha Sharma Aug 19 '20 at 00:24

1 Answers1

0

I think, you should look at the source code - Spark has their own implementation of caching user data and they never delete the same while managing this cache via CacheManager. Have alook

Som
  • 6,193
  • 1
  • 11
  • 22