I have a spark job that needs to store the last time it ran to a text file. This has to work both on HDFS but also on local fs (for testing).
However it seems that this is not at all so straight forward as it seems.
I have been trying with deleting the dir and getting "can't delete" error messages. Trying to store a simple sting value into a dataframe to parquet and back again.
this is all so convoluted that it made me take a step back.
What's the best way to just store a string (timestamp of last execution in my case) to a file by overwriting it?
EDIT:
The nasty way I use it now is as follows:
sqlc.read.parquet(lastExecution).map(t => "" + t(0)).collect()(0)
and
sc.parallelize(List(lastExecution)).repartition(1).toDF().write.mode(SaveMode.Overwrite).save(tsDir)