I have a long run iteration in my program and I want to cache and checkpoint every few iterations (this technique is suggested to cut long lineage on the web) so I wont have StackOverflowError, by doing this
for (i <- 2 to 100) {
//cache and checkpoint ever 30 iterations
if (i % 30 == 0) {
graph.cache
graph.checkpoint
//I use numEdges in order to start the transformation I need
graph.numEdges
}
//graphs are stored to a list
//here I use the graph of previous iteration to this iteration
//and perform a transformation
}
and I have set the checkpoint directory like this
val sc = new SparkContext(conf)
sc.setCheckpointDir("checkpoints/")
However, when I finally run my program I get an Exception
Exception in thread "main" org.apache.spark.SparkException: Invalid checkpoint directory
I use 3 computers, each computer has Ubuntu 14.04, and I also use a pre-built version of spark 1.4.1 with hadoop 2.4 or later on each computer.