So, I have around 35 GB of zip files, each one contains 15 csv files, I have created a scala script that processes each one of the zip files and each one of the csv files per each zip file.
The problem is that after some amount of files the script lunches this error
ERROR Executor: Exception in task 0.0 in stage 114.0 (TID 3145) java.io.IOException: java.sql.BatchUpdateException: (Server=localhost/127.0.0.1[1528] Thread=pool-3-thread-63) XCL54.T : [0] insert of keys [7243901, 7243902,
And the string continues with all the keys (records) that were not inserted.
So what I have found is that apparently (I said apparently because of my lack of knowledge about scala and snappy and spark) the memory that is been used is full... my question... how do I increment the size of the memory used? or how do I empty the data that is in memory and save it in the disk?
Can I close the session started and that way free the memory? I have had to restart the server, remove the files processed and then I can continue with the importation but after some other files... again... same exception
My csv files are big... the biggest one is around 1 GB but this exception happens not just with the big files but when accumulating multiple files... until some size is reached... so where do I change that memory use size?
I have 12GB RAM...