0

I am working on a cluster where I submit jobs through the qsub engine. I am granted a maximum of 72h of computational time at once. The output of my simulation is a folder which typically contains about 1000 files (about 10 Gb). I copy my output back after 71h30m of simulation. This means that everything that is produced after 71h30m (+ time to copy?) is lost. Is there a way to make the process more efficient, that is not having to manually estimate the time needed to copy output back?

Also before copying back my output I compress files with bzip2, what resources are used to do that? Should I ask a 1 node more than what I need to run the simulation only to compress files?

Manfredo
  • 1,760
  • 4
  • 25
  • 53
  • Can you try to measure compression ratio and compression time of your output (on any computer: to get some estimations are output file compressable or not; and to get estimation of compression time). bzip2 is slow and can't compress some types of files (binary files of floating point data). You should also try some parallel compressor (you can run several compression tasks for some file from your output) and modern compressor like xz, lrzip, zstd, lz4 (find compression level which is fast and still compresses)... Do your cluster manual has any info about FS and output files? – osgx Dec 15 '16 at 16:50
  • Well yes I could try to measure these times but my question is somehow more general. Also regarding parallel compressing sometimes I use `pbzip2` but that's not the point. Should I maybe launch a second job at time 71h30m from my job that only does the compression and copies back the data? – Manfredo Dec 15 '16 at 17:18
  • The answer is specific to your cluster site. Some may have shared FS, some not. Some may allow access to FS from login nodes. Some tasks may output 10GBs files as decimal not binary, they can be fast compressed for several times smaller files; other tasks will output 10 GB of uncompressable data. – osgx Dec 15 '16 at 23:21

0 Answers0