-1

I am new to HPC and I am struggling in setting up scratch space. In the cluster I am working with, I need to set-up Scratch space using the SLURM workload manager. And I am struggling with the following questions?

  • How does the scratch space differ from the normal disk space in the home node?

  • Is the scratch space setting up procedure differ from cluster to cluster?

  • Is it possible to copy files from the scratch space to the home node while the simulation is still in progress? and is it possible to transfer files from scratch space to my external hard disk without copying the files to my local home node disk space? or these things differ from cluster to cluster? Because I tried a simulation with scratch. For that purpose, using SLURM, I initially copied my input files to the scratch folder, then the timestep files are directed to the scratch folder and once the simulation is complete, the timestep output files are copied to the home node disk space. While the simulation is in progress, I was trying to access the timestep output files in the scratch folder. But, I couldn't see the output files anywhere in the scratch space. But, once the simulation is over, I was able to see the files in the home node. I am really confused about this.

Sorry, if these questions sound silly. I am just completely new to HPC. Please feel free to ask any questions.

Thanks

Ram

mathquest
  • 31
  • 3
  • The step number one ought be to go and ask your HPC infrastructure Technical Support department. No professionally maintained HPC infrastructure lives inside a vacuum - Technical Support dept. is the best place to get advice right on spot, matching your HPC infrastructure specific conditions + well informed about all relevant Terms & Conditions applicable to your use-case. Typically the best HPC engineering people work there, having tons of hands-on experience, so do not hesitate to meet them and ask about all details and best-practices you need for your HPC workloads. G/L & Happy Computing ! – user3666197 Jan 07 '20 at 12:07

1 Answers1

0

When maintaining a large shared cluster an often occurring problem is that people tend to store lots of data, and do not take the effort of cleaning up after themselves. One way to solve this is to limit the amount of data people can store in their home folder (e.g. 500GB). This has a very clear problem that when you are dealing with larger amounts of data you can not use the cluster. Generally this is solved with a so-called scratch space. On the scratch space users generally can store large amounts of data (e.g. 8TB), however the maintainers of the server might have some rules setup here (for instance that files automatically get deleted after two weeks).

  • the scratch space is different in that files might be removed by the admins after some time. And sometimes the scratch space has better hardware making it slightly faster to do IO processes there.
  • The scratch space usually is already setup, and can be found for instance on /scratch
  • (Usually) The recommended way is writing all your output to the scratch space (also because IO can be faster here), and when everything is done copy the final results from scratch to your home folder. To copy from one place to another take a look at scp or rsync docs, but yes it should be possible. I don't know why you couldn't see your files..
Maarten-vd-Sande
  • 3,413
  • 10
  • 27
  • 1
    Thanks for your answer. I found why I wasn't able to find my files when the job is running. Because the scratch space I was using is node-local temporary scratch. In that type of scratch, I can see the file only after the job is over. Moreover, once the job is over, the files from that scratch space are moved to the job submission directory and then, the data in the scratch space are automatically deleted. Also, unlike the home folder or the network scratch space, the local scratch is only shared between processes running on the same node. – mathquest Jan 28 '20 at 00:46
  • That's very different from what I expected :). Good you figured out. – Maarten-vd-Sande Jan 28 '20 at 05:44