0

Cray recommends using loopback devices for running Spark on HPC cluster with Lustre file systems [1]. The problem is most HPC clusters do not provide access to loopback devices for their users. So I wonder if there is a library that opens only one huge file on Lustre ad let use treat that huge file as a file system, and then we can utilize the parallel file access to that one file.

This way we can have parallel IO while having proper partitions and one file per partition. Searching didn't show me anything.

[1] http://wiki.lustre.org/images/f/fb/LUG2016D2_Scaling-Apache-Spark-On-Lustre_Chaimov.pdf

M.Rez
  • 1,802
  • 2
  • 21
  • 30

1 Answers1

1

Whether this is possible depends heavily on your application. It would be possible to create eg. an ext4 filesystem image in a regular file using mke2fs as a regular user, and it would be possible to access this with libext2fs linked into your application (probably single-threaded) or via fuse2fs in userspace. It may be that fuse2fs still needs root permission to set up, but I'm not positive, but after that it would behave like a normal filesystem, and does not need a block device.

LustreOne
  • 339
  • 2
  • 9
  • This is a good answer thank you, and I looked into it. But in most cases I do not have control over those jobs. But I will consider it as an option that I was never considering. Thank you. – M.Rez Apr 08 '19 at 07:44