0

Does anyone know a good way to read/write files to my hdfs from within MPI? I've done a fair amount of digging trying to figure this out, and just need a general direction to pursue.

Rob Latham
  • 5,085
  • 3
  • 27
  • 44
Kyle.
  • 156
  • 8

2 Answers2

1

There is a full chapter of the MPI Standard about MPI I/O. I'd start by reading there.

MPI implementations have this implemented, usually using ROMIO. You can also take a look at that.

Wesley Bland
  • 8,816
  • 3
  • 44
  • 59
1

There are some oddities with HDFS that make it an interesting target for MPI-IO. Foremost, the restriction on modifications (writes) from more than one process.

It looks like the PLFS project (which takes MPI-IO style "all write to one file" workloads and changes them to "one file per process" workloads) has made HDFS one of its targets. This paper (with a whopping two citations) appears to be the reference? http://www.pdl.cmu.edu/PDL-FTP/HECStorage/CMU-PDL-12-115.pdf

So you'd have the MPI-IO interface, implemented by ROMIO. ROMIO has a device abstraction layer called ADIO, and PLFS can be one of those underlying devices (if you patch it). Then PLFS speaks HDFS and you finally perform I/O.

I have no idea how performant this stack is!

Rob Latham
  • 5,085
  • 3
  • 27
  • 44