4

I am working through this code and have the mmap working now, but I am wondering if I can use mmap in parallel and if so, how to accomplish it. Suppose I have my data on a parallel file system (GPFS, RAID0, whatever) and I want to read it using n processes.

How could I, for example, have each processor read 1/nth contiguous block of the data into memory? Or, alternatively, read every nth memory block (1 B, 1 MB, 100 MB, 1 GB, whatever I choose for optimization) into memory?

I am assuming a posix file system here.

Community
  • 1
  • 1
drjrm3
  • 4,474
  • 10
  • 53
  • 91
  • 1
    Using `mmap()` leaves you at the mercy of the kernel's virtual memory manager. And since creating physical-to-virtual mappings needs to be thread-safe, it tends to get single-threaded under load. Look into `lio_listio()` to do multiple asynchronous IO operations. http://man7.org/linux/man-pages/man3/lio_listio.3.html And if you're streaming a lot of data (read once, don't seek), use direct IO http://www-01.ibm.com/support/knowledgecenter/SSFKCN_3.5.0/com.ibm.cluster.gpfs.v3r5.gpfs100.doc/bl1adm_direct.htm. – Andrew Henle Aug 14 '15 at 01:50
  • What if I want to implement this in a heavy weight process paradigm? Something like mpi over a distributed memory environment where each rank gets 1/n of the data and does something with it? Would the same problems arise? If not, how do I mmap the ith 1/n of data into memory? – drjrm3 Aug 14 '15 at 01:58
  • Multiple clustered physical servers doing the reads? Then each read would only have to be single-threaded. `mmap()` might work, but I've seen really fast file systems deliver data faster than virtual-to-physical mappings can be created. If your disks are that fast, `mmap()` wouldn't work well. And if you don't have any locality and wind up having to pass the file data around the cluster, which can be a lot slower than a fast file system. It all depends on your processing needs - to go really fast, you have to tune **everything** to work together and can't abstract away physical designs. – Andrew Henle Aug 14 '15 at 02:06
  • `mmap()` can map an arbitrary number of bytes from an arbitrary offset into a file. `void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);` `length` is the number of bytes to map, `offset` is the offset into the file to begin mapping from. http://linux.die.net/man/2/mmap – Andrew Henle Aug 14 '15 at 02:11

1 Answers1

0

Here is my mpi function for parallel reading. It chops up the file into n contiguous pieces based on pagesize and has each process read a separate piece via mmap. Some extra tricks need to be done at the end since process i will (likely) get the first half of a line as it's last line and process i+1 will get the second half of the same line as it's first line.

ikind nchars_orig; // how many characters were in the original file
int pagesize = getpagesize();
off_t offset;
struct stat file_stat;
int finp = open(inpfile, O_RDONLY);
int status = fstat(finp, &file_stat);
nchars_orig = file_stat.st_size;

// find out hwich pieces of the file each process should read
ikind nchars_per_proc[nprocs];
for(int ii = 0; ii < nprocs; ii++) {
    nchars_per_proc[ii] = 0;
}   
// start at the second to last proc, so the last proc will get hit first
// we will decrement him at the end, so this will distribute the work more evenly
int jproc = nprocs-2;
ikind nchars_tot = 0;
ikind nchardiff = 0;
for(ikind ic = 0; ic < nchars_orig; ic+= pagesize) {
    jproc += 1;
    nchars_tot += pagesize;
    if(jproc == nprocs) jproc = 0;
    if(nchars_tot > nchars_orig) nchardiff = nchars_tot - nchars_orig;
    nchars_per_proc[jproc] += pagesize;
}   
nchars = nchars_per_proc[iproc];
if( iproc == nprocs-1 ) nchars = nchars - nchardiff;
offset = 0;
for(int ii = 0; ii < nprocs; ii++) {
    if( ii < iproc ) offset += nchars_per_proc[ii];
} 
cs = (char*)mmap(0, nchars, PROT_READ, MAP_PRIVATE, finp, offset);
drjrm3
  • 4,474
  • 10
  • 53
  • 91
  • Please reread @Andrew Henie 's comment to the original question. mmap() is not faster than explicit disk-IO (it uses the same disk/memory bus/channel). You only trade "blocked on I/O" for "blocked by page faults". – wildplasser Aug 26 '15 at 15:39
  • I'm more concerned here with efficient io throughput. These files will be TBs in size on network attached storage and sending <1 KB of data at a time is less efficient than reading a large chunk (10+ GB) at a time ... I think ... In any case, this is exactly what I wanted to achieve (using `mmap` to read a file in parallel with `mpi`) so now I can test my theories. – drjrm3 Aug 26 '15 at 15:44