MPI barrier not blocking file write, flush and os.fsync

Question

I have this test code which does the following:

Write a test message to a file > Barrier > Read the test message > Assert equal > Repeat.

from __future__ import print_function
import os
from mpi4py import MPI


comm = MPI.COMM_WORLD
rank = comm.Get_rank()
loop = True


def main():
    global loop
    txt_write = 'buhahaha'

    with open('test', 'w') as f1:
        if rank == 0:
            f1.write(txt_write)

        f1.flush()
        os.fsync(f1.fileno())

    comm.barrier()

    with open('test') as f2:
        txt_read = f2.read()

    try:
        assert txt_read == txt_write
    except:
        print("Assertion error", txt_read, "!=", txt_write, 'rank=', rank)
        loop = False
    finally:
        comm.barrier()
        if rank == 0:
            os.remove('test')


if __name__ == '__main__':
    i = 0
    while loop:
        main()
        if i % 1000 == 0 and rank == 0:
            print("Iterations:", i)

        i += 1

It works for a few 100 or 1000 iterations, but then at one point it reads an empty file and the assertion fails. Other answers had recommended use of flush and os.fsync, but that does not seem to help - it just makes the execution slower. Any idea how to fix this?

What file system do you use? Is this a single node or a cluster? — Zulan, Jul 12 '17 at 17:52
Doesn't opening a file as writeable usually truncate it to be empty? So, aren't your threads racing between most of them truncating it to be empty and one is truncating it then writing a string to it? — jschultz410, Jul 12 '17 at 18:42
@zulan ext4 filesytem on linux. I ran this code with 2 processes on a workstation. — jadelord, Jul 12 '17 at 18:44
@jschultz410 MPI barrier waits until the `write`, `flush`, `os.fsync` and finally the `__exit__` function calls, which closes the file. The issue is text remains in the I/O buffer waiting to be written. Most of the time this code works. When it does not, **all** threads read an empty file, not just `rank > 1` threads. — jadelord, Jul 12 '17 at 18:51
Depending on how much data you have there, I'd reconsider the architecture. If there is not that much data, I'd read it inside rank 0 and broadcast it. — Oo.oO, Jul 12 '17 at 18:51
@jadelord Maybe I'm missing something pretty fundamental here. My understanding is that you have N processes (or threads) that are executing main() inside a loop where they synchronize on the barrier between the write and read portions of each iteration on a shared disk (and between each iteration too). My comment was simply that opening a file like this `open(fname, 'w')` typically truncates the file to be empty (i.e. - writing to it) and there is no guarantee on the inter-ordering of writes between the competing processes all modifying the same file. Am I way off base here? — jschultz410, Jul 12 '17 at 18:56
OK that makes sense. So the `open` call by different processes simply creates multiple instances in the memory. So when `rank > 1` writes to it in the end of an iteration, it is an empty file :). Explains also why @mko's modification worked. — jadelord, Jul 12 '17 at 19:20
Oh good! I felt like I might be taking crazy pills there for a second. — jschultz410, Jul 12 '17 at 19:22

Oo.oO · Accepted Answer · 2017-07-12T20:01:57.520

3

Maybe you can try something like this, instead:

if rank == 0:
  with open('test', 'w') as f1:
    f1.write(txt_write)
    # as @jschultz410 correctly pointed out, 
    # we remove f1.flush() and f1.close()

comm.barrier()

with open('test') as f2:
  txt_read = f2.read()

edited Jul 12 '17 at 20:01

answered Jul 12 '17 at 18:37

Oo.oO

12,464
3
23
45

You had indention issue. Part of code was just for rank 0 and part was for every rank. And, you were opening the same file with all ranks for writing. – Oo.oO Jul 12 '17 at 19:04
I don't think it was indentation issue. It was a logical issue. It looks like you expected `open(fname, 'w')` to have no effect on the file. Also, why do you have the readers open the file for writing at all? Why not just have the rank 0 thread be the only one to open it for writing? While the non-rank 0 threads just skip straight to waiting on the barrier and then read? – jschultz410 Jul 12 '17 at 19:08
Which is exactly what @mko did in his code, now that I look more closely at it. (Originally, I thought he had all of them opening and writing the same string to it, which also would have worked, but been pointlessly redundant / inefficient). – jschultz410 Jul 12 '17 at 19:12
1

You can drop the explicit calls to `flush()` and `close()` since the `with` clause will invoke those implicitly too. Good job @mko noting that the explicit `fsync()` is not needed because you have ordered all the opens+reads to be after the single write+close and the OS should respect and enforce that. – jschultz410 Jul 12 '17 at 19:20
I agree with @jschultz410 there. To explain further why I did so there, I tried to emulate a [unittest case which was causing problems](https://bitbucket.org/fluiddyn/fluiddyn/src/e3d88e82e2eece3887a4847e25790e8dc6464f0c/fluiddyn/util/test/test_mpi.py?fileviewer=file-view-default#test_mpi.py-43). The test case is a for a function which by default prints by rank 0, and the stdout is redirected to a file to test this feature. I may need to think of a better test. – jadelord Jul 12 '17 at 19:32

jadelord · Answer 2 · 2017-07-31T11:57:52.103

The code resulted in a race condition where all processes were opening the same file simultaneously. Thanks to @jschultz410 and @mko for identifying this logical error.

My solution for the code was to use a memory stream instead of a real file. Now, the open, write and read parts of the code becomes:

from io import StringIO

f1 = StringIO()
if rank == 0:
    f1.write(txt_write)

f1.flush()
comm.barrier()

txt_read = f1.getvalue()

MPI barrier not blocking file write, flush and os.fsync

2 Answers2