Python threading:Is it okay to read/write multiple mutually exclusive parts of a file concurrently?

Question

I know we can guarantee correctness either by locking or using a specialized thread whose sole job is to read/write and communicate with it through queue.

But this approach seems logically ok, so I want to avoid implementing them, specially because both has performance penalty.

I think it would help if you give a short code sample of how you plan to accomplish it. Also run a test of it yourself and see what happens. — Matt S, Mar 04 '17 at 14:54
It gives right answer in tests. But isn't mulit-threaded programs famous for pretending to be correct when it actually isn't? — Shihab Shahriar Khan, Mar 04 '17 at 14:59

score 3 · Accepted Answer · edited May 23 '17 at 12:32

In general, no.

Concurrent reading and writing behavior is heavily dependent on both the underlying operating system and filesystem.

You may be able to get something working by reading and writing chunks that are both a multiple of the underlying block size and are block-aligned. But you are likely in the world of "undefined behavior".

See also, related question: How do filesystems handle concurrent read/write?

stovfl · Answer 2 · 2017-03-05T13:19:38.893

The OP wants multithreaded access to a file, not across multiple programs or even a network. Therefore I say YES you can do that.
For instance:

def job_handler(id, job_queue):
    fh = open('test')
    while True:
        time.sleep(0.1)
        try:
            job = job_queue.get_nowait()
            # Do the job
            #   fh.read(job.offset, job.size)
            #     Work with data
            #   fh.write(job.offset, job.size)

        except queue.Empty:
            fh.close()
            exit(0)

if __name__ == '__main__':
    job_queue = mp.Queue()
    for job in [(0, 100), (200, 100), (200, 100), (100, 100), (300, 100), (300, 100), (400, 100), (500, 100), (400, 100), (600, 100)]:
        job_queue.put( job )

    processes = []
    for p in range(1,4):
        processes.append( mp.Process(target = job_handler, args = (p, job_queue) ) )

    for p in processes:
        p.start()
        time.sleep(0.1)

    for p in processes:
        p.join()

In order to demonstrate what I mean with risk, i have duplicated jobs in the job_queue. Watch out the line [CLASH], without control there is a rw of process 3 within a rw of process 2.

Output:

Start Job handler 1
Start Job handler 2
1: read offset=0
    2: read offset=200
Start Job handler 3
        3: read offset=200
[CLASH] offset:200 read by process:{2}
1: write offset=0
1: read offset=100
        3: write offset=200
    2: write offset=200
  ...
exit(0) job_handler 3
exit(0) job_handler 2
exit(0) job_handler 1

Conclusion, if you don't have such duplicate parts you can do it without locking.
I would suggest to use a File Handle per process/thread.

This contradicts the previous answer, doesn't it? – Shihab Shahriar Khan Mar 04 '17 at 15:08 — Shihab Shahriar Khan, Mar 04 '17 at 15:08

Python threading:Is it okay to read/write multiple mutually exclusive parts of a file concurrently?

2 Answers2

Linked