Serverless concurrent write access in Python

Question

Are there any packages in Python that support concurrent writes on NFS using a serverless architecture?

I work in an environment where I have a supercomputer, and multiple jobs save their data in parallel. While I can save the result of these computations in separate files, and combine their results later, this requires me to write a reader that is aware of the specific way in which I split my computation across jobs, so that it knows how to stitch everything in a final data structure correctly.

Last time I checked SQLite did not support concurrency in NFS. Are there any alternatives to SQLite?

Note: By serverless I mean avoiding to explicitly start another server (on top of NFS) that handles the IO requests. I understand that NFS uses a client-server architecture, but this filesystem is already part of the supercomputer that I use. I do not need to maintain myself. What I am looking for is a package or file format that supports concurrent IO without requiring me to set-up any (additional) servers.

Example:

Here is an example of two jobs that I would run in parallel:

Job 1 populates my_dict from scratch with the following data, and saves it to file :

my_dict{'a'}{'foo'} = [0.2, 0.3, 0.4]
Job 2 also populates my_dict from scratch with the following data, and saves it to file:

my_dict{'a'}{'bar'} = [0.1, 0.2]

I want to later load file, and see the following in my_dict:

> my_dict{'a'}.items()
[('foo', [0.2, 0.3, 0.4]), ('bar', [2, 3, 5])]

Note that the stitching operation is automatic. In this particular case, I chose to split the keys in my_dict['a'] across the computations, but other splits are possible. The fundamental idea is that there are no clashes between jobs. It implicitly assumes that jobs add/aggregate data, so the fusion of dictionaries (dataframes if using Pandas) always results in aggregating the data, i.e. computing an "outer join" of the data.

Your question is very confusing because you talk about NFS but then use the term 'serverless'. Since NFS always has a server, it doesn't make sense. Can you rephrase it? — Gabe, Nov 25 '13 at 21:37
Thank you @Gabe - I have updated my OP to address your question. — Josh, Nov 25 '13 at 21:42
So the Python script you want to write will write to a mounted NFS volume? In that arrangement your python script is the client (no extra servers needed :) — Jason Sperske, Nov 25 '13 at 21:46
This is an *extremely* difficult problem. You're not likely to find anything already written. — Gabe, Nov 25 '13 at 23:46
@Josh I agree with @Gabe, your requirements are too hard. If you would relax your "serverless" requirement, I would recommend trying e.g. Redis key-value store - it is fast, available for Linux as well Windows and is very easy to use with `redis` package. But it is definitely a server. — Jan Vlcinsky, Apr 18 '14 at 01:36

score 1 · Answer 1 · answered May 28 '14 at 12:26

Simple DIY, potentially flaky

Hierarchical locking -- i.e. you lock / first, then lock /foo and unlock /, then lock /foo/bar and unlock /foo. Make changes to /foo/bar and unlock it.

This allows other processes access to other paths. Lock contention on / is relatively small.

Complicated DIY

Adapt a lock-free or wait-free algorithm, e.g. RCU. Pointers become symlinks or files containing lists of other paths.

http://www.rdrop.com/users/paulmck/rclock/intro/rclock_intro.html https://dank.qemfd.net/dankwiki/index.php/Lock-free_algorithms

Serverless concurrent write access in Python

Example:

1 Answers1

Simple DIY, potentially flaky

Complicated DIY