Linux, share a buffer with another program in fork()

Question

I have a Client/Server model where each client can send a Task to the Server - This is called Task Requesting.

This is a base for a simple distributed-computing library i am after.

"In other words, if some ordinary application processes an array of independent elements, then in a data-parallel model, each processor is assigned to process some part of array. To support data-parallel computing, the core library should divide tasks into parts, transfer task data to the local memory of a particular CPU, run the task on that CPU, transfer results back to the caller, and provide the ability to request some global data from the caller."

Data Paraller Diagram

A Task is a Binary(std::vector uint8_t) and a Payload(std::vector uint8_t).
A Binary is just an compiled task / application.
A Payload is optional data that is serialized into a uint8_t.

So simply:

class CGridTask
{

    public:
        ...

        bool        Run             ();    

    private:
        std::vector<uint8_t>        m_vBinary;
        std::vector<uint8_t>        m_vPayload;
        uint32_t                    m_uiUniqueId;

        ...
}

The pseudo-diagram shows how it 'works':

[CLIENT1]---[SEND TASK with PAYLOAD: integer value = 10]-->[SERVER]

[SERVER]-->[RUN TASK with PAYLOAD]

[TASK, start]

[TASK, calculate...]
[TASK, calculate...]
[TASK, calculate...]
[TASK, integer value = 10 + some new value]

[TASK, return]

[SERVER]-->[SEND TASK to CLIENT1]

Ok, so when server calls:

pGridTask->Run();

Here is what should happen:

bool CGridTask::Run()
{
    // Dump the binary to a temporary file
    Dump(m_vBinary);

    // Chmod +x
    system("chmod +x " + strTempopraryBinaryName);

    // Run the binary and pass m_vPayload
    ...how can i do this?...

    // Return true if binary executed
    return true;
}

The only problem here is to share the m_vPayload with the executed binary... How can i do this?

Thank you very much for any input into this!

I suggest reading http://advancedlinuxprogramming.com/ because you could design your application without sharing a *shared memory* buffer (and if you really want to share some memory for the buffer, you need also some other synchronization mechanism, so the shared memory is not enough....). — Basile Starynkevitch, May 04 '13 at 15:25

score 3 · Answer 1 · answered May 04 '13 at 15:19

3

Assuming you want to MODIFY the memory so that the "main" process can see it, you will need to set up a region of shared memory or memory mapped file. Any memory allocated as part of the process will be copied when the new process writes to it, so it will not be "shared".

answered May 04 '13 at 15:19

Mats Petersson

126,704
14
140
227

@PeeS be careful with placing objects that contain pointers (such as vectors) in there though: the addresses would only make sense in the memory space of the process that created them. You might want to have a look at [boost.interprocess](http://www.boost.org/doc/libs/1_53_0/doc/html/interprocess.html). – juanchopanza May 04 '13 at 15:24
1

Yes, mmap should work, but you need to create a named shared memory region with `shm_open()`, so that you have something to pass in as a file to `mmap`. – Mats Petersson May 04 '13 at 15:29
You still need some *synchronization* (between the various processes) so just sharing some buffer memory is not enough .... – Basile Starynkevitch May 04 '13 at 15:34
That is true whichever way you solve the sharing of data between multiple threads or processes. – Mats Petersson May 04 '13 at 15:41

score 1 · Answer 2 · answered May 04 '13 at 15:28

I wouldn't recommend it.. but this way you can "share everything"

Don't use fork() us clone() with following flags: CLONE_VM

CLONE_VM (since Linux 2.0)

If CLONE_VM is set, the calling process and the child process run in the same memory space. In particular, memory writes performed by the calling process or by the child process are also visible in the other process. Moreover, any memory mapping or unmapping performed with mmap(2) or munmap(2) by the child or calling process also affects the other process.

If CLONE_VM is not set, the child process runs in a separate copy of the memory space of the calling process at the time of clone(). Memory writes or file mappings/unmappings performed by one of the processes do not affect the other, as with fork(2).

More info at http://linux.die.net/man/2/clone

But I am sure you will get problems with this.. dynamic allocated memory will leak.. and co.

The real solution would be to setup a mmap for the payload..

Don't use `clone`; it is a very low level syscall essentially reserved for thread library implementors (those able to master `futex`). — Basile Starynkevitch, May 04 '13 at 15:35
@BasileStarynkevitch You can do pretty fun stuff with `clone`, but as I said I wouldn't recommend it.. having a single memory for multiply processes, just makes more problems than it solves. — KoKuToru, May 04 '13 at 15:37

score 1 · Accepted Answer · edited May 23 '17 at 12:12

1

As an alternative to the other solutions, and depending on how your child process is structured, you could also communicate with your child process through pipes (the usual pipe/fork/dup2/exec pattern).

Sure the performance is worse than with shared memory, but the whole architecture is more flexible, and your various programs would be much less coupled: from the child's point of view, it takes its data from stdin and outputs the results to stdout which makes it easily reusable in other contexts (and it also makes it very easy to reuse "ordinary" interactive programs in the context of your task server without having to adapt them first).

edited May 23 '17 at 12:12

Community

1
1

answered May 04 '13 at 15:33

syam

14,701
3
41
65

I definitely agree with that proposal. – Basile Starynkevitch May 04 '13 at 16:09
Thanks, i used that approach in this model. – PeeS May 06 '13 at 09:01

score 1 · Answer 4 · edited May 23 '17 at 12:12

In addition to the other answers, you could also consider using MPI (Message Passing Interface standard, which has several implementations, including Open MPI)

^{Of course MPI is not a shared memory model, but it seems close to your high-level software architecture with "sending data parts".}

MPI is so common that some high-end iron vendors (i.e. million € supercomputers) provide their own hardware assisted MPI implementation.

You could also use Posix shared memory, i.e. shm_open(3) and friends. See shm_overview(7). Then you probably want to synchronize with Posix semaphores. See first sem_overview(7).

And mmap(2) can also be used to share memory (with MAP_SHARED flag).

Sharing memory is not enough. You need some synchronization facility (to tell that the shared data is "ready for consumption" ....).

Maybe you could consider Pthreads. Read a good Pthread tutorial (also recent C++2011 standard provides threads).

And read Advanced Linux Programming to get an overview of the many IPC possibilities under Linux. As Siyam suggested, the usual pipe,fork,dup2, exec (and poll(2) for input and output multiplexing) are worthwhile to consider.

Linux, share a buffer with another program in fork()

4 Answers4