20

I have a large number of bytes per second coming from a sensor device (e.g., video) that are being read and processed by a process in a Docker container.

I have a second Docker container that would like to read the processed byte stream (still a large number of bytes per second).

What is an efficient way to read this stream? Ideally I'd like to have the first container write to some sort of shared memory buffer that the second container can read from, but I don't think separate Docker containers can share memory. Perhaps there is some solution with a shared file pointer, with the file saved to an in-memory file system?

My goal is to maximize performance and minimize useless copies of data from one buffer to another as much as possible.

Edit: Would love to have solutions for both Linux and Windows. Similarly, I'm interested in finding solutions for doing this in C++ as well as python.

eraoul
  • 1,072
  • 12
  • 19

2 Answers2

8

Create a fifo with mkfifo /tmp/myfifo. Share it with both containers: --volume /tmp/myfifo:/tmp/myfifo:rw

You can directly use it:

  • From container 1: echo foo >>/tmp/myfifo

  • In Container 2: read var </tmp/myfifo

Drawback: Container 1 is blocked until Container 2 reads the data and empties the buffer.

Avoid the blocking: In both containers, run in bash exec 3<>/tmp/myfifo.

  • From container 1: echo foo >&3

  • In Container 2: read var <&3 (or e.g. cat <&3)

This solution uses exec file descriptor handling from bash. I don't know how, but certainly it is possible with other languages, too.

mviereck
  • 1,309
  • 1
  • 12
  • 15
  • This sounds promising, thanks! I'd like to figure out how to do this from python or C++, and also would love a Windows solution (updated my question with the additional details). This is a great place to start though. – eraoul Jul 12 '18 at 08:15
  • 1
    @eraoul At least with `MSYS2` on Windows I had issues combining `mkfifo` with `exec`: https://github.com/Alexpux/MSYS2-packages/issues/1333. Using only either `mkfifo` or `exec` works in `MSYS2`. – mviereck Jul 12 '18 at 11:43
  • @eraoul See also links about named pipes in Windows in this comment: https://github.com/mviereck/x11docker/issues/55#issuecomment-403859010 – mviereck Jul 12 '18 at 11:59
  • @mviereck what if there are multiple streams? I've to redefine all the volumes for ? I've a usecase for dynamic streams as well. Is there any other hack? – Sleeba Paul Feb 22 '19 at 10:57
  • 1
    @SleebaPaul For multiple (and dynamic) streams I'd recommend to share a folder instead of a single fifo. Create/delete fifos in that folder as they are needed, they will be available in all containers sharing this folder. – mviereck Feb 23 '19 at 09:49
  • @mviereck My use-case includes multiple listeners as well. I set up the above solution, and pipes interact well between only two processes. Can we scale the approach to multiple listeners? – Sleeba Paul Feb 25 '19 at 10:08
  • 1
    @SleebaPaul Unix pipes in general work with one listener only. Either store the data in files to allow access for other processes, or ask/search in general on how to share pipes with other processes. – mviereck Feb 25 '19 at 17:44
4

Using simple TCP socket would be my first choice. Only if measurements show that we absolutely need to squeeze the last bit of performance from the system that I would fall back to or pipes or shared memory.

Going by the problem statement, the process seems to be bound by the local CPU/mem resources and that the limiting factors are not external services. In that case having both producer and consumer on the same machine (as docker containers) might bound the CPU resource before anything else - BUT I will first measure before acting.

Most of the effort in developing a code is spent in maintaining it. So I favor mainstream practices. TCP stack has rock solid foundations and it is as optimized for performance as humanly possible. Also it is lot more (completely?) portable across platforms and frameworks. Docker containers on same host when communicating over TCP do not hit wire. If some day the processes do hit resource limit, you can scale horizontally by splitting the producer and consumer across physical hosts - manually or say using Kubernetes. TCP will work seamlessly in that case. If you never gonna need that level of throughput, then you also wont need system-level sophistication in inter process communication.

Go by TCP.

inquisitive
  • 3,549
  • 2
  • 21
  • 47