Docker containers instead of multiprocessing

Question

One of the main application of Docker containers is load-balancing. For example, in the case of a web application, instead of having only one instance handling all requests, we have many containers doing exactly the same thing, but the requests are split toward all of these instances.

But can it be used to do the same service, but with different "parameters"?

For instance, let's suppose I want to create a platform storing crypto-currency data from different exchange platforms (Bitfinex, Bittrex, etc.).

A lot of these platforms are handling web sockets. So in order to create one socket per platform, I would do something at the "code layer" like (language agnostic):

foreach (platform in platforms)
    client = createClient(platform)
    socket = client.createSocket()
    socket.GetData()

Now of course, this loop would be stuck on the first iteration, because the websocket is waiting (although I could use asynchrony, anyway). To circumvent that, I could use multiprocessing, something like:

foreach (platform in platforms)
    client = createClient(platform)
    socket = client.createSocket()
    process = new ProcessWhichGetData(socket)
    process.Launch()

Is there any way to do that at a "Docker layer", I mean to use Docker to make the different containers handling different platforms? I would have one Docker container for Bittrex, one Docker container for Bitfinex, etc.

I know this would imply that either the different containers would communicate between each other (who takes care of Bitfinex? who takes care of Bittrex?), or the container orchestrator (Docker Swarm / Kubernete) would handle itself this "repartition".

Is it something we could do, and, on top of that, is it something we want?

James · Answer 1 · 2018-09-29T13:36:20.863

Docker containerization simply adds various layers of isolation around regular user-land processes. It does not by itself introduces coordination among several processes, though it certainly can be exploited in building a multi-process system where each process perform some jobs, no matter if these jobs are redundant or complementary.

If you can design your solution so that one process is launched for each "platform" (for example, passing the specific platform an instance should handle as a command line parameter), then indeed, this can technically be done in Docker.

I should however point out that it is not clear why you would want to run each process in a distinct container. Is isolation pertinent for security reasons? For resource accounting? To have each process dispatched to a distinct host in order to have access to more processing power? Also, is there coordination required among these processes, outside of the having to initially determine which process handle which platform? If so, do they need to have access to a shared storage, or be able to send signals to each others? These questions will help you decide how to approach the dockerization of your solution.

In the most simple case, assuming that all you want is to have the whole process be isolated from the rest of the system, but with no requirement that these processes be isolated from each other, then the most simple strategy would simply to have a single container that contains an entrypoint shell script, which will simply launch one process per platform.

entrypoint.sh (inside your docker image):

#!/bin/bash

platforms=Bitfinex Bittrex
for platform in ${platforms} ; do
    ./myprogram "${platform}" &
done

If you really need a distinct container for each platform, then you would use a similar script, but this time, it would be run directly on the host machine (that is, outside of a container), and would encapsulate each process inside a docker container.

launch.sh (directly on the host):

#!/bin/bash

for platform in ${platforms} ; do
    docker  -name "program_${platform}" my_program_docker \
        /usr/local/bin/myprogram "$platform"
done

Alternatively, you could use docker-compose to define the list of docker containers to be launched, but I will not discuss more this option at present (just ask if this seems pertinent to your you case).

If you need containers to be distributed among several host machines, then that same loop could be used, but this time, processes would be launched using docker-machine. Alternatively, if using docker-compose, the processes could be distributed using Swarm.

Hi thanks for your answer, and sorry for the delay. I would say I'd prefer my processes to be isolated for 2 reasons: 1) Probably a "bad" reason, but I cannot figure out how to handle several websockets in the same time (in the same thread) and 2) "thread-safety" reasons : I would like to be able to relaunch automatically one of my "data getters" if anything goes wrong. I know Docker swarms & Co. are very good at this kind of purposes and I'd like to take advantage of that. But I don't manage to quantify how much it is a "bad use" of these technos. — Edouard Berthe, Oct 12 '18 at 19:01

score 1 · Answer 2 · answered Sep 29 '18 at 12:41

1

Say you restructured this as a long-running program that handled only one platform at a time, and controlled which platform it was via a command-line option or an environment variable. Instead of having your "launch all the platforms" loop in code, you might write a shell script like

#!/bin/sh
for platform in $(cat platforms.txt); do
  ./run_platform $platform &
done

This setup is easy to transplant into Docker.

You should not plan on processes launching Docker containers dynamically. This is hard to set up and has significant security implications (by which I mean "a bug in your container launcher could easily root your host").

If the individual processing tasks can all run totally independently (maybe they use a shared database to store data) then you're basically done. You could replace that shell script with something like a Docker Compose YAML file that lists out all of the containers; if you want to run this on multiple hosts you can use tools like Ansible, or Docker Swarm, or Kubernetes to spread the containers out (with varying levels of infrastructure complexity).

answered Sep 29 '18 at 12:41

David Maze

130,717
29
175
215

Thanks for your answer. I indeed believe that using a database is the good solution (it's what I am currently doing: I have a table containing names and URLs of all the exchange platforms). But how could I make that different containers connect to different platforms, given that all containers are running the exact same code? Adding a boolean column "IsRequesting" in my DB saying that one of the containers is currently taking care of this platform? What is the container suddenly stops (for any reason) without updating the DB? – Edouard Berthe Sep 29 '18 at 13:05
@Edouardb How much differences is there between the handling of each platform? Is this a case where it would be sufficient to pass a different config file to each process, or would you need to literally have distinct algorithms? – James Sep 29 '18 at 13:15
I have created an interface IExchangeClient, which is implemented for each platform (BitfinexClient, BittrexClient, etc.), and my function requesting data from each platform is only taking an IExchangeClient as argument. So basically, there aren't any difference between the handling of each platform. Each Docker should only have one single information to know which platform to target (it could be a string like "bittrex", "bitfinance", etc.). – Edouard Berthe Oct 14 '18 at 14:58

score 0 · Answer 3 · answered Sep 29 '18 at 15:18

0

You can bunch the different docker containers in a STACK and also configure networking so that the docker containers can remain isolated form the outside world but can communicate with each other.

More info here Docker Stack

answered Sep 29 '18 at 15:18

Soumen Mukherjee

2,953
3
22
34

Docker containers instead of multiprocessing

3 Answers3