2

I'm looking for a solution to running a large amount of tasks and monitoring their status on a cluster.

In detail: Each task consists of 3-4 processes which are docker contained (each process is a docker run command). All of the processes have to run on the same server.

The amount of tasks we're talking about is bursts of several hundreds of tasks at a time.

I've looked into several solutions all of them based on Mesos:

  • Chronos - Seems like it would falter under high load and in any case is more directed towards recurring (cron) jobs. While I need one-time (heavy) job.
  • Custom Mesos FW - Seems to low-level for my needs would require me to write scheduling and retrying mechanisms, I'd save this for last resort.
  • Aurora - This seems promising as each task is run on the same node and comprised of several processes. I am missing a couple of this here though: Aurora seems to not be able to run several tasks as a part of a single job. Since my tasks are all similar with different input I could use a single job with many (say 400) instances and the first process of each task (whose role is to download the input from S3) could download a different set based on the instance ID. Which brings me to another problem: I can't find a working example of using {{ mesos.instance }} in .aurora files can anyone give me an example?

Thanks for all the fish people

Jonas
  • 121,568
  • 97
  • 310
  • 388
Lior Regev
  • 450
  • 1
  • 6
  • 9

2 Answers2

2

You could also have a look on Kubernetes (which also can be run as a framework in Mesos). Kubernetes has the concept of Pods which are basically a set of co-located containers. So in your case a pod would consist of your 3-4 processes/containers and then these pods can be scaled up/down.

Short comments regarding the other solutions you mentioned:

  • Chronos: Not really targeting your use case
  • Custom FW: Actually not so difficult, but good call to save this as last resort.
  • Aurora: Very powerful but also complex framework
  • Marathon (which you didn't mention): targeted for long running applications which can be easily scaled up and down.
js84
  • 3,676
  • 2
  • 19
  • 23
  • Firstly, Thank you As for Marathon, I understand it's kind of init.d for Mesos. I am not looking for a service but rather and app. So I didn't think it would fit. – Lior Regev Oct 14 '15 at 14:34
  • Agree, just keep in mind if it is something of which you always want a small number of instances running and then burst at certain events (i.e. increase the number of instances) it could be interesting as well. – js84 Oct 15 '15 at 00:05
1

In addition to the excellent other answer, you could check out Two Sigma's Cook which they have only recently open sourced but have been using in prod at scale for a while.

Michael Hausenblas
  • 13,162
  • 4
  • 52
  • 66
  • Good advice, I personally have not tested it so far :-). The co-location constraint seems to nicely map to kubernetes pods, didn't find whether cook has similar primitives. – js84 Oct 15 '15 at 00:06