what are the main differences between TORQUE, HTCondor and Apache Mesos

Question

I am looking for a program to perform distributed computing (no parallel computing needed though) which has:

a scheduler
a queue management (FIFO, or preferably something more advanced)
a good statistics report
ability to run on a heterogeneous cluster (a set of machines with different characteristics such as cpu and memory)
and very important: a good responsivness (a few seconds maximum between the trigger of the task and the actual start of the execution: I have heard that this may be tricky to achieve with HTCondor and TORQUE? What about Apache Mesos?)

@DmitriChubarov This would add another level of abstraction. Is it not slowing down the response time? — RockScience, Jan 15 '16 at 08:19
Apache Mesos provides [a list of frameworks](http://mesos.apache.org/documentation/latest/frameworks/) that might well suit your needs with an added advantage of sharing resources between multiple frameworks. Containers are not a requirement. — Dima Chubarov, Jan 15 '16 at 08:53
Can you expand what you mean by "a good statistics report"? I know that Torque can handle these other requirements easily. I'm not very familiar with HT Condor but I suspect it can as well, although Torque has a much larger user community. — dbeer, Jan 18 '16 at 18:25
Just in case if you find it useful, have you checked the Spring XD project? http://projects.spring.io/spring-xd/ — Sergey Shcherbakov, Feb 09 '16 at 19:20
Messos can handle all of the above and will scale with you if you need to scale up or have different workloads later on, if your use case is just one single workload, Torque, Slurm or Nomad might be the simple answer. — Walid, Mar 26 '16 at 17:22

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

There is a quite large wikipedia page with comparisons, but you will hardly find large differences. My guess would be that most things could theoretically be done in either framework. The things you list all depend on perspective (people e.g. commonly write their own sophisticated statistics from HTCondor logs). Regarding responsiveness: HTCondor works fine to schedule interactive notebooks if there are enough ressources for the workers to pick up the job. Few seconds is often no problem, but there are hardly guarantees. These are High Throughput Systems, but not low-latency systems. You should preallocate workers and scale them up and down if you care for latency (here supports for other frameworks on top helps much more than native latency).

I try my best to highlight the main foci of each Project from my perspective, that are important for a practical decision:

Target audience

Mesos:

PaaS/IaaS targeted to run other schedulers (you can run Torque on top of Mesos)
particularly interop with big data frameworks such as Spark & Kafka

vs.

Both HTCondor & Torque:

fair-share batch processing particularly in scientific clusters (High Throughput Computing)

Eco-system

Mesos:

Apache open source project with community

vs.

HTCondor:

Open Source maintained by UW-Madison with classical user mailing-list

vs.

TORQUE:

Proprietary, Commercial support

Ease of use

(partially this is statistics, but more the dashboard style)

Mesos & TORQUE:

Web UI
commonly integrations with other frameworks available (for TORQUE look for PBS)

HTCondor:

new, developing REST and python interaces but no common GUI
lagging behind a tiny bit in framework support (R batchtools, lately is has had dask support)

what are the main differences between TORQUE, HTCondor and Apache Mesos

1 Answers1

Target audience

Eco-system

Ease of use