Questions tagged [condor]

Condor is a freely available workload management system designed to enable high-throughput computing processes across local and distributed computer networks. The program's name was changed to HTCondor in 2012.

HTCondor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, HTCondor provides:

  • a job queueing mechanism
  • scheduling policy and priority scheme
  • resource monitoring and management.

Users submit their serial or parallel jobs to HTCondor, HTCondor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.

146 questions
12
votes
2 answers

How to tell Condor to dispatch jobs only to machines on the cluster, that have "numpy" installed on them?

I just figured out how to send jobs to be processed on machines on the cluster by using Condor. Since we have a lot of machines and not each of those machines are configured the same, I was wondering: Is it possible to tell condor only to dispatch…
Aufwind
  • 25,310
  • 38
  • 109
  • 154
7
votes
6 answers

Python library for job scheduling, ssh

I'd like to find a user-space tool (preferably in Python - barring that, in anything I could easily modify if it doesn't already do what I need it to) to replace a short script I've been using that does the two things below: polls less than 100…
Thomas
  • 6,515
  • 1
  • 31
  • 47
6
votes
1 answer

Should I prefer hadoop vs condor when working with R?

I am looking for ways to send works for multiple computers on my University computer grid. Currently it is running Condor and also offers Hadoop. My question is thus, should I try and interface with R to Hadoop or to the Conder for my projects? For…
Tal Galili
  • 24,605
  • 44
  • 129
  • 187
6
votes
1 answer

Restrict scheduling of Condor jobs: one per physical machine

I need to launch a Condor job on a cluster with multiple slots per machine. I have an additional requirement that two jobs can not be placed at the same time in the same physical machine. This is due to some binary that I can not control which…
igon
  • 3,016
  • 1
  • 22
  • 37
5
votes
1 answer

what are the main differences between TORQUE, HTCondor and Apache Mesos

http://www.adaptivecomputing.com/products/open-source/torque/ https://research.cs.wisc.edu/htcondor/ I am looking for a program to perform distributed computing (no parallel computing needed though) which has: a scheduler a queue management…
RockScience
  • 17,932
  • 26
  • 89
  • 125
5
votes
7 answers

Condor, Sun Grid Engine, or something else?

I'm trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else). We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down…
Pengin
  • 4,692
  • 6
  • 36
  • 62
5
votes
3 answers

Tools for setting up and running a grid job on Google Compute Engine?

I have the need to set up and run "embarrassingly" parallel jobs on Google Compute Engine. I am looking for tools to facilitate this. On EC2, I was using MIT's Starcluster to set up the cluster, and then just submitting the job to SGE. Are there…
4
votes
1 answer

[HTCONDOR][kubernetes / k8s] : Unable to start minicondor image within k8s - condor_master not working

POST EDIT The issue is due to : PSP (Pod security policy) By default escalation is not permit for my condor user. That is why it is not working. because the supervisord is running as root user and try to write logs and start condor collector as root…
blackbird
  • 136
  • 12
4
votes
1 answer

How does one send an email after the submission job is done in condor?

I was trying to use the email option after running a condor job. I tried this: Executable = executable.sh Log = file.log Output = file.stdout Error = file.stderr # Use this to make sure 1 gpu is available. The key words are…
Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
4
votes
3 answers

how to run a python program on Condor?

I am new to Condor and am trying to run my Python program on Condor but have a difficulty of doing it. All tutorials I found assume a single file Python program but my Python program consists of multiple packages and files and also use other…
DSKim
  • 575
  • 1
  • 6
  • 16
4
votes
5 answers

Methods/Tools for solving a Mystery Segfault while running on condor

I'm writing a C application which is run across a compute cluster (using condor). I've tried many methods to reveal the offending code but to no avail. Clues: On Average when I run the code on 15 machines for 2 days, I get two or three segfaults…
Ethan Heilman
  • 16,347
  • 11
  • 61
  • 88
4
votes
2 answers

Sandboxing R for Condor (on Linux)

My university runs a condor computing grid (compute nodes are running Linux), and I'd like to use it for running simulations in R. The problem is that only some of the machines on the grid have R installed. So far I see two options, but I don't know…
Wesley
  • 1,324
  • 1
  • 11
  • 27
4
votes
3 answers

Condor output file updating

I'm running several simulations using Condor and have coded the program so that it outputs a progress status in the console. This is done at the end of a loop where it simply prints the current time (this can also be percentage or elapsed time). The…
Max Z.
  • 801
  • 1
  • 9
  • 25
4
votes
1 answer

Condor Timeout for idle jobs

I'm running jobs on a condor cluster, but some get hung in an idle state and never seem to start, let alone finish. Short of manually doing condor_wait -wait n logfile, then condor_rm, is there a more graceful (and automatic, built in) way of…
3
votes
3 answers

Condor central manager could not see the other computing nodes

I connect three servers to form an HPC cluster using condor as a middleware when I run the command condor_status from the central manager it does not shows the other nodes I can run jobs in the central manager and connect to the other nodes via SSH…
user1011891
  • 107
  • 1
  • 6
1
2 3
9 10