Questions tagged [sungridengine]

Oracle Grid Engine, previously known as Sun Grid Engine (SGE), CODINE (Computing in Distributed Networked Environments) or GRD (Global Resource Director), is an open source batch-queuing system, developed and supported by Sun Microsystems. Sun once also sold a commercial product based on SGE, known as N1 Grid Engine (N1GE).

Grid Engine was previously developed and supported by Sun Microsystems. Sun once also sold a commercial product based on SGE, known as N1 Grid Engine (N1GE). With the purchase of Sun by Oracle it was forked and there are currently three actively maintained forks: Univa Grid Engine, Son of Grid Engine and Scalable Grid Engine/Open Grid Scheduler.

Until recently Oracle offered a version known as Oracle Grid Engine but support has been transferred to Univa along with the copyrights and it is expected that the Oracle version will be folded into Univa Grid Engine. It was previously known as Sun Grid Engine (SGE), CODINE (Computing in Distributed Networked Environments) or GRD (Global Resource Director), and is an open source batch-queuing system,

The Scalable Grid Engine and Son of Grid Engine versions are open source and free to use under the Sun Industry Standards Source License.

The Univa Grid Engine and Oracle Grid Engine forks are proprietary and apart from time limited demo versions only available with a support contract.

Scalable Logic offers an optional support contract for the Scalable Grid Engine version.

SGE is typically used on a computer farm or high-performance computing (HPC) cluster and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses.

SGE is the foundation of the Sun Grid utility computing system, made available over the Internet in the United States in 2006, later becoming available in many other countries.

332 questions
3
votes
0 answers

How much memory does java need to start?

I am having some issues running java in an environment with memory controls. My use case is sun grid engine (SGE), but I can reproduce with ulimit. When I try to run java with a limit on memory, (-Xmx), I find that I still need to allow a much…
Evan Benn
  • 1,571
  • 2
  • 14
  • 20
3
votes
1 answer

Using Conda enviroment in SnakeMake on SGE cluster problem

Related: SnakeMake rule with Python script, conda and cluster I have been trying to set up my SnakeMake pipelines to run on SGE clusters (qsub). Using simple commands or tools that are installed directly to computational nodes, there is no…
user44697
  • 313
  • 4
  • 11
3
votes
1 answer

Gridengine: error: commlib error: got select error (connection refused)

I just installed gridengine & getting error when doing qstat: error: commlib error: got select error (Connection refused) error: unable to send message to qmaster using port 6444 on host "MyHost-VirtualBox": got send error cat…
grunt
  • 662
  • 1
  • 8
  • 24
3
votes
1 answer

Slurm: How to restart failed worker job

If one is running an array job on a slurm cluster, how can one restart a failed worker job? In a Sun Grid Engine queue, one can add #$ -r y to the job file to indicate the job should be restarted if it fails--what is the Slurm equivalent of this…
duhaime
  • 25,611
  • 17
  • 169
  • 224
3
votes
1 answer

How can I measure the time it takes to complete a batch of jobs in Sun Grid Engine?

I'm using Sun Grid Engine to run a batch of jobs on Amazon Web Services EC2 nodes and I'd like to measure the wall time it takes to complete the whole batch. I'm fine with either from the time of submission to the time the queue is empty, or from…
3
votes
1 answer

How to use Python parallel modules like multiprocessing.Pool on Sun SGE grid

I have a piece of python code that runs on a single machine with multiprocessing.Pool for lots of independent jobs. I wonder if it's possible to make it even more parallel on a SGE grid, e.g., each node of the grid runs multiple threads for these…
galactica
  • 1,753
  • 2
  • 26
  • 36
3
votes
3 answers

qsub is executing my bash script in csh despite shebang

I want to submit a bash script to my university's Sungrid computing cluster to run an executable in a loop. When I log in to the server, I'm in bash: $ echo $SHELL /bin/bash And I include a bash shebang at the top of the script that I pass to…
ApproachingDarknessFish
  • 14,133
  • 7
  • 40
  • 79
3
votes
1 answer

Enabling Univa Grid Engine Resource Reservation without a time limit on jobs

My organization has a server cluster running Univa Grid Engine 8.4.1, with users submitting various kinds of jobs, some using a single CPU core, and some using OpenMPI to utilize multiple cores, all with varying and unpredictable run-times. We've…
Xirin
  • 33
  • 3
3
votes
3 answers

How to get an SGE job state

This may be a very simple question, but if I have the job ID, how would I get the state of the job submitted through SGE? I basically want to check on a job ID and see if it's in an error state, it's still running, or it's completed. I was thinking…
Greg B
  • 609
  • 6
  • 19
3
votes
1 answer

Running a job on multiple nodes of a GridEngine cluster

I have access to a 128-core cluster on which I would like to run a parallelised job. The cluster uses Sun GridEngine and my program is written to run using Parallel Python, numpy, scipy on Python 2.5.8. Running the job on a single node (4-cores)…
Chinmay Kanchi
  • 62,729
  • 22
  • 87
  • 114
3
votes
1 answer

SGE submitted job state doesn't change from "qw"

I'm using Sun Grid Engine on ubuntu 14.04 to queue my jobs to be run on a multicore CPU. I've installed and set up SGE on my system. I created a "hello_world" dir which contains two shell scripts namely "hello_world.sh" & "hello_world_qsub.sh",…
mhr
  • 144
  • 3
  • 12
3
votes
0 answers

SGE - can't get password entry for user "jenkins"

I'm running the following SGE command thru Jenkins without any problem: qsub -N my_job_name -q my_queue -l hostname=my_hostname -w e -notify -m n -cwd -b y -o /dev/null -e my_error_path -v my_env_var ... Your job 1082782 ("my_job_name") has been…
Bruno
  • 189
  • 1
  • 4
  • 15
3
votes
0 answers

Submitting celery jobs to an SGE queue

I'm working on a cluster that uses SGE to manage jobs across the worker nodes. Is there a way to use the SGE queue as the broker in a way that will cooperate with other people submitting jobs through non-celery means. I currently use python-gridmap…
JudoWill
  • 4,741
  • 2
  • 36
  • 48
3
votes
0 answers

Hooking subprocess logs into main log output in Docker

I'm running SGE (Sun Grid Engine) in a Docker container in order to replicate our live SGE cluster. If you haven't run across it, SGE is basically a program that runs other programs (while managing resources across a cluster - i.e. a grid…
Dunk
  • 1,336
  • 12
  • 16
3
votes
1 answer

Using Docker on Grid Engine / Sun Grid Engine / Son of Grid Engine

Does anyone have experience running Docker on Grid Engine / Sun Grid Engine / Son of Grid Engine and being able to monitor the resource used by the daemon? The issue is that when I qsub docker run ..., the actual process in the container is run by…
Alex Rothberg
  • 10,243
  • 13
  • 60
  • 120