Getting started

General information

Running all jobs at the same time, even though there may be enough physical memory to support it, is not a good way to use the SGI servers because of their distributed memory system. It also makes the average running time longer and moreover not additive, i.e. time required to execute two jobs on same CPU "simulteneously" is more than time required to execute same jobs one after another.

Information about the scheduling system

The Physics Computer undertook to introduce a scheduling system on the SGI system. The goal was to schedule both large-scale parallel jobs and serial jobs efficiently without overloading the machine. The scheduler should also allocate the available resources fairly between different users and groups. We chose between several alternatives: NQE (provided by SGI), LSF (another commercial queueing system), QUE (the queueing system that is used on the SUN stack) and some home-grown alternative. It became obvious that none of the existing products fulfilled our needs without extensive modifications, so Urban Engberg (who wrote QUE) and Lennart Bengtsson designed the scheduling system 'fair que'.

More information about 'fair que'

Manual pages:

qsub
The command for submitting jobs to the queue system.
qdel
The command for deleting jobs from the queue.
qshow
The command for displaying information about jobs and users in the queue system.

Disk usage

UNICC/HPCC traditionally does not provide local home directories for its users. This means that the scratch disk /stor on unicorn may only be used to store data that belongs to currently running jobs. When a job has finished the used disk space should be freed at once, so that the next job does not fail due to an overfull disk. The scheduling systems support disk space reservation, but there is no quota system running yet that can enforce disk usage limits. Even with disk quotas turned on, files do not disappear by themselves, and only the user can decide which files are safe to remove.

Lennart Bengtsson/Andy Polyakov
Last updated 1998-03-05