Göteborg 960416 INTRODUCTION TO THE QUEUEING SYSTEM 'QUE' PREREQUISITES ============= This document is intended to give an overview of the commands available to users of the queue system 'que'. It assumes that you are familiar with basic UNIX concepts such as commands and files. As usual when it comes to getting started with a new program/tool it is imperative that you have access to a computer so you can try the various examples below yourself as you read. GETTING STARTED =============== To access the 'que'system you issue the command 'que' as a regular UNIX command. This will, after some more or less informative messages, give you the 'que' prompt que> 'que' is now ready to accept your commands. To leave the 'que'system use the command que> quit To get help use que> help or anything else which is not valid command. To get started using the system you should first get an overview of how heavily loaded the system is at the moment. This is most easily done with the command que> show m Try this and compare the result with the output of que> show machines The latter gives you more information. Perhaps more than you normally want. To find out more about the running jobs, if any. Use the command que> show r or que> show running if you want more information. Analogously you can check how many, if any, jobs are waiting to be started with que> show w or que> show waiting Typical reasons for jobs to be put on hold is that the maximum number of simultaneous jobs has been reached on all available machines or that the available machines all have less memory and/or disk space free than the user has specified as necessary. (See below, advanced usage). You should now try to submit a job to 'que'. Jobs are normally submitted in the form of scripts. A script is a file with usual unix commands in it. Remember that from unix point of view your own programmes are also commands, just like 'ls' and 'pwd'. The reason why it is preferable to submit scripts instead of commands/programmes directly has to do with more advanced aspects of unix which we will not bother with here. To create a script exit the 'que' system or change to another window and put the below THREE lines in a file called 'myjob' with your favourite editor. #!/bin/csh /bin/ls >& myjob.out /bin/ls -l then give the command chmod +x myjob This will create an executable script called 'myjob' which runs the command /bin/ls and saves it's output in the file 'myjob.out'. Submit this job to the 'que' system with que> submit myjob Now quickly enter que> show w to watch your job in the queue for waiting jobs. Type que> show w until your job no longer shows, then type que> show r Notice that even if free machines are available it takes up to 1 minute to get a job started. Also note that that even though 'ls' executes almost instantaneously it takes half a minute or so for it to be removed from the list of running jobs. These delays are quite normal. After your job has been removed from the list of running jobs you should check your mail. There should be a mail which was sent when your job started and one sent upon its completion. As for the output from your job there should now be two new files in the directory from which you submitted your job. One is the usual 'que' outputfile called, in this case, 'myjob.output'. In this file you will find output which has not be dealt with otherwise. Normally you redirect the output from your different commands to specific files. In the example above the output from the command '/bin/ls' is sent to the file 'myjob.out'. The output from '/bin/ls -l' is, however, not taken care of and will end up in 'myjob.output' If 'que' responds something like submit: Can't find the file myjob submit: Exiting. Then you have probably not created the file 'myjob' in the same directory as you started 'que' in. To make this easy to check the usual UNIX command 'ls' and 'pwd' are available from within 'que' que> pwd and que> ls should work as normal. If you fail to create the script 'myjob' as described above or if you don't get a 'myjob.output' and a 'myjob.out' file with the result of the 'ls' commands you should carefully review the above instructions to see if you have followed them accurately. You might also try the command que> show log and look at the last line. Sometimes error messages, for simple but irrelevant reasons, end up in the 'que' system log instead of in the 'myjob.output' file. Note that it is very important that that the above introductory example works for you, otherwise 'que' will be probably be of little or no use to you. And DO remember the first rule of computing: "When all else fail, ask a human." ADVANCED USAGE ============== Killing running jobs -------------------- To kill a job prematurely use que> kill job# where job# is the number shown by 'show r'. Customizing file ---------------- If your job (program, calculation) requires substantial amounts of memory and/or local disk space you should tell 'que' how much resources you think will be required. These are not hard limits. They are only used to distribute large jobs in a sensible way between the available machines. The way to tell the 'que' system how large your job is is to create a file with the extension '.que'. To provide such a file for the last example in the previous section you would thus create a file called 'myjob.que'. In this you would put the lines MEMORY 12 DISK 60 to request 12Mb of memory and 60Mb of local disk space. Note that MEMORY and DISK are in upper case and that 12 and 60 are in Mb. The limits you provide are not hard limits but rather a bookkeeping device which does not free you of the responsibility not to overload the available computers. The '.que' file can also be used to specify a specific machine with a line of the form MACHINE favourite or a predefined group of machines with MACHINEGROUP mygroup To check which machinegroups your system administrator has defined use the 'que'command que> show g or que> show groups If there is a group called 'default' then it specifies which machines will be used we you don't have any MACHINE or MACHINEGROUP lines in your '.que'-file. To have mail notification sent somewhere else than to the yourself on the machine where 'que' is running use the NOTIFY option NOTIFY user@foo.bar To discard all mail from 'que' automatically use NOTIFY nobody One shot mode ------------- If you wish to incorporate the submission of a job into a regular UNIX script you can do so by issuing que submit myjob as a UNIX command. Try this out with que show m Checking if 'que' is up ----------------------- To check if the system processes which start jobs (run_que) and remove processes from the list of running jobs (wipe_que) are running use the command que> show p or que> show processes If some process is missing, contact your system administrator and inform him which process isn't running (run_que, wipe_que, que_accounting) though you might survive without the accounting.