Notes on my adventures with another beastly overblown job dispatching system.
Miscellaneous Ominous Quotes
"Because of the flexibility offered by the Sun Grid Engine software, administrators sometimes find themselves in need of some help getting started."
"Job scheduling with the Sun Grid Engine software is a very large topic. The Sun Grid Engine software provides a wide variety of scheduling policies with a great degree of flexibility. What is presented above only scratches the surface of what can be done with the Sun Grid Engine 6.2 software."
"The Sun Grid Engine software provides administrators with a tremendous amount of flexibility in configuring the cluster to meet their needs. Unfortunately, with so much flexibility, it sometimes can be challenging to know just what configuration options are best for any given set of needs."
Potentially Valuable Links
-
gridengine.org - a collection of resources and links to what’s going on with this insane software these days (uh circa 2011). A good source of documentation resources. Also seems to have the best catchall mailing list which seems pretty active.
-
Son of Grid Engine - Open source fork of SGE. Looks the best with a new version in late 2014.
-
Open Grid Scheduler - Open source parts of Oracle Grdi Engine. Latest activity late 2013.
-
Univa the company that’s trying to make this free software not so free. Here’s their open source core which hasn’t been touched since 2012 (2015 as I write).
-
ICM and scheduling. ER uses 6.1u6.
-
Sun’s last word on the topic. However it doesn’t actually help you do the specific things an admin needs to do to get started.
Installing
Look for a file like this:
sge62u5_linux24-x64_rpm.zip
Not like this (this is ancient Itanium crap):
sge62u5_linux24-ia64_rpm.zip
Which makes a directory like this:
sge6_2u5
Containing these rpms:
sun-sge-bin-linux24-x64-6.2-5.x86_64.rpm
sun-sge-common-6.2-5.noarch.rpm
Unfortunately, there are a lot of dependencies.
$ rpm -qpR sun-sge-bin-linux24-x64-6.2-5.x86_64.rpm
error: Failed dependencies:
libXm.so.3()(64bit) is needed by sun-sge-bin-linux24-x64-6.2-5.x86_64
sun-sge-common = 6.2 is needed by sun-sge-bin-linux24-x64-6.2-5.x86_64
The second one is easy. Install sun-sge-common first:
$ sudo rpm -ivh sun-sge-common-6.2-5.noarch.rpm
To fix the first dependency, libXm, you need to install this:
$ sudo yum install openmotif
Which installs these (among other things):
/usr/lib64/libXm.so.4
/usr/lib64/libXm.so.4.0.1
/usr/lib/libXm.so.4
/usr/lib/libXm.so.4.0.1
I went ahead and put links in for the dependency like this:
$ sudo ln -s /usr/lib64/libXm.so.4.0.1 /usr/lib64/libXm.so.3
$ sudo ln -s /usr/lib64/libXm.so.4.0.1 /lib64/libXm.so.3
Turns out that actually didn’t work. It still complained about missing libXm.so.3 even with it present. But since it really is there I just installed ignoring the problem:
$ sudo rpm -ivh --nodeps sun-sge-bin-linux24-x64-6.2-5.x86_64.rpm
And the binaries seem there and some of them even execute.
The binaries, libraries, and man pages are all neatly not cluttering up your system in the SGE_ROOT directory. That’s nice, but not super useful. You could have everyone change their paths, but here’s a script that links everything to a better place and unlinks things if you decide you don’t want to do that:
#!/bin/bash
# Chris X Edwards
# A script to make symlinks for all the important things in the
# SGE_ROOT directory to where it's easily usable on the system.
# This elaborate script was made so that it's easy to *uninstall*
# everything in the event of something changing.
SGE_ROOT=/gridware/sge
MAN_ROOT=/usr/local/share/man
BIN_ROOT=/usr/local/bin
LIB_ROOT=/usr/local/lib
# LINK THE MAN PAGES
MEN="1 3 5 8"
for M in $MEN
do
for S in ${SGE_ROOT}/man/man${M}/*
do
TDIR="${MAN_ROOT}/man${M}"
# TO INSTALL
ln -sf ${S} ${TDIR}/
# TO UNINSTALL
#echo "rm -f ${TDIR}/$(basename ${S})"
done
done
# LINK THE BINARIES
for S in ${SGE_ROOT}/bin/lx24-amd64/*
do
# TO INSTALL
ln -sf ${S} ${BIN_ROOT}/
# TO UNINSTALL
#echo "rm -f ${BIN_ROOT}/$(basename ${S})"
done
# LINK THE LIBRARIES (Maybe not necessary, but why not?)
for S in ${SGE_ROOT}/lib/lx24-amd64/*
do
# TO INSTALL
ln -sf ${S} ${LIB_ROOT}/
# TO UNINSTALL
#echo "rm -f ${LIB_ROOT}/$(basename ${S})"
done
Configuring
Fist thing is to add a non priviledged user for SGE to fall back to while running (starts as root):
$ sudo useradd -d $SGE_ROOT -c "Sun Grid Engine User" -u 222 -s /sbin/nologin sgeadmin
I used uid 222 which may or may not be a good idea. Next do this on all compute nodes. This entails making a /gridware/sge directory first. I had a ton of trouble running this over ssh and eventually had to make a little one line script on a shared directory and run it that way:
:-> [blue.xed.ch][~]$ cat sgeuseraddscript
#!/bin/bash
/usr/sbin/useradd --home-dir /gridware/sge --comment "Sun Grid Engine User" --uid 222 --shell /sbin/nologin sgeadmin
[root@blue ~]# for X in `seq -w 1 48`; do echo $X; ssh blue$X ~xed/sgeuseraddscript ; done
I have no idea what I was doing wrong, but when I just put the command itself as the last arguments to ssh, it gave a useradd error (even with explicit dir and variables, good quoting, etc. So I just kept a little script that was in a common place and kept modifying it to do each command. Now run this one to set up dirs:
echo "192.168.1.25:/gridware/sge /gridware/sge nfs async 0 0" >> /etc/fstab
mount /gridware/sge
Looks like you should edit /etc/services to put in the standard SGE ports:
[root@blue sge]# grep sge /etc/services
sge_qmaster 536/tcp # for Sun Grid Engine (SGE) qmaster daemon
sge_execd 537/tcp # for Sun Grid Engine (SGE) exec daemon
Interesting. Now it looks like this is there by default. Make sure there’s no conflict between old ports used and new.
# grep sge_ /etc/services
sge_qmaster 6444/tcp # Grid Engine Qmaster Service
sge_qmaster 6444/udp # Grid Engine Qmaster Service
sge_execd 6445/tcp # Grid Engine Execution Service
sge_execd 6445/udp # Grid Engine Execution Service
I just edited it by hand and then copied it to the nodes.
[root@blue sge]# for X in `seq -w 1 48`; do echo $X; scp /etc/services blue$X:/etc/ ; done
Here’s a better way probably:
sed -i 's~^opalis-rdv.*536/tcp.*$~sge_qmaster 536/tcp # Sun Grid Engine~' /etc/services
sed -i 's~^nmsp.*537/tcp.*$~sge_execd 537/tcp # Sun Grid Engine~' /etc/services
I made a copy of /gridware/sge/util/install_modules/inst_template.conf to my own dir and edited its contents. It looks a bit like:
SGE_ROOT="/gridware/sge/"
SGE_QMASTER_PORT="536"
SGE_EXECD_PORT="537"
SGE_ENABLE_SMF="false"
SGE_CLUSTER_NAME="blue"
SGE_JVM_LIB_PATH="Please enter absolute path of libjvm.so"
SGE_ADDITIONAL_JVM_ARGS="-Xmx256m"
CELL_NAME="default"
ADMIN_USER="sgeadmin"
QMASTER_SPOOL_DIR="/gridware/sge/default/spool/qmaster"
EXECD_SPOOL_DIR="/gridware/sge/default/spool/execd"
GID_RANGE="20000-21000"
SPOOLING_METHOD="classic"
DB_SPOOLING_SERVER="none"
DB_SPOOLING_DIR="/gridware/sge/default/spooldb"
PAR_EXECD_INST_COUNT="20"
ADMIN_HOST_LIST="blue25"
SUBMIT_HOST_LIST="blue25"
EXEC_HOST_LIST="blue01 blue02 blue03 blue04 blue05 blue06 blue07 blue08 blue09 blue10 blue11 blue12 blue13 blue14 blue15 blue16 blue17 blue18 blue19 blue20 blue21 blue22 blue23 blue24 blue25 blue26 blue27 blue28 blue29 blue30 blue31 blue32 blue33 blue34 blue35 blue36 blue37 blue38 blue39 blue40 blue41 blue42 blue43 blue44 blue45 blue46 blue47 blue48"
HOSTNAME_RESOLVING="true"
SHELL_NAME="ssh"
COPY_COMMAND="scp"
DEFAULT_DOMAIN="none"
ADMIN_MAIL="admin-ablab@ucsd.edu"
ADD_TO_RC="false"
SET_FILE_PERMS="true"
RESCHEDULE_JOBS="wait"
SCHEDD_CONF="1"
WINDOWS_SUPPORT="false"
Set the SGE_ROOT:
[root@blue ~]# export SGE_ROOT=/gridware/sge/
Fix the ownership:
# chown -R sgeadmin:sgeadmin /gridware
Now run the big install script:
./inst_sge -m -x -auto /home/xed/headnode_sge.conf
Hopefully that went well. If so, the good news should be reported in a new directory called:
/gridware/sge/default/common/install_logs
And you can check that the daemons are running with:
# ssh blue13 ps -ef | grep sge
sgeadmin 17623 1 0 21:27 ? 00:00:00 /gridware/sge//bin/lx24-amd64/sge_execd
Looks like it’s smart to do this:
# /gridware/sge/bin/lx24-amd64/qconf -mconf global
And edit this line (to include bash, of course):
login_shells bash,sh,ksh,csh,tcsh
Reconfiguring
What’s really weird about this is that this configuration file gets read when the install script runs. Other than that, I have no idea how to modify the configuration information of an functioning system. Let’s say you add some nodes or change the host name of the head node. How do you let the running system know this. So far, all I can figure out to do is to edit the config script and run the install again. It will complain that there is already a …./default directory. It seems like it’s safe to delete this directory if nothing is queued and run the install script to create another one. It also might be a good idea to kill -9 any sge processes on the node you’re reinstalling on. Hard to believe there isn’t a better way, but I haven’t found it yet and god forbid that it just use normal Linux/Unix conventions.
Adding New Nodes
I’m not sure this works, but I’m trying this technique that I found somewhere:
:-> [headnode.xed.ch][~]$ sudo SGE_ROOT=/gridware/sge qconf -ae
<this is like visudo and you have to replace:
s/template/c51/ where c51 is the new host>
added host c51 to exec host list
:-> [headnode.xed.ch][~]$ sudo SGE_ROOT=/gridware/sge qconf -ah c51
c51 added to administrative host list
:-> [headnode.xed.ch][~]$ sudo SGE_ROOT=/gridware/sge qconf -mq all.q
<Here you're supposed to add the hosts to:
"You will add the hostnames to the list in hostlist (I added it
like: "@allhosts, c50") and the
number of CPUs that each node has for SGE to the list in slots
(using the format [HOSTNAME=NUM_OF_CPUs]).
Does this work? Don’t know. Ok, here’s a better way: (Taken from http://ait.web.psi.ch/services/linux/hpc/merlin3/sge/admin/sge_hosts.html)
# Make a config dir if it doesn't exist:
:-> [headnode.xed.ch][~]$ sudo mkdir $SGE_ROOT/config
# Add to trusted host list
:-> [headnode.xed.ch][~]$ sudo -i qconf -ah c50
c59 added to administrative host list
# Add to list of hosts allowed to submit jobs:
:-> [headnode.xed.ch][~]$ sudo -i qconf -as c50
# Add execution host to SGE cluster:
c50 added to submit host list
:-< [headnode.xed.ch][~/testy]$ for H in `seq 50 59`; do H=c$H; echo $H; sudo cat <<XXX > $H.conf
> hostname $H
> load_scaling NONE
> complex_values slots=4
> user_lists NONE
> xuser_lists NONE
> projects NONE
> xprojects NONE
> usage_scaling NONE
> report_variables NONE
> XXX
> done
c50
c51
c52
c53
c54
c55
c56
c57
c58
c59
:-> [headnode.xed.ch][~/testy]$ sudo cp *conf /gridware/sge/config/
:-> [headnode.xed.ch][~]$ sudo -i qconf -Ae $SGE_ROOT/config/c50.conf
root@c25 added "c50" to exechost list
# A rough check
:-> [headnode.xed.ch][~]$ sudo -i qconf -sh | grep c50
# Make sure these directories are available:
:-> [headnode.xed.ch][~]$ for N in `seq 50 59`; do sudo -i mkdir $SGE_ROOT/default/spool/execd/c$N; done
:-> [headnode.xed.ch][~]$ for N in `seq 50 59`; do sudo -i chown sgeadmin:sgeadmin $SGE_ROOT/default/spool/execd/c$N; done
# Make sure that this command runs from the exec node:
:-> [blue50][~]$ qconf -sh | grep c50
c50
Tip
|
Any weird comm/hostname resolution stuff might be helped if the order is carefully set in the /etc/hosts file. Basically make the hostname that SGE knows about first in the list. Very annoying. |
# Prep the install configuraiton file:
headnode.xed.ch][/gridware/sge/util/install_modules]$ sudo cp inst_template.conf inst_ab-lab.conf
IMPORTANT PART THAT ACTUALLY WORKED!
I don’t even know what or how much of the above stuff was necessary. I think this is still needed:
:-> [headnode.xed.ch][~]# for C in 64 65 67 68 69; do echo $C; qconf -ah c$C; qconf -as c$C; done
And then finally I was able to get it working by manually answering questions:
[headnode.xed.ch][/gridware/sge/util]$ ssh root@c51
[root@blue51 sge]# cd $SGE_ROOT; yes '' | ./inst_sge -x
Stupid but it works.
Administering
RESTARTING AFTER A POWER CYCLE
God forbid that Sunhh^hOracle actually make a decent init script that can be used sensibly on a Linux cluster.
Master Node
Double check that /gridware is being exported.
I have no idea how to get the service running so that it survives a reboot. Here’s what I did when it stopped working after the last unexpected power cycle:
$ sudo SGE_ROOT=/gridware/sge $SGE_ROOT/bin/lx24-amd64/sge_qmaster
Compute Nodes
Double check that /gridware is mounted.
sudo /root/nodessh /etc/init.d/sgeexecd.c start
Or log in and start them individually. When you’re done, check with
qhost
and make sure there aren’t dashes in $4,$6, and $8. Dashes
means not participating.
If you want to disable a compute node so that no jobs are sent to it do something like this:
[c][~]$ sudo SGE_ROOT=/gridware/sge/ qconf -de c25
root@c25 removed "c25" from execution host list
In this case c25
was my master node and when jobs were submitted to
it, bad things happened.
Commands
The following commands are central to Sun Grid Engine administration:
qconf
- Add, delete, and modify the current Grid Engine configuration. For more information, see Using qconf.
qhost
- View current status of the available Grid Engine hosts, the
queues, and the jobs associated with the queues. For more information,
see the qhost(1) man page.
qalter
and qsub - Submit jobs. For more information, see the submit(1)
man page.
qstat
- Show the status of Grid Engine jobs and queues. For more
information, see the qstat(1) man page.
Note that qstat doesn’t necessarily show all jobs. There was a case
where some hung jobs were only visible if you specified the user:
qstat -u xed
To see all users:
qstat -u '*'
But not:
qstat -u'*'
Some other good things to know:
qstat -t -u '*'
This shows which execution nodes the job is running on.
qstat -s prsh -u $USER
Shows jobs that are pending, running, suspended, or holding. Can use any combination of these.
qstat -ext
Shows "extended" information. Shows the cpu time which could be interesting to find heavy cpu users.
How do you know if some nodes are not working? This is how:
qstat -f | grep au
Log in to them and do a sudo /etc/init.d/sgeexecd.c stop
followed by
a sudo /etc/init.d/sgeexecd.c start
. A restart option would make too
much sense!
Go figure.
Another thing is queue error states which basically shut the node down. This can be investigated with something like:
qstat -f -explain E
qstat -j $JOBID -explain E
qacct -j $JOBID
These error states can be cleared with:
sudo SGE_ROOT=/gridware/sge qmod -c '*'
qdel
- Note that for stuck jobs, this needs more firepower. Use the
force option to kill them:
qdel -f JOBIDNUMBER
Or in the event of a complete clusterf… try this:
sudo SGE_ROOT=/gridware/sge/ qdel -f -u "*"
This really should kill all jobs.
qquota
- List each resource quota that is being used at least once or
that defines a static limit. For more information, see the qquota(1)
man page.
Disable Nodes
If nodes are acting badly they can cause jobs to fail. While sorting out any problems it can be best to disable the nodes. This is done with:
sudo SGE_ROOT=/gridware/sge qmod -d all.q@c22
In this command the "all" (maybe "all.q") is the name of the queue.
To re-enable the node use the same command but with the option -e
.
Use qstat -f
to see which nodes have the disabled state.
Running
The SGE_ROOT variable seems important. In our case it should be:
# bash
export SGE_ROOT=/gridware/sge
# tcsh
setenv SGE_ROOT /gridware/sge
Basically I did this as a test:
$ for J in `seq 1 200`; do qsub testjob ; done
Where testjob is:
#!/bin/bash
#$ -S /bin/bash
#$ -j y -o /tmp/xed
FILE="/cfs/xed/queuetestdir/results"
T=$(( 5+(`od -An -N2 -i /dev/urandom` )%(20-5+1) ))
#T=1
sleep $T
D=`date`
H=`hostname`
I="Job input: $1"
S="Job ran for $T seconds."
#echo ${OUT} >> $FILE
printf "=========================\n%s\n%s\n%s\n%s\n" "$D" "$H" "$I" "$S" >> $FILE
Here’s a job that MN uses: cd /home/mn/tmp/DUD_decoys/T1_2A_E_Rot/cox2 /pro/icm/icm/icmng /pro/icm/icm/_dockScan \ from=1 to=100 -E confs=3 thorough=1 vlsDUD \ >&d3\_2001\_LOG& wait
MPI
This is another can of worms. Consider these nice man page passages:
pe_name
The name of the parallel environment as defined for
pe_name in sge_types(1). To be used in the qsub(1) -pe switch.
pe_name
A "pe_name" is the name of a Sun Grid Engine parallel
environment described in sge_pe(5).
What a nightmare. The best information about this seems to be this much more reasonable website.
Avoiding SGE
The Network Queueing System (NQS) seems to be an ancient GPL queueing system written for/with NASA. No idea how useful/useless it is today.
GNUBatch looks interesting.
So does this: http://gridscheduler.sourceforge.net/