[Edited 2019-08-01: changed Debian release to ‘stretch’]
This blog post has been a long time coming. For many years I have used the Torque/PBS job scheduler on machines that I administer or am otherwise responsible for. My blog post on setting up Torque on Ubuntu topped this site’s most-visited pages by a wide margin. I’m also a creature of habit, avoiding changes when there is no real problem to solve. So I was rather disappointed when I saw that the upgrade to 18.04 LTS would take away the Torque packages. I knew what it would mean, and avoided facing the problem for as long as I could. At this stage we’re in the process of migrating some computational resources to 18.04 LTS, so the issue had to be faced.
For those of us using Ubuntu 18.04 LTS, there are fundamentally two options:
- Stick with Torque/PBS. This requires another mechanism for installing the necessary packages. I am told (check the comments on the blog post I mentioned earlier) that it is possible to add the xenial (16.04 LTS) repositories on an 18.04 LTS system and install the packages that way. Configuration should be the same as for 16.04 LTS (see the blog post). Please note that I have not verified this.
- Choose another scheduler. After some research, I decided to go with Son of Grid Engine, a fork of the Sun Grid Engine project (which eventually was bought by Oracle). Unfortunately, I found that the packages in the official Ubuntu repositories have a problem (c.f. here and here), which requires use of other repositories (e.g. debian) anyway. This makes the choice between the two options rather less straight-forward. In the end we chose to go with SGE anyway, mostly because it’s newer code, and more likely to remain supported in future Ubuntu versions.
In the rest of this blog post I’ll document what one needs to do to install the packages and start setting up the scheduler. This is a work in progress at my end, so I’ll update this blog post in due course, with complete instructions on setting up a simple queue (similar to what I had done for Torque).
Adding the Debian repositories
The objective here is to add the Debian repositories so that we can install the Grid Engine packages from that source (instead of the official Ubuntu ones). Specifically, we want the repositories for the ‘stretch’ release, which thankfully uses library versions compatible with those in Ubuntu 18.04. We also want to set things so that only the Grid Engine packages are taken from Debian. This avoids the possibility that various packages in your Ubuntu installation start ‘upgrading’ to their Debian equivalent. Thankfully, APT makes this easy with the right settings.
To start with, we add the Debian repositories with:
sudo cat > /etc/apt/sources.list.d/debian.list <<EOL deb http://ftp.debian.org/debian/ stretch main contrib non-free deb http://security.debian.org/debian-security/ stretch/updates main contrib non-free EOL
Next, we set the APT preferences so that a) the Grid Engine packages from Debian get priority over the official Ubuntu repositories, and b) everything else is never used. This can be done with:
sudo cat > /etc/apt/preferences.d/debian <<EOL Package: gridengine-* Pin: release o=Debian Pin-Priority: 1000 Package: * Pin: release o=Debian Pin-Priority: 10 EOL
Once these files are set up, update the package cache and install the Debian signing keys with:
sudo apt-get update sudo apt-get install debian-archive-keyring apt-key add /usr/share/keyrings/debian-archive-keyring.gpg
Installing the Grid Engine Packages
Assuming a single-node cluster, we can install the packages needed for the execution node, master node, and queue management using:
sudo apt-get install gridengine-exec gridengine-master gridengine-qmon
Now the queue management program (qmon) won’t work because the Debian package installs the pixmaps in a different folder from where qmon will look. This can be fixed with the following commands:
sudo mkdir /var/lib/gridengine/qmon sudo ln -s /usr/share/gridengine/pixmaps /var/lib/gridengine/qmon/PIXMAPS
At this point, the software should be in working order. Next step is to configure Grid Engine by adding the necessary queue. (To be continued.)