A few weeks ago I wrote a blog post on installing a single-node Torque/PBS job scheduler on an Ubuntu 14.04 LTS system. The node served as scheduler, queue manager, compute node, and submission node. For my application, this node was installed in a machine room, and was primarily meant to act as a shared compute node; users would need to submit their jobs remotely from their workstations. So an additional part of the job involved the addition of a number of such nodes as job submission nodes. It proved to be rather simpler than I expected, and I’m finally writing about how I did that.
As in my earlier post, the following commands need to be issued as root, either on the machine to be added as a job submission node (I’ll call this the client from now on) or on the machine that’s already installed as scheduler, queue manager, compute node, and submission node (I’ll call this the server from now on).
First, of course, the client machine needs to have the necessary packages installed. This is easily done.
apt-get install torque-client torque-mom
We’ll be installing the client as both a job submission node and a compute node; we won’t necessarily want it to act as a compute node but this makes it easier if we do. So, we first need to stop the compute node process.
Next, we configure the client to point to the already configured server.
echo SERVER.DOMAIN > /etc/torque/server_name
This does two things: it lets the client know where any submitted jobs need to go for scheduling, and it also lets the compute node process know where to get work from. All we need to do after this is to start the compute node process again
Now, we simply need to let the server know that it should accept any jobs coming from this client. So, on the server, we tell the queue manager that our new client is a valid job submission node.
qmgr -c 'set server submit_hosts += CLIENT'
Note that, as when we added the server machine as a job submission node, this client address cannot be a FQDN (see the previous post for an explanation). Note that if the server cannot resolve the client name from its IP (i.e. if you don’t have reverse DNS lookup on the client’s domain) then you’ll need to add the client IP and name (qualified, if you want) to /etc/hosts on the server. This allows the server to do the necessary name lookup from the client IP.
That’s basically it. To test, just submit a job from the client machine, and it should work.