PREREQUISITES

You need to know how to perform/setup/provide the following:

Cid

We build cid on the server and a node first so that we when we install the images later on, everything is ready to go. Being a fairly standard daemon, cid is much easier to get going. At this stage, there are no extra configure options, so the standard configure, make, make install, should work fine.

Node Install

  1. Extract the cid tarball and configure/make/make install. Do NOT change the install prefix from /usr/local/sbin, as this is probably still hardcoded in to the scripts in a number of places.
  2. Example /etc/cid/* files live in notp/etc. The running node image only needs curr_img and cid.conf. Be sure that you remember what you set the key option to in cid.conf, as every machine must agree on this variable. A debug value of 9 is suggested.

Server Install

  1. Same as (1) above.
  2. Same as (2) above, however, the necessary files are cid.conf, node_db and image_db.
  3. Image_db is the only slightly involved configuration file. Currently only NFS is supported as a means for transfering the files used to provision nodes. You must at this point have decided on a machine <server> that will export <dir> to all nodes. <dir> must contain a directory for each image, named exactly at the image is.

Testing

  1. On the server, boot cidd with the argument "-f".
  2. On a node boot cidkid with the argument "-f".
  3. You should get output similar to the following:

Server:
 3/9/07 16:15:12:  cidd.c:73 main()  ---  Cid Daemon up and running on
  port 38008
 3/9/07 16:15:35:  cidd.c:143 server_loop()  ---  server_loop (handler):
  Dropping connection to node1.acrl.clusters.umaine.edu

Node:
 3/9/07 16:11:59:  cid_kid.c:79 main()  ---  Cidkid up and running!
 3/9/07 16:11:59:  cid_kid.c:80 main()  ---  Host (node1), Image
  (<image_name>-<image_version>-<image_subversion>)

So long as this works correctly, cid is ready to go. Example rc scripts to boot both cidkid and cidd live in notp/etc.

Making Images

We provide two scripts that should work fine for both Linux and Darwin. You will want to at least check the excluded directory variables to see if there is anything that you need to add or remove. The scripts, mkimg.linux and mkimg.darwin live in notp/bin

NOTE: A careful reader will notice that mkimg darwin needs to be executed from a machine running linux. We have experimented a great deal with various methods of building images under darwin, and the all failed to successfully run after being extracted from Blancmange. This does mean that you will probably LOSE all resource forks.

The images must be named <image name>-<version>-<subversion>.gz. As mentioned earlier, each image must live in the correct place as suggested by image_db.

Building Blancmange

Blancmange ends up being the initramfs that is used to netboot a machine. The process of building this image has been abstracted away by a collection of scripts in blancmange/bin. The file blancmange/top_config contains all of the information that Blancmange needs to build the image. This file is highly commented and should be quite easy to modify to suit your needs.

Blancmange needs to be built on a machine that will be provisioned using notP. Below are the outputless commands that build Blancmange.


node1: # cd notp/blancmange
node1: # bin/get_packages
node1: # bin/build_packages
node1: # bin/build_image

At this stage, blancmange-initramfs.gz is ready to be moved to your tftp directory and sent out to the nodes. When the machines are running, they will boot dropbear to provide sshd service. The only user is root and the password is "t00r".

A netbooting machine should reboot twice. Once to get the initramfs, which is then installed to the first few partitions of the drive, and a second time to boot into this newly installed image. At this point, partition 1-4 are in use ( for grub machines: grub, blancmange, swap, extended partition; for yaboot machines: partition map, yaboot, blancmange, swap).

The progress of a node will be traced in cidd as well.

Scheduler Integration

Currently, notP has only been tested with the Moab scheduler from Cluster Resources, although it should work with Maui as well. To add support for provisioning to Moab, the simplest route is to add the following to your moab config file.


CLASSCFG[linux] JOBPROLOG='/usr/local/sbin/cidcapt -j $JOBID -n \
  $HOSTLIST -i <image_name>'
CLASSCFG[darwin] JOBPROLOG='/usr/local/sbin/cidcapt -j $JOBID -n \
  $HOSTLIST -i <darwin>'

When a job is submitted to the linux queue, cidcapt contacts the cidd server and requests that all of the nodes be put into the linux image. Cidd contacts the cidkid process on each node and begins the provisioning process. When all of the nodes are alive and running the correct image, cidcapt will exit, allowing Moab to continue with running the job.