From Kerrighed

This page describes how to use NFSROOT to deploy the Kerrighed system on a cluster.

Contents

[edit] Presentation

The main idea consists in using a NFS server which exports the Kerrighed environment. Client nodes do not need to have any harddisk devices attached to take part in a cluster session. All kinds of data (e.g. modules, special files, configuration files, etc.) needed at runtime are being provided by NFS. A NFS serves a local '/' directory that can be mounted by the remote (NFS) clients.

In this setup, the NFS server must NOT be part of the cluster. Thus it does not need to run a Kerrighed kernel.


[edit] Pre-requisites

  • On your node: nodes must provide PXE mechanism in their NIC BIOS.
  • On the server:
    • DHCP server
    • TFTP server
    • NFS server

You may use a dedicated tool to create a basic Linux system. On Debian this tool is called debootstrap.

[edit] Configure the server

  • Configure TFTP server. If you installed tftp-hpa on a Debian system, you may edit /etc/default/tftp-hpa:
# Default fot tftpd-hpa
RUN_DAEMON = "yes"
OPTIONS = '-l -s /srv/tftp'

where /srv/tftp is the place you store kernel, ramdisk and GRUB configuration files for clients.

  • Get a Grub version with network enabled and with correct NIC driver. Pre-compiled versions of Grub rarely contains network support because this conflict other features. Hence, you have to enable this support at compile time. Some NIC support is only provided as patches on GRUB maillists.

Fortunately, we have some precompiled versions of GRUB with network enabled on our download web space. Each pxegrub is suffixed with the name of the driver compiled with.

  • Configure DHCP server.
### PART 1

# GRUB magic
option grub-menu code 150 = string;

# General options
option dhcp-max-message-size 2048;
use-host-decl-names on;
deny unknown-clients;
deny bootp;

### PART 2
option domain-name "mycluster.home";
option domain-name-servers 123.123.123.123, 124.124.124.124;
option ntp-servers ntp.network.net;

### PART 3
subnet 192.168.0.0 netmask 255.255.255.0 {
 option routers 192.168.0.1;
 option broadcast-address 192.168.0.255;
}

### PART 4 
group {
 filename "/pxegrub";
 option grub-menu = concat("(nd)/grub/", host-decl-name);
 option root-path "/NFSROOT/kerrighed";
 host ssi1 { fixed-address 192.168.0.101; hardware ethernet xx:xx:xx:xx:xx:xx; }
 host ssi2 { fixed-address 192.168.0.102; hardware ethernet xx:xx:xx:xx:xx:xx; }
 host ssi3 { fixed-address 192.168.0.103; hardware ethernet xx:xx:xx:xx:xx:xx; }
 host ssi4 { fixed-address 192.168.0.104; hardware ethernet xx:xx:xx:xx:xx:xx; }
}

In this example:

  • The first part is generic and does not need any change ;
  • In part 2, you need to adapt to your network configuration: domain name, dns server, ntp server, etc. ;
  • Part 3 prevent your DHCP server to answer to networks other than your local network ;
  • Part 4 define which files are given to clients when booting.
    • filename point to the PXE-enabled GRUB with correct NIC driver. Path is relative to the tftp daemon root (as set in /etc/default/tftp-hpa).
    • Lines beginning with host associates an IP address to a MAC address. IP numbers can be replaced by name if in /etc/hosts.
  • Configure the NFS server. Edit /etc/exports:
/NFSROOT/kerrighed *(ro,async,no_root_squash,no_subtree_check)
/NFSROOT/kerrighed/tmp *(rw,sync,no_root_squash,no_subtree_check)
/NFSROOT/kerrighed/var *(rw,sync,no_root_squash,no_subtree_check)
/NFSROOT/kerrighed/dev *(rw,sync,no_root_squash,no_subtree_check)
/NFSROOT/kerrighed/root *(rw,sync,no_root_squash,no_subtree_check)

Note: the base directory (/NFSROOT/kerrighed) is exported read-only.

  • Reload these services. On Debian:
/etc/init.d/dhcp3-server restart
/etc/init.d/tftp-hpa     restart
/etc/init.d/nfs-kernel-server restart

[edit] Create the base system

In this chapter, we describe the creation of a NFSROOT base system, without Kerrighed.

  • Create a dedicated directory where you are going to put the Kerrighed system. Let's say /NFSROOT.
mkdir /NFSROOT
  • Create a base Linux system inside this directory.
    • With Debian, you may use:
debootstrap sid /NFSROOT/kerrighed http://ftp.debian.org/debian
  • Change to your root directory:
chroot /NFSROOT/kerrighed
  • Change root passwd
passwd
  • Mount the proc filesystem
mount -t proc none /proc
  • Install all needed packages for the client: gcc, autotools, vi (or emacs !), ssh, libncurses, etc.
  • Install dhcp and nfs client parts.
    • With Debian:
apt-get install dhcp3-common nfs-common nfsbooted

nfsbooted can assign different hostnames to each node of your cluster by resolving IP address attributed to your node.

  • If you plan to build an ramdisk, install a ramdisk builder.
    • With Debian:
apt-get install initramfs-tools
  • Edit fstab to define partitions (we assume 192.168.0.1 is your NFS server):
# A swap partition
/dev/hda    none   swap   sw            0 0
none        /proc  proc   defaults      0 0
none        /sys sysfs    defaults      0 0
#
# NFSROOT
# Following partitions are mounted in rw mode, for the moment, I have 
# some troubles with lockd daemon that's why I added the 'nolock' params
192.168.0.1:/NFSROOT/kerrighed/dev  /dev  nfs rw,hard,nolock 0 0
192.168.0.1:/NFSROOT/kerrighed/var  /var  nfs rw,hard,nolock 0 0
192.168.0.1:/NFSROOT/kerrighed/tmp  /tmp  nfs rw,hard,nolock 0 0
192.168.0.1:/NFSROOT/kerrighed/root /root nfs rw,hard,nolock 0 0
#
# TMPFS
none                             /var/run tmpfs defaults     0 0
  • Symlink /etc/network/if-up.d/mountnfs to /etc/rcS.d/S35mountnfs in order to automatically mount your NFS shares, because as your interface is already up, system boot won't execute scripts in /etc/network/if-up.d
ln -sf ../network/if-up.d/mountnfs /etc/rcS.d/S35mountnfs
  • Edit /etc/hosts to add all cluster nodes:
127.0.0.1 localhost.localdomain localhost

192.168.0.1    server.mycluster.home server
192.168.0.101  ssi1.mycluster.home ssi1
192.168.0.102  ssi2.mycluster.home ssi2
192.168.0.103  ssi3.mycluster.home ssi3
192.168.0.104  ssi4.mycluster.home ssi4
  • Create a ramdisk image with network support.
    • initramfs-tools from Debian allows this configuration. Please have a look at /etc/initramfs-tools/initramfs.conf.

[edit] Configure nodes boot

Once your server is configured and your base system is ready, you can set up your nodes to boot with the base system.

  • Copy the kernel and ramdisk (if any) in /srv/tftp from your base system:
cp /NFSROOT/kerrighed/boot/vmlinuz-xxx /NFSROOT/kerrighed/boot/initrd-xxx \
   /srv/tftp
  • Create a Grub directory in your tftp space to put Grub menus:
mkdir /srv/tftp/grub
  • The line option grub-menu ... in your DHCP server configuration file tells that the Grub menu for a node is the file <tftp>/grub/<node name>. Edit a file /srv/tftp/grub/ssi1 and put in it:
timeout 1

title Debian on my server
root (nd)
kernel /vmlinuz-xxx root=/dev/nfs ip=dhcp nfsroot=192.168.0.1:/NFSROOT/kerrighed
initrd /initrd.img-xxx
  • Validate your environment without Kerrighed, booting your nodes with it. If it's ok, walk a step ahead and install Kerrighed on your system.


[edit] (Not so) Common problems and solutions

  • Problem: Nodes do not come up after booting. Instead they print messages complaining about "/dev/null not writable" and hang forever.
  • Solution: Mounting the "/" directory readable and writable(!) works here. For this, you have to change/etc/exports on the server to look like:
/NFSROOT/kerrighed *(rw,async,no_root_squash,no_subtree_check)

Note: the base directory (/NFSROOT/kerrighed) is the only export for Kerrighed and not read-only any more.

  • Problem: Nodes hang when during the startup process when the ethernet card is reconfigured by dhcp (remember that it was already configured from grub by the kernel parameter "ip=dhcp")
  • Solution: Comment out (#) the lines about eth0 in /etc/network/interfaces. Excerpt of that file:
# Turned off, as eth0 is already configured when using NFSROOT
#auto eth0 
#iface eth0 inet dhcp
  • Problem: Nodes have network problems as their ethernet cards get numbered eth0 at one node, eth1 at another, and eth2, eth3, and so on on the following ones.
  • Solution: [This can only happen, when /etc is mounted with write permissions!] Udev is configured (at least in Debian) to remember each device plugged into the computer by its ID. As there is now only one /etc directory, the first network card detected becomes eth0, the second eth1, and so on. In order to disable this mechanism, make a backup of the file /etc/udev/persistent-net-generator.rules in some other directory like /root and delete the original. Another solution (not tested) is described here: Systemimager.org->Troubleshooting(Debian4.0)
  • Problem: After some time clients get an "error -13 access denied" when trying to contact the nfs-server and the server logs "mountd: getfh failed: Operation not permitted"
  • Solution: There is obviously a problem either with the nfs server (nfs-kernel-server 1.0.10-6+etch.1 with kernel 2.4.27) or the nfs utils (same version); somehow all the fstab-lookalikes get out of sync. A fix for this seems to be:
> /etc/init.d/nfs-kernel-server stop
> rm -f /var/lib/nfs/etab /var/lib/nfs/rmtab \
       /var/lib/nfs/state /var/lib/xtab
> touch /var/lib/nfs/etab /var/lib/nfs/rmtab \
       /var/lib/nfs/state /var/lib/xtab
> chmod 644 /var/lib/nfs/etab /var/lib/nfs/rmtab \
           /var/lib/nfs/state /var/lib/xtab
> /etc/init.d/nfs-kernel-server start
  • Problem: Clients get an "error -13 access denied" when trying to contact nfs-server and the server logs "mountd: refused mount request from <host> for /path/to/NFSROOT (/): not exported
  • Solution: There is probably a problem with your /etc/exports file. You must check exports (5) man-page to be sure that you are using valid syntax. For instance, wildcards are not supported in IPs but only in hostnames.

Thus, a line like:

/NFSROOT/kerrighed 192.168.1.* (ro,async,no_root_squash,no_subtree_check)

is not valid and should be replaced by something like:

/NFSROOT/kerrighed 192.168.1.0/24 (ro,async,no_root_squash,no_subtree_check)

As said in RedHat Reference Manual about NFS, "Wildcards should not be used with IP addresses; however, it is possible for them to work accidentally if reverse DNS lookups fail."