DHCP/PXE Notes

Here are some notes about how to deploy self configuring systems using PXE (Preboot Execution Environment) and a DHCP server. This allows a group of unconfigured machines, possibly new, to power up and become properly set up over the network with no other physical interaction. This is highly desirable for cluster environments where many machines need to be consistently (and possibly simultaneously) configured.

SystemRescueCD is an excellent Linux rescue and utility CD that need not be deployed on an actual CD. Here are some generally helpful notes for getting SYSRESCCD working over PXE. These notes use SystemRescueCD extensively.

DHCP Server

On Debian the package to get a DHCP server is isc-dhcp-server. Is it running?

systemctl status isc-dhcp-server

Probably not after an initial install. It will fail because the /etc/dhcp/dhcpd.conf file will be in its default unconfigured state. Fix that (see below for hints) and try starting again.

systemctl start isc-dhcp-server

Since PXE is running from firmware, the host has pretty limited information about anything. PXE works by first asking the network if there is any DHCP server out there that can help the requesting host. This means, of course, that you need a kindly DCHP server listening on that particular network ready to supply any unconfigured machines with the information they need to proceed.

The technical part about getting a DHCP server working properly is getting the configuration file correctly set up. Here’s an example of one I have used.

Excerpt from /etc/dhcp/dhcpd.conf

# Sometimes this irritating option must be present:
ddns-update-style none;

# Apply DNS settings globally.
option domain-name-servers 132.239.0.252;

# Define parameters common to all machines on a network.
subnet 192.168.1.0 netmask 255.255.255.0 {
    interface "eth1";
    option broadcast-address 192.168.1.255;
    # Default gateway route for nodes defined here.
    option routers 192.168.1.25;
}

# Define host specific settings starting with hostname.
host mynode19 {
   # The host's MAC address (the one used by PXE).
   hardware ethernet 00:11:43:b0:0b:50;
   # The IP address to assign the host with this MAC.
   fixed-address 192.168.1.19;
   # If active defines what to do when DHCP is from PXE.
   #next-server 192.168.1.222;
   #filename "pxelinux.0";
}

Warning

The "filename" and "next-server" options are used for PXE booting. Careful not to leave these active if you have a kickstart image that will automatically install an OS over the network at boot. The particular kickstart I was using, for example, ERASES ALL OF YOUR DISKS on that computer.

Ideally, before embarking on any kind of very fancy PXE configuring, it would be wise to test that your DHCP server is running and correctly listening and responding to requests on the correct network. Do this by taking an already installed and working machine (maybe a laptop) and configuring it to use DHCP to get its IP address. Set this up on the server side using the test machine’s MAC and see if you can get an address assigned correctly. If that test works, move on to the next step.

As mentioned, unless you’re happy with the DHCP server randomly assigning addresses, you’ll need to tell it which requesting machine gets which configuration. Having the addresses randomly assigned is ok if you have some way to later find them, but if you want a certain machine in a certain place to have a specific IP, you need to tell the DHCP server about each of those machines. This can be tedious to find these MAC addresses. You can do it with tcpdump.

TFTP Server

TFTP stands for Trivial File Transfer Protocol and is supposed to be a very simple protocol for doing what its name implies. This is why PXE uses it. As with getting any server to work, it is usually necessary to configure it. This is pretty easy with TFTP. Here is the TFTP configuration file for my system:

Sample /etc/conf.d/in.tftp

# Path to serve files from
INTFTPD_PATH="/tftpboot/"

# -4=IPv4 only, -s=secure (chroot)
INTFTPD_OPTS="-4 -s ${INTFTPD_PATH}"

As you can see, this is (by design) a very simple server.

To learn way more about this file and what one can do with it, type man 8 in.tftpd.

On my Gentoo system, once installed, the TFTP server is started like this:

$ sudo /etc/init.d/in.tftpd start

Note	Since PXE booting is often used to install new images onto computers, if you like your computers they way they are, it might be good to disable the TFTP server so no accidents happen.

To turn off the TFTP server you need to remember to do something like this:

$ sudo /etc/init.d/in.tftpd stop

PXELinux

PXELinux is a boot loader for booting Linux from a network server using a network ROM supporting the PXE specification. It is a SYSLINUX derivative and has many things in common. In the DHCP configuration above, the filename directive specified pxelinux.0; this is the actual boot loader code. I found it was best to get a recent version of this off of a recent SystemRescueCD. The boot loader can be configured with a customizable setup. Here is my configuration:

Example /tftpboot/pxelinux.cfg/default

SAY .
SAY .
SAY .==============================
SAY .== Xed's PXELINUX Boot Menu ==
SAY .==============================
SAY .
SAY . This system is configured in /tftpboot/pxelinux.cfg/default.
SAY . NORMAL BOOT - PRESS ENTER.
SAY .
SAY . Enter one of the following options within 15 seconds:
SAY . ->  lin: Normal installed Linux boot from hard drive [default].
SAY . ->  new: Install a new system. *COMPLETELY ERASES CURRENT SYSTEM*
SAY . ->  man: Manual install. *COMPLETELY ERASES CURRENT SYSTEM*
SAY . ->  mem: Memtest86 memory diagnostic.
SAY . ->  res: SystemRescueCD rescue disc for high quality problem solving.
SAY . ->  r32: SystemRescueCD rescue (32 bit)
SAY . ->  fre: FreeDOS
SAY . ->  mhd: mhdd (Maysoft Hard Drive Diagnostics)
SAY . ->  dba: dban (Darik's Boot And Nuke) *COMPLETELY ERASES ENTIRE DISKS*
SAY . ->  aid: aida (Mysterious Hardware Diagnostics)

# 0=Display "boot:" only if Shift or Alt is pressed, 1=always.
PROMPT 1

# If no interaction, pretend this PXE stuff didn't happen.
DEFAULT lin
ONTIMEOUT lin

# When installing a new image on the cluster, THIS WILL WIPE OUT EVERYTHING!
#DEFAULT new
#ONTIMEOUT new

# In 1/10 seconds. Here is 15 seconds with no key...
TIMEOUT 150
# ...or 5 min after unconsummated keyboard action:
TOTALTIMEOUT 3000

#If using a console server something like this could be handy:
#SERIAL 0x3f8 9600 0x303

LABEL lin
  LOCALBOOT 0
  #LOCALBOOT 0x80
  #LOCALBOOT 0x81

# Red Hat Style Anaconda Image - with Kick Start.
LABEL new
  LINUX vmlinuz
  APPEND initrd=initrd.img ramdisk_size=9216 noapic acpi=off ksdevice=eth0 ks=nfs:192.168.1.110:/files/admin/tftpboot/anaconda-ks.cfg

# Red Hat Style Anaconda Image - manual.
LABEL man
  LINUX vmlinuz
  APPEND initrd=initrd.img ramdisk_size=9216 noapic acpi=off

# + System Rescue CD 64 bit. Takes a while to get full .dat image, but
# ignore the progress bar which is not accurate. Be patient.
LABEL res
  LINUX util/rescue64
  INITRD util/initram.igz
  APPEND dodhcp netboot=tftp://192.168.1.200:69/util/sysrcd.dat

# + Works even on Dells.
LABEL r32
  KERNEL util/rescuecd
  INITRD util/initram.igz
  APPEND dodhcp netboot=tftp://192.168.1.200:69/util/sysrcd.dat

# + Nice hardware identification and diagnosis utility.
LABEL aid
  KERNEL memdisk
  APPEND initrd=/util/aida.img floppy

# + OK though exotic hard drive hardware may be beyond its ability.
LABEL mhd
  KERNEL memdisk
  APPEND initrd=/util/mhdd.img floppy

# + Comes with old school games which actually work on my servers!
LABEL fre
  KERNEL memdisk
  APPEND initrd=/util/freedos.img floppy

# + Compiled myself as memtest.bin which is a "linux kernel".
# If there is an extension (.bin), it will just output "8000" repeatedly.
# So simply rename it.
LABEL mem
  KERNEL util/memtest

# ? Boots, loads kernel fine, but seems to fail to detect HD on Dells.
LABEL dba
  LINUX util/dban.bzi

Note that all the fancy diagnostic and utilities mentioned here are really quite useful. These can all be acquired from a SystemRescueCD.

Since PXELINUX is related to SYSLINUX and other exotic situation boot loaders, most documentation relies on SYSLINUX. See this page for details on how to configure the PXELINUX default file.

Installing CentOS Over PXE

(And presumably Red Hat.)

Images

The first thing you must get organized is a proper kernel image with a corresponding init ram disk. Since PXE booting isn’t (at this time) the most normal way to do things, the general good strategy is to reach into the CD installation system and pull out an appropriate kernel set. I downloaded something like:

CentOS-6.3-x86_64-netinstall.iso

Then mount it like this:

sudo mount -t iso9660 -o ro,loop CentOS-6.3-x86_64-netinstall.iso /mnt/iso/

Now you can pick and choose what you need off of what was supposed to go on the CD (which you need not use at all). First the kernel which can be found in two different places on the CD image:

ee7b809b72945291749c6e38709605c4  /mnt/iso/isolinux/vmlinuz
ee7b809b72945291749c6e38709605c4 /mnt/iso/images/pxeboot/vmlinuz

Copy one of these into the TFTP served directory. Also with these files you’ll find initrd.img which is an initial ram disk image.

Note that CentOS has a pretty aggressive program of updating stuff and if they go from Centos X.Y to Centos X.Y+1, you will need a new kernel and ram disk file. The official way Red Hat recommends to organize this mess is with a complex directory structure. They say for each "Release" and "ARCH" (centos/x86_64/6.3) to copy vmlinuz and initrd.img from /images/pxeboot/ directory on "disc 1" of that $Release/$ARCH to /tftpboot/images/centos/$ARCH/$RELEASE. I personally find that more trouble than it’s worth, but in other circumstances, maybe not.

Now that the kernel, the actual first code that will run that will manage all other activities, and the initial ram disk, the preliminary file system environment containing software needed to get going, have been found, specified, and made available, the next step is to configure the installation process.

Kickstart

Notice that on the PXELINUX menu entry new which installs a new system, there are options passed to the kernel which look like this:

ksdevice=eth0 ks=nfs:192.168.1.110:/files/admin/tftpboot/anaconda-ks.cfg

The ks stands for kick start and this is a system which allows the installer to proceed without human interaction. The anaconda is the Anaconda installer.

The ks options tell the kernel to not wait around for any kind of interactive fun and games but rather get going with Anaconda using the setup defined in the specified file. The syntax and options for a kick start file can be found here.

It may be possible to create such a file from scratch but that might involve too much trial and error. A generally easier way is to manually install the OS using Anaconda which will, upon successful completion, write a /root/anaconda-ks.cfg file containing the steps you just performed.

You can use this file as a starting point for making your own automated version. Usually on the first attempt at an automated install, something doesn’t happen quite right. This is especially true when upgrading as previous versions may not have the exact same paths or conventions as current versions. If the install completes but doesn’t quite set everything up the way you want, look in the /root/ks-post.log for clues as to what the problem is. Follow along with your kick start file’s %post section entering the commands manually into the system to see if anything is inoperable.

Here’s a little way to keep an eye on progress when a machine is (supposed to be) installing an OS on its own:

while true; do if ping -c1 c16 >/dev/null; then echo -n 'UP ';\
else echo -n 'DOWN ';fi; date; sleep 10 ;done

Here is a full working kick start file including all the messy junk I do to set machines up.

anaconda-ks.cfg

# Kickstart file automatically generated by anaconda.
# Manually fixed by Chris X Edwards.

install
url --url=ftp://mirror.example.edu/centos/6.3/os/x86_64
lang en_US.UTF-8
keyboard us

network --device eth0 --bootproto dhcp
network --device eth1 --onboot no --bootproto dhcp

rootpw  --iscrypted $3$gV9NY7AXLONGXHASHXGOESXHEREXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXz3t8tf01b0

firewall --service=ssh
authconfig --enableshadow --passalgo=sha512
selinux --permissive
timezone --utc America/Los_Angeles
bootloader --location=mbr --driveorder=sda --append="nomodeset acpi=off "

clearpart --all --drives=sda

part /boot --fstype=ext4 --size=500 --ondisk=sda
part pv.008002 --grow --size=1 --ondisk=sda
volgroup vg_c-thelab --pesize=4096 pv.008002
logvol / --fstype=ext4 --name=lv_root --vgname=vg_c-thelab --grow --size=10024
logvol swap --name=lv_swap --vgname=vg_c-thelab --grow --size=1024 --maxsize=11056

repo --name="CentOS"  --baseurl=ftp://mirror.example.edu/centos/6.3/os/x86_64 --cost=100

%packages
@base
@client-mgmt-tools
@console-internet
@core
@debugging
@directory-client
@hardware-monitoring
@java-platform
@large-systems
@network-file-system-client
@performance
@perl-runtime
@server-platform
@server-policy
pax
oddjob
sgpio
certmonger
pam_krb5
krb5-workstation
perl-DBD-SQLite
%end

# Could be "shutdown" or "halt" (and "wait", the very annoying default).
reboot

# User's post install script goes here:
%post --log=/root/ks-post.log
printf "search example.edu\nnameserver 132.239.0.252\n" > /etc/resolv.conf

# Update system and never ending lib compatibility pain.
yum update -y
yum install -y compat-libstdc++-33 bzip2-devel libXmu gtkglext-libs libpng-static openssl-static mysql-devel bzip2-libs
yum install -y mesa-libGL libmng
yum install -y gcc-c++ gcc-gfortran compat-gcc-34-g77
yum install -y cvs ntp nmap screen mysql asciidoc lftp ImageMagick

# Get time server established.
/etc/init.d/ntpd start
chkconfig ntpd on

#Prepare For LDAP environment
# For LDAP to work:
yum install -y openldap-clients pam_ldap nss-pam-ldapd nscd
# SSSD is not helpful:
yum erase -y sssd


cat <<XXXXXXXXXX1 > /etc/openldap/certs/ldapscert.pem
-----BEGIN CERTIFICATE-----
ZVVRwQPPN3FtNjVONtVWNYFcf0KH5f3RZN0TPFdTFVo3QDROODHNZVTXZDfjPDLQ
XXXXXXXXXXAXLONGXCERTIFICATEXGOESXHEREXXXXXXXXXXXXXXXXXkZYDJWuM3
VmNuOtxduxvT9j0OPDRJSTSxoJyhDTSvoTSvYaIwp2DhMJE1ZO4KQGN5ZGVjBQN4
-----END CERTIFICATE-----
XXXXXXXXXX1

# xed- This sets important and mysterious things up but does so
# incorrectly. That's why this must be done first and then the wrong
# things must be explicitly fixed.
authconfig --enableldap --enableldapauth --ldapserver=ldaps://ldap.thelab.example.edu --ldapbasedn="dc=thelab,dc=example,dc=edu" --enablelocauthorize --enablesysnetauth --updateall

# xed- Took out `--enableldaptls`  since that causes double login prompts!

# xed- This fixes things:
cat <<XXXXXXXXXX > /etc/openldap/ldap.conf
base dc=thelab,dc=example,dc=edu
uri ldaps://ldap.thelab.example.edu/
ldap_version 3
rootbinddn cn=ldapadmin,dc=thelab,dc=example,dc=edu
timelimit 30
bind_timelimit 30
idle_timelimit 3600
nss_initgroups_ignoreusers
root,ldap,named,avahi,haldaemon,dbus,radvd,tomcat,radiusd,news,mailman,nscd,gdm
TLS_REQCERT never
TLS_CACERT /etc/openldap/certs/ldapscert.pem
pam_password md5
XXXXXXXXXX

cat <<XXXXXXXXXX2 > /etc/nslcd.conf
ldap_version 3
rootpwmoddn cn=ldapadmin,dc=thelab,dc=example,dc=edu
bind_timelimit 30
timelimit 30
idle_timelimit 30
tls_reqcert never
tls_cacertdir /etc/openldap/certs/
tls_cacertfile /etc/openldap/certs/ldapscert.pem
uid nslcd
gid ldap
uri ldaps://ldap.thelab.example.edu
base dc=thelab,dc=example,dc=edu
tls_cacertdir /etc/openldap/certs
XXXXXXXXXX2

# xed- Seems like authconfig sets some things up and they are wrong.
# So new configs that are right need to be put in place and the nslcd
# service needs to be reset, but without the authconfig command,
# something doesn't get set up.
service nslcd restart

# Add nfs server to fstab and mount.
mkdir /cfs
echo "thefileserver:/files/users /home nfs nfsvers=3,async,ro,auto 0 0" >> /etc/fstab
echo "theclustersfs:/cfs/users /csf nfs nfsvers=3,async,ro,auto 0 0" >> /etc/fstab
mount -a

# Get sudo unlocked.
gpasswd -a cooluser1 wheel
gpasswd -a cooluser2 wheel
gpasswd -a xed wheel
echo '%wheel    ALL=(ALL)       ALL' >> /etc/sudoers

# Get Sun Grid Engine ready
sed -i 's~^opalis-rdv.*536/tcp.*$~sge_qmaster     536/tcp  # Sun Grid Engine~' /etc/services
sed -i 's~^nmsp.*537/tcp.*$~sge_execd       537/tcp  # Sun Grid Engine~' /etc/services
SGE_ROOT="/gridware"
mkdir $SGE_ROOT
/usr/sbin/useradd -d $SGE_ROOT -c "Sun Grid Engine User" -u 222 -s /sbin/nologin sgeadmin
chown  sgeadmin:sgeadmin $SGE_ROOT
echo "192.168.1.25:$SGE_ROOT $SGE_ROOT nfs async 0 0" >> /etc/fstab

%end

Reboots And Timing

Note

The reboot directive here can cause some tricky problems. Because presumably the PXE default image to boot was the one that reformatted and reinstalled, when the reboot takes place, it will commence with that process again in an infinite loop. After getting things started, you must change the default while the install is taking place so that the automatic reboot does the normal thing and boots the newly prepared machine for use.

Here are some rough time points for reference:

00h00 Reboot with install option set as default.
00h02 Install starts.
(I think that the real install takes 5 minutes and the updates take 15!)
00h22 Install finishes and reboots with pass through as default.
00h25 System comes up.
00h26 SSH ready for use.