Tuesday, December 28, 2010

NFS Root

It's handy to be able to boot a system from an NFS root drive. I use it mostly for getting 'underneath' an operating system for things like installs or repairs, but you can also run indefinitely if that suits your environment.

If you look back to a post I did in September you will get the basics for setting up your PXE server. You will also need to create a custom kernel, or find one that allows for PXE booting with an NFS root partition. Generally I build a kernel with all of the required drivers installed including NFS root support.

The part that is generally a big problem for me is getting a root image setup to run from. I am a fan of building my own as I have more control in the operation and customization of a system but this can lead to problems. Local binaries included with the distributions I know all require libraries to which they are linked in order to run. So, I wrote a handy script to do all this work for me, from the current running system.

The following works for a RedHat based system. Unfortunately, distributions don't keep binaries in the same place, so if you want to use this you may have to change the file locations. There are four variables to do this job, SBIN_FILES, BIN_FILES, DEV_FILES, USER_BIN_FILES. Each correlates to a directory, for example SBIN_FILES = /sbin. Just find the files you want to include and put it in the corresponding variable. The script will take care of all the dependent libraries for you.

You can also specify some command line options if you like, -delete will clear out the destination directory before it does a build, and -bootdir will change the target build location.

The script uses udev so there isn't much need for /dev entries, otherwise have at it.

#!/bin/bash

BOOT_DIR=/tftpboot/boot-image
MOUNT_POINTS="proc sys mnt/src mnt/dest tmp"
LDD=/usr/bin/ldd

PRE_DELETE=0
SBIN_FILES="init ifconfig reboot poweroff mke2fs mkfs.ext3 mkfs.ext4 mkswap fdisk udevd udevadm mkinitrd ethtool fsck.ext3 fsck.ext4 hdparm shutdown"
BIN_FILES="bash mount cp umount dd rm ls cat more vim tar gzip ps"
DEV_FILES="console null"
USER_BIN_FILES="chroot tclsh vi"
LOG_FILE=/home/mike/nfs_build_log

######
### function setup_programs
### prepends required directories to overall file list to keep things looking clean
######
function setup_binaries {
        for file in $SBIN_FILES
        do
                FINAL_SBIN="$FINAL_SBIN /sbin/$file"
        done
        for file in $BIN_FILES
        do
                FINAL_BIN="$FINAL_BIN /bin/$file"
        done
        for file in $DEV_FILES
        do
                FINAL_DEV="$FINAL_DEV /dev/$file"
        done
        for file in $USER_BIN_FILES
        do
                FINAL_USER_BIN="$FINAL_USER_BIN /usr/bin/$file"
        done
        # delete everything in the boot directory if asked to
        if [ $PRE_DELETE -eq 1 ]
        then
                rm -rf $BOOT_DIR
        fi
}

function log {
        echo -e "`date +%H:%M:%S` -- "$1"" >> $LOG_FILE
}

######
### function create_etc
### populates the new /etc directory with a minimal set of startup scripts
### inittab, rc.sysinit, and boot.udev
######
function create_etc {
        mkdir -p $BOOT_DIR/etc

        # create inittab
        echo "id:3:initdefault:" > $BOOT_DIR/etc/inittab
        echo "si::sysinit:/etc/rc.sysinit" >> $BOOT_DIR/etc/inittab
        echo "id2:3:wait:/bin/bash" >> $BOOT_DIR/etc/inittab

        # create rc.sysinit
        echo "#!/bin/bash" > $BOOT_DIR/etc/rc.sysinit
        echo "echo \"--Running sysinit--\"" >> $BOOT_DIR/etc/rc.sysinit
        echo "echo -n \"Mounting proc filesystem: \"; mount -n -t proc /proc /proc; echo \"Done\"" >> $BOOT_DIR/etc/rc.sysinit
        echo "echo -n \"Mounting sys filesystem: \"; mount -n -t sysfs /sys /sys; echo \"Done\"" >> $BOOT_DIR/etc/rc.sysinit
        echo "echo -n \"Starting udev: \"; /etc/boot.udev; echo \"Done\"" >> $BOOT_DIR/etc/rc.sysinit
        echo "echo -n \"Configuring loopback network adapter: \"; /sbin/ifconfig lo 127.0.0.1; echo \"Done\"" >> $BOOT_DIR/etc/rc.sysinit
        echo "echo -n \"Clearing /etc/mtab\"; > /etc/mtab" >> $BOOT_DIR/etc/rc.sysinit
        chmod ug+x $BOOT_DIR/etc/rc.sysinit

        # create boot.udev assuming udevd and udevadm are in /sbin
        if [[ "$SBIN_FILES" == *udevd* ]] && [[ "$SBIN_FILES" == *udevadm* ]]
        then
                echo "#!/bin/bash" > $BOOT_DIR/etc/boot.udev
                echo "echo \"--Running udev--\"" >> $BOOT_DIR/etc/boot.udev
                echo "echo \"\" > /sys/kernel/uevent_helper" >> $BOOT_DIR/etc/boot.udev
                echo "echo -n \"Starting udevd \"" >> $BOOT_DIR/etc/boot.udev
                echo "rm -rf /dev/.udev" >> $BOOT_DIR/etc/boot.udev
                echo "/sbin/udevd --daemon" >> $BOOT_DIR/etc/boot.udev
                echo "/sbin/udevadm trigger --type=subsystems" >> $BOOT_DIR/etc/boot.udev
                echo "/sbin/udevadm trigger --type=devices" >> $BOOT_DIR/etc/boot.udev
                echo "/sbin/udevadm settle --timeout=180" >> $BOOT_DIR/etc/boot.udev
                chmod ug+x $BOOT_DIR/etc/boot.udev
        fi
}

######
### function get_libraries
### helper function for copy_files
### executes ldd against a binary to retrieve all dependent libraries
### executes copy_files with the "library_file" flag to prevent it from running get_libraries again
######
function get_libraries {
        local FILE_NAME="$1"
        log "Checking library for $FILE_NAME"
        LDD_OUTPUT=`ldd "${FILE_NAME}" 2> /dev/null`
        # e.g. libc.so.6 => /lib64/libc.so.6 (0x00007f7a03b64000)
        # look at each index in the output checking if it's a file, then pass it to copy_files e.g. /lib64/libc.so.6
        for index in $LDD_OUTPUT
        do
                if [ -f $index ]
                then
                        LIBRARY_FILES="$LIBRARY_FILES $index"
                fi
        done

}

######
### function copy_files
### copies a source file to BOOT_DIR with the proper destination directory
### e.g. /bin/ls becomes $BOOT_DIR/bin/ls
### if the file is a symbolic link, follow it through depending if it is a relative or absolute path
######
function copy_files {
        local FILE_NAME="$1"
        log "Copying $FILE_NAME"
        # find where this file should go based BOOT_DIR + it's original path
        # e.g. /tftpboot/boot-image + /sbin
        DEST_DIR="$BOOT_DIR/`dirname "$FILE_NAME"`"
        if [ ! -d "${DEST_DIR}" ]
        then
                mkdir -p "${DEST_DIR}"
        fi
        # the -a will preserve any symbolic links
        cp -a "${FILE_NAME}" "${DEST_DIR}"

        # if this is a link find what the real file is and copy that
        if [ -L "$FILE_NAME" ]
        then
                # ideally we could use canonicalize mode (-f) as that always returns an absolute path but that won't handle links pointing to links
                LINK_FILE=`readlink "$FILE_NAME"`
                # check to see if this is an absolute path
                # e.g. /lib64/libpthread.so.0 -> libpthread-2.10.1.so
                LINK_DIR=`dirname "$LINK_FILE"`         #e.g. ./
                if [ ! ${LINK_DIR:0:1} == "/" ]
                then
                        # find the path of the original file
                        FILE_DIR=`dirname "$FILE_NAME"` #e.g. /lib64
                        # change to the original path + relative path and find where we are with pwd
                        cd "$FILE_DIR"/"$LINK_DIR"              #e.g. cd /lib64/.
                        LINK_PATH=`pwd`                         #e.g. /lib64
                        # recreate the file with the proper path of the link
                        LINK_FILE="$LINK_PATH"/`basename "$LINK_FILE"`  # e.g. /lib64/libpthread-2.10.1.so
                fi
                copy_files "${LINK_FILE}" $2
                # need to copy the file, if it returns no path it is in the same dir, otherwise it will be absolute path
        fi
        if [ ! "$2" == "library_file" ]
        then
                get_libraries "$FILE_NAME"
        fi
}

######
### function create_mount_points
######
function create_mount_points {
        for mount in $MOUNT_POINTS
        do
                mkdir -p $BOOT_DIR/$mount
        done
}

######
### function usage
### outputs usage information for this script and then exits
######
function usage {
        echo ""
        echo -e "Usage:\n`basename $0` -delete -bootdir "
        echo "-delete"
        echo "  remove old data before creating new content, default is to keep"
        echo "-bootdir"
        echo "  directory to copy binaries to, default is /tftpboot/boot-image"
        echo ""
        exit 0
}

if [ $# -eq 0 ]
then
        usage
fi

until [ -z "$1" ]
do
        case "$1" in
        -delete)
                PRE_DELETE=1
                ;;
        -bootdir)
                shift
                BOOT_DIR="$1"
                ;;
        *)
                usage
                ;;
        esac
        shift
done


setup_binaries

for file in $FINAL_SBIN $FINAL_BIN $FINAL_DEV $FINAL_USER_BIN
do
        copy_files $file
done

if [ -n "$LIBRARY_FILES" ]
then
        LIBRARY_FILES=`echo $LIBRARY_FILES | tr " " "\n" | sort -u`
        for file in $LIBRARY_FILES
        do
                copy_files $file "library_file"
        done
fi

create_etc

Tuesday, October 19, 2010

KickStart Network Customization

One of the biggest problems I have found with the KickStart process is fine tuning the network values of a server. I couldn't find anything useful through the standard process so I decided to write my own. If you have seen my last post I reference some customized configuration scripts in the %post section of my KickStart file. In this post I will outline my first customization which I simply call general.cfg.

Basically this sets up the NTP daemon by rewriting the /etc/ntp.conf file and turns off services I don't need. But first it calls a special script called network_config.sh.
# cat general.cfg
# Clean up the network either based on existing DHCP or on configuration file

# network_config.sh requires an argument to tell it where the csv file is and where to output logs
/post_scripts/KickStart/net_config/network_config.sh /post_scripts/KickStart/net_config

# setup NTP
echo 'restrict default kod nomodify notrap nopeer noquery' > /etc/ntp.conf
echo 'restrict -6 default kod nomodify notrap nopeer noquery' >> /etc/ntp.conf
echo 'restrict 127.0.0.1' >> /etc/ntp.conf
echo 'restrict -6 ::1' >> /etc/ntp.conf
echo 'server 192.168.0.1' >> /etc/ntp.conf
echo 'server 127.127.1.0' >> /etc/ntp.conf
echo 'fudge 127.127.1.0 stratum 10' >> /etc/ntp.conf
echo 'driftfile /var/lib/ntp/drift' >> /etc/ntp.conf
chmod 644 /etc/ntp.conf

echo '192.168.0.1' >> /etc/ntp/ntpservers
echo '192.168.0.1' >> /etc/ntp/step-tickers

# modify the /etc/sysconfig/ntp file to add the -x startup option 
# required for Oracle 11gR2
echo 'OPTIONS="-u ntp:ntp -x -p /var/run/ntpd.pid"' > /etc/sysconfig/ntpd
echo 'SYNC_HWCLOCK=no' >> /etc/sysconfig/ntpd
echo 'NTPDATE_OPTIONS=""' >> /etc/sysconfig/ntpd

/usr/sbin/ntpdate 192.168.0.1

chkconfig ntpd on

# remove unnecessary services
chkconfig sendmail off

# printer
chkconfig cups off
chkconfig hplip off

network_config.sh is a bit long but I will post it in it's entirety here. Its primary job is to take input from a file called hostfile.csv and configure DNS, host name, and configure all of the interfaces. Network interfaces can be specified by adapter name (e.g. eth0, eth1, etc) or by MAC address just in case the enumeration isn't quite what you expect. It can also configure bonded interfaces which I am particularly happy with as this can be of significant annoyance getting production servers ready. Host names are defined as part of the dhcp options which I showed in this post. If no match is found, for example if there is no matching entry in hostfile.csv, the script will try to grab whatever IP has been assigned for the install and hard code that to the server. Logs are kept at a location specified on the command line which also happens to be the location of hostfile.csv.

Here is an example of a hostfile.csv entry for your reference when looking through the script.
# cat hostfile.csv
DOMAINSEARCH=example.com example2.com example3.com
# Format -- server_name,[bond|nic] eth# [eth#] IP MASK Primary,gw={gateway},dns={dns1 dns2 etc}
# as long as the server name comes first, the order of the rest doesn't really matter
# Primary is used to determine which interface should be placed in the host file
# an example with multiple bonded interfaces.
server1,bond=bond0 eth0 eth3 192.168.0.10 255.255.255.0 1,gw=192.168.0.1,bond=bond1 eth1 eth2 10.1.1.1 255.255.255.0,dns=192.168.0.254 192.168.1.254
# an example with one bond using MAC addresses
server2,bond=bond0 0050569c25e5 0050569c6cbd 192.168.0.11 255.255.255.0 1,gw=192.168.0.1,dns=192.168.0.254 192.168.1.254
# an example with a single nic
server3,nic=eth0 192.168.0.12 255.255.255.0 1,gw=192.168.0.1,dns=192.168.0.254

And here is the network_config.sh script itself
# cat network_config.sh
#!/bin/bash
DEBUG=off
IFCONFIG=/sbin/ifconfig

NIC_FILE_DIR=/etc/sysconfig/network-scripts/

GW_FILE=/etc/sysconfig/network

HOST_FILE=/etc/hosts

DNS_FILE=/etc/resolv.conf

DOMAIN_LIST="domain.com domain2.com"

####
## function readHostFile
## reads $HOST_MAP_FILE for specific network information about this host
## return 1 on error, 0 on success
## options can be in any order (nic, gw, or bond), broadcast and network address are calculated based on ip and mask
## calls functions to generate Gateway ($GW_FILE), Hosts ($HOST_FILE), and ifcfg ($NIC_FILE_DIR/ifcfg-{nic})
##
## host_map_file format
## {host},nic={eth#} ip mask [?primary],gw={gw_ip},bond={bond#} {nic1} {nic2} {ip} {mask},domain={dns_server},{dns_server}
## e.g. server1,nic=eth0 192.168.1.1 255.255.255.0,gw=192.168.0.1,nic=bond0 eth1 eth2 192.168.0.10 255.255.255.0 1,dns=192.168.0.254 192.168.1.254
####
readHostFile() {
        if [ -e $HOST_MAP_FILE ]
        then
                # override default DOMAIN_LIST if it exists
                DOMAIN_TMP=$(cat $HOST_MAP_FILE | grep -wi "DOMAINSEARCH" | cut -f2 -d =)
                if [ ! -z "$DOMAIN_TMP" ]
                then
                        log info "Domain search list found -- $DOMAIN_TMP"
                        DOMAIN_LIST="$DOMAIN_TMP"
                else
                        log info "Domain search not found, using defaults -- $DOMAIN_LIST"
                fi

                # parse the file for this host exactly (-w) and case insensitive
                HOST_INFO=$(cat $HOST_MAP_FILE | grep -wi `hostname`)
                # check to see there was an entry for this host
                if [ -z "$HOST_INFO" ]
                then
                        log warning "Host information for `hostname` was not found in HOST_MAP_FILE"
                        return 1
                fi
                log notify "Host information found for `hostname` in $HOST_MAP_FILE"
                log notify "Host info is $HOST_INFO"
                # parse HOST_INFO
                IFS=$','
                for entry in $HOST_INFO
                do
                        log debug "Working on entry $entry"
                        KEY=`echo $entry | cut -f1 -d =`
                        VALUE=`echo $entry | cut -f2 -d =`
                        case "$KEY" in
                        nic)
                                log debug "nic is specified -- $VALUE"
                                NIC=`echo $VALUE | cut -f1 -d " "`
                                if [ ${#NIC} -eq 12 ]
                                then
                                        # we are working with a MAC address
                                        NIC=$(getNIC $NIC)
                                fi
                                IPADDR=`echo $VALUE | cut -f2 -d " "`
                                MASK=`echo $VALUE | cut -f3 -d " "`
                                PRIMARY=`echo $VALUE | cut -f4 -d " "`
                                BROADCAST=$(getBroadcastAddress $IPADDR $MASK)
                                NETWORK=$(getNetworkAddress $IPADDR $MASK)
                                # MAC address for this card
                                MAC=$(getMAC $NIC)

                                if [ -z $NIC ]
                                then
                                        log error "Missing NIC information aborting file creation"
                                else
                                        log info "Values for NIC $NIC - MAC $MAC - IP $IPADDR - NetMask $MASK - Broadcast $BROADCAST - Network $NETWORK"
                                        genIPFile $NIC $MAC $IPADDR $MASK $BROADCAST $NETWORK
                                fi

                                if [ "$PRIMARY" == 1 ]
                                then
                                        genHostFile $IPADDR
                                fi
                                ;;
                        bond)
                        #nic=bond0 eth1 eth2 192.168.0.10 255.255.255.0 1
                                log debug "bond is specified -- $VALUE"
                                BOND=`echo $VALUE | cut -f1 -d " "`
                                NIC1=`echo $VALUE | cut -f2 -d " "`
                                if [ ${#NIC1} -gt 12 ]
                                then
                                        # we are working with a MAC address
                                        NIC1=$(getNIC $NIC1)
                                fi
                                NIC2=`echo $VALUE | cut -f3 -d " "`
                                if [ ${#NIC2} -gt 12 ]
                                then
                                        # we are working with a MAC address
                                        NIC2=$(getNIC $NIC2)
                                fi
                                IPADDR=`echo $VALUE | cut -f4 -d " "`
                                MASK=`echo $VALUE | cut -f5 -d " "`
                                BROADCAST=$(getBroadcastAddress $IPADDR $MASK)
                                NETWORK=$(getNetworkAddress $IPADDR $MASK)

                                log info "Values for BOND $BOND - NIC1 $NIC1 - NIC2 $NIC2 - IP $IPADDR - NetMask $MASK - Broadcast $BROADCAST - Network $NETWORK"
                                genBondFile $BOND $NIC1 $NIC2 $IPADDR $MASK $BROADCAST $NETWORK

                                if [ "$PRIMARY" == 1 ]
                                then
                                        genHostFile $IPADDR
                                fi
                                ;;
                        gw)
                                log debug "Gateway value - $VALUE"
                                genGWFile $VALUE
                                ;;
                        dns)
                                log debug "DNS is specified -- $VALUE"
                                genDNSFile "$VALUE"
                        esac
                done
        else
                log warning "Hostfile $HOST_MAP_FILE does not exist"
                return 1
                # configure eth0 as static based on the current DHCP address
        fi
}

####
## function getNIC {mac_addr}
## returns eth# based on MAC address
####
getNIC() {
        local RAW_MAC=$1
        # a properly formatted MAC address is 00:10:20:30:40:50 (17 characters)
        if [ ${#RAW_MAC} -ne 17 ]
        then
                # assume the user didn't put in : marks
                COUNT=0
                # in case this is IPv6 loop for the entire raw mac length
                while [ $COUNT -lt ${#RAW_MAC} ]
                do
                        if [ $COUNT -eq 0 ]
                        then
                                SEARCH_MAC=${RAW_MAC:$COUNT:2}
                        else
                                SEARCH_MAC="$SEARCH_MAC:${RAW_MAC:$COUNT:2}"
                        fi
                        COUNT=$(($COUNT + 2))
                done
        else
                SEARCH_MAC=$RAW_MAC
        fi

        # return eth# for a specific MAC
        local NIC=`$IFCONFIG -a | grep -i $SEARCH_MAC | awk '{print $1}'`
        if [ -z $NIC ]
        then
                log error "Network interface was not found for nic $SEARCH_MAC, this interface will not be configured correctly"
                log error "ifconfig output is \n`$IFCONFIG -a`"
        else
                log info "NIC $SEARCH_MAC found as $NIC"
        fi
        echo $NIC
}

####
## function genBondFile {bond#} {nic1} {nic2} {ip} {mask} {broadcast} {network}
## nic=bond0 eth0 eth1 192.168.0.10 255.255.255.0 
## nic=eth0 192.168.0.10 255.255.255.0 192.168.0.254 192.168.0.0
####
genBondFile() {
        local BOND=$1
        local NIC1=$2
        local NIC2=$3
        local IP=$4
        local MASK=$5
        local BROADCAST=$6
        local NETWORK=$7
        local BOND_FILE=${NIC_FILE_DIR}ifcfg-$BOND
        local NIC1_FILE=${NIC_FILE_DIR}ifcfg-$NIC1
        local NIC2_FILE=${NIC_FILE_DIR}ifcfg-$NIC2

        log info "Creating Bond file $BOND_FILE"
        echo "DEVICE=$BOND" > $BOND_FILE
        echo "BOOTPROTO=none" >> $BOND_FILE
        echo "ONBOOT=yes" >> $BOND_FILE
        echo "NETWORK=$NETWORK" >> $BOND_FILE
        echo "NETMASK=$MASK" >> $BOND_FILE
        echo "IPADDR=$IP" >> $BOND_FILE
        echo "BROADCAST=$BROADCAST" >> $BOND_FILE
        echo "USERCTL=no" >> $BOND_FILE
        echo "BONDING_OPTS=\"mode=active-backup miimon=100 primary=$NIC1\"" >> $BOND_FILE

        log info "Creating network file $NIC1_FILE"
        echo "DEVICE=$NIC1" > $NIC1_FILE
        echo "BOOTPROTO=none" >> $NIC1_FILE
        echo "HWADDR=$(getMAC $NIC1)" >> $NIC1_FILE
        echo "ONBOOT=yes" >> $NIC1_FILE
        echo "MASTER=$BOND" >> $NIC1_FILE
        echo "SLAVE=yes" >> $NIC1_FILE
        echo "USERCTL=no" >> $NIC1_FILE

        log info "Creating network file $NIC2_FILE"
        echo "DEVICE=$NIC2" > $NIC2_FILE
        echo "BOOTPROTO=none" >> $NIC2_FILE
        echo "HWADDR=$(getMAC $NIC2)" >> $NIC2_FILE
        echo "ONBOOT=yes" >> $NIC2_FILE
        echo "MASTER=$BOND" >> $NIC2_FILE
        echo "SLAVE=yes" >> $NIC2_FILE
        echo "USERCTL=no" >> $NIC2_FILE

        log info "Modifying modprobe.conf file /etc/modprobe.conf"
        echo "alias $BOND bonding" >> /etc/modprobe.conf
}

####
## function getMAC {nic}
## gets the MAC address for a given interface using ifconfig
####
getMAC() {
        HWINFO=`$IFCONFIG $1 | grep HWaddr` # eth0      Link encap:Ethernet     HWaddr 00:50:56:9C:1B:00
        if [ $? -ne 0 ]
        then
                log error "Cannot find MAC address for interface $1"
                # return nothing to the calling process
                echo " "
        else
                # return the MAC address 
                echo $HWINFO | awk '{print $5}'
        fi
}

####
## function genDomainFile {nameserver} {nameserver} {etc}
## creates a basic DNS file for nameserver entries
####
genDNSFile() {
        log info "Creating DNS file $DNS_FILE"
        OldIFS=$IFS
        IFS=" "
        > $DNS_FILE
        # create search entries
        echo "search $DOMAIN_LIST" >> $DNS_FILE
        # create server entries
        for dnsEntry in $1
        do
                echo "nameserver $dnsEntry" >> $DNS_FILE
        done
        IFS=$OldIFS
}
####
## function genHostFile {local_ip}
## creates a basic hosts file with loopback and this host
####
genHostFile() {
        local IP=$1
        log info "Creating host file $HOST_FILE"
        echo "127.0.0.1         localhost.localdomain localhost" > $HOST_FILE
        echo "$IP               `hostname`" >> $HOST_FILE

}

####
## function genGWFile {gateway_ip}
## create the default route file including default RedHat values
####
genGWFile() {
        local GW=$1
        log info "Creating gateway file $GW_FILE"
        echo "NETWORKING=yes" > $GW_FILE
        echo "NETWORKING_IPV6=no" >> $GW_FILE
        echo "HOSTNAME=`hostname`" >> $GW_FILE
        echo "GATEWAY=$GW" >> $GW_FILE
}

####
## function genIPFile {nic} {mac} {ip} {mask} {broadcast} {network}
## create the IP Address file (ifcfg-eth{x})
## e.g. nic=eth0 00:50:56:9C:1B:00 192.168.0.10 255.255.255.0 192.168.0.254 192.168.0.0
####
genIPFile() {
        local NIC=$1
        local MAC=$2
        local IP=$3
        local MASK=$4
        local BROADCAST=$5
        local NETWORK=$6
        local IP_FILE=${NIC_FILE_DIR}ifcfg-${NIC}

        log info "Creating network file $IP_FILE"
        echo "DEVICE=$NIC" > $IP_FILE
        echo "BOOTPROTO=static" >> $IP_FILE
        echo "BROADCAST=$BROADCAST" >> $IP_FILE
        echo "HWADDR=$MAC" >> $IP_FILE
        echo "IPADDR=$IP" >> $IP_FILE
        echo "NETMASK=$MASK" >> $IP_FILE
        echo "NETWORK=$NETWORK" >> $IP_FILE
        log debug "----------- ifcfg-$NIC file -----------"
        log debug "\n`cat $IP_FILE`"
        log debug "----------------------"
}

####
## function getNetworkAddress
## calculates the network address given an ip and subnet mask
## converts the ip and mask into an array and does a bitwise and for each element
####
getNetworkAddress() {
        OldIFS=$IFS
        IFS=.
        typeset -a IP_Array=($1)
        typeset -a MASK_Array=($2)
        IFS=$OldIFS
        echo $((${IP_Array[0]} & ${MASK_Array[0]})).$((${IP_Array[1]} & ${MASK_Array[1]})).$((${IP_Array[2]} & ${MASK_Array[2]})).$((${IP_Array[3]} & ${MASK_Array[3]}))
}

####
## function getBroadcastAddress
## calculates the broadcast address given an ip and subnet mask
## converts the ip and mask into an array and does a bitwise or (|) against an XOR (^)
####
getBroadcastAddress() {
        OldIFS=$IFS
        IFS=.
        typeset -a IP_Array=($1)
        typeset -a MASK_Array=($2)
        IFS=$OldIFS
        echo $((${IP_Array[0]} | (255 ^ ${MASK_Array[0]}))).$((${IP_Array[1]} | (255 ^ ${MASK_Array[1]}))).$((${IP_Array[2]} | (255 ^ ${MASK_Array[2]}))).$((${IP_Array[3]} | (255 ^ ${MASK_Array[3]})))
}

####
## function readDHCPAddress
## reads information currently running and writes it out as a static IP entry
####
readDHCPAddress() {
        log info "Host information was not found for this server, copying information from running configuration (DHCP)"
        # the grep will grab two lines of output and merge them together
        # eth0      Link encap:Ethernet  HWaddr 00:50:56:9C:1B:00
        # inet addr:192.168.0.10  Bcast:192.168.0.254  Mask:255.255.255.0
        HWINFO=`$IFCONFIG | grep -A 1 -i hwaddr`
        NIC=`echo $HWINFO | cut -f1 -d " "`
        MAC=`echo $HWINFO | cut -f5 -d " "`
        for i in $HWINFO
        do
                case "$i" in
                addr:*)
                        IP=`echo $i | cut -f2 -d :`
                        ;;
                Bcast:*)
                        BROADCAST=`echo $i | cut -f2 -d :`
                        ;;
                Mask:*)
                        MASK=`echo $i | cut -f2 -d :`
                        ;;
                esac
        done
        NETWORK=$(getNetworkAddress $IP $MASK)
        log debug "DHCP information is NIC $NIC - MAC $MAC - IP $IP - MASK $MASK - BROADCAST $BROADCAST - NETWORK $NETWORK"
        genIPFile $NIC $MAC $IP $MASK $BROADCAST $NETWORK
        genHostFile $IP
        GATEWAY=`netstat -rn | grep -w UG | awk '{print $2}'`
        genGWFile $GATEWAY
}

####
## function log
## logs activities to the screen, a file, or both
####
log() {
        LOG_TYPE="$1"
        LOG_MSG="$2"
        TIME=`date +'%H:%M:%S %Z'`
        # specify the log file only once
        if [ ! -d $SOURCE_DIR/logs ]
        then
                mkdir ${SOURCE_DIR}/logs
        fi
        if [ -z $LOG_FILE ]
        then
                LOG_FILE="$SOURCE_DIR/logs/network_config-`hostname`-`date +%Y%m%d-%H%M%S`"
        fi
        if [ $LOG_TYPE == "error" ]
        then
                echo -e "$TIME - **ERROR** - $LOG_MSG" >> $LOG_FILE
        elif [ $LOG_TYPE == "debug" ]
        then
                if [ $DEBUG == "on" ]
                then
                        echo -e "DEBUG - $LOG_MSG" >> "$LOG_FILE"
                fi
        elif [ $LOG_TYPE == "warning" ]
        then
                echo -e "$TIME - **WARNING** - $LOG_MSG" >> $LOG_FILE
        else
                echo -e "$TIME - $LOG_MSG" >> "$LOG_FILE"
        fi
}

# read source directory from command line.  This is where we will read the hostfile.csv and output logs to
SOURCE_DIR=$1
HOST_MAP_FILE=$SOURCE_DIR/hostfile.csv

readHostFile
if [ $? -ne 0 ]
then
        readDHCPAddress
fi

Sunday, October 3, 2010

Modular KickStart

I am a big fan of the modular kickstart setup. This allows easier administration of multiple configurations and tends to keep things clean and consistent. KickStart allows this through the %include directive but in order to reference an external file, a small trick is required. You see, the kickstart configuration file is in fact read twice. Once looking for any %pre directives and another to actually parse the file. So, it's in this %pre section that you can setup your config file location.

I like to use NFS for my configuration and OS files. It is relatively easy to setup and can allow multiple kickstart servers to reference it if required. Here is a sample pre section:
%pre
mkdir /post_scripts
mount -t nfs -o nolock NFS_SERVER:/tftpboot/kickstart /post_scripts
Basically it says to mount a share to /post_scripts so I can reference the files. On the server I place my configuration files in /tftpboot/kickstart/ks_cfg/ and call them like this:
%include /post_scripts/ks_cfg/CONFIG_FILE
Here are the basic sections of my kickstart file:
# cat /tftpboot/ks_cfg/oracle_5.5.ks
install
nfs --server=NFS_SERVER --dir=/tftpboot/kickstart/OS/RHEL5.5/
key --skip
lang en_US.UTF-8
keyboard us
xconfig --startxonboot
network --bootproto dhcp 
rootpw --iscrypted YOUR_PASSWORD
firewall --disabled
firstboot --disable
authconfig --enableshadow --enablemd5
selinux --disabled
timezone --utc America/Vancouver
bootloader --location=mbr --append="rhgb quiet"
clearpart --all --initlabel
part /boot --fstype="ext3" --size=100
part pv.2 --size=0 --grow
volgroup rootvg --pesize=32768 pv.2
logvol swap --fstype="swap" --name=swap --vgname=rootvg --size=2048
logvol /var --fstype="ext3" --name=var --vgname=rootvg --size=2048
logvol / --fstype="ext3" --name=root --vgname=rootvg --size=1 --grow
reboot

%packages
%include /post_scripts/ks_cfg/packages.cfg

%post
chvt 3
%include /post_scripts/ks_cfg/generic.cfg
%include /post_scripts/ks_cfg/oracle.cfg
%include /post_scripts/ks_cfg/multipath.cfg

%pre
mkdir /post_scripts
mount -t nfs -o nolock NFS_SERVER:/vol/KICKSTART /post_scripts

# cat /tftpboot/ks_cfg/packages.cfg
@editors
@gnome-desktop
@core
@base
@ftp-server
@network-server
@java
@legacy-software-support
@base-x
@server-cfg
@admin-tools
@graphical-internet
emacs
kexec-tools
fipscheck
device-mapper-multipath
dnsmasq
xorg-x11-utils
system-config-boot
# Oracle Required Packages
elfutils-libelf-devel
gcc
gcc-c++
glibc-devel
libaio-devel
libstdc++-devel
sysstat
# these are for Oracle 10G
libXp
openmotif
# Oracle 11GR2
unixODBC
unixODBC-devel
I'll cover off some of the more 'advanced' features in a later post.

Saturday, September 18, 2010

Linux PXE Boot

Over the next few posts I thought I would outline my version of diskless booting and imaging a Linux system (Kickstart specifically). To start with we'll be setting up PXELINUX and the required services, TFTP, NFS and DHCP.

TFTP Server
When PXE booting a client, Trivial File Transfer Protocol Daemon is responsible for passing the kernel binaries. Because it's directory structure forms the basis for the rest of our files, its a good idea to have it installed first. Once installed, enable the daemon by editing /etc/xinetd.d/tftp and changing disabled = yes to disabled = no, or you can run a sed script as shown below, then restart xinetd. This is also the file to edit if you want to place files in a different folder. The default is /tftpboot which I will be using through the rest of the example.
SuSE
# zypper install tftp
RedHat
# yum install tftp

# sed -e 's/disable.*=\ yes/disable\t\t\t= no/' /etc/xinetd.d/tftp > /etc/xinetd.d/tftp.temp
# mv /etc/xinetd.d/tftp.temp /etc/xinetd.d/tftp
# /etc/init.d/xinetd restart

PXELINUX
PXELINUX is a derivative project of SYSLINUX which allows you to boot Linux from a network server. I used to use Etherboot for this purpose but it requires a device specific image to be pushed to the card. This can create all kinds of headaches whereas PXELINUX is quite simple; one binary to all cards as long as they conform to the Intel PXE specification (Pre-Execution Environment). I don't know of a server today that doesn't have a PXE compliant NIC and in fact many desktop cards do as well.

Most distributions have a package for SYSLINUX which includes PXELINUX, or if you want to do things manually you can grab the binaries from here. This is perhaps already installed with a default directory structure which I also follow.
SuSE
# zypper install syslinux
RedHat
# yum install syslinux
There should be a directory under /tftpboot called linux-boot with a single file, pxelinux.0. If not you can create them as shown below.
# mkdir -p /tftpboot/linux-boot
# cp /usr/share/syslinux/pxelinux.0 /tftpboot/linux-boot

NFS Server
The NFS Server requirements are pretty basic and largely depends on what you want to do. For now we will setup two exports, one for diskless booting and another for Kickstart. They actually don't have to be on the same physical box, if you have a fancy NAS server you can host it there but for the purposes of this exercise we will host it under /tftpboot/boot-image and /tftpboot/kickstart. Edit /etc/exports as shown and startup your NFS server.
# mkdir -p /tftpboot/boot-image
# mkdir -p /tftpboot/kickstart
# vi /etc/exports
/tftpboot/boot-image *(rw,no_root_squash)
/tftpboot/kickstart *(rw,no_root_squash)
# /etc/init.d/nfsserver start

DHCP Server
Asside from the details of setting up PXELINUX itself, the DHCP server contains most of the configuration information. Listed below is a generic configuration that should give you a good starting point. I generally have a listing for each one of my servers as it provides me with some additional flexibility such as providing general DHCP services to other hosts on the same network.

The first option ddns-update-style turns off dynamic DNS. The second next-server is used to tell the client from which server it should retrieve it's initial boot as specified by the filename option. Otherwise things should be pretty straight forward.
ddns-ddns-update-style       none;
next-server             192.168.0.1;

subnet 192.168.0.0 netmask 255.255.255.0 {

}

group {
        option domain-name              "pxeboot.net";
        filename                        "linux-install/pxelinux.0";
        option routers                  ;
        option domain-name-servers      ;
        use-host-decl-names             on;
        host node01 {
                hardware ethernet       00:AE:56:9C:5D:81;
                fixed-address           192.168.0.10;
        }
        host node02 {
                hardware ethernet       00:50:56:99:4E:41;
                fixed-address           192.168.0.11;
        }
        host node03 {
                hardware ethernet       00:50:56:99:E0:68;
                fixed-address           192.168.0.12;
        }
}

Once this is completed you will probably have to edit the dhcpd configuration file, /etc/sysconfig/dhcpd, in order to tell it which interface to listen to. It's generally the first option (after the comments) and is called DHCPD_INTERFACE. Simply append on the appropriate interface and restart dhcpd with /etc/init.d/dhcpd restart. There is generally decent error information on the command line if you have any typos or missed anything.

Configuring PXELINUX
When PXELINUX is booted it will look in a sub-directory called pxelinux.cfg to find a configuration file for a specific host. Because this is a relative directory, ours will be called /tftpboot/linux-install/pxelinux.cfg. There is an order to the files its looking for which you can read all about here but given that this is a generic boot server, I am going to use the catch all default file.

In the example below I use numbers to control what to boot from rather than typing names. There is a display option display msgs/logicalshift.msg that shows the available boot choices (see below) and a corresponding label which does the actual work. My default is a generic installation, there is a prompt for the user to input a value and a 10 second timeout. You can find more about these options from the syslinux man pages.

Under each kernel I have two important options; the append option to tell the kernel to load an appropriate initrd, where to find its kickstart file and which device to use for the kickstart; in this case the interface that was booted from. And ipappend, used to tell PXELINUX that it should pass along IP information to the kernel given by the DHCP server.

In my configuration I specify a local boot option, a setup to run completely from an NFS server and two kickstart configurations; a generic install and one pre-configured to run Oracle. More on these to follow.

/tftpboot/linux-install/pxelinux.cfg/default
default 4
prompt 1
timeout 100
display msgs/logicalshift.msg

label 1
        localboot 0

label 2
        kernel CUSTOM/bzImage_x64-2.6.34
        append rw nfsroot=:/tftpboot/boot-image ip=dhcp

label 3
        kernel RHEL_5.4/vmlinuz
        append ksdevice=bootif initrd=RHEL_5.5/initrd.img ks=nfs::/tftpboot/KickStart/cfg/generic_5.5.ks
        ipappend 2

label 4
        kernel RHEL_5.4/vmlinuz
        append ksdevice=bootif initrd=RHEL_5.5/initrd.img ks=:/tftpboot/KickStart/cfg/oracle_5.5.ks
        ipappend 2


/tftpboot/linux-install/pxelinux.cfg/msgs/logicalshift.msg
This file doesn't provide any logic, it is just for display to give the user the available options. Escape characters are there to provide some colour, just for fun. I use an online ascii generator with a rounded font from here if you are interested in doing your own version.
^L

           ^O09Welcome to the ^O0cLogical Shift Installer^O07
^O0a
 _             _             _          _     _    ___       
| |           (_)           | |        | |   (_)  / __)  _   
| | ___   ____ _  ____ _____| |     ___| |__  _ _| |__ _| |_ 
| |/ _ \ / _  | |/ ___|____ | |    /___)  _ \| (_   __|_   _)
| | |_| ( (_| | ( (___/ ___ | |   |___ | | | | | | |    | |_ 
 \_)___/ \___ |_|\____)_____|\_)  (___/|_| |_|_| |_|     \__)
        (_____|                                              

                           
^O07

Enter number of the Operation System you wish to install:

1. Local (HDD) boot
2. Custom NFS root
3. Generic RHEL 5.5
4. RHEL 5.5 with Oracle Prerequisites (default)

Boot Kernel and KickStart File
One last step remains to give us some basic functionality, a Kernel and initial RAM disk to boot from and a kick start file to install from. I like to keep multiple kernel versions around so I have a directly relative to pxelinux.0 called RHEL_5.5.

In my pxe config file above I reference a directory for a RedHat 5.5. This is again a relative directory to pxelinux so it will be located under /tftpboot/linux-install as RHEL_5.5. You can get the kernel and initial RAM disk from the RedHat DVD; the steps are outlined below. The easiest way to create a kick start file is to perform a manual install, save the results and copy the file to the NFS location specified on the append line of your pxelinux.cfg/default file. From this base line you can customize your installation but I will cover that in a latter post.
# mkdir -p /tftpboot/linux-install/RHEL_5.5
# mount -o loop rhel_5.5.iso /mnt
# cp /mnt/images/pxeboot/vmlinuz /tftpboot/linux-install/RHEL_5.5/
# cp /mnt/images/pxeboot/initrd.img /tftpboot/linux-install/RHEL_5.5/
# umount /mnt

Saturday, August 7, 2010

Cisco VPN chkconfig errors

There is (at least) one more error I have seen when running the Cisco VPN client under Linux. For those systems that use LSB or Linux Standard Base (you can see my blog entry on startup scripts here) you will get an error whenever running chkconfig like this:
insserv: warning: script 'K01vpnclient_init' missing LSB tags and overrides
insserv: warning: script 'vpnclient_init' missing LSB tags and overrides
To fix this message, edit the /etc/init.d/vpn_client_init script to bring it up to LSB standards. I put the following just above the Source function library and just after the chkconfig information included with the comments.
### BEGIN INIT INFO
# Provides: ciscovpn
# Required-Start: $network
# Required-Stop:
# Default-Start: 3 4 5
# Default-Stop: 0 1 2 3
# Short-Description: vpnclient
# Description: Cisco VPN Client
### END INIT INFO

Friday, July 30, 2010

Cisco VPN Scripts

I have found that the Cisco VPN client occasionally hangs up on its connection. The reason is because the client removes the local network route which works fine until the MAC address cache expires and needs to be refreshed. Linux can't find its local route and basically drops the vpn, at which time the Cisco client conveniently puts the route back making it extra hard to track.

The really nasty part is you can't add the route ahead of time as the client will just remove it no matter how many there are and you can't add it later from the same shell as any attempt to background the client will end badly. My solution is to capture the local route and background a sub-shell which will add the route 60 seconds after the vpn client starts. I thought this would give enough time for the user to establish the connection but not be too long as to expire the MAC address cache. Here it is:
#!/bin/bash
# look for the interface that is up with a gateway assigned (UG) and grab the last field
DEV=`netstat -rn | grep UG | awk '{print $NF}'`
# This should just grab the one local route for the default interface
NETSTAT=`netstat -rn | grep $DEV | grep -v "^127\|^0.0\|169.254"`
NETWORK=`echo $NETSTAT | awk '{print $1}'`
MASK=`echo $NETSTAT | awk '{print $3}'`

# This says after 60 seconds add the local route back in
# you can't do this after vpnclient as it can't be backgrounded without a username / password on the command line
(sleep 60 && sudo /sbin/route add -net $NETWORK netmask $MASK dev $DEV)&

# Finally run the vpnclient
vpnclient connect mypcf_file

In order to make this work, your user account has to be able to execute sudo for /sbin/route without a password. For me I added my group to /etc/sudoers with the following entry:
# visudo
%users  ALL=(ALL) NOPASSWD:  /sbin/route

Tuesday, July 27, 2010

Cisco VPN Installation

The Cisco VPN module has been a bit of a sore point to get compiled and running. Here are some instructions I used under OpenSuSE 11.3 with kernel 2.6.34-12-desktop but it should work on other distributions too.

You are going to need three pieces of code, the VPN client, a 64 bit patch, and a patch to work with a 2.6.31+ kernel. I have
To start you will need three pieces of code:
So lets see what happens with just the base VPN client:
# tar -zxvf vpnclient-linux-x86_64-4.8.02.0030-k9.tar.gz
# cd vpnclient
# ./vpn_install
Making module
make -C /lib/modules/2.6.34-12-desktop/build SUBDIRS=/home/mike/cisco/vpnclient modules
make[1]: Entering directory `/usr/src/linux-2.6.34-12-obj/x86_64/desktop'
make -C ../../../linux-2.6.34-12 O=/usr/src/linux-2.6.34-12-obj/x86_64/desktop/. modules
/usr/src/linux-2.6.34-12/scripts/Makefile.build:49: *** CFLAGS was changed in "/home/mike/cisco/vpnclient/Makefile". Fix it to use EXTRA_CFLAGS.  Stop.
make[3]: *** [_module_/home/mike/cisco/vpnclient] Error 2
make[2]: *** [sub-make] Error 2
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.34-12-obj/x86_64/desktop'
make: *** [default] Error 2
Failed to make module "cisco_ipsec.ko".
Not so good, lets install the 64 bit patch and see what happens:
# patch < ../vpnclient-linux-4.8.02-64bit.patch
patching file Makefile
patching file frag.c
patching file interceptor.c
patching file linuxcniapi.c
patching file linuxkernelapi.c
# ./vpn_install
Making module
make -C /lib/modules/2.6.34-12-desktop/build SUBDIRS=/home/mike/cisco/vpnclient modules
make[1]: Entering directory `/usr/src/linux-2.6.34-12-obj/x86_64/desktop'
make -C ../../../linux-2.6.34-12 O=/usr/src/linux-2.6.34-12-obj/x86_64/desktop/. modules
  CC [M]  /home/mike/cisco/vpnclient/linuxcniapi.o
/home/mike/cisco/vpnclient/linuxcniapi.c:14:28: fatal error: linux/autoconf.h: No such file or directory
compilation terminated.
make[4]: *** [/home/mike/cisco/vpnclient/linuxcniapi.o] Error 1
make[3]: *** [_module_/home/mike/cisco/vpnclient] Error 2
make[2]: *** [sub-make] Error 2
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.34-12-obj/x86_64/desktop'
make: *** [default] Error 2
Failed to make module "cisco_ipsec.ko".
Now we have a strange error message about a missing autoconf.h file. To fix this we need to know what kernel we are running by using uname. In my case it is 2.6.34-12-desktop. It is the desktop portion that is important as under /usr/src/linux-2.6.34-12-obj/x86_64 there are a few directories, default, desktop, and xen. You need to make sure you are working with the correct one. To get around the error just touch an empty file:
# touch /usr/src/linux-2.6.34-12-obj/x86_64/desktop/include/linux/autoconf.h
# ./vpn_install
Making module
make -C /lib/modules/2.6.34-12-desktop/build SUBDIRS=/home/mike/cisco/vpnclient modules
make[1]: Entering directory `/usr/src/linux-2.6.34-12-obj/x86_64/desktop'
make -C ../../../linux-2.6.34-12 O=/usr/src/linux-2.6.34-12-obj/x86_64/desktop/. modules
  CC [M]  /home/mike/cisco/vpnclient/linuxcniapi.o
  CC [M]  /home/mike/cisco/vpnclient/frag.o
  CC [M]  /home/mike/cisco/vpnclient/IPSecDrvOS_linux.o
  CC [M]  /home/mike/cisco/vpnclient/interceptor.o
/home/mike/cisco/vpnclient/interceptor.c: In function ‘interceptor_init’:
/home/mike/cisco/vpnclient/interceptor.c:132:8: error: ‘struct net_device’ has no member named ‘hard_start_xmit’
/home/mike/cisco/vpnclient/interceptor.c:133:8: error: ‘struct net_device’ has no member named ‘get_stats’
/home/mike/cisco/vpnclient/interceptor.c:134:8: error: ‘struct net_device’ has no member named ‘do_ioctl’
/home/mike/cisco/vpnclient/interceptor.c: In function ‘add_netdev’:
/home/mike/cisco/vpnclient/interceptor.c:271:33: error: ‘struct net_device’ has no member named ‘hard_start_xmit’
/home/mike/cisco/vpnclient/interceptor.c:272:8: error: ‘struct net_device’ has no member named ‘hard_start_xmit’
/home/mike/cisco/vpnclient/interceptor.c: In function ‘remove_netdev’:
/home/mike/cisco/vpnclient/interceptor.c:294:12: error: ‘struct net_device’ has no member named ‘hard_start_xmit’
make[4]: *** [/home/mike/cisco/vpnclient/interceptor.o] Error 1
make[3]: *** [_module_/home/mike/cisco/vpnclient] Error 2
make[2]: *** [sub-make] Error 2
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.34-12-obj/x86_64/desktop'
make: *** [default] Error 2
Failed to make module "cisco_ipsec.ko".
Got rid of that autoconf.h message but now we have an interceptor problem. The 2.6.31 patch will take care of that for us.
# patch < ../vpnclient-linux-2.6.31-final.diff
# ./vpn_install
Making module
make -C /lib/modules/2.6.34-12-desktop/build SUBDIRS=/home/mike/cisco/vpnclient modules
make[1]: Entering directory `/usr/src/linux-2.6.34-12-obj/x86_64/desktop'
make -C ../../../linux-2.6.34-12 O=/usr/src/linux-2.6.34-12-obj/x86_64/desktop/. modules
  CC [M]  /home/mike/cisco/vpnclient/interceptor.o
/home/mike/cisco/vpnclient/interceptor.c: In function ‘add_netdev’:
/home/mike/cisco/vpnclient/interceptor.c:284:5: error: assignment of read-only location ‘*dev->netdev_ops’
/home/mike/cisco/vpnclient/interceptor.c: In function ‘remove_netdev’:
/home/mike/cisco/vpnclient/interceptor.c:311:9: error: assignment of read-only location ‘*dev->netdev_ops’
make[4]: *** [/home/mike/cisco/vpnclient/interceptor.o] Error 1
make[3]: *** [_module_/home/mike/cisco/vpnclient] Error 2
make[2]: *** [sub-make] Error 2
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.34-12-obj/x86_64/desktop'
make: *** [default] Error 2
Failed to make module "cisco_ipsec.ko".
One more error to fix. This one involved changing netdevice.h in the kernel source tree from const struct net_device_ops *netdev_ops to just struct net_device_ops *net_device_ops. We can do that with one line as shown below
# sed -i 's/const\ struct\ net_device_ops\ \*netdev_ops;/struct\ net_device_ops\ \*netdev_ops;/' `find /usr/src -name netdevice.h`
# ./vpn_install
Success, the module compiles and installs. Now we just need to run it. To do this you will need a pcf file from your VPN administrator. For me, I took the files from a windows client and modified it slightly by removing the value for the ISPPhonebook entry. Place this in /etc/opt/cisco-vpnclient/Profiles and then connect with vpnclient connect PCF_FILE.

Sunday, July 25, 2010

Left Over .nfs (dot nfs) Files

I recently had a situation where an NFS file system was constantly filling with these strange .nfs files. If a file is removed while a running process still has it open, that file is renamed as .nfs and a long hex string. The symptoms are fairly easy to reproduce:
# touch testfile
# tail -f testfile
From another session:
# rm testfile
# ls -la
-rw-r--r-- 1 root    root    0 Jul 23 13:31 .nfs000000000033468f00000003
To find the offending process run an lsof .nfsxxxxx and kill it, however, because NFS mounts can be spread across several clients, it may take a bit of searching to find the right one. Once the process is terminated the client should automatically clean the file.

I should also point out that in my testing, both the process and the delete operation have to come from the same client. NFS doesn't enforce any file locking which means if a file is deleted from another machine the system doesn't know to rename it first. It is left up to the application to sort this out. Most do nothing about it which means you will get a stale NFS handle message on the source process.

Tuesday, June 29, 2010

Dynamic CPU Cores

A neat trick I learned to disable and re-enable a CPU core dynamically in Linux. Handy for testing.
Disable a core
# echo 0 > /sys/devices/system/cpu/cpu1/online
Enable a core
# echo 1 > /sys/devices/system/cpu/cpu1/online
You can't disable CPU0 but all others are fair game.

Sunday, June 20, 2010

Linux Virtual File System (VFS)

Every file system under Linux is represented to a user process, not directly, but through a virtual file system layer. This allows the underlying structure to change, for example from reiserfs to xfs to ext4 without having to change any application code. For each file system available there is either a loadable or an integrated kernel module available. This module is responsible for the low level operations but also to provide standard information back to the VFS layer. You can see which modules have registered by looking at /proc/filesystems.
# cat /proc/filesystems
nodev   sysfs
nodev   rootfs
nodev   bdev
nodev   proc
nodev   tmpfs
nodev   devtmpfs
nodev   debugfs
nodev   securityfs
nodev   sockfs
nodev   usbfs
nodev   pipefs
nodev   anon_inodefs
nodev   inotifyfs
nodev   devpts
        ext3
        ext2
nodev   ramfs
nodev   hugetlbfs
        iso9660
nodev   mqueue
        ext4
nodev   fuse
        fuseblk
nodev   fusectl
nodev   vmblock
The first column indicates if the file system requires a block device or not. The second is the file system name as it is registered to the kernel.

When a filesystem is mounted, the mount command always passes three pieces of information to the kernel; the physical block device, the mount point, and the file system type. However, we generally don't specify the file system type at least on the command line and looking at man mount(8), it shows that this information is optional. So how does the kernel know which module to load? As it turns out, mount makes a library call to libblkid which is capable of determining quite a range of file system types. There is a user space program which will also use libblkid, aptly named blkid. Feel free to have a look at the source for blkid to see the full file system list. You can also run it against your local system to see the results it produces.
# blkid /dev/sdb1
/dev/sdb1: UUID="06749374749364E9" TYPE="ntfs"
# blkid /dev/sda1
/dev/sda1: UUID="207abd21-25b1-43bb-81d3-1c8dd17a0600" TYPE="swap"
# blkid /dev/sda2
/dev/sda2: UUID="67ea3939-e60b-4056-9465-6102df51c532" TYPE="ext4"
Of course if blkid isn't able to determine the type shown with the error mount: you must specify the filesystem type it has to be specified by hand with the -t option. Now if we look at an strace from a mount command we can see the system call in action. The first example is a standard file system requiring a block device, the second is from sysfs. Notice how mount still passes the three options.
# strace mount -o loop /test.img /mnt
...
stat("/sbin/mount.vfat", 0x7fff1bd75b80) = -1 ENOENT (No such file or directory)
mount("/dev/loop0", "/mnt", "vfat", MS_MGC_VAL, NULL) = 0
...

# strace mount -t sysfs sys /sys
...
stat("/sbin/mount.sysfs", 0x7fff21628c30) = -1 ENOENT (No such file or directory)
mount("/sys", "/sys", "sysfs", MS_MGC_VAL, NULL) = 0
...
Looking at the system call mount(2), we can see there are actually five required arguments; source, target, file system type, mount flags, and data. The mount flag in this case is MS_MGC_VAL which is ignored as of the 2.4 kernel but there are several other options that will look familiar. Have a look at the man page for a full list.

The kernel can now request the proper driver (loaded by kerneld) which is able to query the superblock from the physical device and initialize its internal variables. There are several fundamental data types held within VFS as well as multiple caches to speed data access.

Superblock
Every mounted file system has a VFS superblock which contains key records to enable retrieval of full file system information. It identifies the device the file system lives, its block size, file system type, a pointer to the first inode of this file system (a dentry pointer), and a pointer to file system specific methods. These methods allow a mapping between generic functions and a file system specific one. For example a read inode call can be referenced generically under VFS but issue a file system specific command. Applications are able to make common system calls regardless of the underlying structure. It also means VFS is able to cache certain lookup data for performance and provide generic features like chroot for all file systems.

Inodes
An index node (inode) contains the metadata for a file and in Linux, everything is a file. Each VFS inode is kept only in the kernel's memory and its contents are built from the underlying file system. It contains the following attributes; device, inode number, access mode (permissions), usage count, user id (owner), group id (group), rdev (if it's a special file), access time, modify time, create time, size, blocks, block size, a lock, and a dirty flag.

A combination of the inode number and the mounted device is used to create a hash table for quick lookups. When a command like ls makes a request for an inode its usage counter is increased and operations continue. If it's not found, an free VFS inode must be found so that the file system can read it into memory. To do this there are a two options; new memory space can be provisioned, or if all the available inode cache is used, an existing one can be reused selecting from those with a usage count of zero. Once an inode is found, a file system specific methods is called read from the disk and data is populate as required.

Dentries
A directory entry (dentry) is responsible for managing the file system tree structure. The contents of a dentry is a list of inodes and corresponding file names as well as the parent (containing) directory, superblock for the file system, and a list of subdirectories. With both the parent and a list of subdirectories kept in each dentry, a chain in either direction can be reference to allow commands to quickly traverse the full tree. As with inodes, directory entries are cached for quick lookups although instead of a usage count the cache uses a Least Recently Used model. There is also an indepth article of locking and scalability of the directory entry cache found here.

Data Cache
Another vital service VFS provides is an ability to cache file level data as a series of memory pages. A page is a fixed size of memory and is the smallest unit for performing both memory allocation and transfer between main memory and a data store such as a hard drive. Generally this is 4KB for an x64 based system, however, huge pages are supported in the 2.6 kernel providing sizes as large as 1GB. You can find the page size for your system by typing getconf PAGESIZE, the results are in bytes.

When the overall memory of a system becomes strained, VFS may decide to swap out portions to available disk. This of course can have a serious impact to application performance, however, there is a way to control this; swappiness. Having a look at /proc/sys/vm/swappiness will show the current value, a lower number means the system will swap less, a higher will swap more. To prevent swapping all together type:
# echo 0 > /proc/sys/vm/swappiness
To make this change persistent across a reboot edit /etc/sysctl.conf with the following line
vm.swappiness=0
Of course you may not want to turn swap off entirely so some testing to find the right balance may be in order.

A second layer of caching available to VFS is the buffer cache. Its job is to store copies of physical blocks from a disk. With the 2.4 kernel, a cache entry (referenced by a buffer_head) would contain a copy of one physical block, however, since version 2.6 a new structure has been introduced called a BIO. While the fundamentals remain the same, the BIO is also able to point to other buffers as a chain. This means blocks are able to be logically grouped as a larger entity such as an entire file. This improves performance for common application functions and allows the underlying systems to make better allocation choices.

The Big Picture
Here are the components described above put together.

Controlling VFS
vmstat
VMstat gives us lots of little gems into how the overall memory, cpu, and file system cache is behaving.
# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
1  0      0 6987224  30792 313956    0    0    85    53  358  651  3  1 95  1  0
Of particular interest to VFS is the amount of free memory. From the discussion above, buff refers to the size of block data cached in bytes, and cache refers to the size of file data kept in pages. The amount of swap used and active swap operations can have significant performance impact and is also available here shown as memory pages in (read) and pages out (write).

Other items shown are r for number of processes waiting to be executed (run queue) and b for number of processes blocking on I/O. Under System, in shows the number of interrupts per second, and cs shows the number of context switches per second. IO shows us the number of blocks in an out from physical disk. Block size for a given file system can be shown using stat -f or tune2fs -l against a physical device.

Flushing VFS
It is possible to manually request a flush of clean blocks from the vfs cache through the /proc file system.
Free page cache
# echo 1 > /proc/sys/vm/drop_caches
Free dentries and inodes
# echo 2 > /proc/sys/vm/drop_caches
Free page cache, dentries, and inodes
#echo 3 > /proc/sys/vm/drop_caches
While not required, it is a good idea to first run sync to force any dirty block to disk. An unmount and remount will also flush out all cache entries but can be disruptive depending on other system functions. This can be a useful tool when performing disk based benchmark exercises.

slabtop
Slabinfo provides overall kernel memory allocation and within that includes some specific statistics pertaining to VFS. Items such as number of inodes, dentries, and buffer_head, a wrapper to BIO are available.

Saturday, June 5, 2010

Linux Logical Volume Management

There are a few reasons for using Logical Volume Management; extending the capacity of a file system beyond the available physical spindles by spanning disks, using it to have more dynamic control over disk capacity for example by adding or removing a drive, or to create backups in the form of snapshots. LVM can be applied against any block device such as a physical drive, software raid, or external hardware raid device. The file system is still separate, however, it must be managed in conjunction with LVM to make use of the available block appropriately.

In general there are three basic components:

Physical Disk
  • Initially, each drive is simply marked as available for use in a volume group. This writes a Universally Unique Identifier (UUID) to the initial sectors of the disk and prepares it to receive a volume group

Volume Group
  • A collection of physical disks (or partitions if desired). When created this will designate physical extents to all of its member disks, the default being 4MB. It will also record information about all other physical disks in the group and any logical volumes present.

Logical Volume
  • Most of the work happens at this layer. A logical volume is a mapping between a set of physical extents (PE) from the disk to a set of logical extents (LE). The size of these are always the same and generally the quantity matches one to one. However, it is possible to have two PEs mapping to one LE if mirroring is used.

In the example shown, there is one volume group with two physical drives and two logical volumes mapped. Physical blocks that are not assigned to a logical drive are free and can be used to expand either logical drive at a later time.



Creating a Logical Drive
I a not going to bother with mirrored or stripped volumes. You could make a case for a stripe to increase performance, however, in general I believe it is better to use either the hardware or software raid functions available as they are better suited for that purpose. The steps are fairly simple, mark the device with pvcreate, create a volume group and then assign a logical volume. Depending on how big your volume group is, you may want to consider altering the default physical extent size. The man page for vgcreate states, if the volume group metadata uses lvm2 format those restrictions [65534 extents in each logical volume] do not apply, but having a large number of extents will slow down the tools but have no impact on I/O performance to the logical volume. So if I was creating a terabyte or larger volume, its probably a good idea to increase this to 64MB or even 128MB.

# pvcreate /dev/sdb
No physical volume label read from /dev/sdb
Physical volume "/dev/sdb" successfully created
# pvcreate /dev/sdc
No physical volume label read from /dev/sdc
Physical volume "/dev/sdc" successfully created
# vgcreate -s 16M datavg /dev/sdb /dev/sdc
Volume group "datavg" successfully created
# pvdisplay /dev/sdb
--- Physical volume ---
PV Name /dev/sdb
VG Name datavg
PV Size 10.00GB / not Usable 16.00MB
Allocatable Yes
PE Size (KByte) 16384
Total PE 639
Free PE 639
Allocated PE 0
PV UUID apk7wQ-V9B2-vHVo-L5Yz-81U0-orx7-F8J0MI
# vgdisplay datavg
--- Volume group ---
VG Name datavg
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 1
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size 19.97GB
PE Size 16.00MB
Total PE 1278
Alloc PE / Size 0 / 0
Free PE / Size 1278 / 19.97GB
VG UUID Glyv9C-qRog-YZVk-08nR-csMe-quMp-A3Ksby

As you can see in this example, the volume group named datavg has two member disks each 10GB in size. I selected a different physical extent size not because I had to, just to show how it is done. You will also notice that the available PE size is one less than the total drive space. This is to accommodate the volume group metadata mentioned earlier. You can actually read this data yourself if you like.
# dd if=/dev/sdb of=vg_metadata bs=16M count=1
# strings vg_metadata

The last step is to create the Logical Volume itself. There are a myriad of options available depending on what you want to accomplish, the important ones are:

-L size[KMGTPE]
  • Specifies a size in kilobytes, megabytes, gigabytes, terabytes, petabytes, or exabytes. Let me know if you actually use the last two.

-l size
  • Specifies the size in extents. In this case 16MB each. You can also specify as a percentage of either the Volume Group, free space in the volume group, or free space for the physical volumes with %VG, %FREE, or %PVS respectively.

-n string
  • Gives a name to your logical volume

-i Stripes
  • Number of stripes to use. As I mentioned earlier, you should probably use raid to perform this functionality, but if you must, this should be equal to the number of spindles present in the volume group

-I stripeSize
  • The stripe depth in KB to use for each disk

Here is an example for a simple volume, and then a striped volume

# lvcreate -L 5G -n datalv datavg
Logical volume "datalv" created
# lvdisplay
--- Logical volume ---
LV Name /dev/datavg/datalv
VG Name datavg
LV UUID fCoaFl-7aQY-CX5U-zDwO-at52-udkI-ke6CZn
LV Write Access read/write
# open 0
LV size 5.00GB
Current LE 320
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 1024
Block device 253:0
# lvcreate -L 10G -i 2 -I 64 -n stripedlv datavg

If you are going to use striped volumes you should probably only use striped as it requires the proper number of blocks free on each physical volume. Once we have a volume we need a file system. For this exercise I am going to use ext4, but you can use what you like.

# mkfs.ext4 /dev/datavg/datalv
# mkdir /data
# mount /dev/datavg/datalv /data

Expanding a Logical Volume
# pvcreate /dev/sdd
# vgextend datavg /dev/sdd
Volume group "datavg" successfully extended
# lvresize -L 15G /dev/datavg/datalv
Extending logical volume datalv to 15.00 GB
Logical volume datalv successfully resized
# resize2fs /dev/datavg/datalv
resize2fs 1.41.9 (22-Aug-2009)
Resizing the filesystem on /dev/datavg/datalv to 3932160 (4k) blocks.
The filesystem on /dev/datavg/datalv is now 3932160 blocks long.

Depending on the state of your file system, you may not be able to expand online. You can check the output of tune2fs to ensure GDT blocks have been set aside, without those you will for sure have to be offline. For example, tune2fs -l /dev/datavg/datalv. You may also get a warning to run e2fsck first. The man page warns of running this on-line, so again you are probably best served by unmounting the file system first. If this was a system disk that generally means dropping back down to single user mode.

Reducing a Logical Volume
Before embarking on this journey, ensure you manage the file system first, which for the ext series anyway, means you have to have it unmounted. Once that is done you can go ahead and shrink the logical volume as shown here.

# umount /data
# resize2fs /dev/datavg/datalv 10g
resize2fs 1.41.9 (22-Aug-2009)
Resizing the filesystem on /dev/datavg/datalv to 2621440 (4k) blocks.
The filesystem on /dev/datavg/datalv is now 2621440 blocks long.
# lvreduce -L 10g /dev/datavg/datalv
WARNING: Reducing active and open logical volume to 10.00 GB
THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce datalv? [y/n]: y
Reducing logical volume datalv to 10.00 GB
Logical volume datalv successfully resized

Again you may be prompted to check your file system but it's unmounted anyway, so it shouldn't be a problem. If the file system is highly fragmented the resize process can take quite a while so be prepared.

Snap shots
Another benefit of lvm is the ability to take point in time images of your file system. Snaps use a copy of write technology where a block that is about to be overwritten or changed is first copied to a new location and then allowed to be altered. This can cause a performance problem on writes which can compound as more snaps are added so bear that in mind. You will also have to set aside some space within the volume group for this purpose. The amount really depends on how many changes you are making, but 10-20% is probably a good starting point. For this example I am going to use 1G as I don't expect many changes.

# lvcreate -L 1g -s -n datasnap1 /dev/datavg/datalv 
Logical volume "datasnap" created

Notice the -s entry for snapshot and that the target isn't the volume group but rather the logical volume desired. It appears there is a bug in OpenSuSE that may be present in other distributions. It prevents the snap from being registered with the event monitor, to alert when full or reaching capacity. If you get this message you will have to upgrade both lvm2 and device-mapper packages as it was compiled against the wrong library versions.

OpenSuSE error:
datavg-datasnap: event registration failed: 10529:3 libdevmapper-event-lvm2snapshot.so.2.02 dlopen failed: /lib64/libdevmapper-event-lvm2snapshot.so.2.02: undefined symbol: lvm2_run
datavg/snapshot0: snapshot segment monitoring function failed.


To use your new snap, simply mount it like you would any other file system with mount /dev/datavg/datasnap1 /datasnap. You can view the snap useage through lvdisplay from the allocated to snapshot field.

# lvdisplay /dev/data/datasnap1
--- Logical volume ---
LV Name /dev/datavg/datasnap1
VG Name datavg
LV UUID 82IA4M-Md6s-MEI6-iNPW-6wFb-8pzD-eCQqmS
LV Write Access read/write
LV snapshot status active destination for /dev/datavg/datalv
LV Status available
# open 0
LV Size 20.00 GB
Current LE 5120
COW-table size 1.00 GB
COW-table LE 256
Allocated to snapshot 68.68%
Snapshot chunk size 4.00 KB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0

If the snap reserve space fills completely it will not be deleted but marked invalid and cannot be read from, even if it is currently mounted. Snaps aren't good forever but as a point in time image they can be invaluable for providing specific backup scenarios like quick reference points for database backups. Instead of moving the active file system to tape you can quiesce the database, snap it, and return it to normal operations and then perform a backup from the snapshot.

Moving Volume Groups
A handy utility that I have used many times under AIX is also available under Linux; the ability to move a volume group from one system to the next.

# umount /data
# vgchange -an datavg
0 logical volume(s) in volume group "datavg" now active
# vgexport datavg
Volume group "datavg" successfully exported

Shutdown the machine before removing and assigning to another machine.
# pvscan
PV /dev/sdb is in exported VG datavg [10.00 GB / 0 free]
PV /dev/sdc is in exported VG datavg [10.00 GB / 1.99 GB free]
Total: 2 [19.99 GB] / in use: 2 [19.99 GB] / in no VG: 0 [0 ]
# vgscan
Reading all physical volumes. This may take a while...
Found exported volume group "datavg" using metadata type lvm2
# vgimport datavg
Volume group "datavg" successfully imported

You should now be able to mount your file system on the new machine.

Other Commands
Some other important commands for volume management
# lvremove logical_volume_path
e.g. lvremove /dev/datavg/datasnap1
# pvremove device
e.g. pvremove /dev/sdd
# pvmove device
moves data from an existing drive to free extents on other disks in the volume group
e.g. pvmove /dev/sdc
# vgreduce volume_group device
removes a device from a volume group
e.g. vgreduce datavg /dev/sdc
# pvremove device
removes a physical device from lvm
e.g. pvremove /dev/sdc

Sunday, May 23, 2010

Custom Startup Scripts for Linux

There are a few options when having a process or command execute on boot. The easiest is to add it to /etc/rc.local. This works well for small quick and dirty jobs, however, for more complex jobs such as those requiring a specific start order or daemon control a full start-up script is a great way to go.

For this example I am going to draw on a past project of mine, Linux Cluster Manager as it has a daemon that needs to stay running all of the time. Here is the script:

#!/bin/bash
#
# lcm This shell script takes care of starting and stopping
# lcm server daemons
#
# chkconfig: 345 85 25
# description: Client side daemon for LCM
# processname: lcmclient

### BEGIN INIT INFO
# Provides: lcmclient
# Required-Start: $network $syslog
# Required-Stop:
# Default-Start: 3 4 5
# Default-Stop: 0 1 2 6
# Short-Description: LCMClient
# Desciption: Client side daemon for LCM
### END INIT INFO

STATUS=0
# Source function library.
test -s /etc/rc.d/init.d/functions && . /etc/rc.d/init.d/functions
test -s /etc/rc.status && . /etc/rc.status && STATUS=1

start() {
echo -n $"Starting LCM Client Daemons: "
if [ -x /usr/local/lcm/lcmclient ] ; then
if [ $STATUS -eq 1 ]
then
startproc /usr/local/lcm/lcmclient &> /dev/null
rc_status -v
else
/usr/local/lcm/lcmclient &> /dev/null &
PID=`/sbin/pidof -s -x lcmclient`
if [ $PID ]
then
echo_success
else
echo_failure
fi
echo
fi
fi
}

stop () {
echo -n $"Stopping LCM Client Daemons: "
test -s /sbin/pidof && PID=`/sbin/pidof -s -x lcmclient`
test -s /bin/pidof && PID=`/bin/pidof -s -x lcmclient`
if [ $PID ]
then
/bin/kill $PID
fi
if [ $STATUS -eq 1 ]
then
rc_status -v
else
echo_success
echo
fi
}

restart() {
stop
start
}

case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
esac

Registration
At least for SuSE and RedHat based distributions, start-up scripts live in /etc/init.d. They can be called whatever you like as long as they are executable and ideally owned by root as that is who will run them anyway. We used to have to link this script to the different run levels, which is easy enough to do, it's just tedious and error prone. So today we register scripts with chkconfig and let it do all the work for us.

The opening lines enable this feature for both RedHat and SuSE, which of course have to do things differently. I generally like to have both as it doesn't do any harm and allows for more portable code.

1  #!/bin/bash
2 #
3 # lcm This shell script takes care of starting and stopping
4 # lcm server daemons
5 #
6 # chkconfig: 345 85 25
7 # description: Client side daemon for LCM
8 # processname: lcmclient
9
10 ### BEGIN INIT INFO
11 # Provides: lcmclient
12 # Required-Start: $network $syslog
13 # Required-Stop:
14 # Default-Start: 3 4 5
15 # Default-Stop: 0 1 2 6
16 # Short-Description: LCMClient
17 # Description: Client side daemon for LCM
18 ### END INIT INFO

RedHat
The first line is of course the desired shell which all scripts should have. Lines 2-5 are really just information lines for the user. Lines 6-7 are required for chkconfig under RedHat and tell it what run levels we want to start, the start order and the shutdown order. In this case it will start under run levels 3, 4, and 5 with a start order of 85 and a shutdown order of 25.

To register the script and check the results we can run the following:
# chkconfig --add lcm
# chkconfig --list lcm
lcm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
# ls /etc/rc*/*lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc0.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc1.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc2.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc3.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc4.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc5.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc6.d/K25lcm -> ../init.d/lcm

SuSE
SuSE takes its setup process from the Linux Standard Base core specifications. This is shown in lines 10-18 blocked by BEGIN and END INIT INFO. Basically what it does is specify the run levels we would like and what other services are needed to be able to start and stop. Chkconfig figures things out from there and numbers the start and shutdown order for us.

Line 11 begining with Provides established this script as a facility called lcmclient. We can reference other facilities through the Required-Start and Required-Stop on lines 12 and 13. Common facility names are $network, $syslog and $local_fs, but a larger list and some additional explanation can be found here.

The main benefit of this approach is parallel boot operations. If the system understands the relationships of all the start-up elements, many can be run at the same time. If I had another script that depended on this one, I could list lcmclient as a Required-Start entry for that script. Note there is no $ in front as by naming convention, those are reserved for system facility names.

Again, we run the same chkconfig commands, however, this time the start order is determined for us. If we take a closer look at our dependencies we see that network starts at order 2 and syslog at order 3.

# chkconfig --add lcm
lcm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
# ls -l /etc/rc.d/rc*/*lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc3.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc3.d/S04lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc4.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc4.d/S04lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc5.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc5.d/S04lcm -> ../lcm

User Feedback
The next section involves loading other helper functions. They aren't specifically required but make formatting, user feedback, and process management a lot easier.
1  STATUS=0
2 # Source function library.
3 test -s /etc/rc.d/init.d/functions && . /etc/rc.d/init.d/functions
4 test -s /etc/rc.status && . /etc/rc.status && STATUS=1

The only reason I have a STATUS variable is to identify which set of libraries, and therefor which OS is doing the executing. Line 3 is for RedHat, line 4 is for SuSE. As with registration they differ enough from each other to be annoying.

My primary use for these extra functions is to put the nice little [ OK ] or [ FAILED ] messages on the screen that can be so helpful. The exact function called to do this can depend on what the script is doing or how the program it calls operates.
Starting
1  start() {
2 echo -n $"Starting LCM Client Daemons: "
3 if [ -x /usr/local/lcm/lcmclient ] ; then
4 if [ $STATUS -eq 1 ]
5 then
6 startproc /usr/local/lcm/lcmclient &> /dev/null
7 rc_status -v
8 else
9 /usr/local/lcm/lcmclient &> /dev/null &
10 PID=`/sbin/pidof -s -x lcmclient`
11 if [ $PID ]
12 then
13 echo_success
14 else
15 echo_failure
16 fi
17 echo
18 fi
19 fi
20 }

In this case I have chosen to start the application with startproc on line 6 for SuSE and just by hand on line 9 for RedHat. The reason is because the program blocks and its possible to spit out errors to stderr. Startproc handles this fairly well and gives a proper return code which rc_status -v on line 7 can report on. However, the tools under RedHat either expect the process to fork as with a daemon or to return when completed. So, I have resorted to starting by hand and then checking for a process on lines 10-11. You can't just rely on the return code because if you redirect stdout and stderr to /dev/null and put it in the background it will always return 0. Go ahead, try it, I'll wait.

If a pid exists, echo_success is run on line 13, otherwise echo_failure on line 15. Either one of these requires a subsequent echo command on line 17 to provide a newline.

Other methods of starting scripts, programs, or just commands:
























OSFunctionExampleResult
RedHatactionaction "Starting example: " /usr/bin/example[ OK ] or [ FAILED ]
RedHatecho_successecho_success; echo[ OK ]
RedHatecho_failureecho_failure; echo[ FAILED ]
RedHatecho_warningecho_warning; echo[ WARNING ]
SuSEstartprocstartproc /usr/bin/examplenone
SuSErc_statucrc_status -vdone, failed, or skipped


I invite you to wade into the functions provided by each OS and see if you can find any gems in there. Bring your choice of caffeine, you'll need it.

Shutdown
1  stop () {
2 echo -n $"Stopping LCM Client Daemons: "
3 test -s /sbin/pidof && PID=`/sbin/pidof -s -x lcmclient`
4 test -s /bin/pidof && PID=`/bin/pidof -s -x lcmclient`
5 if [ $PID ]
6 then
7 /bin/kill $PID
8 fi
9 if [ $STATUS -eq 1 ]
10 then
11 rc_status -v
12 else
13 echo_success
14 echo
15 fi
16 }

Fairly simple here, grab the pid of the program and issue a kill command. Of course RedHat and SuSE have to disagree on the location for pidof but that isn't too hard to overcome. Again the STATUS variable is used to determine which helper function to run. You'll notice that there isn't a failure result here. I could have some some extra work against the kill command but felt it complicated things more than it really mattered.

Command Line Arguments
Every start-up script is required to accept both the start or stop command line argument. I have handled that with a case statement but you can use whatever makes you happy. It is also customary to include a restart option, usage information, and possibly status if it makes sense.

If your needs are simple enough, you could include all of the code inside the case statement. I find this harder to read for pretty much everything but the simplest of jobs, most of which will fit into rc.local anyway.

Running Your Script
Some useful commands to control and execute your new script
# chkconfig --list lcm
# chkconfig lcm on
will remove all symbolic links to prevent the script from executing

# chkconfig lcm off
will add all symbolic links

# service lcm {start | stop | restart}
# /etc/init.d/lcm {start | stop | restart}
both of these will execute your script, the first just has a little less typing