Showing posts with label shell. Show all posts
Showing posts with label shell. Show all posts

Sunday, November 13, 2011

Misaligned I/O reporting

In the last post I showed a script for logging misaligned I/Os on a NetApp storage array. However, it is nice to produce some graphs of the data, so here is my quick parse script to crank out a CSV file for a given filer. If you recall the output from the collection script was as follows:
22:10:01 UTC - Zeroing stats from filer np00003
22:10:02 UTC - sleeping for 300 seconds
22:15:02 UTC - Capturing data from filer np00003
22:15:03 UTC - Collected Values:
22:15:03 UTC - interval = 300.399491 seconds, pw.over_limit = 249 and WAFL_WRITE = 540822
22:15:03 UTC - Percentage Misaligned Writes = .0460%
22:15:03 UTC - Successfully Completed!
My log structure is kept under ./NetApp_Logs/ followed by a file for each log like so, NetApp-align-{filer}-{yyyymmdd}-{hhmmss}. This little script will walk through all of the logs for a given filer and output a CSV as shown here:
2011-10-10,22:15:03,.0460%
2011-10-11,10:15:03,.0670%
2011-10-11,22:15:03,.1500%
2011-10-12,20:15:03,.4500%
I have found this works well for spread sheet programs as it is able to parse out both the date and time appropriately, saving me a lot of time. And here is the script:
$ cat na_parse_alignment
if [ "$#" == 0 ]
then
        echo "Usage: `basename $0` "
        echo "e.g."
        echo "`basename $0` {filer}"
        exit 0
fi
FILER=$1
grep -H Percent NetApp_Logs/NetApp-align-${FILER}-*  | sed -e 's/\(2011\)\([0-9].\)\([0-9].\)\(-[0-9].....:\)/ \1-\2-\3 /g' | awk '{print $2" "$3" "$NF}' | tr " " , 

Sunday, November 6, 2011

Tracking Misaligned I/Os

For those that don't know about misaligned I/Os, I provided a brief introduction to them in my last post. In this post I'll show you how to track and quantify how many of your I/Os exhibit the problem. We currently run our VMware infrastructure from a NetApp array, so this script is tailored to that environment.

The only method I've found to track is by using the pw.over_limit counter. Unfortunately it is only available with advanced privileges on the command line as it isn't exported via SNMP. You can manually obtain the data with the following:
# priv set advanced
# wafl_susp -w
# priv set
This will produce lots of data, most of which you can ignore. You'll quickly notice a problem; pw.over_limit is an absolute number, like 29300. So what? Is that good, is it bad? That depends on how many I/Os you are generating in the first place. Since I love writing little programs, here is the output of my script:
$ cat NetApp-align-filer1-20111010-221001
22:10:01 UTC - Zeroing stats from filer filer1
22:10:02 UTC - sleeping for 300 seconds
22:15:02 UTC - Capturing data from filer filer1
22:15:03 UTC - Collected Values:
22:15:03 UTC - interval = 300.399491 seconds, pw.over_limit = 249 and WAFL_WRITE = 540822
22:15:03 UTC - Percentage Misaligned Writes = .0460%
22:15:03 UTC - Successfully Completed! 
As you can see, there are a few misaligned writes going on, but overall in pretty good health with just under 0.05% of the total. When should you panic? That depends on way to many variables to list here but your write latency is a key indicator to pay attention to. Your application mix (and users) will scream when you've crossed the line.

The code I use is listed below. Read the comments in the parse_wafl function to figure out what it's doing.
#!/bin/bash
# Script Information
# na_alignment_check Version: 1.0
# Last Modified: Sept 18/2010
# Created By: Michael England

# to run this script you will need an account on the filer with administrative privileges as it has to be run with 'priv set advanced'
# ssh prep work
# ssh-keygen -t dsa -b 1024
# cat id_dsa.pub >> {filer}/etc/sshd/{user}/.ssh/authorized_keys

# default to local user
FILER_USER=$(whoami)

######
### function log
### logs activities to the screen, a file, or both
######
function log {
 LOG_TYPE="$1"
 LOG_MSG="$2"
 TIME=`date +'%H:%M:%S %Z'`
 # specify the log file only once
 if [ -z $LOG_FILE ]
 then
  LOG_FILE="/work/Scripts/NetApp_Logs/NetApp-align-$FILER-`date +%Y%m%d-%H%M%S`"
 fi
 if [ $LOG_TYPE == "error" ]
 then
  echo -e "$TIME - **ERROR** - $LOG_MSG"
  echo -e "$TIME - **ERROR** - $LOG_MSG" >> $LOG_FILE
 elif [ $LOG_TYPE == "debug" ]
 then
  if [ $DEBUG == "on" ]
  then
   echo -e "DEBUG - $LOG_MSG"
  fi
 else
  echo -e "$TIME - $LOG_MSG"
  echo -e "$TIME - $LOG_MSG" >> "$LOG_FILE"
 fi
}

######
### check_ssh
### check the return code of an ssh command
######
function check_ssh {
 CHECK=$1
 ERROR_DATA="$2"
 if [ $CHECK != 0 ]
 then
  log error "ssh failed to filer"
  log error "return is $ERROR_DATA"
  exit 1
 fi
}

######
### capture_wafl_susp
### ssh to the filer specified and collect the output from wafl_susp
######

function capture_wafl_susp {
 log notify "Capturing data from filer $FILER"
 SSH_RET=`ssh $FILER -l $FILER_USER "priv set advanced;wafl_susp -w;priv set" 2>&1`
 RETVAL=$?
 check_ssh $RETVAL "$SSH_RET"

 parse_wafl $SSH_RET
}

######
### parse_wafl
### capture values for pw.over_limit and WAFL_WRITE and the overall scan interval
### e.g.
### WAFL statistics over 577455.189585 second(s) ...
### pw.over_limit = 29300
### New messages, restarts, suspends, and waffinity completions (by message-type):
### WAFL_WRITE           = 10568010   122597   122597        0
### 
### There are many other WAFL_WRITE lines so we need to find the New messages line first then the WAFL_WRITE in that section
######
function parse_wafl {
 oldIFS=$IFS
 IFS=$'\n'
 NEW_FLAG=0
 for line in echo $SSH_RET
 do
  if [[ $line =~ second* ]]
  then
   STATS_INTERVAL=`echo $line | awk '{print $4}'`
  fi

  if [[ $line =~ pw.over_limit* ]]
  then
   OVER_LIMIT=`echo $line | awk '{print $3}'`
  fi

  if [[ $line =~ New\ messages.* ]]
  then
   NEW_FLAG=1
  fi
  if [[ $NEW_FLAG == 1 ]] && [[ $line =~ WAFL_WRITE* ]]
  then
   WAFL_WRITE=`echo $line | awk '{print $3}'`
   NEW_FLAG=0
  fi
 done
 IFS=$oldIFS
 if [[ -n $OVER_LIMIT ]] && [[ $WAFL_WRITE -gt 0 ]]
 then 
  # by multiplying by 100 first we don't loose any precision
  MISALIGNED_PERCENT=`echo "scale=4;100*$OVER_LIMIT/$WAFL_WRITE" | bc -l`
 else
  log error "Error collecting values, pw.over_limit = $OVER_LIMIT and WAFL_WRITE = $WAFL_WRITE"
 fi
 
 log notify "Collected Values:"
 log notify "interval = $STATS_INTERVAL seconds, pw.over_limit = $OVER_LIMIT and WAFL_WRITE = $WAFL_WRITE"
 log notify "Percentage Misaligned Writes = ${MISALIGNED_PERCENT}%"
}

######
### function zero_values
### zeroes out existing wafl stats on the filer
######
function zero_values {
 log notify "Zeroing stats from filer $FILER"
 SSH_RET=`ssh $FILER -l $FILER_USER "priv set advanced;wafl_susp -z;priv set" 2>&1`
 RETVAL=$?
 check_ssh $RETVAL "$SSH_RET"
}

######
### function usage
### simple user information for running the script
######
function usage {
 echo -e ""
 echo "Usage:"
 echo "`basename $0` -filer {filer_name} [-username {user_name}] [-poll_interval ]"
 echo -e "\t-filer {file_name} is a fully qualified domain name or IP of a filer to poll"
 echo -e "\t-username {user_name} is a user to attach to the filer, if omitted will use current user"
 echo -e "\t-poll_interval {x} will zero out the filer stats and sleep for {x} seconds then return.  If omitted will read stats since last zeroed"
 echo -e ""
 exit 0
}

# parse command line options
if [ $# == 0 ]
then
 usage
fi
until [ -z "$1" ]
do
 case "$1" in
 -filer)
  shift
  FILER="$1"
  ;;
 -username)
  shift
  FILER_USER="$1"
  ;;
 -poll_interval)
  shift
  POLL_INTERVAL="$1"
  ;;
 *)
  usage
  ;;
 esac
 shift
done
# do the work

if [[ -n $POLL_INTERVAL ]]
then
 zero_values
 log notify "sleeping for $POLL_INTERVAL seconds"
 sleep $POLL_INTERVAL
fi
capture_wafl_susp

log notify "Successfully Completed!"

Sunday, January 9, 2011

Cross Server Multipath

I was asked to come up with a method to compare multipath devices across an Oracle RAC cluster to ensure they are consistent. Beyond this there was one catch; it had to be run as a regular user. I would have normally run a multipath command but unfortunately that isn't available without elevated privileges. So my solution looks at /etc/multipath.conf, which is readable by a user, compare the entries across all nodes, and output a table at the end. You can see an example of the output in the initial comments.

#!/bin/bash                                                                                                                                                                         
# Script Information
# Version: 1.0
# Created By: Michael England
                                                                                    
# Parses /etc/multipath.conf on any number of nodes (with ssh) and produces a report comparing all of the entries
# e.g.
# WWID                                    node1                         node2                         node3                         Status
# 360060e8005be08000000be08000011ab       apps_001_40g                  --                            --                            INVALID
# 360060e8005be08000000be08000011ac       apps_002_40g                  --                            --                            INVALID
# 360060e8005be08000000be0800004000       ora_shared_data_001_407g      ora_shared_data_001_407g      ora_shared_data_001_407g      OK
# 360060e8005be08000000be0800004001       ora_shared_data_002_407g      ora_shared_data_002_407g      ora_shared_data_002_407g      OK

######
### function read_multipath
### reads a file looking for wwid and alias pairs
### e.g.
### multipath {
###     wwid    36006016015a019004e9820d8b56cde11
###     alias   vote_disk1
### }
### becomes
###     wwid 36006016015a019004e9820d8b56cde11 alias vote_disk1
### if it is missing a wwid or alias that part will be blank
### e.g.
###     wwid 36006016015a019004e9820d8b56cde11
### The first entry will be blank as we increment COUNT on the first multipath
######
function parse_multipath {
        COUNT=0
        unset MPATH_ENTRIES
        oldIFS=$IFS
        IFS=$'\n'
        for line in echo $NODE_MPATH_RESULT
        do
                # skip anything beginning with a comment
                # after bash 3.2 we can't use "" anymore on right hand side
                if [[ $line == \#* ]]
                then
                        continue
                fi
                # look for multipath keyword and increment array counter
                # \ .* excludes multipaths
                if [[ $line =~ multipath\ .* ]]
                then

                        (( COUNT++ ))
                # anything that starts with wwid or alias and add it to the current array element
                elif [[ $line =~ wwid* ]] || [[ $line =~ alias* ]]
                then
                        MPATH_ENTRIES[$COUNT]=`echo "${MPATH_ENTRIES[$COUNT]} $line"`
                fi
        done
        IFS=$oldIFS
}

######
### function search_wwid
### Args: 
###     $1 search key
### Searches the WWID_MAP array for a given key and returns the array position of that element
######
function search_wwid {
        key="$1"
        for index in ${!WWID_MAP[@]}
        do
                if [[ ${WWID_MAP[$index]} =~ $key ]]
                then
                        echo $index
                        # exit the function
                        exit
                fi
        done
        echo -1
}

######
### function search_node
### Args:
###     $1 search key
### Searches the NODE_ARRAY array for a given key and returns the array position of that element
######
function search_node {
        key="\<$1\>"
        for index in ${!NODE_ARRAY[@]}
        do
                if [[ ${NODE_ARRAY[index]} =~ $key ]]
                then
                        echo $index
                        exit
                fi
        done
        echo -1
}

######
### function assign_wwid
### populates WWID_MAP array with a common view to all nodes multipath entries
### reads from the current MPATH_ENTRIES which should be the output from parse_multipath for one node only
### array will look like the following
### WWID        ALIAS_node1     ALIAS_node2     ALIAS_node3     etc
### it does not include any status information
######
function assign_wwid {
        for index in ${!MPATH_ENTRIES[@]}
        do
                # count the number of items starting from 1 (wc -w is word)
                ITEM_COUNT=`echo ${MPATH_ENTRIES[index]} | wc -w`
                # make sure our entry is 4 words long (wwid  alias ), this is really an error condition as every wwid should have an alias
                # if it doesn't pad it with "--"
                for (( i=$ITEM_COUNT; $i <= 4; i++ ))
                do
                        MPATH_ENTRIES[$index]="${MPATH_ENTRIES[index]} --"
                done

                unset FILLER
                # set the key to the WWID and search the WWID_MAP to see if one already exists
                key=`echo ${MPATH_ENTRIES[$index]} | awk '{print $2}'`
                RETVAL=`search_wwid $key`
                if [[ $RETVAL -lt 0 ]]
                then
                        # this is a new WWID
                        # check to see what node ordinal this is (first, second, third, etc...)
                        # if this isn't the first node we will have to fill in -- for the ones before to indicate this WWID doesn't exist on previous nodes
                        NODE_POSITION=`search_node $1`
                        for (( i=0; $i < $NODE_POSITION; i++ ))
                        do
                                FILLER="$FILLER --"
                        done
                        # tack on a new element with the wwid, and filler required, and then the node alias
                        WWID_MAP[${#WWID_MAP[@]}]=`echo ${MPATH_ENTRIES[index]} | awk '{ print $2, filler, $4 }' filler="$FILLER"`
                else
                        # a WWID already exists, just add on
                        (( NODE_POSITION=`search_node $1` + 1 ))
                        MAP_LENGTH=`echo ${WWID_MAP[RETVAL]} | wc -w`
                        # the node position will be greater than the length if we have a hole... plug it with filler
                        for (( i=$MAP_LENGTH; $i < $NODE_POSITION; i++ ))
                        do
                                FILLER="$FILLER --"
                        done
                        # append any required filler and then the alias for this node
                        WWID_MAP[$RETVAL]=${WWID_MAP[$RETVAL]}\ $FILLER\ `echo ${MPATH_ENTRIES[$index]} | awk '{print $4}'`
                fi
        done
}

######
### function check_status
### -ensures each WWID_MAP is the correct length by first finding the longest entry
###  and then padding all others to that length with --
### -creates a WWID_STATUS array, each element aligns with a WWID_MAP element 
###  it checks if all aliases in WWID_MAP are the same, if so marks a green OK, if not a red INVALID
### WWID_STATUS array will have one entry for each WWID_MAP
###   OK
###   OK
###   INVALID
###   etc
######
function check_status {
        # find the longest element in WWID_MAP
        LONGEST=0
        for index in ${!WWID_MAP[@]}
        do
                LENGTH=`echo ${WWID_MAP[index]} | wc -w`
                if [[ $LENGTH > $LONGEST ]]
                then
                        LONGEST=$LENGTH
                fi
        done
        for index in ${!WWID_MAP[@]}
        do
                # count the number of items in this element
                COUNT=`echo ${WWID_MAP[index]} | wc -w`
                # another way to do this is convert to an array and then count the array elements
                # ARRAY=( $(echo ${WWID_MAP[index]}) )
                # for i in `seq ${#ARRAY[@]} $(( $LONGEST - 1 ))` or for i in `seq 2 $(( ${#ARRAY[@]} - 1))` when math rquired
                # if the array is shorter, pad it
                # longest is reduced by 1 because arrays are 0 based
                for i in `seq $COUNT $(( $LONGEST - 1 ))`
                do
                        WWID_MAP[$index]="${WWID_MAP[$index]} --"

                done

                # recount the element as its size may have just changed
                COUNT=`echo ${WWID_MAP[index]} | wc -w`
                if [ $COUNT -eq 2 ]
                then
                        WWID_STATUS[$index]="\e[0;32mOK\e[0;30m"
                fi
                # for all items starting at 2 (the third item) to the end - 1 (zero based)
                # compare with the item previous (i.e. 3:2, 4:3, 5:4, etc)
                for i in `seq 2 $(( $COUNT - 1))`
                do
                        ARRAY=( $(echo ${WWID_MAP[index]}) )
                        # if they don't match, or this item is "--" mark it as invalid
                        if [[ "${ARRAY[i]}" != "${ARRAY[i-1]}" ]] || [[ "${ARRAY[i]}" = "--" ]]
                        then
                                WWID_STATUS[$index]="\e[0;31mINVALID\e[0;30m"
                                break
                        else
                                WWID_STATUS[$index]="\e[0;32mOK\e[0;30m"
                        fi
                done
        done
}

######
### function exchange
### helper for bubble sort to swap both WWID_MAP and WWID_STATUS entries
######
function exchange {
        local temp=${WWID_MAP[$1]}
        local status_temp=${WWID_STATUS[$1]}
        WWID_MAP[$1]=${WWID_MAP[$2]}
        WWID_MAP[$2]=$temp

        WWID_STATUS[$1]=${WWID_STATUS[$2]}
        WWID_STATUS[$2]=$status_temp
}

######
### function sort_results
### a bubble sort of both WWID_MAP and WWID_STATUS based on a user specified sort field (SORT_FIELD)
### if SORT_FIELD = a digit, uses awk to compare values for that column
### if SORT_FIELD = valid | invalid simply compares the WWID_STATUS array elements to be either > or <
######
function sort_results {
        number_of_elements=${#WWID_MAP[@]}
        (( comparisons = $number_of_elements - 1 ))
        count=1
        while [ "$comparisons" -gt 0 ]
        do
                index=0
                while [ "$index" -lt "$comparisons" ]
                do
                        if [[ $SORT_FIELD = [[:digit:]]* ]]
                        then
                                if [[ `echo ${WWID_MAP[index]} | awk '{print $i}' i=$SORT_FIELD` > `echo ${WWID_MAP[ (( $index + 1 ))]} | awk '{print $i}' i=$SORT_FIELD` ]]
                                then
                                        exchange $index `expr $index + 1`
                                fi
                        elif [[ $SORT_FIELD = "invalid" ]]
                        then
                                if [[ ${WWID_STATUS[index]} > ${WWID_STATUS[`expr $index + 1`]} ]]
                                then
                                        exchange $index `expr $index + 1`
                                fi
                        elif [[ $SORT_FIELD = "valid" ]]
                        then
                                if [[ ${WWID_STATUS[index]} < ${WWID_STATUS[`expr $index + 1`]} ]]
                                then
                                        exchange $index `expr $index + 1`
                                fi
                        fi
                        (( index += 1 ))
                done
                (( comparisons -= 1 ))
                (( count += 1 ))
        done
}

######
### function echo_results
### outputs the collected results in WWID_MAP and WWID_STATUS to the screen
######
function echo_results {
        # put together a header row
        # WWID      Status
        RESULT_SET="WWID"
        for node in $NODE_LIST
        do
                RESULT_SET="$RESULT_SET $node"
        done
        RESULT_SET="$RESULT_SET Status"
        # print the header row, wwid (1st column) is 40 characters wide, everything else is 30
        echo -e $RESULT_SET | awk '{ printf "%-40s", $1 }'
        echo -e $RESULT_SET | awk '{ for (i=2; i<=NF; i++) printf "%-30s", $i }'
        awk 'BEGIN {printf "\n"}'
        echo "---------------------"
        # for each element print the first at 40 characters, all others (colume 2 - NF) at 30, then the status (using echo so the colours work)
        # NF is Number of Fields
        for index in ${!WWID_MAP[@]}
        do
                echo ${WWID_MAP[index]} | awk '{ printf  "%-40s", $1 }'
                echo ${WWID_MAP[index]} | awk '{ for (i=2; i<=NF; i++) printf "%-30s", $i }'
                echo -e "${WWID_STATUS[index]}"
        done
}

######
### function usage
### outputs help message
######
function usage {
        echo -e "Usage: `basename $0` -n {node1,node2,node3,etc} [-sort {field} | -sort_invalid | -sort_valid] [-user {username}]"
        echo -e "\t-n is a node list separated by commas, the script will ssh and read /etc/multipath.conf on each"
        echo -e "\t-sort {field} sorts the output. 0 sorts the WWID number, 1..x sorts for a specific node"
        echo -e "\t-sort_invalid places invalid alias entries at the top"
        echo -e "\t-sort_valid places valid alias entries at the top"
        echo -e "\t-user {username} allows a different username to be used than the one currently logged in"
        exit
}

######
### parse the command line for options
######
if [ -z "$1" ]
then
        usage
fi
until [ -z "$1" ]
do
        case "$1" in
        -n)
                shift
                NODE_LIST=`echo $1 | tr "," " "`
                ;;
        -sort)
                shift
                SORT_FIELD=`expr $1 + 1`
                ;;
        -sort_invalid)
                SORT_FIELD="invalid"
                ;;
        -sort_valid)
                SORT_FIELD="valid"
                ;;
        -user)
                shift
                USERNAME="-l $1"
                ;;
        *)
                usage
                ;;
        esac
        shift
done

# convert NODE_LIST to an array so we can search the position
NODE_ARRAY=( $(echo "$NODE_LIST") )

# for each node in the list
#  - grab its multipath.conf (ssh)
#  - parse out wwid and alias pairs (parse_multipath)
#  - add it to the master WWID list (assign_wwid)
# then fill in any blank holes and the overall status (check_status)
# sort the results as requested (sort_results)
# then output to display (echo_results)
for node in $NODE_LIST
do
        NODE_MPATH_RESULT=`ssh $node $USERNAME -C "cat /etc/multipath.conf" 2> /dev/null`
        RETVAL=$?
        if [ $RETVAL != 0 ]
        then
                echo "--- Error retrieving /etc/multipath.conf from node $node ---"
        fi
        parse_multipath $node
        assign_wwid $node
done
check_status
sort_results
echo_results

Tuesday, October 19, 2010

KickStart Network Customization

One of the biggest problems I have found with the KickStart process is fine tuning the network values of a server. I couldn't find anything useful through the standard process so I decided to write my own. If you have seen my last post I reference some customized configuration scripts in the %post section of my KickStart file. In this post I will outline my first customization which I simply call general.cfg.

Basically this sets up the NTP daemon by rewriting the /etc/ntp.conf file and turns off services I don't need. But first it calls a special script called network_config.sh.
# cat general.cfg
# Clean up the network either based on existing DHCP or on configuration file

# network_config.sh requires an argument to tell it where the csv file is and where to output logs
/post_scripts/KickStart/net_config/network_config.sh /post_scripts/KickStart/net_config

# setup NTP
echo 'restrict default kod nomodify notrap nopeer noquery' > /etc/ntp.conf
echo 'restrict -6 default kod nomodify notrap nopeer noquery' >> /etc/ntp.conf
echo 'restrict 127.0.0.1' >> /etc/ntp.conf
echo 'restrict -6 ::1' >> /etc/ntp.conf
echo 'server 192.168.0.1' >> /etc/ntp.conf
echo 'server 127.127.1.0' >> /etc/ntp.conf
echo 'fudge 127.127.1.0 stratum 10' >> /etc/ntp.conf
echo 'driftfile /var/lib/ntp/drift' >> /etc/ntp.conf
chmod 644 /etc/ntp.conf

echo '192.168.0.1' >> /etc/ntp/ntpservers
echo '192.168.0.1' >> /etc/ntp/step-tickers

# modify the /etc/sysconfig/ntp file to add the -x startup option 
# required for Oracle 11gR2
echo 'OPTIONS="-u ntp:ntp -x -p /var/run/ntpd.pid"' > /etc/sysconfig/ntpd
echo 'SYNC_HWCLOCK=no' >> /etc/sysconfig/ntpd
echo 'NTPDATE_OPTIONS=""' >> /etc/sysconfig/ntpd

/usr/sbin/ntpdate 192.168.0.1

chkconfig ntpd on

# remove unnecessary services
chkconfig sendmail off

# printer
chkconfig cups off
chkconfig hplip off

network_config.sh is a bit long but I will post it in it's entirety here. Its primary job is to take input from a file called hostfile.csv and configure DNS, host name, and configure all of the interfaces. Network interfaces can be specified by adapter name (e.g. eth0, eth1, etc) or by MAC address just in case the enumeration isn't quite what you expect. It can also configure bonded interfaces which I am particularly happy with as this can be of significant annoyance getting production servers ready. Host names are defined as part of the dhcp options which I showed in this post. If no match is found, for example if there is no matching entry in hostfile.csv, the script will try to grab whatever IP has been assigned for the install and hard code that to the server. Logs are kept at a location specified on the command line which also happens to be the location of hostfile.csv.

Here is an example of a hostfile.csv entry for your reference when looking through the script.
# cat hostfile.csv
DOMAINSEARCH=example.com example2.com example3.com
# Format -- server_name,[bond|nic] eth# [eth#] IP MASK Primary,gw={gateway},dns={dns1 dns2 etc}
# as long as the server name comes first, the order of the rest doesn't really matter
# Primary is used to determine which interface should be placed in the host file
# an example with multiple bonded interfaces.
server1,bond=bond0 eth0 eth3 192.168.0.10 255.255.255.0 1,gw=192.168.0.1,bond=bond1 eth1 eth2 10.1.1.1 255.255.255.0,dns=192.168.0.254 192.168.1.254
# an example with one bond using MAC addresses
server2,bond=bond0 0050569c25e5 0050569c6cbd 192.168.0.11 255.255.255.0 1,gw=192.168.0.1,dns=192.168.0.254 192.168.1.254
# an example with a single nic
server3,nic=eth0 192.168.0.12 255.255.255.0 1,gw=192.168.0.1,dns=192.168.0.254

And here is the network_config.sh script itself
# cat network_config.sh
#!/bin/bash
DEBUG=off
IFCONFIG=/sbin/ifconfig

NIC_FILE_DIR=/etc/sysconfig/network-scripts/

GW_FILE=/etc/sysconfig/network

HOST_FILE=/etc/hosts

DNS_FILE=/etc/resolv.conf

DOMAIN_LIST="domain.com domain2.com"

####
## function readHostFile
## reads $HOST_MAP_FILE for specific network information about this host
## return 1 on error, 0 on success
## options can be in any order (nic, gw, or bond), broadcast and network address are calculated based on ip and mask
## calls functions to generate Gateway ($GW_FILE), Hosts ($HOST_FILE), and ifcfg ($NIC_FILE_DIR/ifcfg-{nic})
##
## host_map_file format
## {host},nic={eth#} ip mask [?primary],gw={gw_ip},bond={bond#} {nic1} {nic2} {ip} {mask},domain={dns_server},{dns_server}
## e.g. server1,nic=eth0 192.168.1.1 255.255.255.0,gw=192.168.0.1,nic=bond0 eth1 eth2 192.168.0.10 255.255.255.0 1,dns=192.168.0.254 192.168.1.254
####
readHostFile() {
        if [ -e $HOST_MAP_FILE ]
        then
                # override default DOMAIN_LIST if it exists
                DOMAIN_TMP=$(cat $HOST_MAP_FILE | grep -wi "DOMAINSEARCH" | cut -f2 -d =)
                if [ ! -z "$DOMAIN_TMP" ]
                then
                        log info "Domain search list found -- $DOMAIN_TMP"
                        DOMAIN_LIST="$DOMAIN_TMP"
                else
                        log info "Domain search not found, using defaults -- $DOMAIN_LIST"
                fi

                # parse the file for this host exactly (-w) and case insensitive
                HOST_INFO=$(cat $HOST_MAP_FILE | grep -wi `hostname`)
                # check to see there was an entry for this host
                if [ -z "$HOST_INFO" ]
                then
                        log warning "Host information for `hostname` was not found in HOST_MAP_FILE"
                        return 1
                fi
                log notify "Host information found for `hostname` in $HOST_MAP_FILE"
                log notify "Host info is $HOST_INFO"
                # parse HOST_INFO
                IFS=$','
                for entry in $HOST_INFO
                do
                        log debug "Working on entry $entry"
                        KEY=`echo $entry | cut -f1 -d =`
                        VALUE=`echo $entry | cut -f2 -d =`
                        case "$KEY" in
                        nic)
                                log debug "nic is specified -- $VALUE"
                                NIC=`echo $VALUE | cut -f1 -d " "`
                                if [ ${#NIC} -eq 12 ]
                                then
                                        # we are working with a MAC address
                                        NIC=$(getNIC $NIC)
                                fi
                                IPADDR=`echo $VALUE | cut -f2 -d " "`
                                MASK=`echo $VALUE | cut -f3 -d " "`
                                PRIMARY=`echo $VALUE | cut -f4 -d " "`
                                BROADCAST=$(getBroadcastAddress $IPADDR $MASK)
                                NETWORK=$(getNetworkAddress $IPADDR $MASK)
                                # MAC address for this card
                                MAC=$(getMAC $NIC)

                                if [ -z $NIC ]
                                then
                                        log error "Missing NIC information aborting file creation"
                                else
                                        log info "Values for NIC $NIC - MAC $MAC - IP $IPADDR - NetMask $MASK - Broadcast $BROADCAST - Network $NETWORK"
                                        genIPFile $NIC $MAC $IPADDR $MASK $BROADCAST $NETWORK
                                fi

                                if [ "$PRIMARY" == 1 ]
                                then
                                        genHostFile $IPADDR
                                fi
                                ;;
                        bond)
                        #nic=bond0 eth1 eth2 192.168.0.10 255.255.255.0 1
                                log debug "bond is specified -- $VALUE"
                                BOND=`echo $VALUE | cut -f1 -d " "`
                                NIC1=`echo $VALUE | cut -f2 -d " "`
                                if [ ${#NIC1} -gt 12 ]
                                then
                                        # we are working with a MAC address
                                        NIC1=$(getNIC $NIC1)
                                fi
                                NIC2=`echo $VALUE | cut -f3 -d " "`
                                if [ ${#NIC2} -gt 12 ]
                                then
                                        # we are working with a MAC address
                                        NIC2=$(getNIC $NIC2)
                                fi
                                IPADDR=`echo $VALUE | cut -f4 -d " "`
                                MASK=`echo $VALUE | cut -f5 -d " "`
                                BROADCAST=$(getBroadcastAddress $IPADDR $MASK)
                                NETWORK=$(getNetworkAddress $IPADDR $MASK)

                                log info "Values for BOND $BOND - NIC1 $NIC1 - NIC2 $NIC2 - IP $IPADDR - NetMask $MASK - Broadcast $BROADCAST - Network $NETWORK"
                                genBondFile $BOND $NIC1 $NIC2 $IPADDR $MASK $BROADCAST $NETWORK

                                if [ "$PRIMARY" == 1 ]
                                then
                                        genHostFile $IPADDR
                                fi
                                ;;
                        gw)
                                log debug "Gateway value - $VALUE"
                                genGWFile $VALUE
                                ;;
                        dns)
                                log debug "DNS is specified -- $VALUE"
                                genDNSFile "$VALUE"
                        esac
                done
        else
                log warning "Hostfile $HOST_MAP_FILE does not exist"
                return 1
                # configure eth0 as static based on the current DHCP address
        fi
}

####
## function getNIC {mac_addr}
## returns eth# based on MAC address
####
getNIC() {
        local RAW_MAC=$1
        # a properly formatted MAC address is 00:10:20:30:40:50 (17 characters)
        if [ ${#RAW_MAC} -ne 17 ]
        then
                # assume the user didn't put in : marks
                COUNT=0
                # in case this is IPv6 loop for the entire raw mac length
                while [ $COUNT -lt ${#RAW_MAC} ]
                do
                        if [ $COUNT -eq 0 ]
                        then
                                SEARCH_MAC=${RAW_MAC:$COUNT:2}
                        else
                                SEARCH_MAC="$SEARCH_MAC:${RAW_MAC:$COUNT:2}"
                        fi
                        COUNT=$(($COUNT + 2))
                done
        else
                SEARCH_MAC=$RAW_MAC
        fi

        # return eth# for a specific MAC
        local NIC=`$IFCONFIG -a | grep -i $SEARCH_MAC | awk '{print $1}'`
        if [ -z $NIC ]
        then
                log error "Network interface was not found for nic $SEARCH_MAC, this interface will not be configured correctly"
                log error "ifconfig output is \n`$IFCONFIG -a`"
        else
                log info "NIC $SEARCH_MAC found as $NIC"
        fi
        echo $NIC
}

####
## function genBondFile {bond#} {nic1} {nic2} {ip} {mask} {broadcast} {network}
## nic=bond0 eth0 eth1 192.168.0.10 255.255.255.0 
## nic=eth0 192.168.0.10 255.255.255.0 192.168.0.254 192.168.0.0
####
genBondFile() {
        local BOND=$1
        local NIC1=$2
        local NIC2=$3
        local IP=$4
        local MASK=$5
        local BROADCAST=$6
        local NETWORK=$7
        local BOND_FILE=${NIC_FILE_DIR}ifcfg-$BOND
        local NIC1_FILE=${NIC_FILE_DIR}ifcfg-$NIC1
        local NIC2_FILE=${NIC_FILE_DIR}ifcfg-$NIC2

        log info "Creating Bond file $BOND_FILE"
        echo "DEVICE=$BOND" > $BOND_FILE
        echo "BOOTPROTO=none" >> $BOND_FILE
        echo "ONBOOT=yes" >> $BOND_FILE
        echo "NETWORK=$NETWORK" >> $BOND_FILE
        echo "NETMASK=$MASK" >> $BOND_FILE
        echo "IPADDR=$IP" >> $BOND_FILE
        echo "BROADCAST=$BROADCAST" >> $BOND_FILE
        echo "USERCTL=no" >> $BOND_FILE
        echo "BONDING_OPTS=\"mode=active-backup miimon=100 primary=$NIC1\"" >> $BOND_FILE

        log info "Creating network file $NIC1_FILE"
        echo "DEVICE=$NIC1" > $NIC1_FILE
        echo "BOOTPROTO=none" >> $NIC1_FILE
        echo "HWADDR=$(getMAC $NIC1)" >> $NIC1_FILE
        echo "ONBOOT=yes" >> $NIC1_FILE
        echo "MASTER=$BOND" >> $NIC1_FILE
        echo "SLAVE=yes" >> $NIC1_FILE
        echo "USERCTL=no" >> $NIC1_FILE

        log info "Creating network file $NIC2_FILE"
        echo "DEVICE=$NIC2" > $NIC2_FILE
        echo "BOOTPROTO=none" >> $NIC2_FILE
        echo "HWADDR=$(getMAC $NIC2)" >> $NIC2_FILE
        echo "ONBOOT=yes" >> $NIC2_FILE
        echo "MASTER=$BOND" >> $NIC2_FILE
        echo "SLAVE=yes" >> $NIC2_FILE
        echo "USERCTL=no" >> $NIC2_FILE

        log info "Modifying modprobe.conf file /etc/modprobe.conf"
        echo "alias $BOND bonding" >> /etc/modprobe.conf
}

####
## function getMAC {nic}
## gets the MAC address for a given interface using ifconfig
####
getMAC() {
        HWINFO=`$IFCONFIG $1 | grep HWaddr` # eth0      Link encap:Ethernet     HWaddr 00:50:56:9C:1B:00
        if [ $? -ne 0 ]
        then
                log error "Cannot find MAC address for interface $1"
                # return nothing to the calling process
                echo " "
        else
                # return the MAC address 
                echo $HWINFO | awk '{print $5}'
        fi
}

####
## function genDomainFile {nameserver} {nameserver} {etc}
## creates a basic DNS file for nameserver entries
####
genDNSFile() {
        log info "Creating DNS file $DNS_FILE"
        OldIFS=$IFS
        IFS=" "
        > $DNS_FILE
        # create search entries
        echo "search $DOMAIN_LIST" >> $DNS_FILE
        # create server entries
        for dnsEntry in $1
        do
                echo "nameserver $dnsEntry" >> $DNS_FILE
        done
        IFS=$OldIFS
}
####
## function genHostFile {local_ip}
## creates a basic hosts file with loopback and this host
####
genHostFile() {
        local IP=$1
        log info "Creating host file $HOST_FILE"
        echo "127.0.0.1         localhost.localdomain localhost" > $HOST_FILE
        echo "$IP               `hostname`" >> $HOST_FILE

}

####
## function genGWFile {gateway_ip}
## create the default route file including default RedHat values
####
genGWFile() {
        local GW=$1
        log info "Creating gateway file $GW_FILE"
        echo "NETWORKING=yes" > $GW_FILE
        echo "NETWORKING_IPV6=no" >> $GW_FILE
        echo "HOSTNAME=`hostname`" >> $GW_FILE
        echo "GATEWAY=$GW" >> $GW_FILE
}

####
## function genIPFile {nic} {mac} {ip} {mask} {broadcast} {network}
## create the IP Address file (ifcfg-eth{x})
## e.g. nic=eth0 00:50:56:9C:1B:00 192.168.0.10 255.255.255.0 192.168.0.254 192.168.0.0
####
genIPFile() {
        local NIC=$1
        local MAC=$2
        local IP=$3
        local MASK=$4
        local BROADCAST=$5
        local NETWORK=$6
        local IP_FILE=${NIC_FILE_DIR}ifcfg-${NIC}

        log info "Creating network file $IP_FILE"
        echo "DEVICE=$NIC" > $IP_FILE
        echo "BOOTPROTO=static" >> $IP_FILE
        echo "BROADCAST=$BROADCAST" >> $IP_FILE
        echo "HWADDR=$MAC" >> $IP_FILE
        echo "IPADDR=$IP" >> $IP_FILE
        echo "NETMASK=$MASK" >> $IP_FILE
        echo "NETWORK=$NETWORK" >> $IP_FILE
        log debug "----------- ifcfg-$NIC file -----------"
        log debug "\n`cat $IP_FILE`"
        log debug "----------------------"
}

####
## function getNetworkAddress
## calculates the network address given an ip and subnet mask
## converts the ip and mask into an array and does a bitwise and for each element
####
getNetworkAddress() {
        OldIFS=$IFS
        IFS=.
        typeset -a IP_Array=($1)
        typeset -a MASK_Array=($2)
        IFS=$OldIFS
        echo $((${IP_Array[0]} & ${MASK_Array[0]})).$((${IP_Array[1]} & ${MASK_Array[1]})).$((${IP_Array[2]} & ${MASK_Array[2]})).$((${IP_Array[3]} & ${MASK_Array[3]}))
}

####
## function getBroadcastAddress
## calculates the broadcast address given an ip and subnet mask
## converts the ip and mask into an array and does a bitwise or (|) against an XOR (^)
####
getBroadcastAddress() {
        OldIFS=$IFS
        IFS=.
        typeset -a IP_Array=($1)
        typeset -a MASK_Array=($2)
        IFS=$OldIFS
        echo $((${IP_Array[0]} | (255 ^ ${MASK_Array[0]}))).$((${IP_Array[1]} | (255 ^ ${MASK_Array[1]}))).$((${IP_Array[2]} | (255 ^ ${MASK_Array[2]}))).$((${IP_Array[3]} | (255 ^ ${MASK_Array[3]})))
}

####
## function readDHCPAddress
## reads information currently running and writes it out as a static IP entry
####
readDHCPAddress() {
        log info "Host information was not found for this server, copying information from running configuration (DHCP)"
        # the grep will grab two lines of output and merge them together
        # eth0      Link encap:Ethernet  HWaddr 00:50:56:9C:1B:00
        # inet addr:192.168.0.10  Bcast:192.168.0.254  Mask:255.255.255.0
        HWINFO=`$IFCONFIG | grep -A 1 -i hwaddr`
        NIC=`echo $HWINFO | cut -f1 -d " "`
        MAC=`echo $HWINFO | cut -f5 -d " "`
        for i in $HWINFO
        do
                case "$i" in
                addr:*)
                        IP=`echo $i | cut -f2 -d :`
                        ;;
                Bcast:*)
                        BROADCAST=`echo $i | cut -f2 -d :`
                        ;;
                Mask:*)
                        MASK=`echo $i | cut -f2 -d :`
                        ;;
                esac
        done
        NETWORK=$(getNetworkAddress $IP $MASK)
        log debug "DHCP information is NIC $NIC - MAC $MAC - IP $IP - MASK $MASK - BROADCAST $BROADCAST - NETWORK $NETWORK"
        genIPFile $NIC $MAC $IP $MASK $BROADCAST $NETWORK
        genHostFile $IP
        GATEWAY=`netstat -rn | grep -w UG | awk '{print $2}'`
        genGWFile $GATEWAY
}

####
## function log
## logs activities to the screen, a file, or both
####
log() {
        LOG_TYPE="$1"
        LOG_MSG="$2"
        TIME=`date +'%H:%M:%S %Z'`
        # specify the log file only once
        if [ ! -d $SOURCE_DIR/logs ]
        then
                mkdir ${SOURCE_DIR}/logs
        fi
        if [ -z $LOG_FILE ]
        then
                LOG_FILE="$SOURCE_DIR/logs/network_config-`hostname`-`date +%Y%m%d-%H%M%S`"
        fi
        if [ $LOG_TYPE == "error" ]
        then
                echo -e "$TIME - **ERROR** - $LOG_MSG" >> $LOG_FILE
        elif [ $LOG_TYPE == "debug" ]
        then
                if [ $DEBUG == "on" ]
                then
                        echo -e "DEBUG - $LOG_MSG" >> "$LOG_FILE"
                fi
        elif [ $LOG_TYPE == "warning" ]
        then
                echo -e "$TIME - **WARNING** - $LOG_MSG" >> $LOG_FILE
        else
                echo -e "$TIME - $LOG_MSG" >> "$LOG_FILE"
        fi
}

# read source directory from command line.  This is where we will read the hostfile.csv and output logs to
SOURCE_DIR=$1
HOST_MAP_FILE=$SOURCE_DIR/hostfile.csv

readHostFile
if [ $? -ne 0 ]
then
        readDHCPAddress
fi

Sunday, May 23, 2010

Custom Startup Scripts for Linux

There are a few options when having a process or command execute on boot. The easiest is to add it to /etc/rc.local. This works well for small quick and dirty jobs, however, for more complex jobs such as those requiring a specific start order or daemon control a full start-up script is a great way to go.

For this example I am going to draw on a past project of mine, Linux Cluster Manager as it has a daemon that needs to stay running all of the time. Here is the script:

#!/bin/bash
#
# lcm This shell script takes care of starting and stopping
# lcm server daemons
#
# chkconfig: 345 85 25
# description: Client side daemon for LCM
# processname: lcmclient

### BEGIN INIT INFO
# Provides: lcmclient
# Required-Start: $network $syslog
# Required-Stop:
# Default-Start: 3 4 5
# Default-Stop: 0 1 2 6
# Short-Description: LCMClient
# Desciption: Client side daemon for LCM
### END INIT INFO

STATUS=0
# Source function library.
test -s /etc/rc.d/init.d/functions && . /etc/rc.d/init.d/functions
test -s /etc/rc.status && . /etc/rc.status && STATUS=1

start() {
echo -n $"Starting LCM Client Daemons: "
if [ -x /usr/local/lcm/lcmclient ] ; then
if [ $STATUS -eq 1 ]
then
startproc /usr/local/lcm/lcmclient &> /dev/null
rc_status -v
else
/usr/local/lcm/lcmclient &> /dev/null &
PID=`/sbin/pidof -s -x lcmclient`
if [ $PID ]
then
echo_success
else
echo_failure
fi
echo
fi
fi
}

stop () {
echo -n $"Stopping LCM Client Daemons: "
test -s /sbin/pidof && PID=`/sbin/pidof -s -x lcmclient`
test -s /bin/pidof && PID=`/bin/pidof -s -x lcmclient`
if [ $PID ]
then
/bin/kill $PID
fi
if [ $STATUS -eq 1 ]
then
rc_status -v
else
echo_success
echo
fi
}

restart() {
stop
start
}

case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
esac

Registration
At least for SuSE and RedHat based distributions, start-up scripts live in /etc/init.d. They can be called whatever you like as long as they are executable and ideally owned by root as that is who will run them anyway. We used to have to link this script to the different run levels, which is easy enough to do, it's just tedious and error prone. So today we register scripts with chkconfig and let it do all the work for us.

The opening lines enable this feature for both RedHat and SuSE, which of course have to do things differently. I generally like to have both as it doesn't do any harm and allows for more portable code.

1  #!/bin/bash
2 #
3 # lcm This shell script takes care of starting and stopping
4 # lcm server daemons
5 #
6 # chkconfig: 345 85 25
7 # description: Client side daemon for LCM
8 # processname: lcmclient
9
10 ### BEGIN INIT INFO
11 # Provides: lcmclient
12 # Required-Start: $network $syslog
13 # Required-Stop:
14 # Default-Start: 3 4 5
15 # Default-Stop: 0 1 2 6
16 # Short-Description: LCMClient
17 # Description: Client side daemon for LCM
18 ### END INIT INFO

RedHat
The first line is of course the desired shell which all scripts should have. Lines 2-5 are really just information lines for the user. Lines 6-7 are required for chkconfig under RedHat and tell it what run levels we want to start, the start order and the shutdown order. In this case it will start under run levels 3, 4, and 5 with a start order of 85 and a shutdown order of 25.

To register the script and check the results we can run the following:
# chkconfig --add lcm
# chkconfig --list lcm
lcm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
# ls /etc/rc*/*lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc0.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc1.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc2.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc3.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc4.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc5.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc6.d/K25lcm -> ../init.d/lcm

SuSE
SuSE takes its setup process from the Linux Standard Base core specifications. This is shown in lines 10-18 blocked by BEGIN and END INIT INFO. Basically what it does is specify the run levels we would like and what other services are needed to be able to start and stop. Chkconfig figures things out from there and numbers the start and shutdown order for us.

Line 11 begining with Provides established this script as a facility called lcmclient. We can reference other facilities through the Required-Start and Required-Stop on lines 12 and 13. Common facility names are $network, $syslog and $local_fs, but a larger list and some additional explanation can be found here.

The main benefit of this approach is parallel boot operations. If the system understands the relationships of all the start-up elements, many can be run at the same time. If I had another script that depended on this one, I could list lcmclient as a Required-Start entry for that script. Note there is no $ in front as by naming convention, those are reserved for system facility names.

Again, we run the same chkconfig commands, however, this time the start order is determined for us. If we take a closer look at our dependencies we see that network starts at order 2 and syslog at order 3.

# chkconfig --add lcm
lcm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
# ls -l /etc/rc.d/rc*/*lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc3.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc3.d/S04lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc4.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc4.d/S04lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc5.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc5.d/S04lcm -> ../lcm

User Feedback
The next section involves loading other helper functions. They aren't specifically required but make formatting, user feedback, and process management a lot easier.
1  STATUS=0
2 # Source function library.
3 test -s /etc/rc.d/init.d/functions && . /etc/rc.d/init.d/functions
4 test -s /etc/rc.status && . /etc/rc.status && STATUS=1

The only reason I have a STATUS variable is to identify which set of libraries, and therefor which OS is doing the executing. Line 3 is for RedHat, line 4 is for SuSE. As with registration they differ enough from each other to be annoying.

My primary use for these extra functions is to put the nice little [ OK ] or [ FAILED ] messages on the screen that can be so helpful. The exact function called to do this can depend on what the script is doing or how the program it calls operates.
Starting
1  start() {
2 echo -n $"Starting LCM Client Daemons: "
3 if [ -x /usr/local/lcm/lcmclient ] ; then
4 if [ $STATUS -eq 1 ]
5 then
6 startproc /usr/local/lcm/lcmclient &> /dev/null
7 rc_status -v
8 else
9 /usr/local/lcm/lcmclient &> /dev/null &
10 PID=`/sbin/pidof -s -x lcmclient`
11 if [ $PID ]
12 then
13 echo_success
14 else
15 echo_failure
16 fi
17 echo
18 fi
19 fi
20 }

In this case I have chosen to start the application with startproc on line 6 for SuSE and just by hand on line 9 for RedHat. The reason is because the program blocks and its possible to spit out errors to stderr. Startproc handles this fairly well and gives a proper return code which rc_status -v on line 7 can report on. However, the tools under RedHat either expect the process to fork as with a daemon or to return when completed. So, I have resorted to starting by hand and then checking for a process on lines 10-11. You can't just rely on the return code because if you redirect stdout and stderr to /dev/null and put it in the background it will always return 0. Go ahead, try it, I'll wait.

If a pid exists, echo_success is run on line 13, otherwise echo_failure on line 15. Either one of these requires a subsequent echo command on line 17 to provide a newline.

Other methods of starting scripts, programs, or just commands:
























OSFunctionExampleResult
RedHatactionaction "Starting example: " /usr/bin/example[ OK ] or [ FAILED ]
RedHatecho_successecho_success; echo[ OK ]
RedHatecho_failureecho_failure; echo[ FAILED ]
RedHatecho_warningecho_warning; echo[ WARNING ]
SuSEstartprocstartproc /usr/bin/examplenone
SuSErc_statucrc_status -vdone, failed, or skipped


I invite you to wade into the functions provided by each OS and see if you can find any gems in there. Bring your choice of caffeine, you'll need it.

Shutdown
1  stop () {
2 echo -n $"Stopping LCM Client Daemons: "
3 test -s /sbin/pidof && PID=`/sbin/pidof -s -x lcmclient`
4 test -s /bin/pidof && PID=`/bin/pidof -s -x lcmclient`
5 if [ $PID ]
6 then
7 /bin/kill $PID
8 fi
9 if [ $STATUS -eq 1 ]
10 then
11 rc_status -v
12 else
13 echo_success
14 echo
15 fi
16 }

Fairly simple here, grab the pid of the program and issue a kill command. Of course RedHat and SuSE have to disagree on the location for pidof but that isn't too hard to overcome. Again the STATUS variable is used to determine which helper function to run. You'll notice that there isn't a failure result here. I could have some some extra work against the kill command but felt it complicated things more than it really mattered.

Command Line Arguments
Every start-up script is required to accept both the start or stop command line argument. I have handled that with a case statement but you can use whatever makes you happy. It is also customary to include a restart option, usage information, and possibly status if it makes sense.

If your needs are simple enough, you could include all of the code inside the case statement. I find this harder to read for pretty much everything but the simplest of jobs, most of which will fit into rc.local anyway.

Running Your Script
Some useful commands to control and execute your new script
# chkconfig --list lcm
# chkconfig lcm on
will remove all symbolic links to prevent the script from executing

# chkconfig lcm off
will add all symbolic links

# service lcm {start | stop | restart}
# /etc/init.d/lcm {start | stop | restart}
both of these will execute your script, the first just has a little less typing