Logical Shift: 2011

Sunday, November 13, 2011

Misaligned I/O reporting

In the last post I showed a script for logging misaligned I/Os on a NetApp storage array. However, it is nice to produce some graphs of the data, so here is my quick parse script to crank out a CSV file for a given filer. If you recall the output from the collection script was as follows:

22:10:01 UTC - Zeroing stats from filer np00003
22:10:02 UTC - sleeping for 300 seconds
22:15:02 UTC - Capturing data from filer np00003
22:15:03 UTC - Collected Values:
22:15:03 UTC - interval = 300.399491 seconds, pw.over_limit = 249 and WAFL_WRITE = 540822
22:15:03 UTC - Percentage Misaligned Writes = .0460%
22:15:03 UTC - Successfully Completed!

My log structure is kept under ./NetApp_Logs/ followed by a file for each log like so, NetApp-align-{filer}-{yyyymmdd}-{hhmmss}. This little script will walk through all of the logs for a given filer and output a CSV as shown here:

2011-10-10,22:15:03,.0460%
2011-10-11,10:15:03,.0670%
2011-10-11,22:15:03,.1500%
2011-10-12,20:15:03,.4500%

I have found this works well for spread sheet programs as it is able to parse out both the date and time appropriately, saving me a lot of time. And here is the script:

$ cat na_parse_alignment
if [ "$#" == 0 ]
then
        echo "Usage: `basename $0` "
        echo "e.g."
        echo "`basename $0` {filer}"
        exit 0
fi
FILER=$1
grep -H Percent NetApp_Logs/NetApp-align-${FILER}-*  | sed -e 's/\(2011\)\([0-9].\)\([0-9].\)\(-[0-9].....:\)/ \1-\2-\3 /g' | awk '{print $2" "$3" "$NF}' | tr " " ,

Sunday, November 6, 2011

Tracking Misaligned I/Os

For those that don't know about misaligned I/Os, I provided a brief introduction to them in my last post. In this post I'll show you how to track and quantify how many of your I/Os exhibit the problem. We currently run our VMware infrastructure from a NetApp array, so this script is tailored to that environment.

The only method I've found to track is by using the pw.over_limit counter. Unfortunately it is only available with advanced privileges on the command line as it isn't exported via SNMP. You can manually obtain the data with the following:

# priv set advanced
# wafl_susp -w
# priv set

This will produce lots of data, most of which you can ignore. You'll quickly notice a problem; pw.over_limit is an absolute number, like 29300. So what? Is that good, is it bad? That depends on how many I/Os you are generating in the first place. Since I love writing little programs, here is the output of my script:

$ cat NetApp-align-filer1-20111010-221001
22:10:01 UTC - Zeroing stats from filer filer1
22:10:02 UTC - sleeping for 300 seconds
22:15:02 UTC - Capturing data from filer filer1
22:15:03 UTC - Collected Values:
22:15:03 UTC - interval = 300.399491 seconds, pw.over_limit = 249 and WAFL_WRITE = 540822
22:15:03 UTC - Percentage Misaligned Writes = .0460%
22:15:03 UTC - Successfully Completed!

As you can see, there are a few misaligned writes going on, but overall in pretty good health with just under 0.05% of the total. When should you panic? That depends on way to many variables to list here but your write latency is a key indicator to pay attention to. Your application mix (and users) will scream when you've crossed the line.

The code I use is listed below. Read the comments in the parse_wafl function to figure out what it's doing.

#!/bin/bash
# Script Information
# na_alignment_check Version: 1.0
# Last Modified: Sept 18/2010
# Created By: Michael England

# to run this script you will need an account on the filer with administrative privileges as it has to be run with 'priv set advanced'
# ssh prep work
# ssh-keygen -t dsa -b 1024
# cat id_dsa.pub >> {filer}/etc/sshd/{user}/.ssh/authorized_keys

# default to local user
FILER_USER=$(whoami)

######
### function log
### logs activities to the screen, a file, or both
######
function log {
 LOG_TYPE="$1"
 LOG_MSG="$2"
 TIME=`date +'%H:%M:%S %Z'`
 # specify the log file only once
 if [ -z $LOG_FILE ]
 then
  LOG_FILE="/work/Scripts/NetApp_Logs/NetApp-align-$FILER-`date +%Y%m%d-%H%M%S`"
 fi
 if [ $LOG_TYPE == "error" ]
 then
  echo -e "$TIME - **ERROR** - $LOG_MSG"
  echo -e "$TIME - **ERROR** - $LOG_MSG" >> $LOG_FILE
 elif [ $LOG_TYPE == "debug" ]
 then
  if [ $DEBUG == "on" ]
  then
   echo -e "DEBUG - $LOG_MSG"
  fi
 else
  echo -e "$TIME - $LOG_MSG"
  echo -e "$TIME - $LOG_MSG" >> "$LOG_FILE"
 fi
}

######
### check_ssh
### check the return code of an ssh command
######
function check_ssh {
 CHECK=$1
 ERROR_DATA="$2"
 if [ $CHECK != 0 ]
 then
  log error "ssh failed to filer"
  log error "return is $ERROR_DATA"
  exit 1
 fi
}

######
### capture_wafl_susp
### ssh to the filer specified and collect the output from wafl_susp
######

function capture_wafl_susp {
 log notify "Capturing data from filer $FILER"
 SSH_RET=`ssh $FILER -l $FILER_USER "priv set advanced;wafl_susp -w;priv set" 2>&1`
 RETVAL=$?
 check_ssh $RETVAL "$SSH_RET"

 parse_wafl $SSH_RET
}

######
### parse_wafl
### capture values for pw.over_limit and WAFL_WRITE and the overall scan interval
### e.g.
### WAFL statistics over 577455.189585 second(s) ...
### pw.over_limit = 29300
### New messages, restarts, suspends, and waffinity completions (by message-type):
### WAFL_WRITE           = 10568010   122597   122597        0
### 
### There are many other WAFL_WRITE lines so we need to find the New messages line first then the WAFL_WRITE in that section
######
function parse_wafl {
 oldIFS=$IFS
 IFS=$'\n'
 NEW_FLAG=0
 for line in echo $SSH_RET
 do
  if [[ $line =~ second* ]]
  then
   STATS_INTERVAL=`echo $line | awk '{print $4}'`
  fi

  if [[ $line =~ pw.over_limit* ]]
  then
   OVER_LIMIT=`echo $line | awk '{print $3}'`
  fi

  if [[ $line =~ New\ messages.* ]]
  then
   NEW_FLAG=1
  fi
  if [[ $NEW_FLAG == 1 ]] && [[ $line =~ WAFL_WRITE* ]]
  then
   WAFL_WRITE=`echo $line | awk '{print $3}'`
   NEW_FLAG=0
  fi
 done
 IFS=$oldIFS
 if [[ -n $OVER_LIMIT ]] && [[ $WAFL_WRITE -gt 0 ]]
 then 
  # by multiplying by 100 first we don't loose any precision
  MISALIGNED_PERCENT=`echo "scale=4;100*$OVER_LIMIT/$WAFL_WRITE" | bc -l`
 else
  log error "Error collecting values, pw.over_limit = $OVER_LIMIT and WAFL_WRITE = $WAFL_WRITE"
 fi
 
 log notify "Collected Values:"
 log notify "interval = $STATS_INTERVAL seconds, pw.over_limit = $OVER_LIMIT and WAFL_WRITE = $WAFL_WRITE"
 log notify "Percentage Misaligned Writes = ${MISALIGNED_PERCENT}%"
}

######
### function zero_values
### zeroes out existing wafl stats on the filer
######
function zero_values {
 log notify "Zeroing stats from filer $FILER"
 SSH_RET=`ssh $FILER -l $FILER_USER "priv set advanced;wafl_susp -z;priv set" 2>&1`
 RETVAL=$?
 check_ssh $RETVAL "$SSH_RET"
}

######
### function usage
### simple user information for running the script
######
function usage {
 echo -e ""
 echo "Usage:"
 echo "`basename $0` -filer {filer_name} [-username {user_name}] [-poll_interval ]"
 echo -e "\t-filer {file_name} is a fully qualified domain name or IP of a filer to poll"
 echo -e "\t-username {user_name} is a user to attach to the filer, if omitted will use current user"
 echo -e "\t-poll_interval {x} will zero out the filer stats and sleep for {x} seconds then return.  If omitted will read stats since last zeroed"
 echo -e ""
 exit 0
}

# parse command line options
if [ $# == 0 ]
then
 usage
fi
until [ -z "$1" ]
do
 case "$1" in
 -filer)
  shift
  FILER="$1"
  ;;
 -username)
  shift
  FILER_USER="$1"
  ;;
 -poll_interval)
  shift
  POLL_INTERVAL="$1"
  ;;
 *)
  usage
  ;;
 esac
 shift
done
# do the work

if [[ -n $POLL_INTERVAL ]]
then
 zero_values
 log notify "sleeping for $POLL_INTERVAL seconds"
 sleep $POLL_INTERVAL
fi
capture_wafl_susp

log notify "Successfully Completed!"

Sunday, October 30, 2011

Misaligned I/Os

While not unique to virtualization, it generally doesn't cause much of a problem until you consolidate a whole bunch of poorly setup partitions onto one array that things tend to go from all right to a really bad day. The fundamental problem is a mismatch between where the OS places data and where the storage array ultimately keeps it. Both work in logical chunks of data and both present a virtual view of this to the higher layers. There are two specific cases that I'd like to address, one that you can fix, and one that you can only manage.

Storage Alignment
Lets start with a simple illustration of the problem.

Most legacy operating systems (Linux included) like to include a 63 sector offset at the beginning of a drive. This is a real problem as now every read and write overlaps the block boundaries of a physical array. It doesn't matter if you are using VMFS or NFS to host a data store, it's the same problem. Yes an NFS repository will always be aligned, but remember this is a virtual representation to the OS, which happily messes everything up by offsetting its first partition.

Alignment is bad enough when we read. The storage array will pull two blocks when one is read, but it is of particular importance when we write data. Most arrays use some sort of parity to manage redundancy, and if you need to deal with two blocks for every one write request, the system overhead can be enormous. It's also important to keep in mind that every storage vendor has this issue. Even a raw, single drive can benefit from aligned partitions, especially when we consider most new drives ship with a 4KB sector size called Advanced Format.

The impact to each vendor will be slightly different. For example, EMC uses a 64KB block size, so not every write will be unaligned. NetApp uses a 4KB block, which means every write will be unaligned but they handle writes quite a bit differently as the block doesn't have to go back to the same place it came from. Pick your poison.

As you can see, when the OS blocks are aligned, everything through the stack can run at its optimal rate, where one OS request translates to one storage request.

Correctable Block Alignment
Fortunately most modern operating systems have recognized this problem and there is little to do. For example Windows 2008 now uses a 2048 cylinder offset (1MB) as do most current Linux distributions. For Linux, it is easy enough to check.

# fdisk -lu /dev/sdb

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x77cbefef

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048  3907029167  1953513560   83  Linux

As you can see from this example, my starting cylinder is 2048. After that everything will align, including subsequent partitions. The -u option tells fdisk to display cylinders. I generally recommend this option when creating the partition as well although this seems to be the default for fdisk 2.19. You can also check your file system block size to better visualize how this relates through the stack. The following shows that my ext4 partition is using a 4KB block size:

# dumpe2fs /dev/sdb1 | grep -i 'block size'
dumpe2fs 1.41.14 (22-Dec-2010)
Block size:               4096

Uncorrectable Block Alignment
VMware has come out with a nifty utility I first saw in VDI (now called View) and later placed into their cloud offering called a linked clone. Basically it allows you to create a copy of a machine using very little disk space quickly because it reads common data from the original source and writes data to a new location. Sounds a lot like snap shots doesn't it?

Well the problem with this approach is that every block written requires a little header to tell VMware where this new block belongs in the grand scheme of things. This is similar to our 63 cylinder offset but now for every block, nice. It's a good idea to start your master image off with an aligned file system as it will help with reading data but doesn't amount to much when you write. And does a windows desktop ever like to write. Just to exist (no workload), our testing has shown windows does 1-2 write IOs per second. Linux isn't completely off the hook, but generally does 1/3 to 1/2 of that and isn't that common with linked clones yet as it isn't supported in VDI but will get kicked around in vCloud.

Managing Bad Block Alignment

When using linked clones, there are a few steps you can take to minimize the impact:

Refresh your images as often as you can. This will keep the journal file to a minimum and corresponding system overhead. If you can't refresh images, you probably shouldn't be using linked clones.
Don't turn on extra features for those volumes like array based snapshots or de-duplication. The resources needed to track changes for both of these features can cause significant overhead. Use linked clones or de-dupe, not both.
Monitor your progress on a periodic basis. I do this 3 times a day so we can track changes over time. If you can't measure it, you can't fix it.

In the future VAAI is promised to save us by both VMware and every storage vendor that can spell. It's intent is to perform the same linked clone api call but let the storage array figure out the best method of managing the problem. I've yet to see it work in practice, it's still "in the next release", but I have hope.

Wednesday, October 26, 2011

Firefox hang with FilerView

Lately I've been having lots of problems using NetApp's FilerView with current versions of Firefox (I'm using version 7 now). I thought it was something specific to Linux, but when I upgraded Firefox on my windows machine it started happening there too. The basic symptom is the browser just hangs. When it hangs is a bit random. Invariably it would be at some step in one of the wizards or if there are pages with lots of check boxes. In windows you can just switch to IE, but I don't use windows on a day to day basis. So, the work around I have in place now is a configuration setting in Firefox.

In the address bar, type 'about:config'
Type 'html5' in the Filter bar
Set 'html5.parser.enable' to false

That's it, FilerView should now work properly again. I don't run into too many html5 based sites so I'm not entirely sure how much this breaks. Hopefully NetApp and Firefox can get along at some point soon.

Wednesday, July 27, 2011

Linux, OSX Lion, AFP, and Time Machine

I wanted to setup native Apple Filling Protocol (AFP) support on my Linux box so I could host general file data and also act as a Time Machine target. There are quite a few good posts out there such as the ones found here and here but I was still having some trouble so I thought I would post some tips.

In order to do a quick test, I used Finder, selected Go from the menu bar and then Connect to Server... In the Server address, I entered afp:// which presented me with a simplistic error message.

The problem is Apple decided to drop support for DHCAST128 in favour of DHX2. Unfortunately I couldn't find a prebuilt package for OpenSuSE with DHX2 compiled so I had to do it myself. The process is relatively simple in itself, grab a copy of 2.2 beta4 from sourceforge, unpack it and run configure. Assuming you have the basic requirements this should complete and present you with a summary.

Originally I was missing a few options. DHX2 didn't show up, and ACL support didn't work no matter what the configure options said. Without ACL support, your AFP server will work fine, but Time Machine will error out with the following message:

The problem was pretty simple in the end; missing dependencies. While compiling you will run into several errors. Here they are with the packages required to get by them:

configure: error: Zeroconf installation not found
# zypper in libavahi-devel-0.6.28-7.10.1.x86_64

checking whether to enable the TCP wrappers... configure: error: libwrap not found
# zypper in tcpd-devel-7.6-866.1.x86_64

Make sure you have the required Berkeley DB libraries AND headers installed.
# zypper in berkeleydb-3.3.75-10.1.noarch
# zypper in libdb-4_8-devel-4.8.30-2.4.x86_64

You will also need a few development libraries that won't give errors but without them, you will have missing features, like DHX2 and ACL support:

# zypper in libopenssl-devel-1.0.0c-18.19.2.x86_64
# zypper in libgcrypt-devel-1.4.6-3.1.x86_64
# zypper in libacl-devel-2.2.48-12.1.x86_64

Install them using zypper as shown above or with Yast along with any dependent packages, re-run configure, and the required features should show up. Once you are done that, run make and then make install. Here's the summary you should have:

# ./configure --enable-suse --enable-zeroconf --enable-tcp-wrappers --enable-acl-support --disable-cups
    UAMS:
         DHX2    ( SHADOW)
         passwd  ( SHADOW)
         guest
    Options:
         DDP (AppleTalk) support: no
         CUPS support:            no
         SLP support:             no
         Zeroconf support:        yes
         tcp wrapper support:     yes
         quota support:           yes
         admin group support:     yes
         valid shell check:       yes
         cracklib support:        no
         dropbox kludge:          no
         force volume uid/gid:    no
         Apple 2 boot support:    no
         ACL support:             yes

You can then run netatalk with the defaults although I made a couple of custom entries in /usr/local/etc/netatalk/AppleVolumes.default as follows:

# tail AppleVolumes.default
~/ "$u" allow:*user cnidscheme:dbd options:usedots,upriv
~/TimeMachine "$u Backup" allow:*user cnidscheme:dbd options:usedots,upriv,tm

This will automatically share out a home directories for those users specified after allow: and will give you a Time Machine target to backup to. The last step is to create a sparse file for Time Machine to use. I took this from Steffen L. Norgren's blog, so all the credit goes to him.

# hdiutil create -size 512g -fs HFS+J -volname "Time Machine" `grep -A1 LocalHostName /Library/Preferences/SystemConfiguration/preferences.plist | tail -n1 | awk 'BEGIN { FS = "|" } ; { print $2 }'`_`ifconfig en0 | grep ether | awk 'BEGIN { FS = ":" } ; {print $1$2$3$4$5$6}' | awk {'print $2'}`.sparsebundle
# defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1

Launch Time Machine, select your Linux server and new backup volume as a target and you should be good to go.

Wednesday, May 18, 2011

Enabling SSH Access In ESX4i

Enabling SSH access to an ESXi server doesn't seem like remote support to me, it seems like a necessity. To turn this on you can either log onto the console, or follow these steps through Virtual Center.

Select the Host you want and click the Configuration tab
Select 'security profile' from the box labeled software
Click 'properties' on the right hand side
Select 'Remote Tech Support (SSH)' and then 'options'
From the window that pops up click the 'start' button and then the 'Start automatically' radio button as shown below

However, enabling ssh will prompt a 'configuration issue' that just won't go away.

Fortunately there is a way to clear it. With your newly acquired ssh abilities, log into the server and restart the services like this:

# /sbin/services.sh restart

Give it a minute and you'll have easy access to your ESXi servers, able to survive a reboot without a pesky warning.

Tuesday, March 8, 2011

Cisco VPN Without A Cisco Client

A while ago I posted a blog outlining Cisco VPN Installation. This entry is how to get a VPN connection up and running without that software using vpnc instead.

Step one is to get vpnc. Most distributions seem to have a pre-built package so have a look around. For SuSE or RedHat it looks like this:

SuSE
# zypper install vpnc
RedHat
# yum install vpnc

Step two is to get a copy of your .pcf file. If you are using the Cisco VPN client, it is located under /etc/opt/cisco-vpnclient/Profiles/.pcf. This has three important pieces of information.

The host you are connecting to (Host=)
A group name (GroupName=)
An encoded group password (enc_GroupPwd=)

Now, vpnc won't take an encoded password file so you will need to decrypt it first. There is a handy utility for this which should have come in your vpnc package, aptly named cisco-decrypt. If its not included you can download it from here. To run it you will need to either cut and paste the hideously long HEX string after enc_GroupPwd= or just run the command below.

# grep enc_GroupPwd name.pcf | awk -F= '{print $2}' | xargs cisco_decrypt

Once you have this you can create a vpnc.conf file like this one

# vi /etc/vpnc/vpnc.conf
IPSec gateway host_or_ip_from_Host=
IPSec ID group_from_GroupName=
IPSec secret output_from_cisco_decrypt

e.g.
IPSec gateway 44.24.21.2
IPSec ID IPSec-Grp
IPSec secret mysecret
Xauth username myID

If you don't have a group name you should be able to use 'General' instead. You can also add Xauth username your_ID and Xauth password your_password as shown in the example. However, this file is stored in clear text so it is probably best to leave the password option out. VPNC will prompt you for any values not present.

Once that is all done, you can connect and disconnect like this

# vpnc /etc/vpnc/vpnc.conf
  Enter username for _host_: _id_
  Enter password for _id_@_host_: 
  Connect Banner:
  | 
  | Secure VPN Server
  | Authorized Users Only
  | Successfully Authenticated
  | 

  VPNC started in background (pid: 7726)...

# ifconfig
tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:ip  P-t-P:ip  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1412  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

When you're done
# vpnc-disconnect
Terminating vpnc daemon (pid: 7726)

Nice and simple. Since it uses built in kernel modules, no more messing about with compiling, kernel versions or outdated code from Cisco.

Sunday, January 9, 2011

Cross Server Multipath

I was asked to come up with a method to compare multipath devices across an Oracle RAC cluster to ensure they are consistent. Beyond this there was one catch; it had to be run as a regular user. I would have normally run a multipath command but unfortunately that isn't available without elevated privileges. So my solution looks at /etc/multipath.conf, which is readable by a user, compare the entries across all nodes, and output a table at the end. You can see an example of the output in the initial comments.

#!/bin/bash                                                                                                                                                                         
# Script Information
# Version: 1.0
# Created By: Michael England
                                                                                    
# Parses /etc/multipath.conf on any number of nodes (with ssh) and produces a report comparing all of the entries
# e.g.
# WWID                                    node1                         node2                         node3                         Status
# 360060e8005be08000000be08000011ab       apps_001_40g                  --                            --                            INVALID
# 360060e8005be08000000be08000011ac       apps_002_40g                  --                            --                            INVALID
# 360060e8005be08000000be0800004000       ora_shared_data_001_407g      ora_shared_data_001_407g      ora_shared_data_001_407g      OK
# 360060e8005be08000000be0800004001       ora_shared_data_002_407g      ora_shared_data_002_407g      ora_shared_data_002_407g      OK

######
### function read_multipath
### reads a file looking for wwid and alias pairs
### e.g.
### multipath {
###     wwid    36006016015a019004e9820d8b56cde11
###     alias   vote_disk1
### }
### becomes
###     wwid 36006016015a019004e9820d8b56cde11 alias vote_disk1
### if it is missing a wwid or alias that part will be blank
### e.g.
###     wwid 36006016015a019004e9820d8b56cde11
### The first entry will be blank as we increment COUNT on the first multipath
######
function parse_multipath {
        COUNT=0
        unset MPATH_ENTRIES
        oldIFS=$IFS
        IFS=$'\n'
        for line in echo $NODE_MPATH_RESULT
        do
                # skip anything beginning with a comment
                # after bash 3.2 we can't use "" anymore on right hand side
                if [[ $line == \#* ]]
                then
                        continue
                fi
                # look for multipath keyword and increment array counter
                # \ .* excludes multipaths
                if [[ $line =~ multipath\ .* ]]
                then

                        (( COUNT++ ))
                # anything that starts with wwid or alias and add it to the current array element
                elif [[ $line =~ wwid* ]] || [[ $line =~ alias* ]]
                then
                        MPATH_ENTRIES[$COUNT]=`echo "${MPATH_ENTRIES[$COUNT]} $line"`
                fi
        done
        IFS=$oldIFS
}

######
### function search_wwid
### Args: 
###     $1 search key
### Searches the WWID_MAP array for a given key and returns the array position of that element
######
function search_wwid {
        key="$1"
        for index in ${!WWID_MAP[@]}
        do
                if [[ ${WWID_MAP[$index]} =~ $key ]]
                then
                        echo $index
                        # exit the function
                        exit
                fi
        done
        echo -1
}

######
### function search_node
### Args:
###     $1 search key
### Searches the NODE_ARRAY array for a given key and returns the array position of that element
######
function search_node {
        key="\<$1\>"
        for index in ${!NODE_ARRAY[@]}
        do
                if [[ ${NODE_ARRAY[index]} =~ $key ]]
                then
                        echo $index
                        exit
                fi
        done
        echo -1
}

######
### function assign_wwid
### populates WWID_MAP array with a common view to all nodes multipath entries
### reads from the current MPATH_ENTRIES which should be the output from parse_multipath for one node only
### array will look like the following
### WWID        ALIAS_node1     ALIAS_node2     ALIAS_node3     etc
### it does not include any status information
######
function assign_wwid {
        for index in ${!MPATH_ENTRIES[@]}
        do
                # count the number of items starting from 1 (wc -w is word)
                ITEM_COUNT=`echo ${MPATH_ENTRIES[index]} | wc -w`
                # make sure our entry is 4 words long (wwid  alias ), this is really an error condition as every wwid should have an alias
                # if it doesn't pad it with "--"
                for (( i=$ITEM_COUNT; $i <= 4; i++ ))
                do
                        MPATH_ENTRIES[$index]="${MPATH_ENTRIES[index]} --"
                done

                unset FILLER
                # set the key to the WWID and search the WWID_MAP to see if one already exists
                key=`echo ${MPATH_ENTRIES[$index]} | awk '{print $2}'`
                RETVAL=`search_wwid $key`
                if [[ $RETVAL -lt 0 ]]
                then
                        # this is a new WWID
                        # check to see what node ordinal this is (first, second, third, etc...)
                        # if this isn't the first node we will have to fill in -- for the ones before to indicate this WWID doesn't exist on previous nodes
                        NODE_POSITION=`search_node $1`
                        for (( i=0; $i < $NODE_POSITION; i++ ))
                        do
                                FILLER="$FILLER --"
                        done
                        # tack on a new element with the wwid, and filler required, and then the node alias
                        WWID_MAP[${#WWID_MAP[@]}]=`echo ${MPATH_ENTRIES[index]} | awk '{ print $2, filler, $4 }' filler="$FILLER"`
                else
                        # a WWID already exists, just add on
                        (( NODE_POSITION=`search_node $1` + 1 ))
                        MAP_LENGTH=`echo ${WWID_MAP[RETVAL]} | wc -w`
                        # the node position will be greater than the length if we have a hole... plug it with filler
                        for (( i=$MAP_LENGTH; $i < $NODE_POSITION; i++ ))
                        do
                                FILLER="$FILLER --"
                        done
                        # append any required filler and then the alias for this node
                        WWID_MAP[$RETVAL]=${WWID_MAP[$RETVAL]}\ $FILLER\ `echo ${MPATH_ENTRIES[$index]} | awk '{print $4}'`
                fi
        done
}

######
### function check_status
### -ensures each WWID_MAP is the correct length by first finding the longest entry
###  and then padding all others to that length with --
### -creates a WWID_STATUS array, each element aligns with a WWID_MAP element 
###  it checks if all aliases in WWID_MAP are the same, if so marks a green OK, if not a red INVALID
### WWID_STATUS array will have one entry for each WWID_MAP
###   OK
###   OK
###   INVALID
###   etc
######
function check_status {
        # find the longest element in WWID_MAP
        LONGEST=0
        for index in ${!WWID_MAP[@]}
        do
                LENGTH=`echo ${WWID_MAP[index]} | wc -w`
                if [[ $LENGTH > $LONGEST ]]
                then
                        LONGEST=$LENGTH
                fi
        done
        for index in ${!WWID_MAP[@]}
        do
                # count the number of items in this element
                COUNT=`echo ${WWID_MAP[index]} | wc -w`
                # another way to do this is convert to an array and then count the array elements
                # ARRAY=( $(echo ${WWID_MAP[index]}) )
                # for i in `seq ${#ARRAY[@]} $(( $LONGEST - 1 ))` or for i in `seq 2 $(( ${#ARRAY[@]} - 1))` when math rquired
                # if the array is shorter, pad it
                # longest is reduced by 1 because arrays are 0 based
                for i in `seq $COUNT $(( $LONGEST - 1 ))`
                do
                        WWID_MAP[$index]="${WWID_MAP[$index]} --"

                done

                # recount the element as its size may have just changed
                COUNT=`echo ${WWID_MAP[index]} | wc -w`
                if [ $COUNT -eq 2 ]
                then
                        WWID_STATUS[$index]="\e[0;32mOK\e[0;30m"
                fi
                # for all items starting at 2 (the third item) to the end - 1 (zero based)
                # compare with the item previous (i.e. 3:2, 4:3, 5:4, etc)
                for i in `seq 2 $(( $COUNT - 1))`
                do
                        ARRAY=( $(echo ${WWID_MAP[index]}) )
                        # if they don't match, or this item is "--" mark it as invalid
                        if [[ "${ARRAY[i]}" != "${ARRAY[i-1]}" ]] || [[ "${ARRAY[i]}" = "--" ]]
                        then
                                WWID_STATUS[$index]="\e[0;31mINVALID\e[0;30m"
                                break
                        else
                                WWID_STATUS[$index]="\e[0;32mOK\e[0;30m"
                        fi
                done
        done
}

######
### function exchange
### helper for bubble sort to swap both WWID_MAP and WWID_STATUS entries
######
function exchange {
        local temp=${WWID_MAP[$1]}
        local status_temp=${WWID_STATUS[$1]}
        WWID_MAP[$1]=${WWID_MAP[$2]}
        WWID_MAP[$2]=$temp

        WWID_STATUS[$1]=${WWID_STATUS[$2]}
        WWID_STATUS[$2]=$status_temp
}

######
### function sort_results
### a bubble sort of both WWID_MAP and WWID_STATUS based on a user specified sort field (SORT_FIELD)
### if SORT_FIELD = a digit, uses awk to compare values for that column
### if SORT_FIELD = valid | invalid simply compares the WWID_STATUS array elements to be either > or <
######
function sort_results {
        number_of_elements=${#WWID_MAP[@]}
        (( comparisons = $number_of_elements - 1 ))
        count=1
        while [ "$comparisons" -gt 0 ]
        do
                index=0
                while [ "$index" -lt "$comparisons" ]
                do
                        if [[ $SORT_FIELD = [[:digit:]]* ]]
                        then
                                if [[ `echo ${WWID_MAP[index]} | awk '{print $i}' i=$SORT_FIELD` > `echo ${WWID_MAP[ (( $index + 1 ))]} | awk '{print $i}' i=$SORT_FIELD` ]]
                                then
                                        exchange $index `expr $index + 1`
                                fi
                        elif [[ $SORT_FIELD = "invalid" ]]
                        then
                                if [[ ${WWID_STATUS[index]} > ${WWID_STATUS[`expr $index + 1`]} ]]
                                then
                                        exchange $index `expr $index + 1`
                                fi
                        elif [[ $SORT_FIELD = "valid" ]]
                        then
                                if [[ ${WWID_STATUS[index]} < ${WWID_STATUS[`expr $index + 1`]} ]]
                                then
                                        exchange $index `expr $index + 1`
                                fi
                        fi
                        (( index += 1 ))
                done
                (( comparisons -= 1 ))
                (( count += 1 ))
        done
}

######
### function echo_results
### outputs the collected results in WWID_MAP and WWID_STATUS to the screen
######
function echo_results {
        # put together a header row
        # WWID      Status
        RESULT_SET="WWID"
        for node in $NODE_LIST
        do
                RESULT_SET="$RESULT_SET $node"
        done
        RESULT_SET="$RESULT_SET Status"
        # print the header row, wwid (1st column) is 40 characters wide, everything else is 30
        echo -e $RESULT_SET | awk '{ printf "%-40s", $1 }'
        echo -e $RESULT_SET | awk '{ for (i=2; i<=NF; i++) printf "%-30s", $i }'
        awk 'BEGIN {printf "\n"}'
        echo "---------------------"
        # for each element print the first at 40 characters, all others (colume 2 - NF) at 30, then the status (using echo so the colours work)
        # NF is Number of Fields
        for index in ${!WWID_MAP[@]}
        do
                echo ${WWID_MAP[index]} | awk '{ printf  "%-40s", $1 }'
                echo ${WWID_MAP[index]} | awk '{ for (i=2; i<=NF; i++) printf "%-30s", $i }'
                echo -e "${WWID_STATUS[index]}"
        done
}

######
### function usage
### outputs help message
######
function usage {
        echo -e "Usage: `basename $0` -n {node1,node2,node3,etc} [-sort {field} | -sort_invalid | -sort_valid] [-user {username}]"
        echo -e "\t-n is a node list separated by commas, the script will ssh and read /etc/multipath.conf on each"
        echo -e "\t-sort {field} sorts the output. 0 sorts the WWID number, 1..x sorts for a specific node"
        echo -e "\t-sort_invalid places invalid alias entries at the top"
        echo -e "\t-sort_valid places valid alias entries at the top"
        echo -e "\t-user {username} allows a different username to be used than the one currently logged in"
        exit
}

######
### parse the command line for options
######
if [ -z "$1" ]
then
        usage
fi
until [ -z "$1" ]
do
        case "$1" in
        -n)
                shift
                NODE_LIST=`echo $1 | tr "," " "`
                ;;
        -sort)
                shift
                SORT_FIELD=`expr $1 + 1`
                ;;
        -sort_invalid)
                SORT_FIELD="invalid"
                ;;
        -sort_valid)
                SORT_FIELD="valid"
                ;;
        -user)
                shift
                USERNAME="-l $1"
                ;;
        *)
                usage
                ;;
        esac
        shift
done

# convert NODE_LIST to an array so we can search the position
NODE_ARRAY=( $(echo "$NODE_LIST") )

# for each node in the list
#  - grab its multipath.conf (ssh)
#  - parse out wwid and alias pairs (parse_multipath)
#  - add it to the master WWID list (assign_wwid)
# then fill in any blank holes and the overall status (check_status)
# sort the results as requested (sort_results)
# then output to display (echo_results)
for node in $NODE_LIST
do
        NODE_MPATH_RESULT=`ssh $node $USERNAME -C "cat /etc/multipath.conf" 2> /dev/null`
        RETVAL=$?
        if [ $RETVAL != 0 ]
        then
                echo "--- Error retrieving /etc/multipath.conf from node $node ---"
        fi
        parse_multipath $node
        assign_wwid $node
done
check_status
sort_results
echo_results