Sunday, November 6, 2011

Tracking Misaligned I/Os

For those that don't know about misaligned I/Os, I provided a brief introduction to them in my last post. In this post I'll show you how to track and quantify how many of your I/Os exhibit the problem. We currently run our VMware infrastructure from a NetApp array, so this script is tailored to that environment.

The only method I've found to track is by using the pw.over_limit counter. Unfortunately it is only available with advanced privileges on the command line as it isn't exported via SNMP. You can manually obtain the data with the following:
# priv set advanced
# wafl_susp -w
# priv set
This will produce lots of data, most of which you can ignore. You'll quickly notice a problem; pw.over_limit is an absolute number, like 29300. So what? Is that good, is it bad? That depends on how many I/Os you are generating in the first place. Since I love writing little programs, here is the output of my script:
$ cat NetApp-align-filer1-20111010-221001
22:10:01 UTC - Zeroing stats from filer filer1
22:10:02 UTC - sleeping for 300 seconds
22:15:02 UTC - Capturing data from filer filer1
22:15:03 UTC - Collected Values:
22:15:03 UTC - interval = 300.399491 seconds, pw.over_limit = 249 and WAFL_WRITE = 540822
22:15:03 UTC - Percentage Misaligned Writes = .0460%
22:15:03 UTC - Successfully Completed! 
As you can see, there are a few misaligned writes going on, but overall in pretty good health with just under 0.05% of the total. When should you panic? That depends on way to many variables to list here but your write latency is a key indicator to pay attention to. Your application mix (and users) will scream when you've crossed the line.

The code I use is listed below. Read the comments in the parse_wafl function to figure out what it's doing.
#!/bin/bash
# Script Information
# na_alignment_check Version: 1.0
# Last Modified: Sept 18/2010
# Created By: Michael England

# to run this script you will need an account on the filer with administrative privileges as it has to be run with 'priv set advanced'
# ssh prep work
# ssh-keygen -t dsa -b 1024
# cat id_dsa.pub >> {filer}/etc/sshd/{user}/.ssh/authorized_keys

# default to local user
FILER_USER=$(whoami)

######
### function log
### logs activities to the screen, a file, or both
######
function log {
 LOG_TYPE="$1"
 LOG_MSG="$2"
 TIME=`date +'%H:%M:%S %Z'`
 # specify the log file only once
 if [ -z $LOG_FILE ]
 then
  LOG_FILE="/work/Scripts/NetApp_Logs/NetApp-align-$FILER-`date +%Y%m%d-%H%M%S`"
 fi
 if [ $LOG_TYPE == "error" ]
 then
  echo -e "$TIME - **ERROR** - $LOG_MSG"
  echo -e "$TIME - **ERROR** - $LOG_MSG" >> $LOG_FILE
 elif [ $LOG_TYPE == "debug" ]
 then
  if [ $DEBUG == "on" ]
  then
   echo -e "DEBUG - $LOG_MSG"
  fi
 else
  echo -e "$TIME - $LOG_MSG"
  echo -e "$TIME - $LOG_MSG" >> "$LOG_FILE"
 fi
}

######
### check_ssh
### check the return code of an ssh command
######
function check_ssh {
 CHECK=$1
 ERROR_DATA="$2"
 if [ $CHECK != 0 ]
 then
  log error "ssh failed to filer"
  log error "return is $ERROR_DATA"
  exit 1
 fi
}

######
### capture_wafl_susp
### ssh to the filer specified and collect the output from wafl_susp
######

function capture_wafl_susp {
 log notify "Capturing data from filer $FILER"
 SSH_RET=`ssh $FILER -l $FILER_USER "priv set advanced;wafl_susp -w;priv set" 2>&1`
 RETVAL=$?
 check_ssh $RETVAL "$SSH_RET"

 parse_wafl $SSH_RET
}

######
### parse_wafl
### capture values for pw.over_limit and WAFL_WRITE and the overall scan interval
### e.g.
### WAFL statistics over 577455.189585 second(s) ...
### pw.over_limit = 29300
### New messages, restarts, suspends, and waffinity completions (by message-type):
### WAFL_WRITE           = 10568010   122597   122597        0
### 
### There are many other WAFL_WRITE lines so we need to find the New messages line first then the WAFL_WRITE in that section
######
function parse_wafl {
 oldIFS=$IFS
 IFS=$'\n'
 NEW_FLAG=0
 for line in echo $SSH_RET
 do
  if [[ $line =~ second* ]]
  then
   STATS_INTERVAL=`echo $line | awk '{print $4}'`
  fi

  if [[ $line =~ pw.over_limit* ]]
  then
   OVER_LIMIT=`echo $line | awk '{print $3}'`
  fi

  if [[ $line =~ New\ messages.* ]]
  then
   NEW_FLAG=1
  fi
  if [[ $NEW_FLAG == 1 ]] && [[ $line =~ WAFL_WRITE* ]]
  then
   WAFL_WRITE=`echo $line | awk '{print $3}'`
   NEW_FLAG=0
  fi
 done
 IFS=$oldIFS
 if [[ -n $OVER_LIMIT ]] && [[ $WAFL_WRITE -gt 0 ]]
 then 
  # by multiplying by 100 first we don't loose any precision
  MISALIGNED_PERCENT=`echo "scale=4;100*$OVER_LIMIT/$WAFL_WRITE" | bc -l`
 else
  log error "Error collecting values, pw.over_limit = $OVER_LIMIT and WAFL_WRITE = $WAFL_WRITE"
 fi
 
 log notify "Collected Values:"
 log notify "interval = $STATS_INTERVAL seconds, pw.over_limit = $OVER_LIMIT and WAFL_WRITE = $WAFL_WRITE"
 log notify "Percentage Misaligned Writes = ${MISALIGNED_PERCENT}%"
}

######
### function zero_values
### zeroes out existing wafl stats on the filer
######
function zero_values {
 log notify "Zeroing stats from filer $FILER"
 SSH_RET=`ssh $FILER -l $FILER_USER "priv set advanced;wafl_susp -z;priv set" 2>&1`
 RETVAL=$?
 check_ssh $RETVAL "$SSH_RET"
}

######
### function usage
### simple user information for running the script
######
function usage {
 echo -e ""
 echo "Usage:"
 echo "`basename $0` -filer {filer_name} [-username {user_name}] [-poll_interval ]"
 echo -e "\t-filer {file_name} is a fully qualified domain name or IP of a filer to poll"
 echo -e "\t-username {user_name} is a user to attach to the filer, if omitted will use current user"
 echo -e "\t-poll_interval {x} will zero out the filer stats and sleep for {x} seconds then return.  If omitted will read stats since last zeroed"
 echo -e ""
 exit 0
}

# parse command line options
if [ $# == 0 ]
then
 usage
fi
until [ -z "$1" ]
do
 case "$1" in
 -filer)
  shift
  FILER="$1"
  ;;
 -username)
  shift
  FILER_USER="$1"
  ;;
 -poll_interval)
  shift
  POLL_INTERVAL="$1"
  ;;
 *)
  usage
  ;;
 esac
 shift
done
# do the work

if [[ -n $POLL_INTERVAL ]]
then
 zero_values
 log notify "sleeping for $POLL_INTERVAL seconds"
 sleep $POLL_INTERVAL
fi
capture_wafl_susp

log notify "Successfully Completed!"

5 comments:

  1. FYI - that counter does _not_ indicate how many writes are misaligned.

    that counter increments when the system hits an upper limit of partial write buffers, which triggers the system to wait on an acknowledgment until it has completed the pending writes.

    the percentage you have generated here is measuring the number of degraded writes vs number of writes.

    anything more than 0 just means the amount of misalignment on the system has reached a critical enough point that the system is fighting back.

    ReplyDelete
  2. Wow, that's fantastic feedback, thanks. It took months for our NetApp team to provide the information I have, this fits one more piece to that puzzle. Perhaps you (or anyone else) has another method to track this problem?

    ReplyDelete
  3. Sorry for the long delay in replying here... :)

    The bottom line issue here is that this is an application level problem, and it needs to be tracked at a higher level. It's not practical to track this from this low in the stack, because it's the _random_ synchronization of partial writes on a moment by moment basis that causes protocol level jitter (and the performance problem) This isn't just a netapp problem, it's just that netapp has a very unique and clever way of dealing with it.

    Seeing that counter going up tempts you to look for a context around it, that could provide a predictable indicator for 'how much is too much', and unless your looking at a second by second statistic, you simply don't have the resolution to tell. If the counter incrementes 1000 in 1 second, but does nothing for the next 299 seconds, you have a 1000 over-runs, but the system was only 'stopped' for maybe a second or 2, so it doesn't really track right.

    That being said, pulling back to the VM level, with a tool like OnCommand Balance will map out a per VM and per workload based report on alignment issues, and allow you to track the 'big talkers' and eliminate them from the misaligned bucket.

    Absent Balance, "it depends" on if these VMs are LUN based or NFS based, as to if there is a better way to track the issue.

    ReplyDelete
  4. Hi,

    here the Patch that this tool can work with newer Ontap Version > 8.1

    abgvsnap1x:/usr/local/bin # diff -uN detect-missalignment.sh pa > patch0
    abgvsnap1x:/usr/local/bin # cat patch0
    --- detect-missalignment.sh 2013-09-16 14:27:51.000000000 +0200
    +++ pa 2013-09-16 14:48:28.000000000 +0200
    @@ -1,10 +1,8 @@
    #!/bin/bash
    # Script Information
    -# na_alignment_check Version: 1.1
    -#
    -# Last Modified: Sept 16/2013 Fix different Outputformat of Ontab > 8.0
    +# na_alignment_check Version: 1.0
    +# Last Modified: Sept 18/2010
    # Created By: Michael England
    -# Modified by: Martin Pauke

    # to run this script you will need an account on the filer with administrative privileges as it has to be run with 'priv set advanced'
    # ssh prep work
    @@ -25,7 +23,7 @@
    # specify the log file only once
    if [ -z $LOG_FILE ]
    then
    - LOG_FILE="/tmp/NetApp-align-$FILER-`date +%Y%m%d-%H%M%S`"
    + LOG_FILE="/work/Scripts/NetApp_Logs/NetApp-align-$FILER-`date +%Y%m%d-%H%M%S`"
    fi
    if [ $LOG_TYPE == "error" ]
    then
    @@ -80,7 +78,7 @@
    ### pw.over_limit = 29300
    ### New messages, restarts, suspends, and waffinity completions (by message-type):
    ### WAFL_WRITE = 10568010 122597 122597 0
    -###
    +###
    ### There are many other WAFL_WRITE lines so we need to find the New messages line first then the WAFL_WRITE in that section
    ######
    function parse_wafl {
    @@ -89,8 +87,7 @@
    NEW_FLAG=0
    for line in echo $SSH_RET
    do
    - #if [[ $line =~ second* ]]
    - if [[ $line =~ WAFL\ statistics\ over ]]
    + if [[ $line =~ second* ]]
    then
    STATS_INTERVAL=`echo $line | awk '{print $4}'`
    fi
    @@ -100,8 +97,7 @@
    OVER_LIMIT=`echo $line | awk '{print $3}'`
    fi

    - # to solve different Outputformat of Ontap Version > 8.1
    - if [[ $line =~ New\ messages.* ]] || [[ $line =~ message\ type ]]
    + if [[ $line =~ New\ messages.* ]]
    then
    NEW_FLAG=1
    fi
    @@ -113,13 +109,13 @@
    done
    IFS=$oldIFS
    if [[ -n $OVER_LIMIT ]] && [[ $WAFL_WRITE -gt 0 ]]
    - then
    + then
    # by multiplying by 100 first we don't loose any precision
    MISALIGNED_PERCENT=`echo "scale=4;100*$OVER_LIMIT/$WAFL_WRITE" | bc -l`
    else
    log error "Error collecting values, pw.over_limit = $OVER_LIMIT and WAFL_WRITE = $WAFL_WRITE"
    fi
    -
    +
    log notify "Collected Values:"
    log notify "interval = $STATS_INTERVAL seconds, pw.over_limit = $OVER_LIMIT and WAFL_WRITE = $WAFL_WRITE"
    log notify "Percentage Misaligned Writes = ${MISALIGNED_PERCENT}%"
    @@ -188,4 +184,3 @@
    capture_wafl_susp

    log notify "Successfully Completed!"
    -

    ReplyDelete
  5. Please change the Prefix off each Line + > - and - > +. Ich have used the diff command first as modified and second as old, but it has to be done by chenge this filenames.

    Sorry
    Martin

    ReplyDelete