Sunday, May 23, 2010

Custom Startup Scripts for Linux

There are a few options when having a process or command execute on boot. The easiest is to add it to /etc/rc.local. This works well for small quick and dirty jobs, however, for more complex jobs such as those requiring a specific start order or daemon control a full start-up script is a great way to go.

For this example I am going to draw on a past project of mine, Linux Cluster Manager as it has a daemon that needs to stay running all of the time. Here is the script:

#!/bin/bash
#
# lcm This shell script takes care of starting and stopping
# lcm server daemons
#
# chkconfig: 345 85 25
# description: Client side daemon for LCM
# processname: lcmclient

### BEGIN INIT INFO
# Provides: lcmclient
# Required-Start: $network $syslog
# Required-Stop:
# Default-Start: 3 4 5
# Default-Stop: 0 1 2 6
# Short-Description: LCMClient
# Desciption: Client side daemon for LCM
### END INIT INFO

STATUS=0
# Source function library.
test -s /etc/rc.d/init.d/functions && . /etc/rc.d/init.d/functions
test -s /etc/rc.status && . /etc/rc.status && STATUS=1

start() {
echo -n $"Starting LCM Client Daemons: "
if [ -x /usr/local/lcm/lcmclient ] ; then
if [ $STATUS -eq 1 ]
then
startproc /usr/local/lcm/lcmclient &> /dev/null
rc_status -v
else
/usr/local/lcm/lcmclient &> /dev/null &
PID=`/sbin/pidof -s -x lcmclient`
if [ $PID ]
then
echo_success
else
echo_failure
fi
echo
fi
fi
}

stop () {
echo -n $"Stopping LCM Client Daemons: "
test -s /sbin/pidof && PID=`/sbin/pidof -s -x lcmclient`
test -s /bin/pidof && PID=`/bin/pidof -s -x lcmclient`
if [ $PID ]
then
/bin/kill $PID
fi
if [ $STATUS -eq 1 ]
then
rc_status -v
else
echo_success
echo
fi
}

restart() {
stop
start
}

case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
esac

Registration
At least for SuSE and RedHat based distributions, start-up scripts live in /etc/init.d. They can be called whatever you like as long as they are executable and ideally owned by root as that is who will run them anyway. We used to have to link this script to the different run levels, which is easy enough to do, it's just tedious and error prone. So today we register scripts with chkconfig and let it do all the work for us.

The opening lines enable this feature for both RedHat and SuSE, which of course have to do things differently. I generally like to have both as it doesn't do any harm and allows for more portable code.

1  #!/bin/bash
2 #
3 # lcm This shell script takes care of starting and stopping
4 # lcm server daemons
5 #
6 # chkconfig: 345 85 25
7 # description: Client side daemon for LCM
8 # processname: lcmclient
9
10 ### BEGIN INIT INFO
11 # Provides: lcmclient
12 # Required-Start: $network $syslog
13 # Required-Stop:
14 # Default-Start: 3 4 5
15 # Default-Stop: 0 1 2 6
16 # Short-Description: LCMClient
17 # Description: Client side daemon for LCM
18 ### END INIT INFO

RedHat
The first line is of course the desired shell which all scripts should have. Lines 2-5 are really just information lines for the user. Lines 6-7 are required for chkconfig under RedHat and tell it what run levels we want to start, the start order and the shutdown order. In this case it will start under run levels 3, 4, and 5 with a start order of 85 and a shutdown order of 25.

To register the script and check the results we can run the following:
# chkconfig --add lcm
# chkconfig --list lcm
lcm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
# ls /etc/rc*/*lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc0.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc1.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc2.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc3.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc4.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc5.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc6.d/K25lcm -> ../init.d/lcm

SuSE
SuSE takes its setup process from the Linux Standard Base core specifications. This is shown in lines 10-18 blocked by BEGIN and END INIT INFO. Basically what it does is specify the run levels we would like and what other services are needed to be able to start and stop. Chkconfig figures things out from there and numbers the start and shutdown order for us.

Line 11 begining with Provides established this script as a facility called lcmclient. We can reference other facilities through the Required-Start and Required-Stop on lines 12 and 13. Common facility names are $network, $syslog and $local_fs, but a larger list and some additional explanation can be found here.

The main benefit of this approach is parallel boot operations. If the system understands the relationships of all the start-up elements, many can be run at the same time. If I had another script that depended on this one, I could list lcmclient as a Required-Start entry for that script. Note there is no $ in front as by naming convention, those are reserved for system facility names.

Again, we run the same chkconfig commands, however, this time the start order is determined for us. If we take a closer look at our dependencies we see that network starts at order 2 and syslog at order 3.

# chkconfig --add lcm
lcm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
# ls -l /etc/rc.d/rc*/*lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc3.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc3.d/S04lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc4.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc4.d/S04lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc5.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc5.d/S04lcm -> ../lcm

User Feedback
The next section involves loading other helper functions. They aren't specifically required but make formatting, user feedback, and process management a lot easier.
1  STATUS=0
2 # Source function library.
3 test -s /etc/rc.d/init.d/functions && . /etc/rc.d/init.d/functions
4 test -s /etc/rc.status && . /etc/rc.status && STATUS=1

The only reason I have a STATUS variable is to identify which set of libraries, and therefor which OS is doing the executing. Line 3 is for RedHat, line 4 is for SuSE. As with registration they differ enough from each other to be annoying.

My primary use for these extra functions is to put the nice little [ OK ] or [ FAILED ] messages on the screen that can be so helpful. The exact function called to do this can depend on what the script is doing or how the program it calls operates.
Starting
1  start() {
2 echo -n $"Starting LCM Client Daemons: "
3 if [ -x /usr/local/lcm/lcmclient ] ; then
4 if [ $STATUS -eq 1 ]
5 then
6 startproc /usr/local/lcm/lcmclient &> /dev/null
7 rc_status -v
8 else
9 /usr/local/lcm/lcmclient &> /dev/null &
10 PID=`/sbin/pidof -s -x lcmclient`
11 if [ $PID ]
12 then
13 echo_success
14 else
15 echo_failure
16 fi
17 echo
18 fi
19 fi
20 }

In this case I have chosen to start the application with startproc on line 6 for SuSE and just by hand on line 9 for RedHat. The reason is because the program blocks and its possible to spit out errors to stderr. Startproc handles this fairly well and gives a proper return code which rc_status -v on line 7 can report on. However, the tools under RedHat either expect the process to fork as with a daemon or to return when completed. So, I have resorted to starting by hand and then checking for a process on lines 10-11. You can't just rely on the return code because if you redirect stdout and stderr to /dev/null and put it in the background it will always return 0. Go ahead, try it, I'll wait.

If a pid exists, echo_success is run on line 13, otherwise echo_failure on line 15. Either one of these requires a subsequent echo command on line 17 to provide a newline.

Other methods of starting scripts, programs, or just commands:
























OSFunctionExampleResult
RedHatactionaction "Starting example: " /usr/bin/example[ OK ] or [ FAILED ]
RedHatecho_successecho_success; echo[ OK ]
RedHatecho_failureecho_failure; echo[ FAILED ]
RedHatecho_warningecho_warning; echo[ WARNING ]
SuSEstartprocstartproc /usr/bin/examplenone
SuSErc_statucrc_status -vdone, failed, or skipped


I invite you to wade into the functions provided by each OS and see if you can find any gems in there. Bring your choice of caffeine, you'll need it.

Shutdown
1  stop () {
2 echo -n $"Stopping LCM Client Daemons: "
3 test -s /sbin/pidof && PID=`/sbin/pidof -s -x lcmclient`
4 test -s /bin/pidof && PID=`/bin/pidof -s -x lcmclient`
5 if [ $PID ]
6 then
7 /bin/kill $PID
8 fi
9 if [ $STATUS -eq 1 ]
10 then
11 rc_status -v
12 else
13 echo_success
14 echo
15 fi
16 }

Fairly simple here, grab the pid of the program and issue a kill command. Of course RedHat and SuSE have to disagree on the location for pidof but that isn't too hard to overcome. Again the STATUS variable is used to determine which helper function to run. You'll notice that there isn't a failure result here. I could have some some extra work against the kill command but felt it complicated things more than it really mattered.

Command Line Arguments
Every start-up script is required to accept both the start or stop command line argument. I have handled that with a case statement but you can use whatever makes you happy. It is also customary to include a restart option, usage information, and possibly status if it makes sense.

If your needs are simple enough, you could include all of the code inside the case statement. I find this harder to read for pretty much everything but the simplest of jobs, most of which will fit into rc.local anyway.

Running Your Script
Some useful commands to control and execute your new script
# chkconfig --list lcm
# chkconfig lcm on
will remove all symbolic links to prevent the script from executing

# chkconfig lcm off
will add all symbolic links

# service lcm {start | stop | restart}
# /etc/init.d/lcm {start | stop | restart}
both of these will execute your script, the first just has a little less typing

Tuesday, May 18, 2010

Dynamic Linux Disk

To start with, I am going to assume udev and multipath are setup as per my last post. Udev isn't required for scanning or device naming but it is responsible for permissions and device location (directory). Device naming is actually controlled by the multipath driver, which in a modern Linux distribution is conveniently included in the kernel.

The second assumption is that multipath has the basic setup for your particular storage frame. Now, lets ensure multipath is running and set to start on every boot:

# service multipathd status
multipathd is stopped

# chkconfig --list multipathd
multipathd 0:off 1:off 2:off 3:off 4:off 5:off 6:off

# chkconfig multipathd on

# chkconfig --list multipathd
multipathd 0:off 1:off 2:off 3:on 4:off 5:on 6:off

# service multipathd start

As in the last post, I am dealing with RedHat 5.4 and an EMC CLARiiON array. Without any LUNs allocated multipath -ll should look something like this:
# multipath -ll
sdb: checker msg is "emc_clariion_checker: Logical Unit is umbound or LUNZ"
sdc: checker msg is "emc_clariion_checker: Logical Unit is umbound or LUNZ"
sdd: checker msg is "emc_clariion_checker: Logical Unit is umbound or LUNZ"
sde: checker msg is "emc_clariion_checker: Logical Unit is umbound or LUNZ"

These entries are the four paths available to CX controllers.

Adding Devices

No Existing Devices
I have so far been unable to use the simple scan method on an HBA without any devices at all, so this process will unload and reload the adapter driver. It's a disruptive process on the fibre channel bus but there aren't any devices anyway, so it shouldn't matter.

First find the driver you are using, it is likely either an Emulex (lpfc) or Qlogic (qla). In this example I am using an Emulex card.
# lsmod | grep lpfc
lpfc 352909 0
scsi_transport_fc 73801 1 lpfc
scsi_mod 196569 10 scsi_dh,sr_mod,sg,usb_storage,lpfc,scsi_transport_fc,mptsas,mptscsih,scsi_transport_sas,sd_mod

remove the module
# rmmod lpfc

insert the module
# modprobe lpfc

Instead of modprobe, you can also use insmod. The difference being insmod will only load the specified driver and modprobe will load the driver and any dependent drivers.

This will allow the device(s) to show under /dev/ora_rdsk but won't create any multipath entries. To do that we simply run multipath.
# multipath
reload: 36006016015a01900796464949a36df11 DGC,RAID 5
[size=50G][features=1 queue_if_no_path|features=1
queue_if_no_path][hwhandler=1 emc][n/a]
\_ round-robin 0 [prio=2][undef]
\_ 4:0:1:0 sdc 8:32 [active][ready]
\_ 5:0:1:0 sde 8:64 [undef][ready]
\_ round-robin 0 [prio=0][undef]
\_ 4:0:0:0 sdb 8:16 [undef][ready]
\_ 5:0:0:0 sdd 8:48 [undef][ready]

Existing Devices
If you have at least one Fibre device existing, you can simply rescan the bus. This will not take down the existing devices and is able to operate one path at a time ensuring I/O can continue to flow. You will need to know which host devices are your fibre HBAs. To find that, we can list the know fibre adapters as follows and then issue a scan for each.
# ls -l /sys/class/fc_host
drwxr-xr-x 3 root root 0 Apr 17 08:50 host4
drwxr-xr-x 3 root root 0 Apr 17 08:50 host5

# echo "- - -" > /sys/class/scsi_host/host4/scan
# multipath -ll
36006016015a01900e2acd0d4a549df11 dm-3 DGC,RAID 5
[size=8.0G][features=1 queue_if_no_path|features=1
queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:1 sdh 8:112 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:1 sdi 8:128 [active][ready]

# echo "- - -" > /sys/class/scsi_host/host5/scan
# multipath -ll
36006016015a01900e2acd0d4a549df11 dm-3 DGC,RAID 5
[size=8.0G][features=1 queue_if_no_path|features=1
queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=2][enabled]
\_ 1:0:0:1 sdh 8:112 [active][ready]
\_ 2:0:0:1 sdj 8:144 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:1 sdi 8:128 [active][ready]
\_ 2:0:1:1 sdk 8:160 [active][ready]

# ls -lL /dev/ora_rdsk
brw-rw---- 1 root root 253, 3 Apr 17 15:57 36006016015a01900e2acd0d4a549df11

Renaming Devices
The newly scanned device has a WWID name which isn't terribly useful for something like Oracle as we want udev to apply appropriate permissions. To do this, cut and paste the ID into /etc/multipath.conf so it looks something like this:
multipath {
wwid 36006016015a01900796464949a36df11
alias ora_test
}

And then remove the old device name and re-import it into multipath
# multipath -f 36006016015a01900796464949a36df11

# multipath
create: ora_test (36006016015a01900796464949a36df11) DGC,RAID 5
[size=50G][features=1 queue_if_no_path|features=1
queue_if_no_path][hwhandler=1 emc][n/a]
\_ round-robin 0 [prio=2][undef]
\_ 1:0:1:0 sdc 8:32 [undef][ready]
\_ 2:0:1:0 sde 8:64 [undef][ready]
\_ round-robin 0 [prio=0][undef]
\_ 1:0:0:0 sdb 8:16 [undef][ready]
\_ 2:0:0:0 sdd 8:48 [undef][ready]

# ls \-lL /dev/ora_rdsk
brw-rw---- 1 oracle dba 253, 2 Apr 17 09:44 ora_test

If you get an error "must provide a map name to remove" when running multipath -f, make sure you don't have a shell inside /dev/ora_rdsk directory. Also, be careful not to use multipath -F as that will remove all devices, probably not what you want.

Removing Devices

Before doing the actual removal you will need to note several pieces of information; the multipath device name and all block devices assigned to it. All of which can be obtained from multipath -ll.

# multipath -ll
*ora_test2* (36006016015a01900e2acd0d4a549df11) dm-3 DGC,RAID 5
[size=8.0G][features=1 queue_if_no_path|features=1
queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=2][active]
\_ 1:0:0:1 *sdh* 8:112 [active][ready]
\_ 2:0:0:1 *sdj* 8:144 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:1 *sdi* 8:128 [active][ready]
\_ 2:0:1:1 *sdk* 8:160 [active][ready]

To remove the multipath device

# multipath -f ora_test2

Then remove the appropriate block devices from the system with 'echo 1 >
/sys/block/*dev*/device/delete'

# echo 1 > /sys/block/sdh/device/delete
# echo 1 > /sys/block/sdj/device/delete
# echo 1 > /sys/block/sdi/device/delete
# echo 1 > /sys/block/sdk/device/delete

Sunday, May 9, 2010

Linux udev and multipath

While technically /dev is always under the control of udev, as system administrators we rarely need to think about it let alone control it. However, the best use case I have come up with is when configuring storage for Oracle RAC. Oracle requires specific permissions on the devices, you can either use ASMlib (which I think a lot of companies do) or you can use udev. Obviously I like the udev approach as it eliminates yet another vendor supplied software package that needs to be maintained. Being built into Linux means compatibility and updates are handled just like any other OS patch.

A similar argument can be made with multipath (device mapper) although use cases for this spread far beyond just Oracle. Many vendors tout the use of their own multipath software, and while they can have some extra features, I am of the opinion that 99% of the what is required today is available in this free and integrated solution.

Multipath Setup
First off, make sure your storage array is supported by multipath. From my knowledge, all of the major vendors are, but you may have one that isn't on the list. My setup is RedHat 5.4 with an EMC CX array which is convenient as multipath has integrated support for the CLARiiON line. The key section to note from your vendor is any special device entry under /etc/multipath.conf they may need.

Next is to ensure multipath is installed. For RedHat this is in the package device-mapper-multipath, for SuSE it is multipath-tools. Do a package search for multipath and you should find it.

Configuration File
defaults {
user_friendly_names no
}

multipaths {
multipath {
wwid wwid (e.g. 36006016015a0190038585d690b53df11)
alias ora_data1
}
multipath {
wwid wwid (e.g. 36006016015a0190038585d690b53df12)
alias ora_data2
}
}
If you leave user_friendly_name to yes, Linux will create a lovely mpath device for you. I hate these. They are shorter than a WWID but they aren't guaranteed to be consistent across reboots and certainly not across multiple servers. So by specifying “no” you will end up with a device named after the World Wide Identifier (WWID) of the SAN target.

The multipath entries take this WWID and turns it into any name specified by the alias entry. Because we want to change the device permissions, a specific name or prefix is used. In this case I have chosen “ora_” but you could use whatever you like, or multiple prefixes for different functions. For example you could have a vote_ or an ocr_ prefix using different permissions although those disks aren't really required anymore with 11gR2.

You can get WWIDs from the storage array or simply let Linux tell you when it scans them, we'll cover that in a later post.

Optional Configuration
Now if you have a different storage array you may also have to add the vendor specified devices entry, perhaps something like this:

devices {
device {
vendor “NETAPP”
product “LUN”
path_group_policy multibus
getuid_callout “/sbin/scsi_id -g -u -s /block/%n”
prio_callout “/sbin/mpath_prio_ontap /dev/%n”
features "1 queue_if_no_path"
path_checker directio
failback immediate
flush_on_last_del yes
}
}

This is only an example, check with your vendor. If they support multipath, they have documentation with the correct entry for the current features and models.

You can also include a blacklist entry for your local devices but I haven't found it to be required. However, if you like an entry such as this would do:

blacklist
{
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}

Multipath Daemon
The last step is to get the daemon running. Its job is to reconfigure paths when something breaks or when a link is put back in service.

Under RedHat / SuSE this is relatively simple to do:
chkconfig multipathd on

You can validate this with:
chkconfig --list multipathd

And finally startup the daemon immediately with:
service multipathd start
- or -
/etc/init.d/multipathd start

Udev Configuration
Unfortunately distributions can vary greatly in how their udev rules are laid out, so here is the configuration for RedHat, your mileage may vary. I am going to store my new configuration as part of the multipath rules under /etc/udev/rules.d/40-multipath.rules. The changes from the default are highlighted:

# multipath wants the devmaps presented as meaninglful device names
# so name them after their devmap name
SUBSYSTEM!="block", GOTO="end_mpath"
KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"
KERNEL!="dm-[0-9]*", GOTO="end_mpath"
PROGRAM!="/sbin/mpath_wait %M %m", GOTO="end_mpath"
ACTION=="add", RUN+="/sbin/dmsetup ls --target multipath --exec '/sbin/kpartx -a -p p' -j %M -m %m"
PROGRAM=="/sbin/dmsetup ls --target multipath --exec /bin/basename -j %M -m %m", RESULT=="?*", NAME="%k", SYMLINK="ora_rdsk/%c", GOTO="update_oracle_devs"
PROGRAM!="/bin/bash -c '/sbin/dmsetup info -c --noheadings -j %M -m %m | /bin/grep -q .*:.*:.*:.*:.*:.*:.*:part[0-9]*-mpath-'", GOTO="end_mpath"
PROGRAM=="/sbin/dmsetup ls --target linear --exec /bin/basename -j %M -m %m", NAME="%k", RESULT=="?*", SYMLINK="ora_rdsk/%c", GOTO="update_oracle_devs"
GOTO="end_mpath"
LABEL="update_oracle_devs"
RESULT=="vote*",OWNER="oracle",GROUP="oinstall",MODE="644"
RESULT=="ocr*",GROUP="oinstall",MODE="640"
RESULT=="ora*",OWNER="oracle",GROUP="dba",MODE="660"
OPTIONS="last_rule"
LABEL="end_mpath"

The ora_rdsk entry is a way of keeping the oracle disk (multipath disk) in a unique location. The default is mpath, its up to you. This means that all of my multipath devices will show up under /dev/ora_rdsk.

GOTO=”end_mpath” is an entry which acts as a catch for devices that aren't specifically sent to update_oracle_devs which of course follows. This is where the magic happens, it basically does a match based on a certain prefix assigns an owner, a group, and permissions. You can get quite creative here if you like with different regular expressions or device entries for different functions.

Strictly speaking you don't require partitions to use ASM under Oracle RAC, however, if you would like to, they will show up as your designated alias plus a p. For example ora_data1p1, ora_data1p2, etc. I generally don't use them but if you want to change this it is controlled earlier in the file when kpartx is run.

Putting it All Together
Just allocate your disk, put the entries into /dev/multipath.conf and you are off to the races. Your devices should all magically show up in /dev/ora_rdsk/. Next time I'll cover off how to add and remove devices dynamically so you don't have to know your WWIDs up front or do any nasty rebooting.