Sunday, May 23, 2010

Custom Startup Scripts for Linux

There are a few options when having a process or command execute on boot. The easiest is to add it to /etc/rc.local. This works well for small quick and dirty jobs, however, for more complex jobs such as those requiring a specific start order or daemon control a full start-up script is a great way to go.

For this example I am going to draw on a past project of mine, Linux Cluster Manager as it has a daemon that needs to stay running all of the time. Here is the script:

#!/bin/bash
#
# lcm This shell script takes care of starting and stopping
# lcm server daemons
#
# chkconfig: 345 85 25
# description: Client side daemon for LCM
# processname: lcmclient

### BEGIN INIT INFO
# Provides: lcmclient
# Required-Start: $network $syslog
# Required-Stop:
# Default-Start: 3 4 5
# Default-Stop: 0 1 2 6
# Short-Description: LCMClient
# Desciption: Client side daemon for LCM
### END INIT INFO

STATUS=0
# Source function library.
test -s /etc/rc.d/init.d/functions && . /etc/rc.d/init.d/functions
test -s /etc/rc.status && . /etc/rc.status && STATUS=1

start() {
echo -n $"Starting LCM Client Daemons: "
if [ -x /usr/local/lcm/lcmclient ] ; then
if [ $STATUS -eq 1 ]
then
startproc /usr/local/lcm/lcmclient &> /dev/null
rc_status -v
else
/usr/local/lcm/lcmclient &> /dev/null &
PID=`/sbin/pidof -s -x lcmclient`
if [ $PID ]
then
echo_success
else
echo_failure
fi
echo
fi
fi
}

stop () {
echo -n $"Stopping LCM Client Daemons: "
test -s /sbin/pidof && PID=`/sbin/pidof -s -x lcmclient`
test -s /bin/pidof && PID=`/bin/pidof -s -x lcmclient`
if [ $PID ]
then
/bin/kill $PID
fi
if [ $STATUS -eq 1 ]
then
rc_status -v
else
echo_success
echo
fi
}

restart() {
stop
start
}

case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
esac

Registration
At least for SuSE and RedHat based distributions, start-up scripts live in /etc/init.d. They can be called whatever you like as long as they are executable and ideally owned by root as that is who will run them anyway. We used to have to link this script to the different run levels, which is easy enough to do, it's just tedious and error prone. So today we register scripts with chkconfig and let it do all the work for us.

The opening lines enable this feature for both RedHat and SuSE, which of course have to do things differently. I generally like to have both as it doesn't do any harm and allows for more portable code.

1  #!/bin/bash
2 #
3 # lcm This shell script takes care of starting and stopping
4 # lcm server daemons
5 #
6 # chkconfig: 345 85 25
7 # description: Client side daemon for LCM
8 # processname: lcmclient
9
10 ### BEGIN INIT INFO
11 # Provides: lcmclient
12 # Required-Start: $network $syslog
13 # Required-Stop:
14 # Default-Start: 3 4 5
15 # Default-Stop: 0 1 2 6
16 # Short-Description: LCMClient
17 # Description: Client side daemon for LCM
18 ### END INIT INFO

RedHat
The first line is of course the desired shell which all scripts should have. Lines 2-5 are really just information lines for the user. Lines 6-7 are required for chkconfig under RedHat and tell it what run levels we want to start, the start order and the shutdown order. In this case it will start under run levels 3, 4, and 5 with a start order of 85 and a shutdown order of 25.

To register the script and check the results we can run the following:
# chkconfig --add lcm
# chkconfig --list lcm
lcm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
# ls /etc/rc*/*lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc0.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc1.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc2.d/K25lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc3.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc4.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc5.d/S85lcm -> ../init.d/lcm
lrwxrwxrwx 1 root root 17 May 21 10:24 /etc/rc6.d/K25lcm -> ../init.d/lcm

SuSE
SuSE takes its setup process from the Linux Standard Base core specifications. This is shown in lines 10-18 blocked by BEGIN and END INIT INFO. Basically what it does is specify the run levels we would like and what other services are needed to be able to start and stop. Chkconfig figures things out from there and numbers the start and shutdown order for us.

Line 11 begining with Provides established this script as a facility called lcmclient. We can reference other facilities through the Required-Start and Required-Stop on lines 12 and 13. Common facility names are $network, $syslog and $local_fs, but a larger list and some additional explanation can be found here.

The main benefit of this approach is parallel boot operations. If the system understands the relationships of all the start-up elements, many can be run at the same time. If I had another script that depended on this one, I could list lcmclient as a Required-Start entry for that script. Note there is no $ in front as by naming convention, those are reserved for system facility names.

Again, we run the same chkconfig commands, however, this time the start order is determined for us. If we take a closer look at our dependencies we see that network starts at order 2 and syslog at order 3.

# chkconfig --add lcm
lcm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
# ls -l /etc/rc.d/rc*/*lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc3.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc3.d/S04lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc4.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc4.d/S04lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:43 /etc/rc.d/rc5.d/K01lcm -> ../lcm
lrwxrwxrwx 1 root root 10 May 21 10:51 /etc/rc.d/rc5.d/S04lcm -> ../lcm

User Feedback
The next section involves loading other helper functions. They aren't specifically required but make formatting, user feedback, and process management a lot easier.
1  STATUS=0
2 # Source function library.
3 test -s /etc/rc.d/init.d/functions && . /etc/rc.d/init.d/functions
4 test -s /etc/rc.status && . /etc/rc.status && STATUS=1

The only reason I have a STATUS variable is to identify which set of libraries, and therefor which OS is doing the executing. Line 3 is for RedHat, line 4 is for SuSE. As with registration they differ enough from each other to be annoying.

My primary use for these extra functions is to put the nice little [ OK ] or [ FAILED ] messages on the screen that can be so helpful. The exact function called to do this can depend on what the script is doing or how the program it calls operates.
Starting
1  start() {
2 echo -n $"Starting LCM Client Daemons: "
3 if [ -x /usr/local/lcm/lcmclient ] ; then
4 if [ $STATUS -eq 1 ]
5 then
6 startproc /usr/local/lcm/lcmclient &> /dev/null
7 rc_status -v
8 else
9 /usr/local/lcm/lcmclient &> /dev/null &
10 PID=`/sbin/pidof -s -x lcmclient`
11 if [ $PID ]
12 then
13 echo_success
14 else
15 echo_failure
16 fi
17 echo
18 fi
19 fi
20 }

In this case I have chosen to start the application with startproc on line 6 for SuSE and just by hand on line 9 for RedHat. The reason is because the program blocks and its possible to spit out errors to stderr. Startproc handles this fairly well and gives a proper return code which rc_status -v on line 7 can report on. However, the tools under RedHat either expect the process to fork as with a daemon or to return when completed. So, I have resorted to starting by hand and then checking for a process on lines 10-11. You can't just rely on the return code because if you redirect stdout and stderr to /dev/null and put it in the background it will always return 0. Go ahead, try it, I'll wait.

If a pid exists, echo_success is run on line 13, otherwise echo_failure on line 15. Either one of these requires a subsequent echo command on line 17 to provide a newline.

Other methods of starting scripts, programs, or just commands:
























OSFunctionExampleResult
RedHatactionaction "Starting example: " /usr/bin/example[ OK ] or [ FAILED ]
RedHatecho_successecho_success; echo[ OK ]
RedHatecho_failureecho_failure; echo[ FAILED ]
RedHatecho_warningecho_warning; echo[ WARNING ]
SuSEstartprocstartproc /usr/bin/examplenone
SuSErc_statucrc_status -vdone, failed, or skipped


I invite you to wade into the functions provided by each OS and see if you can find any gems in there. Bring your choice of caffeine, you'll need it.

Shutdown
1  stop () {
2 echo -n $"Stopping LCM Client Daemons: "
3 test -s /sbin/pidof && PID=`/sbin/pidof -s -x lcmclient`
4 test -s /bin/pidof && PID=`/bin/pidof -s -x lcmclient`
5 if [ $PID ]
6 then
7 /bin/kill $PID
8 fi
9 if [ $STATUS -eq 1 ]
10 then
11 rc_status -v
12 else
13 echo_success
14 echo
15 fi
16 }

Fairly simple here, grab the pid of the program and issue a kill command. Of course RedHat and SuSE have to disagree on the location for pidof but that isn't too hard to overcome. Again the STATUS variable is used to determine which helper function to run. You'll notice that there isn't a failure result here. I could have some some extra work against the kill command but felt it complicated things more than it really mattered.

Command Line Arguments
Every start-up script is required to accept both the start or stop command line argument. I have handled that with a case statement but you can use whatever makes you happy. It is also customary to include a restart option, usage information, and possibly status if it makes sense.

If your needs are simple enough, you could include all of the code inside the case statement. I find this harder to read for pretty much everything but the simplest of jobs, most of which will fit into rc.local anyway.

Running Your Script
Some useful commands to control and execute your new script
# chkconfig --list lcm
# chkconfig lcm on
will remove all symbolic links to prevent the script from executing

# chkconfig lcm off
will add all symbolic links

# service lcm {start | stop | restart}
# /etc/init.d/lcm {start | stop | restart}
both of these will execute your script, the first just has a little less typing

1 comment:

  1. Great post.

    I ran into an issue. I am using this to run a python daemon as a service.

    #!/usr/bin/env python and #!/usr/bin/python shebangs affect this greatly.
    The latter version works. The former causes pidof to never see the script running!

    ReplyDelete