Upstart 0.3

For the last couple of months, both at the Ubuntu Developer Summit in Mountain View and on the #upstart IRC channel, we’ve been discussing the changes we want to make to upstart for the Feisty Fawn release of Ubuntu.

This will ship with a version of upstart based on the 0.3 series (it may end up getting called 0.5 before release); the primary goal for this are to have an init system that is suitable for general standalone list in any Linux distribution.

I’ll be giving a talk at linux.conf.au 2007 in Sydney with that aim, I hope to persuade at least one other major Linux distribution that it’s the right solution.

A complete list of the specifications and bugs being targeted for the 0.3 release can be found in Launchpad.

The rest of this post will introduce some of the shiniest new things.

Writing Jobs

Upstart takes care of starting, supervising and stopping daemons itself; unlike in the init script system where you have to write code to do that yourself, often using a helper like start-stop-daemon. All you need to is give the path to, and arguments for, the binary you wish to be started.

exec /usr/bin/dbus-daemon

Some jobs, especially quick tasks, will usually be written as shell scripts. To save having to write a separate file and invoke it, you can include shell script code directly in the job file instead of using the exec stanza.

script
    echo /usr/share/apport/apport > /proc/sys/kernel/crashdump-helper
end script

Usually it’s not sufficient to just start a binary and wish it well; you frequently need something to be run before it is started to prepare the system, and sometimes something after it terminates to clean up again.

For these purposes, additional snippets of shell code can be given — to be run before the binary is started, and after it has finished. Unlike init scripts, these do not need to start or stop the daemon itself; that’s done automatically based on the exec stanza.

pre-start script
    mkdir -p /var/run/dbus
    chown messagebus:messagebus /var/run/dbus
end script

post-stop script
    rm -f /var/run/dbus/pid
end script

For consistency, executables may be specified with pre-start exec and post-start exec instead of shell scripts as above.

It’s sometimes useful to be able to run something after the binary has been started; for example, you may wish to attempt to connect to the daemon to determine whether it is ready to serve requests. post-start script or post-start exec can be used to this.

post-start script
    # wait for listen on port 80
    while ! nc -q0 localhost 80 </dev/null >/dev/null 2>&1; do
        sleep 1;
    done
end script

It’s also useful to be able to notify a daemon that it may be about to be stopped, or delay it for a while. pre-stop script or pre-stop exec can be used for this.

pre-stop script
    # disable the queue, wait for it to become empty
    fooctl disable
    while fooq >/dev/null; do
        sleep 1
    done
end script

Events

Events are now quite a bit more detailed than in previous versions; they’re still named with simple strings that are up to the system sending the event, but they can now include arguments and environment variables which are passed through to jobs being started or stopped as a result.

initctl emit network-interface-up eth0 -DIFADDR=00:11:D8:98:1B:37

This command will now output all of the effects of this event, and will not terminate until the event has been fully handled inside upstart.

Events such as the above can be used by jobs that examine the event arguments and environment within their script:

start on network-interface-up
script
    [ $1 = lo ] && exit 0
    grep -q $IFADDR /etc/network/blacklist && exit 0
    # etc.
 end script

or matched directly in the start on and stop on stanzas:

start on block-device-added sda*

The events generated by job state changes have also changed. Previously both jobs and events shared the same namespace, which not only caused confusion but actually caused some problems when one accidentally named a job after an event.

The two primary events generated are now simply called started and stopped; they inform you that a job is fully up and running, or fully shut down again. The name of the job is received as an argument to this event.

start on started dbus

The started event is not emitted until the post-start task (described above) has finished; so the post-start task can delay other jobs from starting because they can’t yet connect to the daemon.

Likewise the stopped event is not emitted until after the post-stop task has finished.

The other two events emitted by a job are special; they are the starting and stopping events. The reason they are special is that the job is not permitted to start or stop until the event has been handled.

This means that if you have a task to perform when your database server is stopped, but before it’s actually terminated, it’s as simple as:

start on stopping mysql
exec /usr/bin/backup-db.py

MySQL won’t be terminated until the backup has finished.

This is especially useful for daemons that depend on each other, for example HAL needs DBUS, it shouldn’t be started until DBUS is running and DBUS should not be stopped until HAL has been terminated. All the HAL job needs is:

start on started dbus
stop on stopping dbus

Likewise if tomcat is installed, Apache should not be started until tomcat is running; and tomcat should not be stopped until apache has been terminated. All the tomcat job needs is:

start on starting apache
stop on stopped apache

Failure

Nothing goes smoothly all of the time, sometimes tasks the job runs will fail, or the daemon itself will die. As well as providing the ability for a crashed daemon to be automatically restarted, upstart ensured that other jobs are notified with a special failed argument to the stopping and stopped events.

start on stopped typo failed
script
    echo "typo failed again :-(" | mail -s "typo failed" root
end script

And if any job started or stopped by an event fails, it’s possible to discover that the event itself failed.

start on network-interface-up/failed

States

While tasks such as configuring a network interface, or checking and mounting a block device are usually performed as a result of events; services are more complicated.

Services normally need to be running while the system is in a certain state, not just when a particular event occurs. Therefore upstart allows you to describe arbitrarily complex system states by referring to events that define their changes.

For example, many services should be running only while the filesystem is mounted, and at least one network device is up. We have events to indicate the changes into and out of these dates, we just need to combine them:

from fhs-filesystem-mounted until fhs-filesystem-unmounted
and from network-up until network-down

The until operator defines a period between two events, the and operator ensures we’re within both of these periods.

Perhaps we need to be running while any display manager is:

from started gdm until stopping gdm
or started kdm until stopping kdm

Or maybe we only want to be run if a network interface comes up before bind9 has been started:

on network-interface-up and from startup until started bind9

These “complex event configurations” can appear in any job file; and any job file itself can serve as a reference for other jobs. They will be started and stopped at the same time as the named job:

with apache

Omitting the exec or script stanza from a job file means that it simply defines a state that can serve as a reference for others. As such, the multiuser state is simply a job file that defines it.

As an added bonus, these states can still have pre-start, post-stop, etc. scripts.

6 Comments

  1. Phil:

    Tiny issue, when you said “start on mysql stopping”, that was backwards, right?

  2. Scott James Remnant:

    Oops, yes; edited

  3. Philipp Sadleder:

    This all sounds very promising! Thanks for your great work.
    My main question is: how could an upgrade-path to upstart jobs (replacing the sysv-rc ones) look like? Is there a way to incrementally switch over to the new jobs and events?

  4. Ken Campbell:

    Just became aware of Upstart. After reading this post, I started wondering how a sysadmin would manage the dependency tree? (or should I say event fan-out?) Are there plans for a graphical display to show that a change would have the intended result?

  5. Jim:

    Tiny issue, when you said “start on mysql stopping”, that was backwards, right?

  6. Timothy:

    Hey,

    on started mysql does not work for me?
    Any ideas?

Leave a comment