Upstart 0.5: Job Lifecycle
Next month I am hoping to release Upstart 0.5.0, the culmination of almost a year’s worth of work on it. Comparitively the version that shipped in edgy (0.2.x) was simply an essay to figure out the basics and the version in feisty thru hardy (0.3.x) a first draft. The new version has been stripped back to the very basics and rebuilt to correct the problems we found with the earlier versions, and to make sure it can handle real world uses as simply and elegantly as possible.
Over the next few weeks, I’ll be writing about the new version; both how it has improved from previous versions and how it compares to what else is out there.
Introduction
First we’ll look at how Upstart allows you to manage the lifecyle of services and tasks (collectively jobs) that you wish to manage. We’ll use the D-Bus daemon as an example service, simply because it’s a modern, well-behaved service that we’re all familiar with.
With SystemV RC, we would have had a single /etc/init.d/dbus file accepting both start and stop as arguments. They may have looked something like this:
case "$1" in
start)
start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon
;;
end)
start-stop-daemon --stop --pidfile /var/run/dbus.pid
;;
esac
As you’re well aware, the simple act of starting a daemon and stopping again is not so simple this way. You nearly always end up requiring some kind of helper like start-stop-daemon to help out, and rely on accurate PID files and the like.
Upstart, like just about every other modern service manager (but strangely, not SMF), takes care of all of this hard work for you. Instead of defining how to start and stop a service you just define what to start. Here’s how you’d define the same service in Upstart:
exec /usr/sbin/dbus-daemon
Setup and teardown
Of course, we all know that no service definition is ever that simple. I massively simplified the SystemV example for the purposes of documentation. In reality, we frequently need to do various things to set up the system for the daemon and clean up again afterwards. The original start shell code probably looks more like this (and even now, I’m simplifying for space):
mkdir /var/run/dbus
chown messagebus.messagebus /var/run/dbus
/usr/bin/dbus-uuidgen --ensure
start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon
We need a directory for socket files, etc. and to create the machine id if missing. ANd likewise to shut it down, we need to clean up:
start-stop-daemon --stop --pidfile /var/run/dbus.pid
rm -rf /var/run/dbus
And this is where most init replacements fall down (especially launchd). In fact, ironically, you’ll often find the developers using their minimal service definitions when they talk about how fast their system can boot. You can boot really fast if you don’t start anything properly.
Obviously I wouldn’t be pointing this out if Upstart didn’t allow you to do this properly; we’ll extend our minimal service definition to include the set up and tear down code necessary.
pre-start script
mkdir /var/run/dbus
chown messagebus.messagebus /var/run/dbus
/usr/bin/dbus-uuidgen --ensure
end script
exec /usr/sbin/dbus-daemon
post-stop script
rm -rf /var/run/dbus
end script
Before we just defined one process in a job’s lifecycle, known as the main process. Our new definition defines two more, the pre-start and post-stop processes. We’ve chosen to define them as shell scripts embedded in the definition, we could have defined them as binaries to execute if we preferred (using pre-start exec), and we could have defined the main process as a script (using script...end script).
As their name suggests, these processes are run before the main process is started and after it has been stopped respectively. In fact, Upstart guarantees more than that:
- For every time that the job is started, the post-stop process will be run.
- For every time that the main process is run, the pre-start process will have been completed successfully first.
It might seem a little strange that the post-stop process will always run but the pre-start process doesn’t have as strong a guarantee. This is because it’s possible for the job to be stopped immediately after it is started. Should that happen, Upstart will not run the main process since there’s no need, and therefore will also not run the pre-start process; however to ensure the system is clean, it always runs the post-stop process.
These guarantees also provide sane restart behaviour. If you restart a job, the main process is killed, the post-stop process is run, then the pre-start process is run again before the main process. If you cancel a restart (by stopping the job again) after the post-stop process has been run, it will always be run again.
Spawned, Running and Killed
Upstart makes important distinctions in the state of the main process, it does not necessarily assume that just because the exec() syscall has succeeded that the process is in a suitable running state. Likewise, it does not assume that just because the kill() syscall has succeeded that the process is no longer running.
The latter is easy to understand, delivering the TERM signal to a running process normally just invokes its own termination handler which may perform any number of activities before cleanly shutting down. Upstart waits for the actual child signal signifying termination before running the post-stop script, until that point the process is considered merely “killed”. Obviously too long in the “killed” state means Upstart delivers the much more harcode KILL signal, but that’s adjustable.
The former is harder to understand since the new binary is in memory and is probably at least initialising, but that’s the point: it isn’t yet ready for other jobs to use. In the SystemV script, this wasn’t an issue, since we could generally rely on daemons (well behaved ones anyway) to follow the convention that they should not fork() until initialisation was completed successfully.
Since Upstart forks and supervises its own processes, it generally prefers that daemons do not fork() and remain as the pid they were given when started. So how do jobs signify that they are ready? There are a few ways:
- By forking as before. As I’ve talked about before, Upstart can supervise process that fork, and it will wait for that to happen before assuming the process is ready.
- By raising the
STOPsignal. Jobs marked withexpect stopwill wait for this, and once received will sent it theCONTsignal and assume that it is now ready. - By registering a D-Bus name. An early 0.5.x release will wait for a particular D-Bus name to be registered, and not assume that the job is ready until it has done so.
- By calling
listen(). Again, planned for an early 0.5.x release, Upstart will use the same mechanism it uses to follow forks to watch for thelisten()system call. - With a post-start script, more on that in a second.
The last two processes
I’ve introduced the three processes that most jobs will tend to use, but there’s also another two which will be somewhat rarer but are probably the most powerful of them all. These are the post-start and pre-stop processes, and they’re interesting because they’re run while the main process is running.
The post-start process, as its name suggests, is run after the main process has been spawned and any event we were expecting (see above) has happened. The job will not be considered ready until the post-start process completes, thus a common use for it is to interrogate the daemon or send it commands it can only act on once its running.
The pre-stop process is run when a request to stop the job occurs (this means it is not run if the main process terminates on its own), and the process is not killed until it finishes. It receives information about the request, and can cause that request to be ignored (thus leaving the job running). Another common use is to send the daemon commands before it receives the TERM signal.
Next…
So that’s a look at the ways we can define the lifecycle of an Upstart job. In the next couple of posts we’ll look at the environment and session of jobs, and then at matters such as respawning and singletons.






Jeff Bailey:
You might consider using the term “shell” instead of “script” that would leave you the flexibilty to add, say, python scripts in there if you have something really fancy that needs doing in there.
Tks,
12 April 2008, 6:41 pmJeff Bailey
Jeff Bailey:
Actually, thinking about it, I’d go a step further than that:
I’d make “script” require an argument of what interpreter the script should be piped to.
Then make “shell” an alias for “script /bin/sh”.
I would also add an alias for “bash” to be “script /bin/bash” because there are some times when the bashisms really are worth the speed tradeoff.
12 April 2008, 6:47 pmTomasz:
That is all nice technically, but how is this going to benefit Ubuntu if we still ship with sysvrc-compat? Ubuntu has never taken advantage of Upstart. Will that change with the new version?
12 April 2008, 8:55 pmJeff Schroeder:
Slight correction, exec() is not a syscall, it is the manpage for the exec*() family of functions. execve() is probably the syscall used.
12 April 2008, 10:48 pmulrik:
Exciting that Upstart 0.5 is really coming along! I am hoping for great times for more distributions than just Ubuntu. Let’s finally get rid of sysvrc for good.. Thanks for all your work so far SJR!
13 April 2008, 1:03 amjef:
Tomasz,
Fedora is actually planning on really taking advantage of upstart unlike Ubuntu in their next release.
http://fedoraproject.org/wiki/Features/30SecondStartup
They have already posted some benchmarks and progressing fast.
13 April 2008, 1:54 amJames:
Wow, {pre,post}-{start,stop}, it’s like Debian package scripts.
13 April 2008, 9:09 amliquidat:
Speaking about upstart, is there any cooperation going on with the Fedora people? They are about to include upstart with Fedora 9, although in version 0.3.9.
13 April 2008, 12:16 pmVlad:
Why don’t you syndicate your blog to PlanetKDE? A lot of KDE devs would be interested in your work, as it will affect them in the future.
13 April 2008, 8:56 pmScott James Remnant:
@jbailey: arguments to “script” are in the TODO, I’d planned to just default to shell if no arguments were given; is there a compelling reason to have a “shell” instead?
@tomasz, @jef: the entire point of the new version is to fix the problems that prevented us from taking full advantage of it. Fedora won’t have much luck without it either, and it’s in their roadmap as well
@liquidat: yes, we’re talking quite a lot
@vlad: up to the PlanetKDE admins I guess
14 April 2008, 5:24 pmJeff Bailey:
> @jbailey: arguments to “script” are in the TODO, I’d planned to just default to shell if no arguments were given; is
> there a compelling reason to have a “shell” instead?
The biggest argument is clarity. You’ve already got ‘exec’ to differentiate from ’script’. But the popular scripting language in people’s minds shifts from year to year. A few years ago it would’ve been Perl. These days it’s Python.
Another weaker argument for having aliases is the possibility of defining them in a config file. That way if someone wants “shell” to mean “/bin/posh” on their system, they can change it in one place instead of sed’ing all of the config files.
14 April 2008, 5:32 pmVlad:
>> up to the PlanetKDE admins I guess
If you’ve already contacted Chris Lee, that’s good. He might take a while to update PlanetKDE — I know it took about 2 weeks for my name to appear on the list.
14 April 2008, 10:53 pmScott James Remnant » Blog Archive » Upstart 0.5: Job Environment:
[...] Home « Upstart 0.5: Job Lifecycle [...]
16 April 2008, 2:07 pm