Archive for the ‘Upstart’ Category.

Upstart 0.5: Relationships

Even the relatively simple System V rc scripts recognise that there are relationships between services, and that in many cases one or more others must be started before a particular service can itself be started: it allows for such relationships to be expressed by using a directory of numbered scripts that are run in series by the sysv rc script.

Tackling this problem in some way is arguably one of the main reasons that each of the alternate init daemons exists. Even launchd acknowledges the problem, even if its solution is to tell service developers that they should spin or sleep while dependencies aren’t available.

The Competition

The way in which the other leading init replacements tackle the relationship problem is through dependencies. This is not that surprising, since the concept is shared (and effectively mirrored) by both the dynamic link loader and the package manager; both things that a service maintainer knows well.

To illustrate how dependencies work, since I use that term precisely to mean only this behaviour, we’ll use one of the chains of the well known Network Manager service.

  • Network Manager depends on HAL
  • HAL depends on D-Bus

When A depends on B, B is required for A to function properly. Any attempt to start A must first start B.

This works well for the link loader, when we load an executable we also need to load and map the shared objects it links to.

It also works well for the package manager, when we install Network Manager it means we also need to install HAL and D-Bus for it to function.

However for an init daemon, it’s not normally ideal: the only reason that D-Bus and HAL will be running is because Network Manager depends on them. If we were to stop Network Manager, we would also stop HAL and D-Bus.

This obviously isn’t what we want, HAL and D-Bus are both essential services in their own right. Thus we end up with a target or goal set of services that must be started anyway, within this group the dependency relationships are only effective for ordering of them. Ironically, it is very rare indeed for a service to not be a target and so all of the complex ability of the dependency-based daemon is lost; the only reason to generate the dependency tree at runtime at all is to allow for parallel starts.

Upside Down Dependencies

Thus one of the first things that service maintainers have to get used to about Upstart is that its service relationships are upside down from the way that they might expect. Upstart assumes that if a service is installed, not disabled, and the required services, tasks or hardware is available then the service should be running.

In the dependency-based model, starting Network Manager would first start HAL which would first start D-Bus.

In the Upstart (event-based) model, D-Bus is started fulfilling HAL’s requirements so HAL is started, fulfilling Network Manager’s requirements (once a network card is available?) so Network Manager is then started.

Upstart has no notion of targets or goals, it simply ensures that all services that can and should be running are; and ensures that services are stopped when it is no longer the right time for them to be running.

Relationships through Events

The way in which relationships between services are defined is by having services react to each other’s events. To continue with our example, HAL would therefore have the following in its job definition:


start on started dbus
stop on stopping dbus

The first line means that when the dbus service is fully up and running (recall from previous posts that this event can be delayed as necessary), HAL will itself be started.

The second line is a little more interesting. Events in Upstart will block until the jobs they affect complete, and the stopping event is emitted before the dbus job is actually stopped and blocks it from doing so. Put more simply, HAL will be fully stopped before D-Bus is stopped.

Thus we have the simplest kind of Upstart relationship. Starting D-Bus will start HAL immediately afterwards, and stopping D-Bus will stop HAL first.

The portmap problem

Most maintainers at this point will be feeling quite smug and about to hit the comments button because they’ve thought of an example service that actually is a dependency, and should not be running if nothing needs it.

Remember that I said they were rare, not non-existant.

One such example is portmap, another is often something like tomcat. There are a few, but they’re certainly not the common case.

Happily one of the elegant things about Upstart’s design is that it does still support this model where it’s needed. In order for portmap to be started when we start an nfs-server, we simply write the following in portmap’s job definition:


start on starting nfs-server
stop on stopped nfs-server

Compare to the example for D-Bus/HAL and you’ll notice that it’s the events that have changed.

Remember that the starting event, like the stopping event we used in the previous example, blocks the job until jobs affected by the event are completed. Thus this first line means that when we start nfs-server, it will not be started until portmap is started.

And the second line is pretty much the mirror of the first in the previous example, once the nfs-server is stopped, we stop portmap as well since it’s no longer needed.

It may seem a little odd that the rules go in portmap, and not nfs-server, but it makes logical sense. It means that for an admin to work out why portmap is getting started, they just need to read the portmap definition and not hunt around the system to see what else might be doing it.

Also in many of the cases, such requirements are actually conditional. Apache doesn’t need to require tomcat, it’s only a requirement if it’s installed. Thus it makes more sense for tomcat to add itself to Apache’s environment rather than Apache to look for tomcat.

Upstart 0.5: Events

In the previous posts, I’ve covered the various features that make Upstart a good service manager, but these are things you’ll find in most others as well. It’s now time to cover that which is singularly unique to Upstart, Events.

Start and Stop

You’ve already seen the start and stop commands, which do somewhat unsurprising things to jobs. The important thing to remember about these is that they are not events. I just wanted to clear that up before we start, since it’s often been a source of confusion not helped by the design of some earlier versions of Upstart.

start and stop operate directly on jobs, and the command will not normally return until the operation is complete or otherwise interrupted. Services are considered complete when they are running, Tasks are considered complete when they have stopped again; in both cases the stop command is complete when the service or task has actually stopped.

This is important since it provides a common-sense behaviour, ensuring that the following operation is not a race condition:


# start apache
apache running (start), process 3591
# wget http://localhost/

Solving race conditions is one key part of Upstart’s purpose.

Both commands may also set environment variables, those set by the start command form part of the environment of the job itself and those set by the stop command are available to the pre-stop script.


# cat /etc/init/jobs.d/getty
instance $TTY
env SPEED=38400
exec /sbin/getty $SPEED $TTY

# start getty TTY=tty1
getty (tty1) running (start), process 4152

Events

As described above, the start and stop commands are admin instructions that act directly on named jobs. Events have many similar properties: they carry environment variables that end up in the environment of jobs they start, and they are not complete until the jobs that they affected have been started or stopped as appropriate.

The difference is that the start and stop commands are targeted at specific jobs, whereas events have no such targetting and instead it is jobs that specify which events they are interested in.

In the Upstart world events serve three general purposes: they act as signals of state changes that jobs can react to (e.g. hardware going away), as method calls to automatically start or stop jobs (e.g. shutdown) and as a way of passing information between jobs.

Events are identified by their name and have a different namespace to that of jobs. They are emitted by a D-Bus call or by using emit on the command-line, naming the event and providing any associated environment variables you wish:


# emit interface-up IFACE=eth0 ADDRFAM=Ethernet ADDRESS=01:23:45:67:89:0a

Jobs may match them on this name and any number of their environment variables, specifying whether the event would automatically start or stop the Job.


start on interface-up IFACE=eth* ADDRFAM=Ethernet

As a short-hand, where the order of the variables for an event is fixed, the names may be omitted:


start on interface-up wlan*

When a job is started by an event, the environment for that event forms part of the environment for the job and may be used when matching events that can automatically stop the job. Harking back to our getty job from previous posts, we can bind this to the lifetime of the underlying device.


start on tty-added
stop on tty-removed TTY=$TTY

instance TTY
exec /sbin/getty 38400 $TTY

We can also match multiple events, either requiring that both occur or either using unsurprising operators:


start on a-up and b-up
stop on a-down or b-down

In these situations, once stopped, both the a-up and b-up events must happen again for the job to be restarted.

Upstart Events

Upstart itself only emits a few events, leaving the rest up to application authors to define. The startup event is the most interesting of these, and is ultimately what nearly all jobs get chained from.

Job Events

As jobs are started and stopped, Upstart emits events on their behalf for four key points in their lifecyle.

  • starting is emitted when the job is first starting, and the job will not actually be started until this event completes.
  • started is emitted once the job is fully running.
  • stopping is emitted when the job is stopping (after the pre-stop has completed), the job will not actually be stopped until this event completes.
  • stopped is emitted once the job is fully stopped.

All of the events have the name of the job in the first variable, JOB and the instance of the job (if applicable) in the second variable, INSTANCE. The stopping and stopped events then have a series of variables indicating the reason for the job stopping: RESULT indicates whether it was a normal stop or a failure then if it failed, PROCESS will say what failed and EXIT_SIGNAL or EXIT_STATUS will contain the terminating signal or exit code.

For example, we can take action to backup a database if the server crashes:


start on stopping hersql RESULT=failed EXIT_SIGNAL=SEGV
task
exec hersql-backup

Jobs can also export variables from their own environment to others through these events by using the export stanza:


start on interface-up
stop on interface-down $IFACE

instance $IFACE
export IFACE
exec ...

Another job may then be started along with this one, and know what interface it’s bound to:


start on started JOBNAME
stop on stopping JOBNAME

instance $IFACE

We’ll look at the various powerful forms of dependency that these events allow us to express in the next post.

Upstart 0.5: Job Lifetime

Continuing the series of posts on Upstart 0.5, in this post I’ll be talking about the various ways that Upstart allows you to manage the lifetime of a job. These are guarantees that Upstart provides you so that when you start a job, you know what will happen if that job dies unexpectedly or someone else tries to start the job as well.

Respawning

We’ve all encountered those daemons that mysteriously die: sometimes they’re taken out by the OOM killer, and sometimes they’re just buggy and crash from time to time. And there’s also those processes that exit when they’re done, and need to be restarted (e.g. getty).

For all of these, Upstart provides the facility to respawn the job; effectively an automatic restart in the case of failure. Respawning is controlled by three things:

  • Whether or not to respawn
  • Whether or not the job exited “normally”
  • Whether it has been respawned too many times recently

Let’s take the sobby server as an example, here’s a job that tends to crash every now and then, and we’d like to keep it running. However, we’re also aware that every now and then, it crashes hard and needs repairing; so we limit it’s respawning to 10 times in 5 seconds (which happens to be the default).


  exec /usr/bin/sobby --autosave-file=/var/lib/sobby/autosave /var/lib/sobby/autosave

  respawn
  respawn limit 10 5

The daemon will be continually respawned until either the limit is reached, or the service is explicitly stopped by request. This isn’t ideal though, sobby has an exit command which we wish to honour; the daemon is well written enough that it only returns the zero exit code if this command has been run, and otherwise always returns a failure or signal of some description.

In addition, we know that the ABRT signal is raised on the daemon when the session file is corrupted (I’m making this up, btw), so we want to stop respawning in that case:

To accomplish this, we simply state which exit codes and signals are considered a normal exit condition:


  exec /usr/bin/sobby --autosave-file=/var/lib/sobby/autosave /var/lib/sobby/autosave

  respawn
  respawn limit 10 5

  normal exit 0 ABRT

Tasks can be respawned too; the only difference is that zero is always considered a normal exit condition for a task:


  task
  exec /usr/sbin/some-check $DEVICE

  respawn

This task will be continually run until it ends with a zero (success) exit code. We could add additional normal exit conditions as well, just as we can with a service.

Singletons

All Upstart jobs are singletons by default, this means that only one instance of that job may be running at any one time. To illustrate, let’s continue using the sobby job we defined above and start it:


  # start sobby
  sobby running (start), process 14977

Ok, we have a single instance of the sobby job running, and we can interrogate the status of that:


  # status sobby
  sobby running (start), process 14977

Now what happens if we (or someone else) tries to start another copy:


  # start sobby
  start: cannot start 'sobby': Already running
  zsh: exit 1   start sobby

This is the most sensible and sane default, it saves you having to worry about locking between services and mos importantly means that you can treat failures to obtain resources as true errors.

For example, if you request a D-Bus name and don’t get it, or attempt to bind to a socket and fail, you can treat that as an error since you know the service manager is already ensuring you’re a singleton. This means that you won’t silently pretend everything’s ok, and thus won’t hide problems.

Instance jobs

But what if you do want to be able to run multiple copies of the job? Upstart supports this though instance jobs, which may have multiple copies running. As well as being identified by the shared job name, each instance is also identified by a second-level instance name.

The instance name for each instance of a job must be unique within that job. Attempting to start another instance with an already used name will return an already running error again.

Thus the usual method for defining an instance name is by using variables from the job environment, which you’ll recall come from sources including the start request.

Let’s use the getty job we defined in the last post and turn that into an instance job:


  instance $TTY
  exec /sbin/getty 38400 $TTY

The instance keyword is the new addition, this defines the name for each instance of the job. Setting it to an ordinary string wouldn’t be much help, since there could only be one unique expansion, and you’d be back to a singleton job again; so we define it using variables from the job’s environment which will be expanded.

In this case, we can have an instance of the job for each unique value of the $TTY variable. This makes sense since this is also what we pass to getty. This means that Upstart is still able to provide the guarantee that another getty won’t be running with the same tty.

All that we need do is pass the value of the TTY environment variable when we start or stop the getty job:


  # start getty TTY=tty1
  getty (tty1) running (start), process 15001
  # start getty TTY=tty2
  getty (tty2) running (start), process 15006

And if we try and run another copy with the same TTY variable, we’ll still get already running:


  # start getty TTY=tty1
  start: cannot start 'getty': Already running
  zsh: exit 1   start getty TTY=tty1

There’s no builtin way to allow unlimited instances, since these would tend to eventually consume all available resources. Since any service or task needs to operate on something, or even just write something, then you’ll need some kind of locking and something in the job environment to tell it what to work on or write. If someone manages to come up with a truly unlimited instance job, you could do it trivially by passing a UUID=$(uuidgen) variable and instancing on that.

In the next post, I’ll cover one of the major differences between Upstart and other service managers: events!

Upstart 0.5: Job Environment

In my previous post on Upstart 0.5, I talked about the ways you can define a service for Upstart to manage and introduced the different processes in a job’s lifecyle. In this post, I’ll look into the detail of those processes and their environment.

Upstart ensures that each process it runs has a sane, safe and predictable environment. By default each process is run in a new process group and session, but not as a leader of that process group or session (otherwise the process would have to be careful on all open() calls to make sure it didn’t suddenly own any ttys it opened); the standard input, output and error file descriptors are bound to /dev/null; the PATH environment variable is set to a sensible default, and the TERM variable inherited from the kernel, otherwise no other variables are set; and all resource limits and the like are inherited from init itself.

There are, of course, many ways to customise this environment from the job definition:

  • Jobs may run as a process group and session leader (normally getty likes this).
  • Jobs may have standard file descriptors sent to /dev/console and may be the owner of /dev/console (so they receive Ctrl-C).
  • Jobs may specify custom resource limits, umask, “nice” level, working directory and chroot directory.

Environment Variables

To say that jobs only have the PATH and TERM environment variables set is quite a fallacy, these are just the two variables that all jobs always have set. In fact, the additional environment variables for a job are very important to Upstart since they are the primary method of communicating with that job how it should behave.

To illustrate this, take an instance of the getty service; it needs to know which tty it should use. We could invent some kind of common configuration or parameter database (or D-Bus service) for this kind of thing, with the job being able to run commands to interrogate it, etc. but that’s entirely unnecessary. UNIX already gives us the functionality we need in environment variables, which you’ve probably noticed your shell documentation calls parameters anyway.

In our getty example, we would store the tty in the TTY environment variable, and then the job definition is nice and simple to understand:


exec /sbin/getty 38400 $TTY

So environment variables can be set from a number of sources: the built-in PATH and TERM variables will always be set; others can be set from the job definition (which can specify to inherit the value from init’s environment); and finally environment can come from the start request for the job. I’ll explain more on the latter in later posts, but for now, it suffices to demonstrate that we’d start our getty example with:


# start getty TTY=tty1

So Upstart allows you to define the job’s true life cycle, including any setup and cleanup it needs to perform before and after the daemon is running; and it allows you to define the environment that daemon runs in, so you don’t have to worry about unexpected situations. In the next post, I’ll talk about how you can manage the lifetime of a job, looking at things such as singletons and respawning.

Upstart 0.5: Job Lifecycle

Next month I am hoping to release Upstart 0.5.0, the culmination of almost a year’s worth of work on it.  Comparitively the version that shipped in edgy (0.2.x) was simply an essay to figure out the basics and the version in feisty thru hardy (0.3.x) a first draft.  The new version has been stripped back to the very basics and rebuilt to correct the problems we found with the earlier versions, and to make sure it can handle real world uses as simply and elegantly as possible.

Over the next few weeks, I’ll be writing about the new version; both how it has improved from previous versions and how it compares to what else is out there.

Introduction

First we’ll look at how Upstart allows you to manage the lifecyle of services and tasks (collectively jobs) that you wish to manage.  We’ll use the D-Bus daemon as an example service, simply because it’s a modern, well-behaved service that we’re all familiar with.

With SystemV RC, we would have had a single /etc/init.d/dbus file accepting both start and stop as arguments. They may have looked something like this:


case "$1" in
    start)
        start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon
        ;;
    end)
        start-stop-daemon --stop --pidfile /var/run/dbus.pid
        ;;
esac

As you’re well aware, the simple act of starting a daemon and stopping again is not so simple this way. You nearly always end up requiring some kind of helper like start-stop-daemon to help out, and rely on accurate PID files and the like.

Upstart, like just about every other modern service manager (but strangely, not SMF), takes care of all of this hard work for you. Instead of defining how to start and stop a service you just define what to start. Here’s how you’d define the same service in Upstart:


exec /usr/sbin/dbus-daemon

Setup and teardown

Of course, we all know that no service definition is ever that simple. I massively simplified the SystemV example for the purposes of documentation. In reality, we frequently need to do various things to set up the system for the daemon and clean up again afterwards. The original start shell code probably looks more like this (and even now, I’m simplifying for space):


mkdir /var/run/dbus
chown messagebus.messagebus /var/run/dbus

/usr/bin/dbus-uuidgen --ensure

start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon

We need a directory for socket files, etc. and to create the machine id if missing. ANd likewise to shut it down, we need to clean up:


start-stop-daemon --stop --pidfile /var/run/dbus.pid

rm -rf /var/run/dbus

And this is where most init replacements fall down (especially launchd). In fact, ironically, you’ll often find the developers using their minimal service definitions when they talk about how fast their system can boot. You can boot really fast if you don’t start anything properly.

Obviously I wouldn’t be pointing this out if Upstart didn’t allow you to do this properly; we’ll extend our minimal service definition to include the set up and tear down code necessary.


pre-start script
    mkdir /var/run/dbus
    chown messagebus.messagebus /var/run/dbus

    /usr/bin/dbus-uuidgen --ensure
end script

exec /usr/sbin/dbus-daemon

post-stop script
    rm -rf /var/run/dbus
end script

Before we just defined one process in a job’s lifecycle, known as the main process. Our new definition defines two more, the pre-start and post-stop processes. We’ve chosen to define them as shell scripts embedded in the definition, we could have defined them as binaries to execute if we preferred (using pre-start exec), and we could have defined the main process as a script (using script...end script).

As their name suggests, these processes are run before the main process is started and after it has been stopped respectively. In fact, Upstart guarantees more than that:

  • For every time that the job is started, the post-stop process will be run.
  • For every time that the main process is run, the pre-start process will have been completed successfully first.

It might seem a little strange that the post-stop process will always run but the pre-start process doesn’t have as strong a guarantee. This is because it’s possible for the job to be stopped immediately after it is started. Should that happen, Upstart will not run the main process since there’s no need, and therefore will also not run the pre-start process; however to ensure the system is clean, it always runs the post-stop process.

These guarantees also provide sane restart behaviour. If you restart a job, the main process is killed, the post-stop process is run, then the pre-start process is run again before the main process. If you cancel a restart (by stopping the job again) after the post-stop process has been run, it will always be run again.

Spawned, Running and Killed

Upstart makes important distinctions in the state of the main process, it does not necessarily assume that just because the exec() syscall has succeeded that the process is in a suitable running state. Likewise, it does not assume that just because the kill() syscall has succeeded that the process is no longer running.

The latter is easy to understand, delivering the TERM signal to a running process normally just invokes its own termination handler which may perform any number of activities before cleanly shutting down. Upstart waits for the actual child signal signifying termination before running the post-stop script, until that point the process is considered merely “killed”. Obviously too long in the “killed” state means Upstart delivers the much more harcode KILL signal, but that’s adjustable.

The former is harder to understand since the new binary is in memory and is probably at least initialising, but that’s the point: it isn’t yet ready for other jobs to use. In the SystemV script, this wasn’t an issue, since we could generally rely on daemons (well behaved ones anyway) to follow the convention that they should not fork() until initialisation was completed successfully.

Since Upstart forks and supervises its own processes, it generally prefers that daemons do not fork() and remain as the pid they were given when started. So how do jobs signify that they are ready? There are a few ways:

  • By forking as before. As I’ve talked about before, Upstart can supervise process that fork, and it will wait for that to happen before assuming the process is ready.
  • By raising the STOP signal. Jobs marked with expect stop will wait for this, and once received will sent it the CONT signal and assume that it is now ready.
  • By registering a D-Bus name. An early 0.5.x release will wait for a particular D-Bus name to be registered, and not assume that the job is ready until it has done so.
  • By calling listen(). Again, planned for an early 0.5.x release, Upstart will use the same mechanism it uses to follow forks to watch for the listen() system call.
  • With a post-start script, more on that in a second.

The last two processes

I’ve introduced the three processes that most jobs will tend to use, but there’s also another two which will be somewhat rarer but are probably the most powerful of them all. These are the post-start and pre-stop processes, and they’re interesting because they’re run while the main process is running.

The post-start process, as its name suggests, is run after the main process has been spawned and any event we were expecting (see above) has happened. The job will not be considered ready until the post-start process completes, thus a common use for it is to interrogate the daemon or send it commands it can only act on once its running.

The pre-stop process is run when a request to stop the job occurs (this means it is not run if the main process terminates on its own), and the process is not killed until it finishes. It receives information about the request, and can cause that request to be ignored (thus leaving the job running). Another common use is to send the daemon commands before it receives the TERM signal.

Next…

So that’s a look at the ways we can define the lifecycle of an Upstart job. In the next couple of posts we’ll look at the environment and session of jobs, and then at matters such as respawning and singletons.

How to (and why) supervise forking processes

Yesterday’s celebratory blog post demonstrated that Upstart is now able to supervise processes that fork into the background, as most daemons do. Now that the code has undergone a little more testing, and been pushed into the archive, it’s worth explaining a little bit more of the background as to the how, and why, we do this.

The why is easiest to answer first. Daemons are normally written to fork, usually twice; this detaches them from the terminal, process group and session that they were spawned from so that they remain running after the user logs out. The fork isn’t just mechanism though, over time a convention has occurred that means daemons don’t go into the background until their initialisation is complete and they’re ready to receive connections — if that’s their bag.

Simply adding an option to remain in the foreground might appear to eliminate the need to deal with the problem, but this also takes away the notification that the daemon is ready for use. Over time this signal can be replaced with other notifications: registering a known D-Bus name, or simply raising SIGSTOP; but these require code changes that need to be agreed with upstream first. Making code changes also assumes that we have the code. Whether we like it or not, sysadmins will often have the need to run proprietary daemons — or even simply older versions of software where the patch is too invasive.

So that’s why we have to do it, now how do we?

This is one of the reasons that building the service supervisor into init, rather than having it as a seperate process, makes sense. Init has a few special kernel-provided buffs, one of which is that orphaned processes are reparented to it. When you run a daemon from the command-line, the process is initially your child; it forks once and the parent dies, the new child is now orphaned, and thus reparented to init. (Most daemons now run setsid and fork a second time. This is to ensure that if they open a tty device, they don’t unexpectedly become its owner.) Init, like any other process, receives notification about its children through wait so will know when daemons terminate; the “must have” of supervision.

So if all daemons are our children we are notified when they terminate and why; we can compare their exit status or signal against a list of known good ones, and choose whether we need to respawn the dead job or mark it as stopped normally.

This isn’t enough though, all we get is the process id of the dead child. We still need to relate that back to a job somehow. One way to do that is to use waitid with the WNOWAIT flag, leaving the process on the table so we can examine /proc to find out more about it. This seems like quite a reasonable approach, we can then match a process to a job by details such as what binary it was actually running. Unfortunately this only works for singleton processes where we’re guaranteed that only one of them exists, both at the job level and at the process-level itself; should the process fork, even to run another child, we could accidentally consider it to have died. Daemons need to be able to run their own children, or even have pools of them to use; and we also need to be able to run multiple copies of daemons where we can support it.

So we really do need to know the process id of the actual daemon process we should be supervising. Unfortunately any method of passing this back to init, even relatively common ones like writing it to a pid file, aren’t sufficiently standard or reliable to do this kind of work.

Ideally the kernel would just tell init when a process was reparented to it, provided both the child process id and that of its previous parent. Such a notification doesn’t exist today, though would be a nice project to try and get it into the kernel mainline; difficult if there’s only one implementation using it.

If we can’t have that, a syscall that would allow us to watch a process and find out when it forks would be the second-best thing. We’d have the previous process id since we were watching it, and we’d hopefully be able to obtain the new child process id from this.

Happily that syscall exists, and I suspect you use it all the time if you’re a developer; it’s a bit of a mad leap to using it inside init, but as you can see, it works rather nicely. All we need do is watch the process, and follow it each time it spawns a new child. We stop watching as soon as we have followed twice (once if a different option is used), or if the process runs a different binary by itself. And thus we can know the process id of daemons we spawned, even if they attempt to detach from their parent process which they’ll just be reparented to anyway.

What’s the syscall? Oh, hmm, is that the time? Got to go! Alright, it’s ptrace.

Supervising forking processes


quest /tmp# cat test.c
#include <sys/types.h>

#include <stdlib.h>
#include <unistd.h>

int
main (int   argc,
      char *argv[])
{
        pid_t pid;

        pid = fork ();
        if (pid > 0)
                exit (0);

        pid = fork ();
        if (pid > 0)
                exit (0);

        pause ();
        exit (0);
}
quest /tmp# gcc -Wall -g -O0 -o test test.c

quest /tmp# cat /etc/event.d/test
wait for daemon
exec /tmp/test

quest /tmp# start test
test (#0) goal changed from stop to start
test (#0) state changed from waiting to starting
event_new: Pending starting event
Handling starting event
event_finished: Finished starting event
test (#0) state changed from starting to pre-start
test (#0) state changed from pre-start to spawned
process_spawn: Spawned main process 6380 for test (#0)
Active test (#0) main process (6380)
test (#0) main process (6380) forked new child 6381
test (#0) main process (6381) forked new child 6382
test (#0) state changed from spawned to post-start
test (#0) state changed from post-start to running
event_new: Pending started event
Handling started event
event_finished: Finished started event

Something for everybody

According to the current issue (#93) of Linux Format, Ubuntu 7.04 (”Feisty Fawn”) is “…a dull release for Ubuntu, leaving Fedora to storm ahead…” (p. 23) whilst “shaping up to be one of the most innovative Linux distro releases of the year.” (p. 38)

Especially amusing for myself is that, with Upstart, they “seldom notice any difference in boot speed” (p. 42), yet “Ubuntu 7.04 boots up in record time, leaving other Linux distros in the dust.” (p. 22)

(As anyone who’s ever read anything about Upstart will know, Ubuntu still uses the SysV-rc scripts so there should be no difference in speed at this point. Funnily enough, they identified the reason Ubuntu boots fast in the same issue; “Changing the /bin/sh symlink to point to Dash instead of Bash can significantly shorten boot times” (p. 33) — unfortunately they simultaneously claim that Dash is only “almost POSIX compliant”, without explaining why they think it isn’t.)

In this modern world, the lack of any editorial direction or basic research into what’s being printed is quite refreshing.

Upstart can now replace sysvinit

Today I reached another milestone in the development of upstart, the packages in universe can now replace the existing sysvinit package.

Before trying this, make sure your installation is up to date as we’ve had to split out some parts of sysvinit into a new sysvutils package. If you’re up to date, and want to try it out, install the upstart and upstart-compat-sysv packages from universe.

Note that the first reboot after you’ve installed the packages (from sysvinit to upstart) will be a little tricky … use reboot -f.

If your system boots and shuts down normally, everything’s working just fine. Note that both will be somewhat more quiet than you’re used to, unless you have usplash running.

Throughout the rest of this entry, I’ll try to answer some of the questions and comments that I’ve received since the last post.

Events

As I talked about previously, upstart is an event-based init daemon. Events are the primary force for having your services and tasks started and stopped at the appropriate time and in the appropriate order.

So what are events and where do they come from? (Note that this part is under development, so may change in later releases).

Events are just simple strings that may be sent by any process when something it is tracking the state of changes. They have no state or longevity, and if, when queued, they do not cause any job state changes, then they have no effect unless they are sent again.

Jobs can list which events cause them to be started if they are not already running and which events cause them to be stopped if they are running. Multiple start and stop events may be listed, in which case the first to occur changes the job until the next one occurs.

upstart itself generates the following system events:

  • “startup”, on system boot.
  • “shutdown”, when the system is about to be shut down.
  • “stalled”, when there are no jobs running and no events in the queue.

The shutdown tool included in the package also causes one of the following events to be sent once the “shutdown” event has been handled:

  • “reboot”,
  • “halt”,
  • “poweroff”,
  • “maintenance” (aka. going into “single user” mode),
  • any user-defined event with shutdown -e event.

Jobs also generate events whenever they change state, this is the primary source of events for ordering:

  • “jobname/start”, when the job is first started.
  • “jobname/started”, once the job is running.
  • “jobname/stop”, when the job is first stopped.
  • “jobname/stopped”, once the job has stopped.
  • “jobname”, for services this is generated once it is running; for tasks this is generated once it has finished.

And as mentioned, any other process on the system may send events through the control socket or just by using initctl trigger EVENT. For now this is just the event string, however it’s intended that the event may include other details including environment variables and even file descriptors.

Typical example

To clarify how it all hangs together, here’s an example (using fictional names) of how the tasks and events can be arranged to provide race-free mounting of filesystems.

  • “udev” service started on the “startup” event.
  • udev daemon is configured to send a “new-block-device” event whenever a new block device is added to the system.
  • “checkfs” task is started on the “new-block-device” event to check the filesystem.
  • “mountfs” task is started when the “checkfs” task has finished and mounts the filesystem if listed in /etc/fstab
  • “filesystem-mounted” event is generated whenever a filesystem is mounted.
  • “fstab” task is started on the “filesystem-mounted” event, it checks the list of mounted filesystems against /etc/fstab and if all are mounted, generates the “writable-filesystem” event.
  • other services and tasks would be started on the “writable-filesystem” event.

By breaking this job into these small tasks, we can see how the pieces fit together. Because everything is now done on events, there are no race conditions; we know that any filesystem listed in /etc/fstab will be checked and mounted.

The only reason they wouldn’t be is if there’s an error of some kind, and that means you have larger problems anyway and the system administrator would have a shell to fix it. Of course, the moment they finish checking the filesystem and mount it, the boot process would carry on.

There’s no reason that any of these events need to be generated by the upstart daemon itself, it can receive them from any other daemon on the system such as udev, acpid, etc. This keeps the focus of the init daemon narrow.

A large part of the future development will be working out exactly what kinds of events we want init itself to generate, what kinds we want to come from elsewhere, and what the contents of an event can be.

Getting Involved

If you want to get involved with trying to nudge the direction of upstart development, you can join the upstart-devel mailing list at http://lists.netsplit.com/.

Or if you just want to grab the source code, tarballs are published at http://people.ubuntu.com/\~scott/software/upstart/ and the bzr archive is at http://bazaar.launchpad.net/\~keybuk/upstart/main

Upstart in Universe

Upstart is a replacement for the init daemon, the process spawned by the kernel that is responsible for starting, supervising and stopping all other processes on the system.

The existing daemon is based on the one found in UNIX System V, and is thus known as sysvinit. It separates jobs into different “run levels” and can either run a job when particular run levels are entered (e.g. /etc/init.d/rc 2) or continually during a particular run level (e.g. /sbin/getty).

The /etc/init.d/rc script is also based on the System V one (and is in the sysv-rc package), it simply executes the stop then start scripts found in /etc/rcN.d (where N is the run level) in numerical order.

Why change it?

Running a fixed set of scripts, one after the other, in a particular order has served us reasonably well until now. However as Linux has got better and better at dealing with modern computing (arguably Linux’s removable device support is better than Windows’ now) this approach has begun to have problems.

The old approach works as long as you can guarantee when in the boot sequence things are available, so you can place your init script after that point and know that it will work. Typical ordering requirements are:

  • Hard drive devices must have been discovered, initialised and partitions detected before we try and mount from /etc/fstab.
  • Network devices must have been discovered and initialised before we try and start networking.

This worked ten years ago, why doesn’t it work now? The simple answer is that our computer has become far more flexible:

  • Drives can be plugged in and removed at any point, e.g. USB drives.
  • Storage buses allow more than a fixed number of drives, so they must be scanned for; this operation frequently does not block.
  • To reduce power consumption, the drive may not actually be spun up until the bus scan so will not appear for an even longer time.
  • Network devices can be plugged in and removed at any point.
  • Firmware may need to be loaded after the device has been detected, but before it is usable by the system.
  • Mounting a partition in /etc/fstab may require tools in /usr which is a network filesystem that cannot be mounted until after networking has been brought up.

We’ve been able to hack the existing system to make much of this possible, however the result is chock-full of race conditions and bugs. It was time to design a new system that can cope with all of these things without any problems.

What we needed was an init system that could dynamically order the start up sequence based on the configuration and hardware found as it went along.

Design of upstart

upstart is an event-based init daemon; events generated by the system cause jobs to be started and running jobs to be stopped. Events can include things such as:

  • the system has started,
  • the root filesystem is now writable,
  • a block device has been added to the system,
  • a filesystem has been mounted,
  • at a certain time or repeated time period,
  • another job has begun running or has finished,
  • a file on the disk has been modified,
  • there are files in a queue directory,
  • a network device has been detected,
  • the default route has been added or removed.

In fact, any process on the system may send events to the init daemon over its control socket (subject to security restrictions, of course) so there is no limit.

Each job has a life-cycle which is shown in the graph below:

upstart_state.png

The two states shown in red (”waiting” and “running”) are rest states, normally we expect the job to remain in these states until an event comes in, at which point we need to take actual to get the job into the next state.

The other states are temporary states; these allow a job to run shell script to prepare for the job itself to be run (”starting”) and clean up afterwards (”stopping”). For services that should be respawned if they terminate before an event that stops them is received, they may run shell script before the process is started again (”respawning”).

Jobs leave a state because the process associated with them terminates (or gets killed) and move to the next appropriate state, following the green arrow if the job is to be started or the red arrow if it is to be stopped. When a script returns a non-zero exit status, or is killed, the job will always be stoped. When the main process terminates and the job should not be respawned, the job will also always be stopped.

As already covered, events generated by the init daemon or received from other processes cause jobs to be started or stopped; also manual requests to start or stop a job may be received.

The communication between the init daemon and other processes is bi-directional, so the status of jobs may be queries and even changes of state to all jobs be received.

How does it differ from launchd?

launchd is the replacement init system used in MacOS X developed as an “Open Source” project by Apple. For much of its life so far, the licence has actually been entirely non-free and thus it has only become recently interesting with the licence change.

Much of the goal of both systems appears initially to be the same; they both start jobs based on system events, however the launchd system severly limits the events to only the following:

  • system startup,
  • file modified or placed in queue directory,
  • particular time (cron replacement),
  • connection on a particular port (inetd replacement).

Therefore it does not actually allow us to directly solve the problems we currently have; we couldn’t mount filesystems once the “filesystem checked” event has been recived, we couldn’t check filesystems when the block device is added and we certainly couldn’t start daemons once the complete filesystem (as described by /etc/fstab) is available and writable.

The launchd model expects the job to “sit and wait” if it is unable to start, rather than provide a mechanism for the job to only be started when it doesn’t need to wait. Jobs that need /usr to be mounted would need to spin in a loop waiting for /usr to be available before continuing (or use a file in a tmpfs to indicate it’s available, and use that modification as the event).

This is not especially surprising given that Apple have a high degree of control over both their hardware and the actual underlying operating system; they don’t need to deal with the wide array of different configurations that we have in the Linux world.

Had the licence been sufficiently free at the point we began development of our own system, we would probably have extended launchd rather than implement our own. At the point Apple changed the licence, our own system was already more suitable for our purposes.

How does it differ from initng?

Initng by Jimmy Wennlund is another replacement init daemon intended to replace the sysvinit system used by Linux. It is a dependency-based system, where upstart is an event-based system.

The notion of a dependency-based system is interesting to talk about at this point. Jobs declare dependencies on other jobs that need to happen before the job itself can be started. Starting the job causes its dependencies to be started first, and their dependencies, and so on. When jobs are stopped, if running jobs have no dependencies, they themselves can be stopped.

It’s a neat solution to the problem of ordering a fixed boot sequence and the problem of keeping the number of running processes to a minimum needed.

However this means that you need to have goals in mind when you boot the system, you need to have decided that you want gdm to be started in order for it, and its dependencies, to be started. Initng uses run levels to ensure this happens, where a run level is a list of goal jobs that should be running in that run level.

It’s also not clear how the dependencies interact with the different types of job, a dependency on Apache would need the daemon to be running where a dependency on “checkroot” would need the script to have finished running. Upstart handles this by using different events (”apache running” vs. “checkroot stopping”).

Again while interesting, Initng does not solve the problems that we wanted to solve. It can reorder a fixed set of jobs, but cannot dynamically determine the set of jobs needed for that particular boot.

A typical example would be that if the only dependency on the job that configures networking is the mount network filesystems job, then should that job fail or notbe a goal (e.g. because there are no network filesystems to be mounted) the result is that network devices themselves will not be configured. You could make everything a goal, and just use the dependencies to determine the order, however this is less efficient than just ordering the existing sysv-rc scripts (which
can be done at install time).

Another example is that often you simply don’t know whether something is a dependency or not without reading other configuration, for example the mount network filesystems may be a dependency of everything under /usr or may just be a dependency of anything allowing the user to login if it just mounts /home.

The difference in model can be summed up as “initng starts with a list of goals and works out how to get there, upstart starts with nothing and finds out where it gets to.”

How does it differ from Solaris SMF?

SMF is another approach to replacing init developed by Sun for the Solaris operating system. Like initng it’s a dependency-based system, so see above for the differences between those systems and upstart.

SMF’s main focus is serive management; making sure that once services are running, they stay running, and allowing the system administrator to query and modify the states of jobs on the system.

Upstart provides the same set of functionality in this regard, services are respawned when they fail and system administrators can at any time query the state of running services and adjust the state to their liking.

Will it replace cron, inetd, etc?

The goal of upstart is to replace those daemons, so that there is only one place (/etc/event.d) where system administrators need to configure when and how jobs should be run.

In fact, the goal is that upstart should also replace the “run event scripts” functionality of any daemon on the system. Daemons such as acpid, apmd and Network Manager would send events to init instead of running scripts themselves with their own perculiar configuration and semantics.

A system administrator who only wanted a particular daemon to be run while the computer was on AC power would simply need to edit /etc/event.d/daemon and change “on startup” to “on ac power”.

What about compatibility?

There’s a lot of systems administrators out there who have learned how Linux works already and will not want to learn again immediately, there’s also a large number of books that cover the existing software and won’t cover upstart for at least a couple of years.

For this reason, compatibility is very important. upstart will continue to run the existing scripts for the forseeable future so that packages will not need to be updated until the author wants.

Compatibility command-line tools that behave like their existing equivalents will also be implemented, a system administrator would never need to know that crontab -e is actually changing upstart jobs.

Does it use D-BUS?

“To D-BUS people, every problem seems like a D-BUS problem.”
– Erik Troan

The UNIX philosophy is that something should do just one job, and do it very well. upstart’s one job is starting, supervising and stopping other jobs; D-BUS’s one job is passing messages between other jobs.

D-BUS does provide a mechanism for services to be activated when the first message is sent to them, thereby starting other jobs. Some people have taken this idea and extended it to suggest that all a replacement init system need do is register jobs with D-BUS and turn booting into a simple matter of message parsing.

This seems wrong to me, D-BUS would need to be extended to supervise these services, provide means for them to be restarted and stopped; as well as deal with being process #1 which means cleaning up after children whose parent’s have died, etc. It seems far simpler to arrange for D-BUS to send an event to init when it needs a service to be started, and focus on being a very good message passing system.

The IPC mechanism used by upstart is not currently D-BUS because of various problems, however it’s always been expected that even if init itself doesn’t communicate with D-BUS directly, there would be a D-BUS proxy that would ensure messages about all init jobs and events would be given to D-BUS and D-BUS clients could send messages to init to query and change the state of jobs.

What is the implementation plan?

Because this is process #1 we are changing, we want to make sure that we get it right. Therefore instead of releasing a fully-featured daemon and configuration to the world, we’re developing it in the following stages:

  1. Principal development; at the end of this stage the daemon has been implemented and can manage jobs as described.
  2. Replacement of /sbin/init while running the existing sysv-rc scripts. This is the shake-down test of the daemon, can it perform the same job as the existing sysvinit daemon without any regressions?
  3. /etc/rcS.d scripts replaced by upstart jobs. These consitute the majority of tasks for booting the system into at least single-user mode, and contain many of the current ordering problems and race conditions. If the daemon solves the problems here, it will be a success.
  4. Other daemon’s scripts replaced by upstart jobs on a package-by-package basis; this will be an ongoing effort during which upstart will continue running the existing sysv-rc scripts as well as its own jobs. During this time the event system may be tweaked to ensure it truly solves the problems we need.
  5. Replcement of cron, atd, anacron and inetd. This will happen alongside the above and result in a single place to configure system jobs.
  6. Modification of other daemons and processes to send events to init instead of trying to run things themselves.

The current plan is that we will be at least part of the way into stage #3 by the time edgy is released, with that release shipping with upstart as the init daemon and the most critical rcS scripts being run by it to correct the major problems

For edgy+1 we hope to have completed stage #5 and be at least part of the way into the implementation of stage #6. From the start of development of edgy+2, no new packages will be accepted unless they provide upstart jobs instead of init scripts and init scripts will be considered deprecated.

What state is it in now?

The init daemon has been written and is able to manage jobs as described above, receiving events on the control socket to start and stop them. This has now been uploaded to the Ubuntu universe component in the upstart package for testing before it becomes the init daemon.

We welcome any experienced users who want to help test this; install the package and follow the instructions in /usr/share/doc/upstart/README.Debian to add a boot option that will use upstart instead of init. If your system boots and shut downs normally (other than a slightly more verbose boot without usplash running) then it is working correctly.

Other types of events will be added as required during development and testing. Currently only a basic client tool (initctl) has been written, compatibility tools such as shutdown will be written over the next week or two before it replaces our sysvinit package.