Archive for the ‘Technology’ Category.

Upstart 0.5: Relationships

Even the relatively simple System V rc scripts recognise that there are relationships between services, and that in many cases one or more others must be started before a particular service can itself be started: it allows for such relationships to be expressed by using a directory of numbered scripts that are run in series by the sysv rc script.

Tackling this problem in some way is arguably one of the main reasons that each of the alternate init daemons exists. Even launchd acknowledges the problem, even if its solution is to tell service developers that they should spin or sleep while dependencies aren’t available.

The Competition

The way in which the other leading init replacements tackle the relationship problem is through dependencies. This is not that surprising, since the concept is shared (and effectively mirrored) by both the dynamic link loader and the package manager; both things that a service maintainer knows well.

To illustrate how dependencies work, since I use that term precisely to mean only this behaviour, we’ll use one of the chains of the well known Network Manager service.

  • Network Manager depends on HAL
  • HAL depends on D-Bus

When A depends on B, B is required for A to function properly. Any attempt to start A must first start B.

This works well for the link loader, when we load an executable we also need to load and map the shared objects it links to.

It also works well for the package manager, when we install Network Manager it means we also need to install HAL and D-Bus for it to function.

However for an init daemon, it’s not normally ideal: the only reason that D-Bus and HAL will be running is because Network Manager depends on them. If we were to stop Network Manager, we would also stop HAL and D-Bus.

This obviously isn’t what we want, HAL and D-Bus are both essential services in their own right. Thus we end up with a target or goal set of services that must be started anyway, within this group the dependency relationships are only effective for ordering of them. Ironically, it is very rare indeed for a service to not be a target and so all of the complex ability of the dependency-based daemon is lost; the only reason to generate the dependency tree at runtime at all is to allow for parallel starts.

Upside Down Dependencies

Thus one of the first things that service maintainers have to get used to about Upstart is that its service relationships are upside down from the way that they might expect. Upstart assumes that if a service is installed, not disabled, and the required services, tasks or hardware is available then the service should be running.

In the dependency-based model, starting Network Manager would first start HAL which would first start D-Bus.

In the Upstart (event-based) model, D-Bus is started fulfilling HAL’s requirements so HAL is started, fulfilling Network Manager’s requirements (once a network card is available?) so Network Manager is then started.

Upstart has no notion of targets or goals, it simply ensures that all services that can and should be running are; and ensures that services are stopped when it is no longer the right time for them to be running.

Relationships through Events

The way in which relationships between services are defined is by having services react to each other’s events. To continue with our example, HAL would therefore have the following in its job definition:


start on started dbus
stop on stopping dbus

The first line means that when the dbus service is fully up and running (recall from previous posts that this event can be delayed as necessary), HAL will itself be started.

The second line is a little more interesting. Events in Upstart will block until the jobs they affect complete, and the stopping event is emitted before the dbus job is actually stopped and blocks it from doing so. Put more simply, HAL will be fully stopped before D-Bus is stopped.

Thus we have the simplest kind of Upstart relationship. Starting D-Bus will start HAL immediately afterwards, and stopping D-Bus will stop HAL first.

The portmap problem

Most maintainers at this point will be feeling quite smug and about to hit the comments button because they’ve thought of an example service that actually is a dependency, and should not be running if nothing needs it.

Remember that I said they were rare, not non-existant.

One such example is portmap, another is often something like tomcat. There are a few, but they’re certainly not the common case.

Happily one of the elegant things about Upstart’s design is that it does still support this model where it’s needed. In order for portmap to be started when we start an nfs-server, we simply write the following in portmap’s job definition:


start on starting nfs-server
stop on stopped nfs-server

Compare to the example for D-Bus/HAL and you’ll notice that it’s the events that have changed.

Remember that the starting event, like the stopping event we used in the previous example, blocks the job until jobs affected by the event are completed. Thus this first line means that when we start nfs-server, it will not be started until portmap is started.

And the second line is pretty much the mirror of the first in the previous example, once the nfs-server is stopped, we stop portmap as well since it’s no longer needed.

It may seem a little odd that the rules go in portmap, and not nfs-server, but it makes logical sense. It means that for an admin to work out why portmap is getting started, they just need to read the portmap definition and not hunt around the system to see what else might be doing it.

Also in many of the cases, such requirements are actually conditional. Apache doesn’t need to require tomcat, it’s only a requirement if it’s installed. Thus it makes more sense for tomcat to add itself to Apache’s environment rather than Apache to look for tomcat.

Upstart 0.5: Events

In the previous posts, I’ve covered the various features that make Upstart a good service manager, but these are things you’ll find in most others as well. It’s now time to cover that which is singularly unique to Upstart, Events.

Start and Stop

You’ve already seen the start and stop commands, which do somewhat unsurprising things to jobs. The important thing to remember about these is that they are not events. I just wanted to clear that up before we start, since it’s often been a source of confusion not helped by the design of some earlier versions of Upstart.

start and stop operate directly on jobs, and the command will not normally return until the operation is complete or otherwise interrupted. Services are considered complete when they are running, Tasks are considered complete when they have stopped again; in both cases the stop command is complete when the service or task has actually stopped.

This is important since it provides a common-sense behaviour, ensuring that the following operation is not a race condition:


# start apache
apache running (start), process 3591
# wget http://localhost/

Solving race conditions is one key part of Upstart’s purpose.

Both commands may also set environment variables, those set by the start command form part of the environment of the job itself and those set by the stop command are available to the pre-stop script.


# cat /etc/init/jobs.d/getty
instance $TTY
env SPEED=38400
exec /sbin/getty $SPEED $TTY

# start getty TTY=tty1
getty (tty1) running (start), process 4152

Events

As described above, the start and stop commands are admin instructions that act directly on named jobs. Events have many similar properties: they carry environment variables that end up in the environment of jobs they start, and they are not complete until the jobs that they affected have been started or stopped as appropriate.

The difference is that the start and stop commands are targeted at specific jobs, whereas events have no such targetting and instead it is jobs that specify which events they are interested in.

In the Upstart world events serve three general purposes: they act as signals of state changes that jobs can react to (e.g. hardware going away), as method calls to automatically start or stop jobs (e.g. shutdown) and as a way of passing information between jobs.

Events are identified by their name and have a different namespace to that of jobs. They are emitted by a D-Bus call or by using emit on the command-line, naming the event and providing any associated environment variables you wish:


# emit interface-up IFACE=eth0 ADDRFAM=Ethernet ADDRESS=01:23:45:67:89:0a

Jobs may match them on this name and any number of their environment variables, specifying whether the event would automatically start or stop the Job.


start on interface-up IFACE=eth* ADDRFAM=Ethernet

As a short-hand, where the order of the variables for an event is fixed, the names may be omitted:


start on interface-up wlan*

When a job is started by an event, the environment for that event forms part of the environment for the job and may be used when matching events that can automatically stop the job. Harking back to our getty job from previous posts, we can bind this to the lifetime of the underlying device.


start on tty-added
stop on tty-removed TTY=$TTY

instance TTY
exec /sbin/getty 38400 $TTY

We can also match multiple events, either requiring that both occur or either using unsurprising operators:


start on a-up and b-up
stop on a-down or b-down

In these situations, once stopped, both the a-up and b-up events must happen again for the job to be restarted.

Upstart Events

Upstart itself only emits a few events, leaving the rest up to application authors to define. The startup event is the most interesting of these, and is ultimately what nearly all jobs get chained from.

Job Events

As jobs are started and stopped, Upstart emits events on their behalf for four key points in their lifecyle.

  • starting is emitted when the job is first starting, and the job will not actually be started until this event completes.
  • started is emitted once the job is fully running.
  • stopping is emitted when the job is stopping (after the pre-stop has completed), the job will not actually be stopped until this event completes.
  • stopped is emitted once the job is fully stopped.

All of the events have the name of the job in the first variable, JOB and the instance of the job (if applicable) in the second variable, INSTANCE. The stopping and stopped events then have a series of variables indicating the reason for the job stopping: RESULT indicates whether it was a normal stop or a failure then if it failed, PROCESS will say what failed and EXIT_SIGNAL or EXIT_STATUS will contain the terminating signal or exit code.

For example, we can take action to backup a database if the server crashes:


start on stopping hersql RESULT=failed EXIT_SIGNAL=SEGV
task
exec hersql-backup

Jobs can also export variables from their own environment to others through these events by using the export stanza:


start on interface-up
stop on interface-down $IFACE

instance $IFACE
export IFACE
exec ...

Another job may then be started along with this one, and know what interface it’s bound to:


start on started JOBNAME
stop on stopping JOBNAME

instance $IFACE

We’ll look at the various powerful forms of dependency that these events allow us to express in the next post.

Upstart 0.5: Job Lifetime

Continuing the series of posts on Upstart 0.5, in this post I’ll be talking about the various ways that Upstart allows you to manage the lifetime of a job. These are guarantees that Upstart provides you so that when you start a job, you know what will happen if that job dies unexpectedly or someone else tries to start the job as well.

Respawning

We’ve all encountered those daemons that mysteriously die: sometimes they’re taken out by the OOM killer, and sometimes they’re just buggy and crash from time to time. And there’s also those processes that exit when they’re done, and need to be restarted (e.g. getty).

For all of these, Upstart provides the facility to respawn the job; effectively an automatic restart in the case of failure. Respawning is controlled by three things:

  • Whether or not to respawn
  • Whether or not the job exited “normally”
  • Whether it has been respawned too many times recently

Let’s take the sobby server as an example, here’s a job that tends to crash every now and then, and we’d like to keep it running. However, we’re also aware that every now and then, it crashes hard and needs repairing; so we limit it’s respawning to 10 times in 5 seconds (which happens to be the default).


  exec /usr/bin/sobby --autosave-file=/var/lib/sobby/autosave /var/lib/sobby/autosave

  respawn
  respawn limit 10 5

The daemon will be continually respawned until either the limit is reached, or the service is explicitly stopped by request. This isn’t ideal though, sobby has an exit command which we wish to honour; the daemon is well written enough that it only returns the zero exit code if this command has been run, and otherwise always returns a failure or signal of some description.

In addition, we know that the ABRT signal is raised on the daemon when the session file is corrupted (I’m making this up, btw), so we want to stop respawning in that case:

To accomplish this, we simply state which exit codes and signals are considered a normal exit condition:


  exec /usr/bin/sobby --autosave-file=/var/lib/sobby/autosave /var/lib/sobby/autosave

  respawn
  respawn limit 10 5

  normal exit 0 ABRT

Tasks can be respawned too; the only difference is that zero is always considered a normal exit condition for a task:


  task
  exec /usr/sbin/some-check $DEVICE

  respawn

This task will be continually run until it ends with a zero (success) exit code. We could add additional normal exit conditions as well, just as we can with a service.

Singletons

All Upstart jobs are singletons by default, this means that only one instance of that job may be running at any one time. To illustrate, let’s continue using the sobby job we defined above and start it:


  # start sobby
  sobby running (start), process 14977

Ok, we have a single instance of the sobby job running, and we can interrogate the status of that:


  # status sobby
  sobby running (start), process 14977

Now what happens if we (or someone else) tries to start another copy:


  # start sobby
  start: cannot start 'sobby': Already running
  zsh: exit 1   start sobby

This is the most sensible and sane default, it saves you having to worry about locking between services and mos importantly means that you can treat failures to obtain resources as true errors.

For example, if you request a D-Bus name and don’t get it, or attempt to bind to a socket and fail, you can treat that as an error since you know the service manager is already ensuring you’re a singleton. This means that you won’t silently pretend everything’s ok, and thus won’t hide problems.

Instance jobs

But what if you do want to be able to run multiple copies of the job? Upstart supports this though instance jobs, which may have multiple copies running. As well as being identified by the shared job name, each instance is also identified by a second-level instance name.

The instance name for each instance of a job must be unique within that job. Attempting to start another instance with an already used name will return an already running error again.

Thus the usual method for defining an instance name is by using variables from the job environment, which you’ll recall come from sources including the start request.

Let’s use the getty job we defined in the last post and turn that into an instance job:


  instance $TTY
  exec /sbin/getty 38400 $TTY

The instance keyword is the new addition, this defines the name for each instance of the job. Setting it to an ordinary string wouldn’t be much help, since there could only be one unique expansion, and you’d be back to a singleton job again; so we define it using variables from the job’s environment which will be expanded.

In this case, we can have an instance of the job for each unique value of the $TTY variable. This makes sense since this is also what we pass to getty. This means that Upstart is still able to provide the guarantee that another getty won’t be running with the same tty.

All that we need do is pass the value of the TTY environment variable when we start or stop the getty job:


  # start getty TTY=tty1
  getty (tty1) running (start), process 15001
  # start getty TTY=tty2
  getty (tty2) running (start), process 15006

And if we try and run another copy with the same TTY variable, we’ll still get already running:


  # start getty TTY=tty1
  start: cannot start 'getty': Already running
  zsh: exit 1   start getty TTY=tty1

There’s no builtin way to allow unlimited instances, since these would tend to eventually consume all available resources. Since any service or task needs to operate on something, or even just write something, then you’ll need some kind of locking and something in the job environment to tell it what to work on or write. If someone manages to come up with a truly unlimited instance job, you could do it trivially by passing a UUID=$(uuidgen) variable and instancing on that.

In the next post, I’ll cover one of the major differences between Upstart and other service managers: events!

Upstart 0.5: Job Environment

In my previous post on Upstart 0.5, I talked about the ways you can define a service for Upstart to manage and introduced the different processes in a job’s lifecyle. In this post, I’ll look into the detail of those processes and their environment.

Upstart ensures that each process it runs has a sane, safe and predictable environment. By default each process is run in a new process group and session, but not as a leader of that process group or session (otherwise the process would have to be careful on all open() calls to make sure it didn’t suddenly own any ttys it opened); the standard input, output and error file descriptors are bound to /dev/null; the PATH environment variable is set to a sensible default, and the TERM variable inherited from the kernel, otherwise no other variables are set; and all resource limits and the like are inherited from init itself.

There are, of course, many ways to customise this environment from the job definition:

  • Jobs may run as a process group and session leader (normally getty likes this).
  • Jobs may have standard file descriptors sent to /dev/console and may be the owner of /dev/console (so they receive Ctrl-C).
  • Jobs may specify custom resource limits, umask, “nice” level, working directory and chroot directory.

Environment Variables

To say that jobs only have the PATH and TERM environment variables set is quite a fallacy, these are just the two variables that all jobs always have set. In fact, the additional environment variables for a job are very important to Upstart since they are the primary method of communicating with that job how it should behave.

To illustrate this, take an instance of the getty service; it needs to know which tty it should use. We could invent some kind of common configuration or parameter database (or D-Bus service) for this kind of thing, with the job being able to run commands to interrogate it, etc. but that’s entirely unnecessary. UNIX already gives us the functionality we need in environment variables, which you’ve probably noticed your shell documentation calls parameters anyway.

In our getty example, we would store the tty in the TTY environment variable, and then the job definition is nice and simple to understand:


exec /sbin/getty 38400 $TTY

So environment variables can be set from a number of sources: the built-in PATH and TERM variables will always be set; others can be set from the job definition (which can specify to inherit the value from init’s environment); and finally environment can come from the start request for the job. I’ll explain more on the latter in later posts, but for now, it suffices to demonstrate that we’d start our getty example with:


# start getty TTY=tty1

So Upstart allows you to define the job’s true life cycle, including any setup and cleanup it needs to perform before and after the daemon is running; and it allows you to define the environment that daemon runs in, so you don’t have to worry about unexpected situations. In the next post, I’ll talk about how you can manage the lifetime of a job, looking at things such as singletons and respawning.

Upstart 0.5: Job Lifecycle

Next month I am hoping to release Upstart 0.5.0, the culmination of almost a year’s worth of work on it.  Comparitively the version that shipped in edgy (0.2.x) was simply an essay to figure out the basics and the version in feisty thru hardy (0.3.x) a first draft.  The new version has been stripped back to the very basics and rebuilt to correct the problems we found with the earlier versions, and to make sure it can handle real world uses as simply and elegantly as possible.

Over the next few weeks, I’ll be writing about the new version; both how it has improved from previous versions and how it compares to what else is out there.

Introduction

First we’ll look at how Upstart allows you to manage the lifecyle of services and tasks (collectively jobs) that you wish to manage.  We’ll use the D-Bus daemon as an example service, simply because it’s a modern, well-behaved service that we’re all familiar with.

With SystemV RC, we would have had a single /etc/init.d/dbus file accepting both start and stop as arguments. They may have looked something like this:


case "$1" in
    start)
        start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon
        ;;
    end)
        start-stop-daemon --stop --pidfile /var/run/dbus.pid
        ;;
esac

As you’re well aware, the simple act of starting a daemon and stopping again is not so simple this way. You nearly always end up requiring some kind of helper like start-stop-daemon to help out, and rely on accurate PID files and the like.

Upstart, like just about every other modern service manager (but strangely, not SMF), takes care of all of this hard work for you. Instead of defining how to start and stop a service you just define what to start. Here’s how you’d define the same service in Upstart:


exec /usr/sbin/dbus-daemon

Setup and teardown

Of course, we all know that no service definition is ever that simple. I massively simplified the SystemV example for the purposes of documentation. In reality, we frequently need to do various things to set up the system for the daemon and clean up again afterwards. The original start shell code probably looks more like this (and even now, I’m simplifying for space):


mkdir /var/run/dbus
chown messagebus.messagebus /var/run/dbus

/usr/bin/dbus-uuidgen --ensure

start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon

We need a directory for socket files, etc. and to create the machine id if missing. ANd likewise to shut it down, we need to clean up:


start-stop-daemon --stop --pidfile /var/run/dbus.pid

rm -rf /var/run/dbus

And this is where most init replacements fall down (especially launchd). In fact, ironically, you’ll often find the developers using their minimal service definitions when they talk about how fast their system can boot. You can boot really fast if you don’t start anything properly.

Obviously I wouldn’t be pointing this out if Upstart didn’t allow you to do this properly; we’ll extend our minimal service definition to include the set up and tear down code necessary.


pre-start script
    mkdir /var/run/dbus
    chown messagebus.messagebus /var/run/dbus

    /usr/bin/dbus-uuidgen --ensure
end script

exec /usr/sbin/dbus-daemon

post-stop script
    rm -rf /var/run/dbus
end script

Before we just defined one process in a job’s lifecycle, known as the main process. Our new definition defines two more, the pre-start and post-stop processes. We’ve chosen to define them as shell scripts embedded in the definition, we could have defined them as binaries to execute if we preferred (using pre-start exec), and we could have defined the main process as a script (using script...end script).

As their name suggests, these processes are run before the main process is started and after it has been stopped respectively. In fact, Upstart guarantees more than that:

  • For every time that the job is started, the post-stop process will be run.
  • For every time that the main process is run, the pre-start process will have been completed successfully first.

It might seem a little strange that the post-stop process will always run but the pre-start process doesn’t have as strong a guarantee. This is because it’s possible for the job to be stopped immediately after it is started. Should that happen, Upstart will not run the main process since there’s no need, and therefore will also not run the pre-start process; however to ensure the system is clean, it always runs the post-stop process.

These guarantees also provide sane restart behaviour. If you restart a job, the main process is killed, the post-stop process is run, then the pre-start process is run again before the main process. If you cancel a restart (by stopping the job again) after the post-stop process has been run, it will always be run again.

Spawned, Running and Killed

Upstart makes important distinctions in the state of the main process, it does not necessarily assume that just because the exec() syscall has succeeded that the process is in a suitable running state. Likewise, it does not assume that just because the kill() syscall has succeeded that the process is no longer running.

The latter is easy to understand, delivering the TERM signal to a running process normally just invokes its own termination handler which may perform any number of activities before cleanly shutting down. Upstart waits for the actual child signal signifying termination before running the post-stop script, until that point the process is considered merely “killed”. Obviously too long in the “killed” state means Upstart delivers the much more harcode KILL signal, but that’s adjustable.

The former is harder to understand since the new binary is in memory and is probably at least initialising, but that’s the point: it isn’t yet ready for other jobs to use. In the SystemV script, this wasn’t an issue, since we could generally rely on daemons (well behaved ones anyway) to follow the convention that they should not fork() until initialisation was completed successfully.

Since Upstart forks and supervises its own processes, it generally prefers that daemons do not fork() and remain as the pid they were given when started. So how do jobs signify that they are ready? There are a few ways:

  • By forking as before. As I’ve talked about before, Upstart can supervise process that fork, and it will wait for that to happen before assuming the process is ready.
  • By raising the STOP signal. Jobs marked with expect stop will wait for this, and once received will sent it the CONT signal and assume that it is now ready.
  • By registering a D-Bus name. An early 0.5.x release will wait for a particular D-Bus name to be registered, and not assume that the job is ready until it has done so.
  • By calling listen(). Again, planned for an early 0.5.x release, Upstart will use the same mechanism it uses to follow forks to watch for the listen() system call.
  • With a post-start script, more on that in a second.

The last two processes

I’ve introduced the three processes that most jobs will tend to use, but there’s also another two which will be somewhat rarer but are probably the most powerful of them all. These are the post-start and pre-stop processes, and they’re interesting because they’re run while the main process is running.

The post-start process, as its name suggests, is run after the main process has been spawned and any event we were expecting (see above) has happened. The job will not be considered ready until the post-start process completes, thus a common use for it is to interrogate the daemon or send it commands it can only act on once its running.

The pre-stop process is run when a request to stop the job occurs (this means it is not run if the main process terminates on its own), and the process is not killed until it finishes. It receives information about the request, and can cause that request to be ignored (thus leaving the job running). Another common use is to send the daemon commands before it receives the TERM signal.

Next…

So that’s a look at the ways we can define the lifecycle of an Upstart job. In the next couple of posts we’ll look at the environment and session of jobs, and then at matters such as respawning and singletons.

Ubuntu Brainstorm Announced!

The Ubuntu QA community have put together an awesome new resource for Ubuntu users and developers - Ubuntu Brainstorm.  This allows you to suggest ideas for improvements, and to vote on the ideas others have suggested.

We have of course been inspired by the IdeaStorm site from our good friends at Dell but modified the concept to fit our needs.

The development team can now take the pulse on the most pressing user issues and propose the ideas as topics at the Ubuntu Development Summits and ultimately as specifications. Ubuntu development is in turn driven by detailed specifications written up in the wiki and tracked as blueprints in Launchpad.

An idea on brainstorm can easily be linked to a Launchpad blueprint as well as to a bug or a forum discussion thread. In this way we expect to bridge the locations where ideas are often submitted now, as forum posts or bug reports, with the blueprint format they should be expressed in to be implemented.

DH-Hell

I hate DHL.  This says it all.

Web 2.0 Service Pack 1

Why do I need to tell each and every web service who my friends are?  Why can’t last.fm, flickr and twitter just get this information from Facebook?  Likewise, why I do have to tell them all where I live, how old I am, what my website is, etc.?

How to (and why) supervise forking processes

Yesterday’s celebratory blog post demonstrated that Upstart is now able to supervise processes that fork into the background, as most daemons do. Now that the code has undergone a little more testing, and been pushed into the archive, it’s worth explaining a little bit more of the background as to the how, and why, we do this.

The why is easiest to answer first. Daemons are normally written to fork, usually twice; this detaches them from the terminal, process group and session that they were spawned from so that they remain running after the user logs out. The fork isn’t just mechanism though, over time a convention has occurred that means daemons don’t go into the background until their initialisation is complete and they’re ready to receive connections — if that’s their bag.

Simply adding an option to remain in the foreground might appear to eliminate the need to deal with the problem, but this also takes away the notification that the daemon is ready for use. Over time this signal can be replaced with other notifications: registering a known D-Bus name, or simply raising SIGSTOP; but these require code changes that need to be agreed with upstream first. Making code changes also assumes that we have the code. Whether we like it or not, sysadmins will often have the need to run proprietary daemons — or even simply older versions of software where the patch is too invasive.

So that’s why we have to do it, now how do we?

This is one of the reasons that building the service supervisor into init, rather than having it as a seperate process, makes sense. Init has a few special kernel-provided buffs, one of which is that orphaned processes are reparented to it. When you run a daemon from the command-line, the process is initially your child; it forks once and the parent dies, the new child is now orphaned, and thus reparented to init. (Most daemons now run setsid and fork a second time. This is to ensure that if they open a tty device, they don’t unexpectedly become its owner.) Init, like any other process, receives notification about its children through wait so will know when daemons terminate; the “must have” of supervision.

So if all daemons are our children we are notified when they terminate and why; we can compare their exit status or signal against a list of known good ones, and choose whether we need to respawn the dead job or mark it as stopped normally.

This isn’t enough though, all we get is the process id of the dead child. We still need to relate that back to a job somehow. One way to do that is to use waitid with the WNOWAIT flag, leaving the process on the table so we can examine /proc to find out more about it. This seems like quite a reasonable approach, we can then match a process to a job by details such as what binary it was actually running. Unfortunately this only works for singleton processes where we’re guaranteed that only one of them exists, both at the job level and at the process-level itself; should the process fork, even to run another child, we could accidentally consider it to have died. Daemons need to be able to run their own children, or even have pools of them to use; and we also need to be able to run multiple copies of daemons where we can support it.

So we really do need to know the process id of the actual daemon process we should be supervising. Unfortunately any method of passing this back to init, even relatively common ones like writing it to a pid file, aren’t sufficiently standard or reliable to do this kind of work.

Ideally the kernel would just tell init when a process was reparented to it, provided both the child process id and that of its previous parent. Such a notification doesn’t exist today, though would be a nice project to try and get it into the kernel mainline; difficult if there’s only one implementation using it.

If we can’t have that, a syscall that would allow us to watch a process and find out when it forks would be the second-best thing. We’d have the previous process id since we were watching it, and we’d hopefully be able to obtain the new child process id from this.

Happily that syscall exists, and I suspect you use it all the time if you’re a developer; it’s a bit of a mad leap to using it inside init, but as you can see, it works rather nicely. All we need do is watch the process, and follow it each time it spawns a new child. We stop watching as soon as we have followed twice (once if a different option is used), or if the process runs a different binary by itself. And thus we can know the process id of daemons we spawned, even if they attempt to detach from their parent process which they’ll just be reparented to anyway.

What’s the syscall? Oh, hmm, is that the time? Got to go! Alright, it’s ptrace.

Supervising forking processes


quest /tmp# cat test.c
#include <sys/types.h>

#include <stdlib.h>
#include <unistd.h>

int
main (int   argc,
      char *argv[])
{
        pid_t pid;

        pid = fork ();
        if (pid > 0)
                exit (0);

        pid = fork ();
        if (pid > 0)
                exit (0);

        pause ();
        exit (0);
}
quest /tmp# gcc -Wall -g -O0 -o test test.c

quest /tmp# cat /etc/event.d/test
wait for daemon
exec /tmp/test

quest /tmp# start test
test (#0) goal changed from stop to start
test (#0) state changed from waiting to starting
event_new: Pending starting event
Handling starting event
event_finished: Finished starting event
test (#0) state changed from starting to pre-start
test (#0) state changed from pre-start to spawned
process_spawn: Spawned main process 6380 for test (#0)
Active test (#0) main process (6380)
test (#0) main process (6380) forked new child 6381
test (#0) main process (6381) forked new child 6382
test (#0) state changed from spawned to post-start
test (#0) state changed from post-start to running
event_new: Pending started event
Handling started event
event_finished: Finished started event