Calling things by the same name

In response to my blog post “Whatever you do, don’t fix the kernel!“, David Zeuthen (prominent plumber, the maintainer of HAL and author of DeviceKit) wrote:

Scott, here’s why you’re wrong. It’s very simple and comes down to two points

- you obviously agree we can’t break huge amounts of userspace by changing DEVPATH

- having two names emitted from the kernel (_just_ because lots of user space is
broken) is just wrong and confusing
=> much better to fix up things in user space

Besides, what’s in a freaking name _anyway_? Apps should be using stable symlinks or, gosh, a device enumeration framework like HAL or the upcoming DeviceKit.

He makes, I think, an interesting point.

Why do we have two names for devices?

The kernel maintains its own namespace for devices which is based on its tree of internal objects and exported via the sysfs virtual filesystem.  My mouse’s object path is /devices/pci0000:00/0000:00:02.0/usb2/2-4/2-4:1.0/input/input2/mouse1 (convention is to omit the /sys prefix), and a class device link also exists as /class/input/mouse1 for easy access.

Properties of the device object are exported into user space via the sysfs filesystem, at that path under its mount point, and announcements of new objects, and significant changes to or removal of existing objects are made through the uevent system.  One of those properties is details about the device node that needs to be created; the dev file contains the major and minor number, and these are also present in the MAJOR and MINOR keys of the uevent.

The udev daemon listens out for these uevents and creates device nodes under the /dev path for userspace to use. These device nodes have a naming scheme that is mostly flat, with some sub-directories used for grouping. It records both the kernel object name and device path in its database so that lookups can be performed on device removal, and queries by applications using DeviceKit.

It also passes on the uevent to HAL, which stores the mapping in its own database and performs its own actions.

Applications can then use DeviceKit or HAL to enumerate devices by type, or walk the tree of devices, and lookup the actual device node path from that. They may also use them to lookup the object information for a given device node path.

That’s quite a lot of work going on behind the scenes already to map between two different names for a single device, all my proposal tried to do was reduce a little bit of the user-space side of that work by harmonising the names.

But maybe David has a point, the real problem is that we have two names in the first place!

Obviously the /dev paths are necessary for the vast number of userspace applications that still require them. But why do applications that use DeviceKit or HAL need them?

If the kernel placed a device node inside the sysfs filesystem, we wouldn’t need to do any path lookup, we’d just append the fixed name of the node to the object name.

udev would only need to make symlinks to those devices in /dev for legacy applications.

Concept Distro

The automotive industry, with its particular emphasis on efficient workflow and practices, has had a lot to teach the software world over the years.  From the process of requirements, specification and design through to LEAN development practices, it is difficult to argue that we haven’t learned anything from them.

I think that there’s another practice from that industry it might be fun to adopt: the Concept Car (sometimes known as a Show or Halo Car).

These are cars where the designers and engineers have been allowed to let their imaginations run wild, and build something that shows off the limits of what’s possible.  Often they’re also used to explore new technologies or ideas without having to commit to standards of production that would be required for a marketable road car.

And that’s pretty much the key point about these cars, they normally build just one or two and take them around the car and motoring shows for everybody to look at and talk about.

Obviously I’m not suggesting that we build strange and outlandish cars, and drape them in fancy lights and scantily clad people on a slowly rotating podium; but I think the idea can translate to our world.

Thus I’d like to humbly introduce my idea of the Concept Distro.

The Concept Distro would be an engineering project to allow developers and maintainers to let their imaginations run wild.  It’d be released, probably to demonstrate at a major event, and would explicitly not be supported.  Not even basic security support, or a bug tracker, or even answering questions about why things don’t work.

On a Concept Car, it’s entirely normal that half of the doors don’t even open; likewise in the Concept Distro, it would be entirely expected that half of the icons were just placeholders and didn’t do anything if you clicked on them.

After release, engineering effort could be focussed either on integrating the successful technologies into Linux distributions proper, or on working on the next Concept Distro for the next big event a year or two down the line.

In the early days of Ubuntu, when we had two different CDs, we had a plan to do this kind of thing with the Live CD.  Since that didn’t have an installer, it could be a little more experimental and a little more risqué.  It was a good place to try out Network Manager before we integrated it with the distribution proper, and the intent was that the naked people would have had even less clothing (I didn’t mind the loss of this, the male model they picked was not the prettiest of the options).

Assuming we don’t resurrect the naked people, what kinds of things would we do with the Concept Distro?

It’s a chance to make some fundamental changes without having to worry about the support or upgrade implications of them.  I’d like to see what we could do by assuming that the filesystem is a single mount of ext4 on LVM on RAID, which we grow onto additional disks as they are made available.

And since we wouldn’t have to worry about partitioning, it might be interesting to look into rearranging the hierarchy.  Maybe having /System, /Software and /Users really is better than /bin, /usr and /home.

If we went down that route, we could throw out the traditional package manager and experiment with some new approaches.  What better way to upgrade the operating system than:

cd /System
bzr update

or switch to a new version with bzr switch?  It works well enough to upgrade my WordPress installation, after all.

From a technology fetishist point of view, there’s plenty to play with and try out.  Would we use ALSA and dmix instead of PulseAudio?  Assuming we didn’t use the Concept Distro to try out going fully volume control per application, of course.  It’d be a great place to see what we can do with Upstart, udev, D-Bus, DeviceKit (replacing HAL)  and other plumbing-layer components.

In the desktop library layer, the bling guys could play with Multi-Pointer X with kernel-mode setting support and a resolution independent GTK+.  Rendering could be fully indirect or entirely direct GL based, depending on preference.

And for the desktop itself, the user experience and interface designers have a completely blank canvas to play with.  Since it’s just a Concept Distro, one needn’t worry about the ability of users to transition to new ways of working.  Instead you can see how they react to seeing new ways of working in a demonstration or talk, perform usability testing in the lab and even see how they get on in the field.

It would be a very fun and exciting project.

Unfortunately, unlike the car world, there’s not necessarily the funding for such a thing.  Who would want to finance an ongoing software development project that was explicitly intended to have no users?

In the automotive world, the Concept Car from a development point of view is important since companies cannot, for example, experiment with new engine technologies and expect their customers to be able to drive them on the road.  In the software world, such “lab” projects are much easier to develop in isolation and tend to remain on our own workstations.

The Concept Car can also serve as a marketing tool, it draws potential customers to your show stand and while looking at the sexy car on the stand they’re ripe for being sold a somewhat more pedestrian road car.  It also aids towards customer loyalty, since you’re more likely to buy another car from a manufacturer who is showing off the most advanced concepts.

In the Linux world, while we appear to have direct competitions between the distributions, the reality is that we co-operate far more than you might expect unless you’re involved with development.  A Concept Distro would need upstream work from just about everybody.

And would such a thing help convert people from Windows or Mac OS?  If it would, maybe it’s a good idea after all.

Whatever you do, don’t fix the kernel!

As you may have read in LWN (subscription required, and strongly recommended anyway), there’s been some argument on the linux-hotplug mailing list, the historically named home of udev development, about device naming.

The key threads are “default udev rules” and “Patches for device names“.

It all started when Kay reminded everybody that distributions should attempt to drop their own udev rules in favour of those supplied by upstream.  For those not familar with udev, the rules are a language that creates device nodes and performs other actions based the information about that device from the kernel.  A typical rule to put all devices from the “sound” subsystem into the “audio” group looks like:

SUBSYSTEM=="sound", GROUP="audio"

Sometimes these rules also change the names of the devices.  For example rules such as the following are automatically generated to keep the name of your ethernet devices the same between reboots:

SUBSYSTEM=="net", ATTRS{address}=="00:11:22:33:44:55", ATTR{type}=="1", NAME="eth0"

Ironically, perhaps, none of the argument is about the names of the devices, the permissions assigned or the groups they’re placed in.  We’re all pretty much in agreement about that.

Every major distribution pretty much follows the plan laid out in the devices.txt file found in the kernel’s Documentation sub-directory.  This is maintained by the Linux Assigned Names and Numbers Authority, and up until 2.2, was included by reference in the Filesystem Hierarchy Standard (FHS).  Nobody really knows why the reference was removed, I guess the LSB didn’t like having standards everybody agreed on ;-)

So what is the argument about?  Marco d’Itri, the Debian udev maintainer, is arguing because he’s spent a lot of time and effort making their rules readable and elegant in their operation.  The upstream rules are, in his opinion, somewhat scraggy.  I don’t really see this as a problem, we can fix the upstream rules to be more elegant easily enough.

My argument is different, and is a little more fundamental.

While most of the rules do udev-specific things like permissions, groups, run callouts to gather more information and perhaps run programs after device creation, we have many rules such as this:

KERNEL=="hw_random", NAME="hwrng"

What that says is:

Rename the kernel device “hw_random” to “hwrng”.

This makes the device name correct according to devices.txt.  What irritates me about this is that this rule should be entirely unnecessary!  It would be a one line patch to the kernel to cause it to name the device properly in the first place.  Then we wouldn’t need to spend the resource and CPU time changing the name every single time every Linux machine around the world boots.

There’s another set of rules that annoys me:

KERNEL=="device-mapper", NAME="mapper/control"

The kernel object for the device mapper control node is /sys/class/misc/device-mapper, but the device name according to devices.txt should be /dev/mapper/control – in a sub-directory. The kernel and udev have a mechanism to deal with this, the kernel object could be named /sys/class/misc/mapper!control and the right thing will happen.

Another similar class of devices needs udev to rename them:

KERNEL=="mice", NAME="input/mice"
KERNEL=="mouse[0-9]*", NAME="input/mouse%n"

The first one seems straight forward, but the kernel object is named /sys/class/input/mice so if we used the ! trick, it would become /sys/class/input/input!mice. I can appreciate that it’s ugly. Similarly for the mouse case.

I’ve suggested a fix for this though, and this fix also alleviates any concerns about backwards compatibility with sysfs names. The uevent from the kernel for the “mice” device looks like this:

ACTION=add
DEVPATH=/devices/virtual/input/mice
SUBSYSTEM=input
MAJOR=13
MINOR=63

I’ve suggested where the device ends up in a sub-directory, adding an extra field to this:

DEVNAME=input/mice

When present, udev would use this instead of the last part of the sysfs path as the kernel name. The extra cost to the kernel is a single %s in an existing sprintf() call, the result, a vast saving in userspace time.

This fix would also let us deal with the raw USB devices, and other things like the DVB devices, where we have to construct the device names. For example, the following rule is used to name DVB devices:

SUBSYSTEM=="dvb", PROGRAM="/bin/sh -c 'K=%k; K=$${K#dvb}; printf dvb/adapter%%i/%%s $${K%%%%.*} %%{K#*.}" NAME="%c"

That means that for every DVB device, on every computer, every time Linux boots, we have to fork and exec a shell, do some string pattern matching, fork and exec printf and apply more string pattern matching to the format string to name the device.

This could be avoided by doing that printf in the kernel, and setting DEVNAME for that device.

Device names are set down in a standard. That standard is shipped inside the kernel’s own source tree. Most distributions are already following that standard. The udev default rules follow that standard. Most distributions are likely to adopt the default udev rules. This is, for all intents and purposes, as official naming policy as you can get.

For those devices where the name is static, or constructed entirely from information from the kernel (ie. not persistent storage, input, network, etc.); why do we waste resource and CPU time every single boot changing the name that the kernel exports to match the standard?

To me this is obvious, fix the kernel to export the right name in the first place.

To kernel developers, such as Greg K-H, this is not so obvious:

“Wait, why do this at all?”

and

“Can’t you live with input devices having a few rules in udev? Is it really that hard to maintain? :)

While patches were apparently welcome in the first thread, by the second thread when it was clear that patches were going to be done, they didn’t seem quite so welcome after all.

This isn’t the first time that I’ve seen kernel developers claim that it’s better to work around the kernel in userspace than it is to fix it. I could understand this if we didn’t have the source code to our own kernel, but we do.

The kernel isn’t sacred and it isn’t a separate part of the system. It needs to be seen as just one component of a fully integrated system, especially by its developers.

That 12ft-high wall between “kernel space” and “user space” needs to come down.

As LWN notes, we have a lot to talk about at the LPC in September.

Development Platform

I’m afraid I have a confession to make.  A couple of weeks ago, I purchased an iPhone.  And to make matters worse, I’m wonderfully happy with it.

Now, I know that I should have got something more compatible with the community that I’m a member of.  Maybe one of those OpenMoko powered Neo FreeRunner devices or even an Ubuntu Mobile powered prototype device.

But an iPhone it was.  Why?

Well, frankly I needed something that works today.

The iPhone is a fascinating device.  Don’t worry, I’m not going to go on about its features and all of its bling.  What fascinates me is how easily Apple brought it to market, and now that the App Store is up and running, how quickly native applications are being written for it.

The most breath-taking thing is that this device is effectively running a version of Mac OS X ported to the ARM processor, and with any unnecessary bits for the smaller platform removed.  The graphics, audio and other core libraries are basically the same as on the bigger brother computers.

In other words, Apple have done what Linux always promised; turned Darwin into a truly scalable platform.

What’s more, the pace at which new applications have been developed for it shows that this platform is easy to write for.  My phone has rich, native applications for Twitter, Facebook, Flickr and Google; none of which came pre-installed.

I have a theory about how they’ve managed to scale their platform so quickly down to a size that fits in my pocket whilst also running on a machine that barely fits on my desk.  The same theory explains why developers have been so quick to develop applications for it.

It’s not that their platform is better, or more capable, or even necessarily more flexible.

It’s that their platform is better componentised.

The core technologies of their platform are grouped into easy to understand components.  It’s easy to draw boxes that show how these stack up to provide functionality to the developers, and it’s easy to see which boxes you can remove when scaling the platform down.  Documentation is easier to write too, each component has a specific function and tech writers can turn that into a story and write simple to understand overviews and rich API documentation.

Audio playback is a great example here.

In Linux, you want to play sounds from your application, so you have a quick hunt around for Linux audio APIs.  Your resulting list looks something like this:

  • libcanberra – “a simple abstract interface for playing event sounds”
  • gnome-sound – “Sound playing routines”
  • GStreamer – “Media framework”
  • PulseAudio – “Sound Server”
  • ESD – “Enlightened Sound Daemon”
  • libasound2 – “ALSA library”
  • ALSA – “Advanced Linux Sound Architecture”

And those are just the libraries and daemons installed by default, and I didn’t even include the format libraries such as libogg.  If I were to include those, and the various other sound daemons, mixers and framework libraries (hello, Phonon), we’d be here all night.

Where is an application developer actually supposed to start?

Even I have no real idea where GStreamer, PulseAudio and ALSA begin and end; and where they overlap and contradict each other, which I’m supposed to use.

Apple developers have it much easier.  If you want to do anything with audio, you want Core Audio.

If I were to try and do something more interesting, like putting things on the screen, a somewhat common requirement for GUI applications, I’d have to read up on Clutter, Pigment, GTK+, GDK, Cairo, Pango, FreeType, Xft and X11.  At least.

An analogy can be drawn with Lego.

When I was a young kid, if I wanted to make cars to sit on the roads around my lego town, you used to have to build them from scratch.  I didn’t really care about lego cars, but the town looked silly without them, so it was a chore.

The chassis for each car was the same.  A 1×4 flat at each end for the bumpers, with a 2×4 end on in the middle to make the wheel arches.  These were joined by a 4×4 to make the car floor.  (Sadly I couldn’t find any images on Google).

You had to know how to do it, but when you did there was a certain pride in being able to build a car from memory and knowing how all the pieces fit together.  If you cared about cars, anyway.

Then an amazing thing occurred.  Lego released a new car, and in the box was a single piece that made the chassis.  No more mucking around and searching for lost bits, or realising you’d built it upside down.  Now you could instead spend more time deciding what colour the body and windows would be, or if you really didn’t care, spend more time on the houses and other buildings that were more fun to build.

If the single piece wasn’t right, nothing stopped you building your own custom chassis, but it was a great time-saver.  Nowadays they probably have a box where a complete car rolls out, but that’s ok too.  Those are the boxes for people who really don’t care about cars, but understand that they need them to fill the multi-storey car park.  They do other boxes with a thousand pieces to build a single car for those people who like making cars.  Those are neat, the engines look like they’re working and everything!

Apple’s approach is somewhat like this.  Their APIs are grouped into big components that you can quickly get to grips with, and spend your time on the interesting bits of the application.  Linux’s API stack is more like a box of bits, you have to know how to fit them together and build the chassis before you start.

The only people that really delight in the differences between GTK+, GDK, Cairo and X11 are the authors of those particular parts of the platforms.  The rest of us really wish we just had a single piece marked “InterfaceKit” that we could use.

LSB 4

Any views or opinions expressed here are my own, and not that of my employer or any project I am a member of.

InternetNews ran a story last Thursday (picked up via LWN) asking whether LSB 4 will standardize Linux?  In it, they interview Jim Zemlin, the executive director of the Linux Foundation, and the article expresses the feeling that if only the distributions would adopt it, the world would be a better place.

To those that know me, I may sound like a skipping CD, but I just don’t see anything in LSB 4 that will change the current situation because they have not addressed the fundamental problem with the LSB.

The failure of the LSB to actually engage with the distributions it’s attempting to standardise.

This wouldn’t be so much of a problem if the LSB attempted to document existing practice in the form of standards, while acting as a forum for development of new practices which could be trialled before standardisation.  Much as the IETF does, now.

Instead, the LSB sees itself as a development group that decides on future direction itself and dictates that to the distributions.  That’s not necessarily a bad thing, it’s pretty much the way that the W3C works.  But to work successfully, you must represent everybody that you expect to follow the standard.

To this day, the LSB still feels like an RPM-only club.  The core specification specifically requires RPM, and in fact much of the other system-related pieces are based on the layout and design of RedHat and its derivatives.

That is, except for those bits that the LSB invented all by itself, such as the Init Scripts section.

While much of the LSB can be hacked into a different distribution through compatibility layers and tools, such as alien, what ISV or other vendor wants to provide a support contract against a distribution that has such kludges?

The whole point of the LSB is that ISVs and other vendors feel confident being able to simply target their software or platform to the standard, and safe to honour support contracts on any deployment to an LSB-certified operating system.

If the distributions themselves don’t directly implement the LSB specification, there will never be the confidence to deploy against it directly and we’ll remain in a world where vendors directly target the distributions.

And until the LSB invites all of the distributions to the table to fundamentally redraft the specification to provide a common base that they are all happy to implement directly, they’ll still conform through hacks, kludges and compatibility layers.

Intrepid Sprint (London)

The great thing about the Ubuntu distro team development sprints is that you get to sit around a table and share your knowledge about workarounds for all of the broken things in the current release:

  • To get the machine to resume from suspend, boot an older kernel
  • To get X to start, disable usplash
  • To get a useful desktop, wait for the white screen, press Alt+F2 and type “metacity --replace

GUADEC Hacking

Upstart 0.5: Relationships

Even the relatively simple System V rc scripts recognise that there are relationships between services, and that in many cases one or more others must be started before a particular service can itself be started: it allows for such relationships to be expressed by using a directory of numbered scripts that are run in series by the sysv rc script.

Tackling this problem in some way is arguably one of the main reasons that each of the alternate init daemons exists. Even launchd acknowledges the problem, even if its solution is to tell service developers that they should spin or sleep while dependencies aren’t available.

The Competition

The way in which the other leading init replacements tackle the relationship problem is through dependencies. This is not that surprising, since the concept is shared (and effectively mirrored) by both the dynamic link loader and the package manager; both things that a service maintainer knows well.

To illustrate how dependencies work, since I use that term precisely to mean only this behaviour, we’ll use one of the chains of the well known Network Manager service.

  • Network Manager depends on HAL
  • HAL depends on D-Bus

When A depends on B, B is required for A to function properly. Any attempt to start A must first start B.

This works well for the link loader, when we load an executable we also need to load and map the shared objects it links to.

It also works well for the package manager, when we install Network Manager it means we also need to install HAL and D-Bus for it to function.

However for an init daemon, it’s not normally ideal: the only reason that D-Bus and HAL will be running is because Network Manager depends on them. If we were to stop Network Manager, we would also stop HAL and D-Bus.

This obviously isn’t what we want, HAL and D-Bus are both essential services in their own right. Thus we end up with a target or goal set of services that must be started anyway, within this group the dependency relationships are only effective for ordering of them. Ironically, it is very rare indeed for a service to not be a target and so all of the complex ability of the dependency-based daemon is lost; the only reason to generate the dependency tree at runtime at all is to allow for parallel starts.

Upside Down Dependencies

Thus one of the first things that service maintainers have to get used to about Upstart is that its service relationships are upside down from the way that they might expect. Upstart assumes that if a service is installed, not disabled, and the required services, tasks or hardware is available then the service should be running.

In the dependency-based model, starting Network Manager would first start HAL which would first start D-Bus.

In the Upstart (event-based) model, D-Bus is started fulfilling HAL’s requirements so HAL is started, fulfilling Network Manager’s requirements (once a network card is available?) so Network Manager is then started.

Upstart has no notion of targets or goals, it simply ensures that all services that can and should be running are; and ensures that services are stopped when it is no longer the right time for them to be running.

Relationships through Events

The way in which relationships between services are defined is by having services react to each other’s events. To continue with our example, HAL would therefore have the following in its job definition:


start on started dbus
stop on stopping dbus

The first line means that when the dbus service is fully up and running (recall from previous posts that this event can be delayed as necessary), HAL will itself be started.

The second line is a little more interesting. Events in Upstart will block until the jobs they affect complete, and the stopping event is emitted before the dbus job is actually stopped and blocks it from doing so. Put more simply, HAL will be fully stopped before D-Bus is stopped.

Thus we have the simplest kind of Upstart relationship. Starting D-Bus will start HAL immediately afterwards, and stopping D-Bus will stop HAL first.

The portmap problem

Most maintainers at this point will be feeling quite smug and about to hit the comments button because they’ve thought of an example service that actually is a dependency, and should not be running if nothing needs it.

Remember that I said they were rare, not non-existant.

One such example is portmap, another is often something like tomcat. There are a few, but they’re certainly not the common case.

Happily one of the elegant things about Upstart’s design is that it does still support this model where it’s needed. In order for portmap to be started when we start an nfs-server, we simply write the following in portmap’s job definition:


start on starting nfs-server
stop on stopped nfs-server

Compare to the example for D-Bus/HAL and you’ll notice that it’s the events that have changed.

Remember that the starting event, like the stopping event we used in the previous example, blocks the job until jobs affected by the event are completed. Thus this first line means that when we start nfs-server, it will not be started until portmap is started.

And the second line is pretty much the mirror of the first in the previous example, once the nfs-server is stopped, we stop portmap as well since it’s no longer needed.

It may seem a little odd that the rules go in portmap, and not nfs-server, but it makes logical sense. It means that for an admin to work out why portmap is getting started, they just need to read the portmap definition and not hunt around the system to see what else might be doing it.

Also in many of the cases, such requirements are actually conditional. Apache doesn’t need to require tomcat, it’s only a requirement if it’s installed. Thus it makes more sense for tomcat to add itself to Apache’s environment rather than Apache to look for tomcat.

Upstart 0.5: Events

In the previous posts, I’ve covered the various features that make Upstart a good service manager, but these are things you’ll find in most others as well. It’s now time to cover that which is singularly unique to Upstart, Events.

Start and Stop

You’ve already seen the start and stop commands, which do somewhat unsurprising things to jobs. The important thing to remember about these is that they are not events. I just wanted to clear that up before we start, since it’s often been a source of confusion not helped by the design of some earlier versions of Upstart.

start and stop operate directly on jobs, and the command will not normally return until the operation is complete or otherwise interrupted. Services are considered complete when they are running, Tasks are considered complete when they have stopped again; in both cases the stop command is complete when the service or task has actually stopped.

This is important since it provides a common-sense behaviour, ensuring that the following operation is not a race condition:


# start apache
apache running (start), process 3591
# wget http://localhost/

Solving race conditions is one key part of Upstart’s purpose.

Both commands may also set environment variables, those set by the start command form part of the environment of the job itself and those set by the stop command are available to the pre-stop script.


# cat /etc/init/jobs.d/getty
instance $TTY
env SPEED=38400
exec /sbin/getty $SPEED $TTY

# start getty TTY=tty1
getty (tty1) running (start), process 4152

Events

As described above, the start and stop commands are admin instructions that act directly on named jobs. Events have many similar properties: they carry environment variables that end up in the environment of jobs they start, and they are not complete until the jobs that they affected have been started or stopped as appropriate.

The difference is that the start and stop commands are targeted at specific jobs, whereas events have no such targetting and instead it is jobs that specify which events they are interested in.

In the Upstart world events serve three general purposes: they act as signals of state changes that jobs can react to (e.g. hardware going away), as method calls to automatically start or stop jobs (e.g. shutdown) and as a way of passing information between jobs.

Events are identified by their name and have a different namespace to that of jobs. They are emitted by a D-Bus call or by using emit on the command-line, naming the event and providing any associated environment variables you wish:


# emit interface-up IFACE=eth0 ADDRFAM=Ethernet ADDRESS=01:23:45:67:89:0a

Jobs may match them on this name and any number of their environment variables, specifying whether the event would automatically start or stop the Job.


start on interface-up IFACE=eth* ADDRFAM=Ethernet

As a short-hand, where the order of the variables for an event is fixed, the names may be omitted:


start on interface-up wlan*

When a job is started by an event, the environment for that event forms part of the environment for the job and may be used when matching events that can automatically stop the job. Harking back to our getty job from previous posts, we can bind this to the lifetime of the underlying device.


start on tty-added
stop on tty-removed TTY=$TTY

instance TTY
exec /sbin/getty 38400 $TTY

We can also match multiple events, either requiring that both occur or either using unsurprising operators:


start on a-up and b-up
stop on a-down or b-down

In these situations, once stopped, both the a-up and b-up events must happen again for the job to be restarted.

Upstart Events

Upstart itself only emits a few events, leaving the rest up to application authors to define. The startup event is the most interesting of these, and is ultimately what nearly all jobs get chained from.

Job Events

As jobs are started and stopped, Upstart emits events on their behalf for four key points in their lifecyle.

  • starting is emitted when the job is first starting, and the job will not actually be started until this event completes.
  • started is emitted once the job is fully running.
  • stopping is emitted when the job is stopping (after the pre-stop has completed), the job will not actually be stopped until this event completes.
  • stopped is emitted once the job is fully stopped.

All of the events have the name of the job in the first variable, JOB and the instance of the job (if applicable) in the second variable, INSTANCE. The stopping and stopped events then have a series of variables indicating the reason for the job stopping: RESULT indicates whether it was a normal stop or a failure then if it failed, PROCESS will say what failed and EXIT_SIGNAL or EXIT_STATUS will contain the terminating signal or exit code.

For example, we can take action to backup a database if the server crashes:


start on stopping hersql RESULT=failed EXIT_SIGNAL=SEGV
task
exec hersql-backup

Jobs can also export variables from their own environment to others through these events by using the export stanza:


start on interface-up
stop on interface-down $IFACE

instance $IFACE
export IFACE
exec ...

Another job may then be started along with this one, and know what interface it’s bound to:


start on started JOBNAME
stop on stopping JOBNAME

instance $IFACE

We’ll look at the various powerful forms of dependency that these events allow us to express in the next post.

Upstart 0.5: Job Lifetime

Continuing the series of posts on Upstart 0.5, in this post I’ll be talking about the various ways that Upstart allows you to manage the lifetime of a job. These are guarantees that Upstart provides you so that when you start a job, you know what will happen if that job dies unexpectedly or someone else tries to start the job as well.

Respawning

We’ve all encountered those daemons that mysteriously die: sometimes they’re taken out by the OOM killer, and sometimes they’re just buggy and crash from time to time. And there’s also those processes that exit when they’re done, and need to be restarted (e.g. getty).

For all of these, Upstart provides the facility to respawn the job; effectively an automatic restart in the case of failure. Respawning is controlled by three things:

  • Whether or not to respawn
  • Whether or not the job exited “normally”
  • Whether it has been respawned too many times recently

Let’s take the sobby server as an example, here’s a job that tends to crash every now and then, and we’d like to keep it running. However, we’re also aware that every now and then, it crashes hard and needs repairing; so we limit it’s respawning to 10 times in 5 seconds (which happens to be the default).


  exec /usr/bin/sobby --autosave-file=/var/lib/sobby/autosave /var/lib/sobby/autosave

  respawn
  respawn limit 10 5

The daemon will be continually respawned until either the limit is reached, or the service is explicitly stopped by request. This isn’t ideal though, sobby has an exit command which we wish to honour; the daemon is well written enough that it only returns the zero exit code if this command has been run, and otherwise always returns a failure or signal of some description.

In addition, we know that the ABRT signal is raised on the daemon when the session file is corrupted (I’m making this up, btw), so we want to stop respawning in that case:

To accomplish this, we simply state which exit codes and signals are considered a normal exit condition:


  exec /usr/bin/sobby --autosave-file=/var/lib/sobby/autosave /var/lib/sobby/autosave

  respawn
  respawn limit 10 5

  normal exit 0 ABRT

Tasks can be respawned too; the only difference is that zero is always considered a normal exit condition for a task:


  task
  exec /usr/sbin/some-check $DEVICE

  respawn

This task will be continually run until it ends with a zero (success) exit code. We could add additional normal exit conditions as well, just as we can with a service.

Singletons

All Upstart jobs are singletons by default, this means that only one instance of that job may be running at any one time. To illustrate, let’s continue using the sobby job we defined above and start it:


  # start sobby
  sobby running (start), process 14977

Ok, we have a single instance of the sobby job running, and we can interrogate the status of that:


  # status sobby
  sobby running (start), process 14977

Now what happens if we (or someone else) tries to start another copy:


  # start sobby
  start: cannot start 'sobby': Already running
  zsh: exit 1   start sobby

This is the most sensible and sane default, it saves you having to worry about locking between services and mos importantly means that you can treat failures to obtain resources as true errors.

For example, if you request a D-Bus name and don’t get it, or attempt to bind to a socket and fail, you can treat that as an error since you know the service manager is already ensuring you’re a singleton. This means that you won’t silently pretend everything’s ok, and thus won’t hide problems.

Instance jobs

But what if you do want to be able to run multiple copies of the job? Upstart supports this though instance jobs, which may have multiple copies running. As well as being identified by the shared job name, each instance is also identified by a second-level instance name.

The instance name for each instance of a job must be unique within that job. Attempting to start another instance with an already used name will return an already running error again.

Thus the usual method for defining an instance name is by using variables from the job environment, which you’ll recall come from sources including the start request.

Let’s use the getty job we defined in the last post and turn that into an instance job:


  instance $TTY
  exec /sbin/getty 38400 $TTY

The instance keyword is the new addition, this defines the name for each instance of the job. Setting it to an ordinary string wouldn’t be much help, since there could only be one unique expansion, and you’d be back to a singleton job again; so we define it using variables from the job’s environment which will be expanded.

In this case, we can have an instance of the job for each unique value of the $TTY variable. This makes sense since this is also what we pass to getty. This means that Upstart is still able to provide the guarantee that another getty won’t be running with the same tty.

All that we need do is pass the value of the TTY environment variable when we start or stop the getty job:


  # start getty TTY=tty1
  getty (tty1) running (start), process 15001
  # start getty TTY=tty2
  getty (tty2) running (start), process 15006

And if we try and run another copy with the same TTY variable, we’ll still get already running:


  # start getty TTY=tty1
  start: cannot start 'getty': Already running
  zsh: exit 1   start getty TTY=tty1

There’s no builtin way to allow unlimited instances, since these would tend to eventually consume all available resources. Since any service or task needs to operate on something, or even just write something, then you’ll need some kind of locking and something in the job environment to tell it what to work on or write. If someone manages to come up with a truly unlimited instance job, you could do it trivially by passing a UUID=$(uuidgen) variable and instancing on that.

In the next post, I’ll cover one of the major differences between Upstart and other service managers: events!