Archive for the ‘Technology’ Category.

Upstart 0.5: Job Lifecycle

Next month I am hoping to release Upstart 0.5.0, the culmination of almost a year’s worth of work on it.  Comparitively the version that shipped in edgy (0.2.x) was simply an essay to figure out the basics and the version in feisty thru hardy (0.3.x) a first draft.  The new version has been stripped back to the very basics and rebuilt to correct the problems we found with the earlier versions, and to make sure it can handle real world uses as simply and elegantly as possible.

Over the next few weeks, I’ll be writing about the new version; both how it has improved from previous versions and how it compares to what else is out there.

Introduction

First we’ll look at how Upstart allows you to manage the lifecyle of services and tasks (collectively jobs) that you wish to manage.  We’ll use the D-Bus daemon as an example service, simply because it’s a modern, well-behaved service that we’re all familiar with.

With SystemV RC, we would have had a single /etc/init.d/dbus file accepting both start and stop as arguments. They may have looked something like this:


case "$1" in
    start)
        start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon
        ;;
    end)
        start-stop-daemon --stop --pidfile /var/run/dbus.pid
        ;;
esac

As you’re well aware, the simple act of starting a daemon and stopping again is not so simple this way. You nearly always end up requiring some kind of helper like start-stop-daemon to help out, and rely on accurate PID files and the like.

Upstart, like just about every other modern service manager (but strangely, not SMF), takes care of all of this hard work for you. Instead of defining how to start and stop a service you just define what to start. Here’s how you’d define the same service in Upstart:


exec /usr/sbin/dbus-daemon

Setup and teardown

Of course, we all know that no service definition is ever that simple. I massively simplified the SystemV example for the purposes of documentation. In reality, we frequently need to do various things to set up the system for the daemon and clean up again afterwards. The original start shell code probably looks more like this (and even now, I’m simplifying for space):


mkdir /var/run/dbus
chown messagebus.messagebus /var/run/dbus

/usr/bin/dbus-uuidgen --ensure

start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon

We need a directory for socket files, etc. and to create the machine id if missing. ANd likewise to shut it down, we need to clean up:


start-stop-daemon --stop --pidfile /var/run/dbus.pid

rm -rf /var/run/dbus

And this is where most init replacements fall down (especially launchd). In fact, ironically, you’ll often find the developers using their minimal service definitions when they talk about how fast their system can boot. You can boot really fast if you don’t start anything properly.

Obviously I wouldn’t be pointing this out if Upstart didn’t allow you to do this properly; we’ll extend our minimal service definition to include the set up and tear down code necessary.


pre-start script
    mkdir /var/run/dbus
    chown messagebus.messagebus /var/run/dbus

    /usr/bin/dbus-uuidgen --ensure
end script

exec /usr/sbin/dbus-daemon

post-stop script
    rm -rf /var/run/dbus
end script

Before we just defined one process in a job’s lifecycle, known as the main process. Our new definition defines two more, the pre-start and post-stop processes. We’ve chosen to define them as shell scripts embedded in the definition, we could have defined them as binaries to execute if we preferred (using pre-start exec), and we could have defined the main process as a script (using script...end script).

As their name suggests, these processes are run before the main process is started and after it has been stopped respectively. In fact, Upstart guarantees more than that:

  • For every time that the job is started, the post-stop process will be run.
  • For every time that the main process is run, the pre-start process will have been completed successfully first.

It might seem a little strange that the post-stop process will always run but the pre-start process doesn’t have as strong a guarantee. This is because it’s possible for the job to be stopped immediately after it is started. Should that happen, Upstart will not run the main process since there’s no need, and therefore will also not run the pre-start process; however to ensure the system is clean, it always runs the post-stop process.

These guarantees also provide sane restart behaviour. If you restart a job, the main process is killed, the post-stop process is run, then the pre-start process is run again before the main process. If you cancel a restart (by stopping the job again) after the post-stop process has been run, it will always be run again.

Spawned, Running and Killed

Upstart makes important distinctions in the state of the main process, it does not necessarily assume that just because the exec() syscall has succeeded that the process is in a suitable running state. Likewise, it does not assume that just because the kill() syscall has succeeded that the process is no longer running.

The latter is easy to understand, delivering the TERM signal to a running process normally just invokes its own termination handler which may perform any number of activities before cleanly shutting down. Upstart waits for the actual child signal signifying termination before running the post-stop script, until that point the process is considered merely “killed”. Obviously too long in the “killed” state means Upstart delivers the much more harcode KILL signal, but that’s adjustable.

The former is harder to understand since the new binary is in memory and is probably at least initialising, but that’s the point: it isn’t yet ready for other jobs to use. In the SystemV script, this wasn’t an issue, since we could generally rely on daemons (well behaved ones anyway) to follow the convention that they should not fork() until initialisation was completed successfully.

Since Upstart forks and supervises its own processes, it generally prefers that daemons do not fork() and remain as the pid they were given when started. So how do jobs signify that they are ready? There are a few ways:

  • By forking as before. As I’ve talked about before, Upstart can supervise process that fork, and it will wait for that to happen before assuming the process is ready.
  • By raising the STOP signal. Jobs marked with expect stop will wait for this, and once received will sent it the CONT signal and assume that it is now ready.
  • By registering a D-Bus name. An early 0.5.x release will wait for a particular D-Bus name to be registered, and not assume that the job is ready until it has done so.
  • By calling listen(). Again, planned for an early 0.5.x release, Upstart will use the same mechanism it uses to follow forks to watch for the listen() system call.
  • With a post-start script, more on that in a second.

The last two processes

I’ve introduced the three processes that most jobs will tend to use, but there’s also another two which will be somewhat rarer but are probably the most powerful of them all. These are the post-start and pre-stop processes, and they’re interesting because they’re run while the main process is running.

The post-start process, as its name suggests, is run after the main process has been spawned and any event we were expecting (see above) has happened. The job will not be considered ready until the post-start process completes, thus a common use for it is to interrogate the daemon or send it commands it can only act on once its running.

The pre-stop process is run when a request to stop the job occurs (this means it is not run if the main process terminates on its own), and the process is not killed until it finishes. It receives information about the request, and can cause that request to be ignored (thus leaving the job running). Another common use is to send the daemon commands before it receives the TERM signal.

Next…

So that’s a look at the ways we can define the lifecycle of an Upstart job. In the next couple of posts we’ll look at the environment and session of jobs, and then at matters such as respawning and singletons.

Ubuntu Brainstorm Announced!

The Ubuntu QA community have put together an awesome new resource for Ubuntu users and developers - Ubuntu Brainstorm.  This allows you to suggest ideas for improvements, and to vote on the ideas others have suggested.

We have of course been inspired by the IdeaStorm site from our good friends at Dell but modified the concept to fit our needs.

The development team can now take the pulse on the most pressing user issues and propose the ideas as topics at the Ubuntu Development Summits and ultimately as specifications. Ubuntu development is in turn driven by detailed specifications written up in the wiki and tracked as blueprints in Launchpad.

An idea on brainstorm can easily be linked to a Launchpad blueprint as well as to a bug or a forum discussion thread. In this way we expect to bridge the locations where ideas are often submitted now, as forum posts or bug reports, with the blueprint format they should be expressed in to be implemented.

DH-Hell

I hate DHL.  This says it all.

Web 2.0 Service Pack 1

Why do I need to tell each and every web service who my friends are?  Why can’t last.fm, flickr and twitter just get this information from Facebook?  Likewise, why I do have to tell them all where I live, how old I am, what my website is, etc.?

How to (and why) supervise forking processes

Yesterday’s celebratory blog post demonstrated that Upstart is now able to supervise processes that fork into the background, as most daemons do. Now that the code has undergone a little more testing, and been pushed into the archive, it’s worth explaining a little bit more of the background as to the how, and why, we do this.

The why is easiest to answer first. Daemons are normally written to fork, usually twice; this detaches them from the terminal, process group and session that they were spawned from so that they remain running after the user logs out. The fork isn’t just mechanism though, over time a convention has occurred that means daemons don’t go into the background until their initialisation is complete and they’re ready to receive connections — if that’s their bag.

Simply adding an option to remain in the foreground might appear to eliminate the need to deal with the problem, but this also takes away the notification that the daemon is ready for use. Over time this signal can be replaced with other notifications: registering a known D-Bus name, or simply raising SIGSTOP; but these require code changes that need to be agreed with upstream first. Making code changes also assumes that we have the code. Whether we like it or not, sysadmins will often have the need to run proprietary daemons — or even simply older versions of software where the patch is too invasive.

So that’s why we have to do it, now how do we?

This is one of the reasons that building the service supervisor into init, rather than having it as a seperate process, makes sense. Init has a few special kernel-provided buffs, one of which is that orphaned processes are reparented to it. When you run a daemon from the command-line, the process is initially your child; it forks once and the parent dies, the new child is now orphaned, and thus reparented to init. (Most daemons now run setsid and fork a second time. This is to ensure that if they open a tty device, they don’t unexpectedly become its owner.) Init, like any other process, receives notification about its children through wait so will know when daemons terminate; the “must have” of supervision.

So if all daemons are our children we are notified when they terminate and why; we can compare their exit status or signal against a list of known good ones, and choose whether we need to respawn the dead job or mark it as stopped normally.

This isn’t enough though, all we get is the process id of the dead child. We still need to relate that back to a job somehow. One way to do that is to use waitid with the WNOWAIT flag, leaving the process on the table so we can examine /proc to find out more about it. This seems like quite a reasonable approach, we can then match a process to a job by details such as what binary it was actually running. Unfortunately this only works for singleton processes where we’re guaranteed that only one of them exists, both at the job level and at the process-level itself; should the process fork, even to run another child, we could accidentally consider it to have died. Daemons need to be able to run their own children, or even have pools of them to use; and we also need to be able to run multiple copies of daemons where we can support it.

So we really do need to know the process id of the actual daemon process we should be supervising. Unfortunately any method of passing this back to init, even relatively common ones like writing it to a pid file, aren’t sufficiently standard or reliable to do this kind of work.

Ideally the kernel would just tell init when a process was reparented to it, provided both the child process id and that of its previous parent. Such a notification doesn’t exist today, though would be a nice project to try and get it into the kernel mainline; difficult if there’s only one implementation using it.

If we can’t have that, a syscall that would allow us to watch a process and find out when it forks would be the second-best thing. We’d have the previous process id since we were watching it, and we’d hopefully be able to obtain the new child process id from this.

Happily that syscall exists, and I suspect you use it all the time if you’re a developer; it’s a bit of a mad leap to using it inside init, but as you can see, it works rather nicely. All we need do is watch the process, and follow it each time it spawns a new child. We stop watching as soon as we have followed twice (once if a different option is used), or if the process runs a different binary by itself. And thus we can know the process id of daemons we spawned, even if they attempt to detach from their parent process which they’ll just be reparented to anyway.

What’s the syscall? Oh, hmm, is that the time? Got to go! Alright, it’s ptrace.

Supervising forking processes


quest /tmp# cat test.c
#include <sys/types.h>

#include <stdlib.h>
#include <unistd.h>

int
main (int   argc,
      char *argv[])
{
        pid_t pid;

        pid = fork ();
        if (pid > 0)
                exit (0);

        pid = fork ();
        if (pid > 0)
                exit (0);

        pause ();
        exit (0);
}
quest /tmp# gcc -Wall -g -O0 -o test test.c

quest /tmp# cat /etc/event.d/test
wait for daemon
exec /tmp/test

quest /tmp# start test
test (#0) goal changed from stop to start
test (#0) state changed from waiting to starting
event_new: Pending starting event
Handling starting event
event_finished: Finished starting event
test (#0) state changed from starting to pre-start
test (#0) state changed from pre-start to spawned
process_spawn: Spawned main process 6380 for test (#0)
Active test (#0) main process (6380)
test (#0) main process (6380) forked new child 6381
test (#0) main process (6381) forked new child 6382
test (#0) state changed from spawned to post-start
test (#0) state changed from post-start to running
event_new: Pending started event
Handling started event
event_finished: Finished started event

On Metadata

The last release (Ubuntu 7.10) was the first in which we shipped Tracker enabled by default; this service runs in the background and indexes all of your files, storing information about them in a metadatabase which can subsequently be searched. The two main ways of searching are through the deskbar-applet (press Alt+F3) and within the nautilus file manager (press Ctrl+F).

That’s all well and good, but since we now have a metadatabase and indexer, what else can we do with it?

The first thing that comes to mind is improve those applications that attempt to maintain their own metadatabase; those that tend to be the primary apps that we use because they manage our all-important content. I’m going to pick on Rhythmbox here since it’s our default music manager, but the same ideas can be applied to our default photo manager, F-Spot, or any other application concerned with content.

A no-brainer is that Rhythmbox no longer needs to worry about walking directory trees, keeping inotify watches on them, identifying media files, etc. Tracker already does all of that. All we need to ensure is that tracker collects all of the metadata that Rhythmbox will need to start with — we expect it to come along and add additional metadata, such as the last time I played the track, where the album cover thumbnail is stored, etc.

Another thing you get to eliminate is the concept of “the Library”. Your entire home directory is already indexed, why care about partitioning it? We can just show the user all of their music, or all of their photos. Immediately. With no need to import from one arbitrary location on disk to another.

Tracker should then grow removable device support, indexing files on removable devices just as it does on the primary filesystem; but keeping mount-relative paths to the files and remembering particulars such as serial number, label, etc. for the device they were found on. This has immediate benefit for Tracker anyway, I can search for a presentation and I’ll be told which USB Key I wrote it to so I can find it again — I’m terrible for losing presentation slides after I’ve given the associated talk.

All Rhythmbox then needs to do is query Tracker for removable devices containing music, and show them as icons in the panel; the contents are already indexing — or if you’ve already used that device, indexed (no more wait for it to index my 40GB media player every single time I insert it). Since there’s just one metadatabase behind this, you may as well add an “All Your Music” option to the top which amalgamates the collection of music on your filesystem and removable devices, eliminating duplicates; this would be the thing you’d share, getting rid of yet another bug.

We then don’t need import dialogs. If I plug a media player in (or a camera, this applies equally there), the content immediately shows up in my browser. The only question we need ask the user is whether they wish to add the music on the device to their local collection, and that can be done inline in the window rather than with an obtrusive dialog. For F-Spot the experience would be that on plugging in a camera of photos, the main F-Spot window would open with the photos already in place (or appearing) in the rest of your collection and a “add these to your collection?” bar at the top — since you have the full app, dealing with adjusting images on import, or removing them entirely is much easier than fiddling inside an option-filled dialog.

The only other backends we’d need would be for remote media such as shared music –why isn’t there a shared photos standard yet?–, online content such as last.fm or flickr and devices that don’t act like disks; there are still some media players and cameras out there which are designed around import/export APIs.

Yet another change of blog software

I’ve not really been having much luck with blog software; I’ve found that the harder the software is to use, or the more maintenance it requires, the less I care about blogging.

Ironically, perhaps the easiest time was with Livejournal; it’s easy to set up, maintain and post to simply because it’s a hosted service. Unfortunately you don’t get a great deal of control over the output, and any kind of extras you want to add are out of the window. (Without paying somebody, anyway).

I moved from that to a Pyblosxom blog hosted on my own webserver; this had all the control over the output that I needed, with lots of optional extras. Management was a pain though, relying on me logging in to the web server and editing files directly — since nobody, especially me, likes writing raw HTML I’d chosen Textile as the formatting plugin but could never remember the formatting codes. And dealing with comments and spam? Forget it.

The next change in my experiments came when I moved my blog to being hosted by a friend, and he suggested Typo; the idea here was that it’s a drop-in webapp, so has an admin interface for writing posts, managing existing ones and dealing with comments and spam. Unfortunately it’s one of the slowest and least stable pieces of software I’ve seen, it’s really put me off learning Ruby on Rails as a result! And as a piece of software, it’s pretty inflexible; idiotic limitations made dealing with spam comments an issue and niceties like trackbacks almost impossible.

So I’m back to self-hosting my website and blog again, and at the recommendation of just about everybody, have installed WordPress to do it.

Since this means a change of backend software, this will almost certainly have spammed Planet again; sorry about that. I blame the original Planet author, really, I do.

Also I’ve adjusted my feed URL on Planet Ubuntu; previously this was a technology-only feed, but I’ve received far more complaints about the lack of posts about my flying exploits than when they were present — so the majority appear to want them!

UI Design Mistakes

A common mistake of user interface design is to come up with a clever solution to a common user problem, when you should work out a way to remove that problem in the first place.

My example here is the password entry dialog, in particular the GNOME Screensaver one.

The common user problem is that their password is rejected, even when typed perfectly, because the Caps Lock key is on.

The clever solution was to display in the dialog that the Caps Lock key was on, and thus hinting to the user why it might have been rejected.

The better solution would be to ignore the state of Caps Lock in all password entry dialogs, so it doesn’t matter whether it’s on or off.

Why I choose Bazaar (a history of revision control)

Like any sensible software developer, I have a close relationship with revision control systems. In my previous job, I was an SCM Engineer (see Software configuration management) which meant I had an even closer relationship than most, since we were running the CVS servers and actively using them to track changes and deployments.

We all know, deep down, that revision control systems shouldn’t exist. This kind of thing should be inherent in the design of the operating system, through standard file and filesystem formats. The OLPC interface is making some headway towards that, but for the rest of us, it means using a revision control tool throughout the development process.

Unfortunately, even though the tool is expected to be the most-used command on your system, very few of them are particularly easy to use. Thus there’s a large learning curve, and people become religious about their choice since they have invested significant time in using it.

Just to spice the mix up, not only will people religiously defend their choice of revision control system, but they’ll do so while actively hating it.

In the beginning there was CVS and we all thought that it was pretty good. It was based on the simpler RCS and shared a file-format with it, but introduced control of directory trees and remote operation.

Actually, in reality, CVS wasn’t that good. Its command set could be a little strange and inconsistent (e.g. it’s not possible to diff between two dates on a branch); the support for branching assumed that all branches would be merged into the mainline, and only once; and nobody ever really knew how to create a new project in a repository (tip. cvs import is wrong).

But we all used it anyway, and we muddled through. It did have some good features; it was simple, fast and pretty reliable–when it did break, you could usually fix the repository yourself. And most importantly of all, we understood how to drive it.

And so it was for many years, until Subversion (SVN) came along. Subversion intended to be “a better CVS”, perhaps this goal should have made us suspicious at the time since CVS was already being a pretty good CVS by itself; unfortunately we hated CVS so much we flocked to the new system in hope.

In hindsight, Subversion didn’t really improve on CVS much at all. In fact, arguably, the only real improvement was the addition of atomic commits (in CVS, each commit is per-file, so it’s manual labour to work out which change was made to two files at the same time).

(Its support for branching, tagging, copying, renaming, etc. were no better than CVS’s when done in the repository by hand.)

The cost of this single new feature was a much more complicated interface (with two separate commands), a backend that tended to break down weekly and a lethargic slowness to its operation.

Most people I know now justify their use of Subversion instead of CVS by “Subversion is maintained, CVS isn’t” which is a somewhat self-fulfilling justification.

While the mass conversion to Subversion, and ensuing disappointment and frustration, was going on; something new appeared on the horizon: Arch.

Arch was different, it broke one of the core assumptions of revision control, that of the repository as a cathedral. In CVS, and Subversion like it, if somebody wants to modify your code (even if on a branch) you need to give them access to your own repository. In some cases (especially with CVS), vast access control and permission structures would be in place to ensure proper behaviour.

With Arch, you don’t; all you need to give to anyone is read access. Anybody can make their own branch by copying yours and committing to their own copy.

This model also necessitated fixing a long standing problem that CVS had; Arch has repeatable (smart) merging. If you merge from a branch, you can merge again later, and again, and again.

Arch made this possible through each commit (changeset) having a globally unique identifier; made from the branch’s own globally unique identifier and the changeset number in the branch.

Unfortunately while this was a massive step in a new direction, Arch had an absolutely terrible user interface. Its command list was terrifying with over 100 commands, many of which had multiple word names (tla set-tree-version). It exposed too many of its own innards, and expected you to learn them. It also forced baroque file naming semantics on its users and strange policy (though shalt not commit without first running “make clean”).

Efforts were made to improve Arch’s user interface through projects such as baz, but they were always to be doomed from the start.

We’ve since seen an explosion of new revision control systems; Monotone, Darcs, Git and Bazaar.

What’s especially interesting is the commonality between these systems. They are all “distributed” like Arch, though they also all discard the strange “unique branch identifier” convention and instead simply assign a unique identifier to each file or commit.

This means that they all support personal branches, and by necessity all support repeatable (smart) merging.

So how do they differ, what are their killer features and killer problems?

Monotone is all about repository integrity, ensuring that every commit is both authorised and intact. It pays for this with a severe lack of speed.

Darcs is based around a “theory of patches”, a branch is not made up of its history but by the collection of patches in it. Unfortunately this often breaks down, and darcs frequently gets stuck calculating even trial and commonplace branch models.

Git is very strange to me; its killer feature appears to be the speed at which it can handle very large trees, but the interface is as insane as Arch’s was. It is heavily optimised for the “I only apply patches” development model, at the expense of ordinary development models (it shares an issue with Arch where calculating annotations on an individual file is an expensive operation).

What about Bazaar? Its killer feature is that it is designed to work the way you do. The command set is relatively small, and each of them works in the most obvious manner. It also supports plugins so that you can always implement your own workflow.

Of all the revision control systems, it’s the only one (that I’m aware of) that supports both distributed and centralised workflows (and lets you go distributed when you need to, e.g. when you’re on a plane).

Here’s a few examples of how Bazaar’s command set works the way you do. To start managing some code in bzr:

$ cd myproject
$ bzr init

To add the files, copy in your usual .bzrignore file and just add everything:

$ cp ~/bzrignore .bzrignore
$ bzr add
added foo.c
added bar.c

Check the output for mistakenly added files, adjust .bzrignore and remove the file with bzr rm.

A common operation is realising that the commit you’re about to make should really go on a new branch for now:

$ cd ..
$ cp -a myproject myproject-foo
$ cd myproject-foo
$ bzr commit

A copy of a Bazaar branch is a different branch, you can commit to it separately. There’s a bzr branch command for it too (which deals with issues such as bound branches, checkouts, etc.) but it’s nice to demonstrate that Bazaar does what you’d expect even when you don’t use its own commands.

Pulling changes from another branch (where you haven’t made any modifications yet) is easy:

$ bzr pull ../myproject

As is merging (when your branches have diverged):

$ bzr merge ../myproject

One particularly nice feature is that after a merge, you see the merge as a single commit and it can be treated as such; but it also has the set of merged commits indented under it–you can examine these as individual commits as well!

What’s the downside of Bazaar? Well, it’s not the fastest system (but by no means the slowest), for small to medium sized projects this is never an issue but may be for extremely large projects–fortunately the developers are improving its performance all the time!

But that doesn’t matter; it is, honestly, the first revision control system that I don’t hate.