On Metadata

The last release (Ubuntu 7.10) was the first in which we shipped Tracker enabled by default; this service runs in the background and indexes all of your files, storing information about them in a metadatabase which can subsequently be searched. The two main ways of searching are through the deskbar-applet (press Alt+F3) and within the nautilus file manager (press Ctrl+F).

That’s all well and good, but since we now have a metadatabase and indexer, what else can we do with it?

The first thing that comes to mind is improve those applications that attempt to maintain their own metadatabase; those that tend to be the primary apps that we use because they manage our all-important content. I’m going to pick on Rhythmbox here since it’s our default music manager, but the same ideas can be applied to our default photo manager, F-Spot, or any other application concerned with content.

A no-brainer is that Rhythmbox no longer needs to worry about walking directory trees, keeping inotify watches on them, identifying media files, etc. Tracker already does all of that. All we need to ensure is that tracker collects all of the metadata that Rhythmbox will need to start with — we expect it to come along and add additional metadata, such as the last time I played the track, where the album cover thumbnail is stored, etc.

Another thing you get to eliminate is the concept of “the Library”. Your entire home directory is already indexed, why care about partitioning it? We can just show the user all of their music, or all of their photos. Immediately. With no need to import from one arbitrary location on disk to another.

Tracker should then grow removable device support, indexing files on removable devices just as it does on the primary filesystem; but keeping mount-relative paths to the files and remembering particulars such as serial number, label, etc. for the device they were found on. This has immediate benefit for Tracker anyway, I can search for a presentation and I’ll be told which USB Key I wrote it to so I can find it again — I’m terrible for losing presentation slides after I’ve given the associated talk.

All Rhythmbox then needs to do is query Tracker for removable devices containing music, and show them as icons in the panel; the contents are already indexing — or if you’ve already used that device, indexed (no more wait for it to index my 40GB media player every single time I insert it). Since there’s just one metadatabase behind this, you may as well add an “All Your Music” option to the top which amalgamates the collection of music on your filesystem and removable devices, eliminating duplicates; this would be the thing you’d share, getting rid of yet another bug.

We then don’t need import dialogs. If I plug a media player in (or a camera, this applies equally there), the content immediately shows up in my browser. The only question we need ask the user is whether they wish to add the music on the device to their local collection, and that can be done inline in the window rather than with an obtrusive dialog. For F-Spot the experience would be that on plugging in a camera of photos, the main F-Spot window would open with the photos already in place (or appearing) in the rest of your collection and a “add these to your collection?” bar at the top — since you have the full app, dealing with adjusting images on import, or removing them entirely is much easier than fiddling inside an option-filled dialog.

The only other backends we’d need would be for remote media such as shared music –why isn’t there a shared photos standard yet?–, online content such as last.fm or flickr and devices that don’t act like disks; there are still some media players and cameras out there which are designed around import/export APIs.

34 Comments

  1. Peter Russell:

    And how would you stop things from appearing in your applications?

    Say I’ve downloaded a copy of LUGRadio to my desktop: As far as tracker is concerned this looks like a music file, but I definitely don’t want it to appear in my library in Rhythmbox.

    Equally I might have JPEG images for a web site I’m working on in a ~/work/website directory, I definitely won’t want these to appear in FSpot.

    Perhaps if I put all of my music files in ~/music, and all of my photos in ~/photos…

    (In general, I think of Beagle, Tracker etc: “That would be useful if it didn’t bring my machines to a crawl at the worst possible times, and if I didn’t know where I’d put things”.)

  2. dré:

    I think you talk about Ubuntu 7.10. ;)

    Problem with removable media is that they can be modified while disconnected. I think a kind of on disk checksum or timestamp is needed to detect if a reindex is necessary.

  3. Chris Cunningham:

    Nit: Your last release was 7.10 (8.04 isn’t out yet, what with it not being 2008 yet).

    - Chris

  4. Tom:

    Spot on. Tracker in its current “User makes search -> results” state really isn’t useful to someone like me who keeps his files organised. Adding that automatic magic to the apps i depend on though suddenly takes it from “i can see why some people would like this” to “this actually makes my life easier”.

  5. sigurdga:

    This sounds great! However, I have one comment about Rhythmbox and the other music management programs. I think the all-in-one library that they have per default (which you are going to extend), limits the usage of the programs a bit too much.

    My music consists of:
    - Albums from internationally known artists
    - Albums e.g. from contemporary musicians (sometimes referred to as “pling plong”)
    - Music played by the orchestras I am playing in
    - Theme songs from TV shows
    - Sound clips of talking, someone telling a story or an adventure

    When I am listening to my music alone, I can live with skipping boring tracks, but when I’m having guests, I would like my music application to not play all kinds of audio tracks on my computer. My guests would usually only like to listen to the first category in my list.

    Sitting for hours sorting out clips into playlists, having to manually “register” new music, is not an option, and will take us some steps backwards. So my solution has been to use one media player for everything and one for music that guests can listen to. Using inotify on the music folders is very convenient.

    If the music players of the future will index all my audio clips, I would like to know of a simple way to do a high level of sorting (like I now do with my top level folders). Reusing information from the file system may work for me, but maybe not for every user…

  6. Jan Schmidt:

    Hrmn, I see a few flaws in this Utopia.
    1) I really hate watching tracker index everything every time I reboot or log in as a new user. About half the time I kill it because I get so sick of watching it churn disk, and every single time it runs it ends up dumping a core file in my home dir that I have to delete.
    2) In no reality do I want Rhythmbox to offer me every music file in the trees under my home directory as ‘My Library’ - I have gigabytes of media test files that I would rather it just ignore. The same thing applies to removeable drives that happen to contain a few mp3s. Unless I’ve explicitly asked for that content to be shown in RB, I don’t want to see it.

    Oh, and of course Ubuntu 8.04 won’t ship until April next year….

  7. Chris Lord:

    Note that latest Ubuntu ships with Tracker enabled by default and updatedb happening in a daily cron… This is pretty killer for my laptop, I’ve disabled both - Not so much for the battery life (though it doesn’t help that either), but the speed of the hard-drive. I had to wait a full 10 minutes before my disk stopped churning every morning I turned it on, and during that period response would be sluggish and apps would take an age to load… Rather than integrating Tracker with everything, I think it’s more important to sort out lower-level conflicts like this first. Also, Tracker’s CPU usage isn’t great either.

  8. Lure:

    Scott, last release was 7.10, but I am sure you know. ;-)

  9. Paul Tötterman:

    “The last release (Ubuntu 8.04)”

    8.04 isn’t released yet, maybe you meant 7.04?

    BR,
    Paul

  10. Frej Soya:

    I have audio files in my home dir which should not show up in Ryhtmbox.
    So only files which reside in ~/Music? (or whatever the xdg standard says).

    Also if you think about it, tracker might have one layer too much?
    Currently it handles both inotify/file changes and indexing. Maybe split the two up, so the index part can ask “what files have changed” and then get a list.

    Does it make sense that tracker should index everything needed for app B. app C and D in the same database (performance issue)?

    Inspired by leopard FSevents http://arstechnica.com/reviews/os/mac-os-x-10-5.ars/7

  11. Asbjørn Ulsberg:

    Sounds like a fabulous idea, but how will this be achieved? Will F-Spot, Rhythmbox and so on be reprogrammed to require Tracker, or will an intermediary database layer, shared by all Tracker-like applications (like Beagle) surface that all these applications can use instead? Or will the applications be specifically tailored for Ubuntu and other Tracker-enabled distributions to use Tracker in the way you depict, only in Ubuntu’s software repositories?

  12. Matt Mossholder:

    Great idea, but please don’t fall into the “my way or the highway” trap… In particular, don’t forget that people don’t always use applications in the way you would expect. For example, the Library setting still has a use if people have a centralized music directory, shared on a home network.

    F-Spot used to be guilty of this, but has gotten better by optionally storing image metadata in the images now, rather that in its own DB.

    I’d also LOVE to see applications support user-definable patterns for naming files. For example, when I import photos from my camera, I have a perl script that pulls the EXIF data, and renames them with the date, time, sequence and camera model, and drops them into my photos directory. It would be great if this could be part of the import process itself, even if that just meant having the “Import Photos” dialog capable of calling a user script.

  13. Wolfger:

    8.04 was released? When was that? ;-)

  14. Bastien:

    Except that Rhythmbox takes less than a second to read all its database and populate the treeview, and Tracker (dixit Jamie) would take 5 seconds to list all the audio files.
    So either Tracker is too slow, or it’s not the right answer (I didn’t see iTunes changing its way when Spotlight came into the Desktop on OS X…)

  15. HoellP:

    I think this idea ist absolutely great and I’d love to see this in Ubuntu some time (Hardy +1, maybe?), even more when I consider the plans to use tracker instead of locate for console-based searches.
    For Hardy I think it’s more important to make trackerd absolutly reliable and as fast as possible. When I move big amounts of files to an indexed directory, tracker can totally hog my system.

    I’m not sure, but could there be a way to use tracker’s metadata for projects like TimeVault as well? I don’t know much about the mechanics of this stuff, but I guess it should be doable.
    Another program which comes to my mind is gnome-do which is a work-a-like to Katapult and is awfully slow at this stage of development.
    Once more, great idea, I hope it gets picked up.
    greets
    Paul

  16. Alessandro Delgado:

    Don’t you mean 7.10?

  17. William Lachance:

    This sounds cool and everything, but what about the case where I have a bunch of music on another computer, mounted in an NFS or Samba share? We can’t exactly have tracker searching potentially thousands of remote filesystems (in the case of a corporate network).

  18. Stijn:

    Oh I would love to see the (IMHO rather stupid) media library “imports” a.k.a. unnecessary copies go away. Please make it so! ;-)

  19. ubuntu_demon:

    8.04 isn’t out yet :)

  20. Dave Morley:

    Nice plan. It would be interesting to se if this idea gets any traction. It would save a boat load of disc space and on a plus side if you had all three apps open it would save on memory and power only having to access 1 file rather than 3 :)

  21. fsteinel:

    > The only other backends we’d need would be for remote media such as shared music
    > –why isn’t there a shared photos standard yet?–, online content such as last.fm or > flickr and devices that don’t act like disks; there are still some media players
    > and cameras out there which are designed around import/export APIs.
    A tracker bridge via http://www.conduit-project.org/ perhaps?

  22. Anonymous Coward:

    Don’t you know that Scott is ahead of times? Isn’t the whole post proof of it? :)

  23. Mårten:

    He’s posting from the future!

    But what he’s basically saying is that if Tracker is down I can’t listen to my music or watch my pictures…

  24. Matthias Liegend:

    “..tracker collects all of the metadata that Rhythmbox will need to start with — we expect it to come along and add additional metadata, such as the last time I played the track, where the album cover thumbnail is stored, etc.”

    Sounds to me like you are saying tracker would be responsible for adding play counts and thumbnails to the index. I believe it would make a lot more sense for Rhythmbox to write these to the index itself, as soon as they are triggered by the user.

    The idea, however, makes a world of sense. There is definitely duplication of function going on here. The only problem I see is beagle being left out. I wonder if it would eventually be possible to be able to drop either of the two in and have it just work with the gnome apps.

  25. claes:

    Can tracker retrieve the “latest saved files”? If so, with what delay?

  26. Philip:

    It was released next year, silly. :-P

  27. Adam Williamson:

    I don’t keep my music in my /home directory, and I don’t index it via Tracker (or Beagle or whatever) as it’s on an NFS share and it would be terribly slow.

    Always consider the corner cases. :)

  28. Stoffe:

    Yes, please! :) More or less exactly what I want. :)

    Although, you probably want to ask first before indexing any removable media just to be polite. And maybe there could be an option to, well optionally, store a copy on the index for the removable media on that very media. Not so nice for small sticks, but it would be nice for my removable HD if I plug it into my other Ubuntu system if it was already indexed… =)

  29. Diego Calleja:

    Great idea indeed

  30. Scott James Remnant:

    *cough* editorial change, it now says 7.10

  31. noshi:

    I want xmp metadata writing support!

  32. mif:

    Huh, make apps depend on Tracker, which runs cpu 100% for minutes at a time (and makes my laptop fan go nuts), crashes repeteadly on bittorrent downloads, etc. etc. Indexing needs to be REALLY robust and efficient, it’s just not good enough (yet).

  33. thorwil:

    For those concerned about unwanted files showing up / being played: there could be an “ignore” category, or even an option to only list material that is in an “include” category.

    But a default of making stuff just available will be nice for many users by removing the concept of importing. One thing less to understand and worry about.

    I think things will become tricky if it’s about sharing files with meta-data. Where would it go for a file you put online? If it’s a file-format that can’t transport such info? Maybe a wrapper file-format (/container)?

    Also, your friend might have tracks tagged “Rock’n'Roll”, but you prefer to have just “Rock”. OK, that sounds rather made up :) What I want to say: suddenly it could become interesting Who tagged something and to have conversions. Mapping one taxonomy to another.

  34. Brett:

    I wrote about this as a initial spec/idea for Ubuntu 8.04 at the Ubuntu Forums: http://ubuntuforums.org/showthread.php?t=598413. Thanks to 23meg, I found your post.

    I wrote about the need to integrate Tracker into Nautilus as the most important feature. Why? Because if I’m working on a file and save it, I need to be able to tag it - maybe from OpenOffice - but at the very least from Nautilus. Then I’ll have the ability to see all of my ‘work’ files or my ’school’ files with a couple clicks of a button.

    One other thing I’m annoyed with is constantly having to enter in a date format into my files such as ‘./my_documents/journal/20071031-nature_and_its_warmth.odt’. Why do I enter in my information that way? Because I would like to sort my data by date THEN title. You simple can’t order by multiple columns in Nautilus, nor have meta data such as date, author, etc.

    “But wait a minute, you can add author data to a document in OpenOffice and I know for a fact there is a ‘date created’ field.” Why yes there is I would reply, but Nautilus can’t search the OpenOffice meta data tags and ‘date created’ is very shaky when you’re constantly backing up, moving, transferring and wiping files.

    This way is Tracker read OpenOffice meta data (or photo meta data or music meta data) and it was integrated into Nautilus, I would be able to find any file I own with whichever specification I choose without all of this manual work on organizing my files.

Leave a comment