LinuxDevCenter.com

oreilly.comSafari Books Online.Conferences.

We've expanded our Linux news coverage and improved our search! Search for all things Linux across O'Reilly!

Search
Search Tips

advertisement

Your personal information space (Dashboard and Beagle)

   Print.Print
Email.Email weblog link
Discuss.Discuss
Blog this.Blog this

Andy Oram
Oct. 09, 2004 12:12 PM
Permalink

Atom feed for this author. RSS 1.0 feed for this author. RSS 2.0 feed for this author.

URL: http://www.gnome.org/projects/beagle/...

Google doesn't want you to delete your mail. That note you tossed off to your spouse last night, asking which brand of cereal to buy at the grocery store, may be utterly irrelevant to you today, but to Google Mail it's highly marketable information.

A similar concern, less commercial but equally avaricious in the information sense, lies behind one of the projects from Ximian (now Novell) to generate the most buzz: founder Nat Friedman's Dashboard project. Despite a promising prototype, Dashboard implementation turned out to involve a lot of deep and difficult questions, but its supporters believe they have a way foreward. A future version of Dashboard will be reconstituted on the Beagle project, led by Jon Trowbridge.

The GNOME foundation is treating Dashboard and Beagle as extremely important. Trowbridge gave an informal keynote-like talk on them today at the 4th GNOME Developer's Summit.

The issue does not concern GNOME alone. Dashboard and Beagle are desktop-independent; they could be accessed by KDE as well. And Microsoft has announced a similar system that automatically indexes your entire computer system and turns up everything related to some topic of importance to you.

Reasons for Dashboard, etc.

The problem motivating these systems is the common "Where did I see that?" question. For instance, I told the GNOME Foundation executive director Tim Ney today that I had seen survey results suggesting that KDE is three times as popular as GNOME. (I don't consider the results necessarily accurate.)

Now I'm trying to figure out whether I saw this survey. Was it a Web site I visit regularly, something on an RSS feed, an email sent by a colleague, or a hallucination induced by listening to too much modern jazz this week?

I don't think either Dashboard or Longhorn will help me search that last category any time soon. But they are supposed to help turn up results from all the other categories--and (thanks to real-time indexing) turn them up nearly instantly, even on a hard disk with multiple gigabytes of information in a variety of formats.

More than a super-grep

The Dashboard/Beagle vision is far more than a super-grep, or something able to search for keywords in files of different formats. (Windows has offered that for a long time.) Beagle already has time tracking, which means that if you read an email and visit a file a few seconds later, Beagle will remember that they're related even if there's no particular phrase that's featured prominently in both. Beagle also maintains a full-text index on every Web site you visit. Trowbridge would like to go further and track of the context in which you handle information. For instance, if you save a file from someone's email message, the file will contain a marker indicating a connection with that email message.

Trowbridge complains that your computer throws away a lot of information you give it (such as the fact that you saved a file from an email message). But I wonder about the push to save so much metainformation. True, we now have the processing power and storage space to save all kinds of junk. But can we predict what information will really be useful? I'll return to this question at the end of this article.

What Dashboard and Beagle entail

It's worth briefly going over the architecture that supports the personal information space, because that helps to show how extensively a system must be changed to support it.

Fast search depends on an up-to-date database, whether one is talking about the spidering done all the time by Internet search engines, or a repository of terms used by files on your own hard disk. Thanks to events generated by the new D-BUS interface being developed for Linux, a kernel subsystem called inotify can collect changes to files as they happen and pass them to interested userspace tools.

Beagle depends on an indexing tool called Lucene to keep track of what's in various files on the system. It essentially checks everything except files in dot directories and others that traditionally contain throw-away data. As I already mentioned, it records the contents of Web pages you visit. It can also search your email, your IM logs, and anything else that exists as a file.

The next step is to associate store the metainformation collected in various ways with the files. Microsoft's Longhorn will theoretically involve an entirely new filesystem called WinFS. (When this will happen is anybody's guess, but it won't happen soon.) One of Linux's strengths is its support for multiple filesystems, and Trowbridge doesn't expect them all to be enhanced just to support Beagle. However, many filesystems contain files called "extended attributes," often used to implement Access Control Lists and other new features. Beagle can use these to store its metadata.

For each file format or type of information (email, for instance) Beagle will have a back-end API to do searching. The developers are even looking for ways to associate metainformation with pictures. Beagle combines all the results and presents them in a single front-end API. Applications that want to do system-wide searches, therefore, will need to understand just the Beagle API in order to access all types of data on the system. The current utility used to demo Beagle is called best, for Bleeding Edge Search Tool.

Privacy fears come to mind when one considers a tool that does instant searches. Remember that (currently) Dashboard and Beagle are meant for use by an individual on his or her personal data. One approach to the issue is to say "Privacy is overrated" and assume that one is doing the user a favor by presenting his or her entire disk contents on demand. Another approach would be to divide information into categories, such as to separate work data from personal data. But that's hard to do: asking the user to distinguish them is adding work, while trying to do it from context risks oversimplifying the complex lives led by users.

Indexing the infinite

I want Dashboard. I am intrigued by the idea that, instead of organizing and boiling down the information I receive and trying to get rid of what I don't need, I should go in the opposite direction and compulsively save information, expecting my computer to pluck out what I need later. A saying attributed to AI researcher Marvin Minsky claimed that his information store consisted of his friends. For this task I trust computers more than friends. (Sorry, Tim Ney.)

But I worry about clever schemes to track and save information--and not for privacy reasons. I just wonder whether we'll know what we'll want in the future.

Archaeologists have found marvelous ways to deduce ancient people's lifestyles from the facts they turn. They make deductions based on whether an artifact is upside-down or right-side-up, and from chemical traces found nearby. Still, we often wish people in the past left more clues.

We also do archaological searches on our computer's data, which is just as strangely organized and off-balance as the MIT Stata Center that hosts today's GNOME summit. Once again, the data we left behind on our computers proves frustratingly inadequate for today's purposes. And I would guess that increasing the data we collect will do little to close the gap.

When one starts creating filesystem attributes and instrumenting applications, one makes choices that will continue to have impacts thirty years later. What new application will arise just a year or two from now that will make the Beagle developers kick themselves because they forgot to prepare for it?

So I'm not saying full system search is unfeasible. I'm just asking how long it takes to prepare a system for the search, in comparison to how it takes for the system to become obsolete. I'd like to try the results, in any case.

Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.

What would you search for?
You must be logged in to the O'Reilly Network to post a comment.
Post Comment

Return to weblogs.oreilly.com.



Weblog authors are solely responsible for the content and accuracy of their weblogs, including opinions they express, and O'Reilly Media, Inc., disclaims any and all liabililty for that content, its accuracy, and opinions it may contain.

Creative Commons License This work is licensed under a Creative Commons License.



Advertisement

Sponsored by:

O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com