After installing a "desktop search" (how I hate that term) product, Paul Kedrosky made a good observation:

All of this is a long way of saying that one thing I have discovered in making my computer more searchable is how slowly I add to my list of files. The combination of Microsoft's lousy search tool in Windows XP, and my own sloppy filing system, had me thinking that there were millions of files here, and that I was adding tens of new files by the day. There isn't, and I'm not.

Bingo.

Unlike the web, the amount of stuff on most desktops is finite and changes very slowly in comparison. The vast majority of it arrives in the form of web pages (and downloads), e-mail (and attachments), music purchases, and possibly the occasional CD-ROM or BitTorrent or file sharing network download.

And from the size point of view, the vast majority of space is consumed by music, images, and video. The density of "searchable data" associated with a 16MB movie clip is very different than that of a 16MB PowerPoint MS Office document or PDF file.

When you think about the problem in those terms, indexing performance on a multi-gigahertz PC isn't the issue it might seem to be. Instead, you probably want to index as much data and metadata as possible.

The real trick is deciphering all those file formats. When you compare Google's toy with something a bit more sophisticated like X1's product, you notice that's one of the significant differences between them. It's no wonder that GDS is free.

It's also no surprise that both products grok Outlook e-mail.

Posted by jzawodn at December 05, 2004 08:23 PM

Reader Comments
# Timboy said:

Websearch secret #1: most of the web doesn't change at all either.

on December 5, 2004 09:10 PM
# Jeremy Zawodny said:

Heh, that's *two* secrets per post now.

We can't have that! :-)

on December 5, 2004 09:12 PM
# Otis said:

There is another important aspect of 'desktop search' - how do you find new or modified files, if you want to (re)index them as soon as possible? On the Web we use crawlers. What about the file system? Crawling it is one way, but there must be something better. GDS does it in real time. How? It looks like it hooks into the Winblows in some direct way (via MFC/.Net?) which lets it get notified of file system activity in real time...

on December 8, 2004 10:08 AM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

 

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.