A few weeks ago, I noticed that Google seemed to think that one of my blog entries is the best match for the search "Schwarzenegger for governor"

In the comments, a crowd of people came to chide me for bashing Google. I suppose I deserve it, since I've complained about PageRank on more than one occasion.

But at that time, I missed the fact that Phillip Winn noticed that removing the word "for" from the query changes the results.

So let's look at this closely. The first screenshot on the right (click to enlarge) is from the first search on Google. Notice what I've circled in red using my high-tech crayon. For those who can't read it, it says:

"for" is a very common word and was not included in your search. [details]

You'd expect then, that removing "for" from the query should produce the same results. After all, Google is telling me in no uncertain terms that they're ignoring "for" in my query.

Bullshit.

Try it yourself or look at the second screenshot no the right (click to enlarge). Notice that the result are different. Yes, I'm still in the results, but it's a different set of results with a different order to it. The number of documents matched is even different.

They're not ignoring the word "for" in my query. It clearly factors into the method they're using to produce those results.

This got me wondering what other lies Google tells? Have you run into any?

Posted by jzawodn at August 21, 2003 11:51 AM

Reader Comments
# Courtney said:

As far as I understand it, according to an article in today's NYTimes Fishing for Information? Try Better Bait, Google is supposed to search the same with and without quotation marks, and that is very obviously not true.....

Plus you can use the + sign to require that some term be in the search, because it occaisionally ignores terms - why, I'm not sure

and, despite the fact that it's supposed to be equally Boolean, the order of the words matters a great deal

on August 21, 2003 12:11 PM
# Ivan said:

Quotes are used to search for phrases instead of words, so obviously the search would be different with/without them. I can also totally understand them using word order, and would expect totally different search results for "Schwarzenegger governor" and "governor Schwarzenegger" for instance - word order should indicate importance of those terms.

But you're right Jeremy, searching for "Shwarzenegger +for governor" gives the same results but doesn't give the message about ignoring "for".

on August 21, 2003 12:20 PM
# Mike Hillyer said:

I have to agree with the importance of word order. When I do checkups on my page rank I find completely different results for a "MySQL VB" search versus a "VB MySQL" search.

on August 21, 2003 12:20 PM
# Nick said:

If you replace "for" with another stop word, like "the" or "a", you get the same results. So the fact that a stopword is included seems to effect the results, but what the stopword is doesn't.

on August 21, 2003 12:27 PM
# Ivan said:

Oh, so I think its just that the weight of "governor" changes if there is a stop word before it.

Searching for "Shwarzenegger for governor" is the same as "Shwarzenegger the governor".

Searching for "Shwarzenegger governor" is the same as "Shwarzenegger governor for".

Basically because governor is in place #3 when you include any stopword, its significance is downgraded and that somehow alters google's search algorithm. If it was up to me, I'd file a bug report and tell them to revise the word order numbers when they remove stopwords.

on August 21, 2003 12:57 PM
# Courtney said:

Ever read With the Lightnings? Example of a librarian saving the day through her data mining abilities...:) These sorts of tips and tricks are what was supposed to be in the NYTimes article, but weren't. I'm sure you guys know that these search tips all depend on how the information is ordered in the database, and knowing how the database is set up is key to getting the most out of it - so I'd think good librarians should know database administration like the back of their hand....

But, unfortunately, the poor schmo who knows no better, (like the 12 year old I'm trying to train to do research), is susceptible to ads and random junk because they don't know how to search for stuff properly, and can't tell the difference between quality info and garbage.....


on August 21, 2003 01:02 PM
# Courtney said:

database = search algorithm for the poor shmos who just know how to tweak the system....:)

on August 21, 2003 01:06 PM
# Dave Winer said:

I've always wondered why the Google Weblog is the number one result for "weblog" on Google. Not saying it isn't, it's just a surprising result to pop out of their Page Rank algorithm.

on August 21, 2003 01:11 PM
# Arcterex said:

Doesn't google round robin through servers? Could the different results be the result of hitting different servers?

on August 21, 2003 01:43 PM
# Jirka said:

The results ARE different.

So every time I see that Google skipped any word from my request (with the incredible explanation "[something] is a very common word and was not included in your search"), I cannot find enough bad words for such kind of behavior.

I use certain words in my requests and somebody imposes on me his/her own judgments of what is better? They can advise me something different but they shouldn't ignore my choice this way. It makes me sick...

on August 21, 2003 01:51 PM
# Peter Grigor said:

Heya Jeremy:

The reason the "for" makes a difference in the search, even though the word "for" is not included in the search, is that it takes up a space in the phrase to be searched.

Now *any* word can be in that space in the search results--Schwarzenegger for governor; Schwarzenegger is governor; etc. The reason not including "for" in the search phrase makes a difference is that Google is then looking for the words 'Schwarzenegger' and 'governor' right beside each other first, and assigning the highest proximity score to those results.

So, not to mince words, the word "for" really isn't included in the search results, but the position it occupied is. :)

on August 21, 2003 02:37 PM
# Jeremy Zawodny said:

Peter: yes, that is the case (roughly).

But you have to agree that it's pretty misleading.

on August 21, 2003 03:31 PM
# Gerald said:

Yes, the solution is proximity, order, word distance.

Schwarzenegger+terrific+governor
Schwarzenegger+good+governor

on August 21, 2003 04:54 PM
# Jeremy C. Wright said:

YOU got first result? Damn, I was really hoping for that spot... ;)

Seriously though, Google isn't so much lying as not overloading you with information.

I mean, really:

"for" is a very common word and was not included in your search. That said, you putting the word "for" in actually messed up the search algorithm. Click this link if you'd like to see what your search would have been if we'd "truly ignored" (note the use of quotes, they are very important to us here at Google) your use of the word "for".

... ;)

on August 21, 2003 04:57 PM
# halla said:

I think that the difference is that google weighs the two words together better when there is only two words in your search.

on August 21, 2003 06:52 PM
# Jeremy Zawodny said:

But if 1 of 3 was ignored, then there *are* only 2 words in my search, right?

on August 21, 2003 07:39 PM
# Chris said:

Maybe it counts the space? I should look in the Google Hacks book I have...maybe it addresses it there.

on August 21, 2003 09:35 PM
# Jason Lotito said:

Does this really make that much of a difference? So they aren't giving us the whole truth, what's the big deal.

I don't see Yahoo! detailing exactly how they are performing their searches.

on August 21, 2003 09:54 PM
# Jeremy Zawodny said:

I'm not sure what you mean by "counts the space." If I had used 5 spaces, would it really do anything different? Of course not.

on August 21, 2003 10:16 PM
# critohyp said:

Ah yes, I guess Yahoo is soooo much better! They couldn't even TRY creating their own search engine until recently (which is such a rip-off of Google it's laughable), having previously had to license said technology from Inktomi and Google. I expect the "of" difference to show up in Yahoo, since all the Yahooligans show a total lack of creativity and are basically creating a carbon copy of Google's search engine.

on August 21, 2003 10:35 PM
# wil said:

OK. What *is* your beef with Google? Did they turn you down a job offer or something? Why should they tell you the truth about their searching algorithm? Why should they detail out their IP to the world and to their competitors (you)? Use Google and be happy with it, or don't use it and shut up. Surely?

on August 22, 2003 01:38 AM
# Dan Isaacs said:

You guys are killing me. He's an engineer. He found a quirk in a search engine that HE USES!! He's not "bashing" them in any way that I, as someone with no vested interest, would. He's making an informed observation about something HE USES and noticed a quirk in.

Attack the merits of the issue. Questioning the integrity of the person making them is the domain of small minds and Slashdot posters (to the degree those are discreet goups.) And pointing out Yahoo's flaws is not a relevant course of action, either. There is no defense of Yahoo stated or implied in Jeremy's post.

Jeremy is, as I said, an Engineer. He doesn't work for Yahoo's marketing dept. If you want to scream about bias, go comment on Scoble's blog. He's at least used to this nonsense.

on August 22, 2003 05:23 AM
# Jeremy C. Wright said:

In fact, I've found Jeremy to be incredibly balanced. He'll often say things like "I prefer Amazon for shopping", etc. He's up front that he works for Yahoo, but also up front about the things that he feels aren't as good as they could be.

He's not bashing, just posting about an ongoing saga of things he's encountering.

Pardon my French, but back off guys ;-)

on August 22, 2003 06:38 AM
# wil said:

I'm commenting more on the dialect and tone rather than content. "Lies google tells me" doesn't sound like engineer-speak to me, sounds more like either a personal gripe or indeed a marketing headline. An engineer would have picked a headline similar to "abnormalities with google search mechanism" or something of that tone.

on August 22, 2003 08:16 AM
# Hemo said:

I find all this interesting. Peter's explanation of how google searches seems to be true and knowing how google searches when using these words (for, a, of, etc) I wonder if you could use them in a search to get google to *truly* search for phrases with *any* words in between key words, or is it just going to search for for phrases with mincemeat (sic) words inbetween key phrases?

If I search search for 'Schwarzenegger is is is governor' I wonder which phrase google would rank first, 'Schwarzenegger is not a governor', 'Schwarzenegger to run for governor', 'Schwarzenegger would be next governor'?

Knowing that I could use mincemeat words like is,for,the as placeholders I could certainly use them to my advantage whilst searching if I knew my keywords would be likely to be seperated by a certain number of words - misspelled words or not.

Then again, I might just have too much time on my hands lately..

on August 22, 2003 08:21 AM
# Hans said:

Yup, I also noticed this contradiction in Google's search:

http://www.hanskellner.com/archives/2002/11/14/google_not_really_ignoring_common_words.html

on August 22, 2003 10:03 AM
# Tux said:


http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=direct+x+vunerability

My blog ends up at number 1 in google , because i mistyped "vulnerability".

Strange , considering i posted the link a few days ago, and my blog isn't exactly a very high traffic site - more of a personal, note taking blog.

Is Google giving higher "pagerank" to bloggers on blogger.com??

on August 22, 2003 02:56 PM
# Jeremy Zawodny said:

Wil:

> OK. What *is* your beef with Google?

That they're lying. And that people seem to think they're infallible.

> Did they turn you down a job offer or something?

No. What makes you think I'd want to
interview at Google anyway?

> Why should they tell you the truth about their searching algorithm?

Because lying isn't a good way to build trust with their customers and
users.

> Why should they detail out their IP to the world and to their
> competitors (you)?

Nobody said they should.

> Use Google and be happy with it, or don't use it and shut
> up. Surely?

Did you know that people from Google read my weblog? There's a chance
that my writing this could get someone to think twice about using
misleading messages in their service. If that happens, there will be
more to like about Google, no?

Clearly I use and like Google. My blog has Google provided links on
every article page. It's my browser's default search engine.

But I guess you'd rather have me pretent that Google is perfect--like
too many other people already do...

on August 23, 2003 07:23 AM
# anand said:

Dave : The reason why the google weblog ranks number one for weblog is quite easy. As you know google places a lot of importance to the link description tag. When people link to the google weblog, they normally tend to write something like this google weblog ( since they cannot say google ).

Now when people link to you they say, Dave winer and not Dave winer's weblog. If it was the latter, you would have been the number one for weblog.

on August 23, 2003 11:08 PM
# said:

Did you notice that Schwarzenegger for governor and Schwarzenegger governor both result in 205,000 results? It seems like it is not using "for" to determine which pages match. But the order is still different, just as governor Schwarzenegger is different.

on August 24, 2003 04:16 PM
# Craig said:

Maybe it doesn't affect the filtering, but does affect the sorting? That's how I'd implement it -- the stopword really is most expensive in the search, not the sort.

on August 26, 2003 05:18 PM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

 

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.