December 10, 2007

Google Finds Less Search Results

Everybody should know that when you use a search the engine, the number of search results is just an estimate. You only look at the first 10 or 20 results anyway and, in some cases, the search engine doesn't let you access more than a certain number of results. For example, Google only lets you see the top 1000 results, mostly for efficiency reasons.

"When you perform a search, the results are often displayed with the information: Results 1 - 10 of about XXXX. Google's calculation of the total number of search results is an estimate. We understand that a ballpark figure is valuable, and by providing an estimate rather than an exact account, we can return quality search results faster." (Google help center)

But recently something has changed in Google's algorithm that estimates the number of results. Here's a comparison between the number of results for [Moby] in May (notice the recently-launched bar that used gradients) and today:

Searching for Moby (May 17, 2007)

Searching for Moby (December 10, 2007)

From 15 million results to only 2 million results, there's a long way. For the same query, Yahoo estimates 18,900,000 results, Microsoft finds 7,730,000 results, while Ask only finds 4,089,000 results. Notice that all the other three major search engines show bigger numbers than Google. You might think that this query is just an exception, but that's not the case. Almost every query shows much less results in Google than in other search engines.

A search for [Google] shows (the numbers may vary across different data centers):
* 132,000,000 results - Google (screenshot)
* 1,610,000,000 results - Yahoo
* 244,000,000 results - Windows Live
* 281,620,000 results - Ask.com

And even if this estimate has never been reliable, it's strange to see a such an obvious inaccuracy. If you use complicated queries (more than 3-4 keywords), the estimates become more accurate and Google starts to show more results than other search engines.

In other related news, Google started to treat subdomains the same as directories for some queries. "For several years Google has used something called host crowding, which means that Google will show up to two results from each hostname/subdomain of a domain name. That approach works very well to show 1-2 results from a subdomain, but we did hear complaints that for some types of searches (e.g. esoteric or long-tail searches), Google could return a search page with lots of results all from one domain. In the last few weeks we changed our algorithms to make that less likely to happen in the future," explains Matt Cutts.

Related:
Google - Yahoo Comparison
Persistent queries (Greasemonkey script)
Index size and estimation (given two search engines, what are the relative sizes of their indexes?)

17 comments:

  1. i read something about google deleting "spam-sites" from its index (like e.g. link-farms)and doesn't include pages with malware in its search results.

    maybe this chance decreases the number of results while improving quality and, of course, our security :)

    ReplyDelete
  2. I just did the test with keyword "moby" and got the following result:

    "Résultats 1 - 10 sur un total d'environ 19'100'000 pour moby (0,24 secondes)"

    To me, this post sounds like disinformation.

    ReplyDelete
  3. I get the same results like you at Google France or in other localized versions, but at google.com (the US version accessible at google.com/ncr) the numbers are much smaller. Check this:

    http://www.google.com/search?q=moby&hl=en&gl=us

    Different data centers may show different results.

    ReplyDelete
  4. total no. of results don't matter much, most of the users leave after 40-50 results, it's the quality that matters

    ReplyDelete
  5. Here's the thing... if you're logged in to your Google account, you get ~2 million results for 'moby'. If you're not logged in, you get 18 million. Duh, personalized search.

    ReplyDelete
  6. I noticed a significant drop in the number of results for certain phrases I was tracking between September and October of this year -- from 4,300,000 results to 561,000 results in just one month (around the time of a large Page Rank update). That query is now at 217,000 results. I noticed drops in several other queries as well.

    Other background information:

    I do not have personalized search enabled.

    These queries were all made from the same physical location.

    The queries are all multi-word queries, with phrases in quotes, and the OR operator ("a b c" OR "a bc" OR abc).

    ReplyDelete
  7. (Good find Ionut. I'm reposting a comment from the Google Blogoscoped forum thread...)

    As for the lower results quantity, I also don't think it may be too meaningful necessarily. I wonder if the big difference is due to Google handling permanent or temporary redirects differently? So that Yahoo would see "double" where Google sees only one? A small test shows this could indeed be the case:

    Google count for blog.outer-court.com (which is now redirecting to blogoscoped.com): ~2
    Google count for blogoscoped.com: ~12,200

    Yahoo count for blog.outer-court.com: ~15,584
    Yahoo count for blogoscoped.com: ~20,729

    So, Yahoo counted at least 15,582 more sort of "non-existing" pages than Google. Way to bloat your page count :)

    Also, maybe Google kicks some spam sites off the index faster (though I think they should merely lower their ranks, not completely stop indexing them, right)?

    But I guess the real question is: how likely do they actually let you find "exotic" pages? I mean that's the only use-case where you'd really need not just "the best" pages but also a really deep & far index to find even the smallest webpage that may contain info.
    I just formulated a hypothetical research query for instance, which reads [daniel gillespie clowes interview ink pen]. I was imagining that I'm looking for an interview with comic artist Daniel G. Clowes in which he details which tools he uses. I even used his middle name to only get interviews that go really deep about the subject matter:

    Google: ~129
    Yahoo: ~10

    ReplyDelete
  8. Fewer, please! "Less results" means none at all!

    ReplyDelete
  9. I'll ask about the results estimates; I think it's independent of the subdomain/subdirectory change that I discussed. But Philipp is right that the only way to truly compare index size is to do queries that return <1000 results and then count the actual number of results.

    ReplyDelete
  10. Wow, that was quick. A member of the team found the issue as soon as I pointed this post out to them. We were doing undercounting in
    some cases, and they're fixing it.

    ReplyDelete
  11. Anyone notice the performance times? The December search took almost 3 times as long to run as the May search.

    Granted it's still less than a quarter of a second...

    ReplyDelete
  12. Ron Michael, I suspect that that was just the noise from a small sample size.

    ReplyDelete
  13. Google is now shifting its algorithm towards web 2.0 socialization which emphasizes relevancy and consistency of content among its top page results. In other words, they have slimmed down on outdated static web pages which have not been updated in months, years. For instance, abandoned domains.

    Also, please note that this new algorithm change is a progressive implementation and may vary from server to server for a few days.

    ReplyDelete
  14. Hi everyone,

    Just wanted to give you an update on this. There was a bug in the serving code that caused result estimates to be low by a factor of up to 40 (depending on the query and the language). This didn't affect the search results at all, just the calculation of the estimate. The fix is rolling out right now, so over the next hour or so the numbers should be back where they're supposed to be. I'll be keeping an eye on this thread in case anyone spots something else.

    Yonatan (Lead engineer for the system that had the problem)

    ReplyDelete
  15. I got 22,900,000 search results for Moby...

    ReplyDelete
  16. @josh.ma:
    Read the comment above from Yonatan Zunger. The problem has been fixed.

    ReplyDelete
  17. Try to use a longer keyword, i.e. "health benefits of strawberries", last time I use this keyword on Google (about 2 months ago) it showed me about 1,900,000 results but now its only showing 189,000 results.

    Btw, before this bug showed up, the results are returning so many relevant content but now the results are a little messy and kind of unrelated.

    ReplyDelete