Tuesday, February 12, 2008

Google Toolbar and 404 Error Pages

I find it very strange that people have abnormal reactions when Google does something. People have an incorrect perception of the "don't be evil" mantra and like to say that Google doesn't respect it every time Google does something debatable. I didn't hear too many people complaining that Internet Explorer replaces default 404 error pages with its own page, but when Google Toolbar does that, it suddenly hijacks web sites.

Let's take a look at a simple example of a site that doesn't have a custom 404 error page (they're very hard too find these days, so most sites won't fall in this category). If you try to go to news.speeple.com/sunflowers, here's what you see in IE7: a page with useful suggestions like "Retype the address" or "Go back to the previous page".



This is actually a page created by Internet Explorer and you can disable it in the advanced settings, by unchecking "Show friendly HTTP error pages". Here's the page returned by the server, which is displayed in most browsers (Firefox, Opera, etc.):



The latest version of Google Toolbar has a feature disabled by default that replaces IE's error pages with more useful suggestions: the site's homepage or subdomain, some search queries that could help you locate the right page. The idea is that you probably clicked on a bad link or the page was relocated without using a redirect. In this case, Google's query segmentation is not perfect, but it usually does a pretty good job at transforming a URL into an useful query. To obtain the suggestions, the toolbar sends the URL to Google's servers, so this feature has privacy implications. More exactly, the suggestion page is obtained from:

http://linkhelp.clients.google.com/tbproxy/lh/fixurl?sourceid=navclient &hl=en&sd=com&error=http404&url=http://news.speeple.com/sunflowers


Google Toolbar only displays that page for default error pages (that have less than 512 bytes), DNS errors and connection failures. The feature can be enabled from Google Toolbar's settings by checking "Browse by name in the address bar", a feature that also performs searches when you enter keywords in the address bar.

So which of the three pages is more helpful for someone who ends up on a non-existing page from a site that didn't bother to create a custom 404 error page?

Related:
Matt Cutts' reaction
Google tries to fix broken URLs
Browsing the web using Google Cache

Labels:

  18 comments ( Post a comment )
It could be useful for the fact of Google Cache. This is a bit on the invasive side, though.
Hey
I really like your blog but I think you might be missing a minor point here.
(I don't have a windows install so I might be wrong. If so, I am very sorry and please delete this entry)
IE7 will just display some tips what to do (I suppose page is stored locally in some dll. While google will try to get you to search. So google is trying to push it's search box everywhere. I think if google would have just replaced the text with a little more helpful information and cut out it search box and stop trying to put it's logo everywhere, people would not be so concerned. Further is this a standard (locally saved) page or what does google transfer to display these links. And does it log the clicks somehow.
I don't think the purpose is to search, but searching is one of the ways to find that page, in case it exists. The first option is to go to the site's homepage and find the page from there. If the site has a sitemap, Google also links to it.

Unfortunately, the page is not generated locally because segmenting the URL is not an easy task and because the suggestions are dependent on the URL. So you're sent to a web page:

http://linkhelp.clients.google.com/tbproxy/lh/fixurl?error=http404&url=URL

You can test the link above with different URLs.

Maybe Google should provide a separate option for this feature and include more explanations. Overall, I think it's useful and you'll not see the Google-generated page very often.
But... what happens if a webmaster has set up a particular page to display in case of a 404 error ? ie :
http://pages.ebay.com/sefgzergheqarg
or
http://www.amazon.com/fzgegze
?
That's when Google's option is a probelm (if it DOES redirect to a Google page in that case)
If the webmaster has set up a custom error page, you'll see that error page. Google Toolbar follows the same procedure as Internet Explorer: if the error page returned by the server has less than 512 bytes, it's a default error page and they replace it. Otherwise, the user sees the custom error page.

Most sites have a custom error page (including Amazon, eBay, Google, Yahoo, Facebook etc.) By the way, do you know an important web site that doesn't have custom HTTP error pages?
I found some sites that don't return custom error pages:

http://www.hi5.com/hjhkhjh/jkljlj
http://www.myspace.com/jkljl/jkjkjlj (the IIS error page is bigger than 512 bytes)
http://rapidshare.com/kjljlkj/jkjljkl
http://www.baidu.com/kjljlkj/jkjljkl
http://www.imageshack.us/kjljlkj/
Are you sure you are not from Google?
Your views are always too biased.
The only point of this post was to tell that if you think Google Toolbar hijacks 404 error pages, then Internet Explorer also hijacks them (both replace those pages with something else). I don't know if I'm biased, but I've always tried to tell what I think:

* Google is your default search
* Google forces you to install Google Pack
* Froogle Checkout
* Search, no longer the main feature of Google Desktop
etc.
The "Google hijacking" and the "IE hijacking" are of very different nature. Google is collecting valuable information from their users. IE offers a local feature. There is no "spying" from MS.
i dont think that purpose is to search.we get pages which we are looking for to got to its home page.
iam a new bee and iam not very good at finding out page.if link is brokenthenno way i can fing out what iam looking for
Important update. The feature is not enabled by default. I've uninstalled/reinstalled the toolbar and the feature was disabled. The custom error pages are part of the "browse by name", although Google doesn't explicitly mention this.
Yet another example of Google changing site content, less than 512 bytes or not, it is not what the site owner wants. This shows no respect by Google for content producers as usual. Not quite as evil as adding links to a page which clearly is facilitating the creation of an unauthorized derivative work. Hopefully one day someone will stand up to their masses of lawyers and start wiping their ass in court.
As already mentioned, Internet Explorer (the browser used by more than 70% of the people) doesn't respect the webmasters either. If webmasters cared about their users, they would create custom error pages with alternate links, site map, search box etc. By displaying:

"Not Found

The requested URL was not found on this server."

you're not very helpful.
>>>Important update. The feature is not enabled by default.

I just installed it and it was enabled by default.
@Justin:
That's strange. If I click on "Restore defaults", Browse by Name is disabled. The same happens when I uninstall/reinstall the toolbar. Maybe you've already had an old version and Google Toolbar 5 preserved the settings.

--> Screenshot
I took the url you posted in comment 3 and used a non-existent url with the base url = my website (my website gets no respect from Google anyhow).

error=http404&url=www.egorg.com/library.html

When the error appears, nothing having to do with my website appears. The search term offered isn't even from my base site; it's treated like an anagram.

I know I'm in google's db. I even pay adwords to advertise the site. At least, please, offer the option of going to my home page.
"I find it very strange that people have abnormal reactions when Google does something."

Aside from the "Don't be evil" mantra, how many major corporations have actually created an aura of trust and corporate responsibility?

Why are Google and MS treated differently when doing seemingly the same thing? Many users trust Google (right or wrong) and expect the worst from MS.

People enjoy Googles customer centric approach and are quick to keep the organization on its toes. The additional attention should be appreciated.
Yep, What happens when site has already their customized 404 page?