An unofficial blog that watches Google's attempts to move your operating system online since 2005. Not affiliated with Google.

Send your tips to

September 24, 2006

News Publishers Want Full Control of the Search Results

robots.txt After a Belgian press organization sued Google for copyright infringement and won, World Association of Newspapers decided to create "an automated system for granting permission on how to use their content", reports Reuters. The system will be called Automated Content Access Protocol (ACAP).

If you're wondering why a such a system would be useful, you're not the only one. "Since search engine operators rely on robotic 'spiders' to manage their automated processes, publishers' Web sites need to start speaking a language which the operators can teach their robots to understand. What is required is a standardized way of describing the permissions which apply to a Web site or Web page so that it can be decoded by a dumb machine without the help of an expensive lawyer."

The publishers seem to ignore the fact that there is a system that lets you control what pages you want search engines to crawl: it's called robots.txt and it's available to every site owner. Probably some sites have an extremely valuable content and they need a new permission system, that will match their inflated self-importance.

Publishers were kind enough to offer an example: "In one example of how ACAP would work, a newspaper publisher could grant search engines permission to index its site, but specify that only select ones display articles for a limited time after paying a royalty." It's very strange to see this example. If you allow a search engine to crawl your site, you also allow it to display a small excerpt from the article (or at least the headline). If a site pays you to display the full articles (like Yahoo News), that site already knows it has the right to republish the content. Other sites, like Google News, send the visitors to the source of the article and they only aggregate and cluster articles.

I think news sites should be treated the same like the rest of the sites, and it's not necessary to create a new system for giving permissions to index a site.

Google Belgium homepage displays the court order
More about Google News

This blog is not affiliated with Google.