You are invited to Googleplex to visit Google playground and you are given read-only access to all the data gathered by Googlebot (and other Google bots). You can write programs that manipulate the data, you can view the most visited sites and the most frequent queries, the most influential sites and the queries for which these sites rank well. What would you do with all that power?
* list the most visited websites that used to be updated once a day, but haven't been updated for many months. I would like to see archive sites.
* list the top expressions used in the anchors to link to a certain site.
* create an interface that allows you to start with a site and then go to the most visited external link (in Google). The journey continues until a threshold is reached.
* define the distance between two sites: how many links do you have to click from a page of site A to go to a page from site B?
* start with a query and see how people modify that query to obtain better results.
* list the sites that deliver the most clicked news stories in Google News.
* list the most used words in a language.
* describe a site using vocabulary richness: how many different words use BBC News or MySpace?
* list the results for a query not with respect to a page relevancy, but with respect to a site relevancy. Who is more entitled to talk about Sony Ericsson K800i: BBC News or a MobileReviews.com?
* what do people that use OS2, BeOS or Amiga are searching for?
* discover unknown parts of the web: what sites have no backlinks?
* create a verbose interface for Google that explains why a page was included in the results.
The Architects’ Role
5 hours ago