An unofficial blog that watches Google's attempts to move your operating system online since 2005. Not affiliated with Google.

Send your tips to gostips@gmail.com.

July 29, 2007

Google Indexing Many Web Pages in Real-Time

A year ago you had to wait days if not weeks to see your content indexed by Google. Now many web pages are indexed in 5-10 minutes. At least that's the case for many blog homepages, which are updated in almost real time. Here you can see the homepage of a PageRank 4 blog:


...and here's the proof that the blog post was created 11 minutes earlier (the result is from Google Blog Search):


Google could use the ping feature of the blog search engine to get notifications when a site is updated. This doesn't work for all the blogs, so there may still be a prioritization algorithm.

Update (less than half an hour later):

20 comments:

  1. Which, IMO, is why they bought Feedburner.

    ReplyDelete
  2. I wrote about it two weeks ago.

    The post is in Polish, but you can get the most important thing from the screenshots - Google Blog Search got my new post (blog's PR=5) indexed in just a minuter after publishing, and the blog's home was reindexed just after ~25 minutes.

    And since Google bought Feedburner, the indexing time is even better :)

    ReplyDelete
  3. I don't think this has anything to do with FeedBurner since most feeds don't use FeedBurner.

    ReplyDelete
  4. @Ionut Alex Chitu: FeedBurner has the "PingShot" option which can ping "Google Blog Search Pinging Service" (see screenshot) - I think that this can speed-up indexing. The same thing with being indexed by Technorati and other services like that.

    ReplyDelete
  5. I already read statements by Google people who wrote that a lot of effort in the past year went in making their index more up to date, because a very large number of queries was about recent stories.

    it's still a matter prioritization, but the priority isn't set by popularity, but times of updates.

    Every realized that the official Google Blog always takes half a day until it makes its way to Google Reader?

    My own blog was faster up to date after I started to write more posts.

    That's at least my observation for Google Reader...

    ReplyDelete
  6. << FeedBurner has the "PingShot" option... >>

    But only for the 808,707 FeedBurner feeds. BlogSpot has more than 40,000,000 blogs -> at least 40,000,000 feeds.

    ReplyDelete
  7. Correction: Google indexing many web pages from blogs in real time", or any content associated with a RSS feed.

    ReplyDelete
  8. i don't know how closely this relates but i do notice that google 'alerts' service is sending me alerts of posts on various blogs (as it happens setting) about 30 minutes after they're posted (by me or web wide by anyone).

    ReplyDelete
  9. Google Blog Search is pinged by most blog services, so everytime a blog is updated, Google is notified.

    Google Blog Search Pinging Service

    ReplyDelete
  10. Yeah, while not everyone uses Feedburner, my guess is that Google has grabbed every RSS feed they come across while crawler, and just have a partition of their massive architecture monitor those feeds. Still, must faster than relying on the random surfer model of traditional crawling.

    ReplyDelete
  11. Well, yes, this is not surprising. Most major blogging software has a "ping" feature set up by default - certainly wordpress has this. Google receives these pings and immediately knows to crawl that blog.

    ReplyDelete
  12. Sometimes it takes less than a minute to get into index! To my mind high PageRank incoming links help to index faster.

    ReplyDelete
  13. If all the blogs and news rolls are pinging Google about updates in real time, we should be able to build a visualization app that shows us how any particular story/meme spreads around the net.

    Now if I just had some free time...

    ReplyDelete
  14. Yeah... we do the same thing with Spinn3r. We have our crawlers reprioritize based on ping traffic and other variables.

    Kevin

    ReplyDelete
  15. Massimiliano ManciniJuly 31, 2007 at 12:24 AM

    Hello, pagerank is important for the real time index in Google? My web page, PR3 was index immediately, but not the Url. If i search the Title of the post in Google, the result is right description, but url to home page of my Blog.

    ReplyDelete
  16. Google seems to be very fast at indexing the homepage. The actual posts aren't indexed that fast.

    ReplyDelete
  17. How does it affect blogs with low PageRank, and newly created ones. I have a new blog which does not get indexed yet after more than a week. Is quick indexing has relevancy to higher search rank though?

    ReplyDelete
  18. I received a Google Alert after a blog post on a Blogger site that has no ranking. Shortly after that the person who posted the blog sent me a link to the post. Google Alerts beat the blogger, of course it was a Google Blog site, but still - wow!

    ReplyDelete
  19. I'm finding that home page get indexed first, then category and single post pages get indexed second, and the more often you add content, the more often the spiders come back. But, if you don't have some static content, quickly scrolling content rolls of the pages and keyword rankings seems to plateau. 2 or 3 posts per week seems perfect if you don't have a bunch of incoming links. Speaking of incoming links, there's a company called Moguling.com, http://www.moguling.com that gives you free business listings and they actually ping Google in real time each time you do a post, so it proactively announces your content to blogger and google! Pretty cool

    ReplyDelete