Vanessa Fox reports that Google's blog search engine changed the way it indexes blog posts. Until now, Google Blog Search only indexed feeds, so the results weren't very good for sites that offered partial feeds. The site started to offer a more comprehensive search by indexing the entire content of the page, including comments, navigation links and blogrolls.
"We have changed the way we index blog posts to include the full content of the page. We've had occasional complaints about the use of the feed content, particularly the problem with partial feeds. The indexing change has improved the results for a lot of queries, both because we have the full content of the page and because we extract links that are missing from the feeds. The downside of this change is that we see more results that match only the blogroll and other parts of the page that are common to all of a blog's posts," explains Jeremy Hylton. He says that the algorithm will be improved to exclude "the content that isn't really part of the post" to make the results more useful.
Here's an example of a comment from a Google OS post indexed by Google Blog Search:
Tip: if you want to find recent blog posts, don't sort the results by date. Just select "last 12 hours" or "last day" from the sidebar. This way, you'll get relevant results and you'll minimize the number of splogs (spam blogs) in the list of search results.
Subscribe to:
Post Comments (Atom)
this is more wrong than i could tell anybody...
ReplyDeleteIf someone usese partial feeds he is not willing to give his whole content to everybody.
Now those who only give the content for clicks will continue to work this irrational way and we will go straight back to the times of unrelated private homepages without any way of datatransfer between.
Google Blogsearch now fails miserabily to find the real author of news, because he try to discover it from the body of the post and no longer from the RSS feed.
ReplyDeleteI imagine they could use a blog's feed as a "hint" to help determine which part of a blog post should be included in the search results, or indeed included at all.
ReplyDeleteI just want to know why i cant find my blog using google web or blog searches.
ReplyDeletefeeds are only duplicates of the web content, so i don't see any major loss here, the content is still available and one can subscibe to a feed once they find the site they want
ReplyDeleteI am glad to see Google finally do this. It will promote feeds that are more user friendly than bot friendly.
ReplyDeleteThe info is still their, very good to see them do some updates that are easier for doing feeds.
ReplyDeleteIt seems that this method will help the web searches to be more relevant. This being said is great for the user and potential customers.
ReplyDeleteThank goodness for full indexation. I had forgotten that it used to be only partial for feeds. My, how the Web has grown.
ReplyDelete