January 1, 2008

Google Artificially Promotes Recent Web Pages

Google paid a big price when it started to index pages faster and show them in the search results minutes after they're published. The problem is that you can't rank a page that has just been created because it has no backlinks so Google artificially inflates the rankings of the recently-created pages based on historical data and the few backlinks that are detected.

In some cases, if Google sees a lot of searches for a query that wasn't popular before, it assumes something has happened recently and shows more recent results.

These two changes are extremely visible today. If you go to Google's homepage and click on the special logo that celebrates 25 years of TCP/IP and the New Year, you'll be sent to the search results for [January 1 TCP/IP] and you should normally see a Wikipedia page as the top result. But the first page of Google's results has changed dramatically in the past hours and all the results are new: most of them are from spam sites, pages that discuss Google's logo and quote from Wikipedia. Most notably, the top result is a Digg page that links to a newly-created blog with a meaningful address: january-1-tcp-ip.blogspot.com and a highly-optimized title: "January 1 tcp/ip". Obviously, that blog hoped to take advantage of Google's new logo and succeeded: the two top results are Digg pages that link to that site and they're followed by that blog's homepage and a post from the same blog.



The site gets traffic both directly from Digg and from Google's homepage.


You can see at Google Trends that [january 1 tcp/ip] was the "hottest" query for December 31 in the US and continues to be very popular today.


It seems that Google can no longer send users to a search results page from a doodle because the results can become unpredictable and they show a big flaw in Google's algorithms. The same bug can also be a feature if there's a devastating earthquake somewhere on the planet and people start to search for more information about it after they hear the news.

Update (9 hours later): Other blogs take advantage of the situation.

23 comments:

  1. Goes to show the biggest threat to Google Inc in 2008 remains, well, Google Inc.

    ReplyDelete
  2. This is a really clever feature -- but it does need some way to disable it, because otherwise if there is an earthquake it will render me unable to find information about the game Quake or information about earthquakes in general.

    I'd suggest adding a tag to the search string, such as boostrecent:off or something, with a link on the search results page to add it if many of the results are boosted recent ones. That would allow them to get their search-results link back, too, because they'd be able to add the tag to that string.

    I'm so clever.

    ReplyDelete
  3. Perhaps 2008 will be the year Google needs to gain its credibility back after gaming their own system during the last half of 2007.

    ReplyDelete
  4. Last night (7:50 PM pst), the number one result for this query was www.xomba.com/january_1_tcp_ip, which is now not even in the top 30.

    ReplyDelete
  5. Interesting. When I first saw the new logo (6PM PST), the top results were a Wikipedia page, followed by a Microsoft Research page. The two pages are now at #11, #12, while all the recently created junk dominates the top 10. As usually, rankings depend on many factors: location, personalization, the data center you hit etc.

    In fact, the artificial increase is obvious if you look at the results page. The real results start to appear at #11, while the first 10 results are there just because they've been created in the past X hours.

    ReplyDelete
  6. Ionut, you have described a flaw in the algo's within the google farm.

    This type of 'natural' errors or for better words- design flaws- actually dramtically increase the misconception of information and knowledge.

    This literaly means, I can't "google" for more info !! Not sure, if other's really understand the implications of the change in algo's !!

    ReplyDelete
  7. From what I see, Google has a special index for recent web pages (probably obtained from Google Blog Search, so it's biased towards blogs and site that have feeds). When you enter a query, Google's algorithms determine whether it's a good idea to artificially promote pages from that index. Some of the conditions could be: if many pages suddenly write about something, if a lot of people search for a rather obscure query etc.

    Google's algorithms are created by humans and include a lot of signals and conditions added manually. That means they also have flaws and they're subjective.

    ReplyDelete
  8. Maybe Google can come up with a bew tag to use for links? They handled their spam problems with no_follow, which eliminated the need to engineer a solution.

    Maybe a fresh tag now? Those who don't use the fresh_today tag would find their rankings dropped mysteriously. Companies who sell fresh posts could be knocked out of the market. *grin*

    Seriously, Google is great and all, but I'm starting to think they've taken their eye off the ball. Their main product is search and it's starting to slip. Amazing to think that someone else could become the search king.

    ReplyDelete
  9. This is wrong. The reason that post shows up at the top is because it has the exact words in the title and the URL, (and higher up in the page too) not because it was recently created. Why would Wikipedia entry be top for such a query? If you query only for January 1 or only for TCP/IP , Wikipedia shows up.

    ReplyDelete
  10. Good news for frequent bloggers, bad news for high ranking quality content providers.

    ReplyDelete
  11. If Google does in fact separate their index based upon blogs, social media etc then it should also allow these pages to be separately shown to users. This would then allow users to filter them out if required.

    Maybe I should just heed my own advice and use another engine (eg a9 springs to mind) from now on.

    ReplyDelete
  12. Good idea, hope spammers will not take it off. Wikipedia will go on. If I need something to research I check wikipedia too.

    ReplyDelete
  13. I spent too many times googling and thus can say fully qualified:


    GOOGLING sucks.

    Not long ago the interface changed slightly. An interface is not really that important, but i HATED this change. Moreso because the results were getting worse (or at least, they stayed as bad as always)...

    its kinda a love-hate relationship. Google has some nice things, but i actually hate googling in itself (as in searching for information).

    So many commercial crap... its annoying too. Its my time that I lose here :(

    ReplyDelete
  14. True. An explanation could be that the new pages were optimized for the keywords, but since when Google ranks sites created less than a day ago?

    Last year I created a blog and linked to it from a post that received a great number of backlinks. The new blog was added to Google's index after a month or so, but it has never ranked all for any query.

    ReplyDelete
  15. Yeah, but I liked the Deep Dish version much better

    ReplyDelete
  16. Ironically, the discussed rank inflation makes this page first in the Google ranking for "Google OS". I was trying to look up more about the false rumor and came across this.

    ReplyDelete
  17. Just after Christmas I created a blog/site using Wordpress software. As I figured no-one was linking to my blog I didn't bother putting a password on it. To my surprise Google indexed my blog within 2 days! I am puzzled as to how Google would have gotten the URL to my blog. The only explanation I have is that Google got it from the Google Analytics script I installed on my blog...

    ReplyDelete
  18. Wordpress pings Technorati and all its little blog indexing chums, and Google indexes those. It has links to pretty-well every WP and Blogger blog out there.

    ReplyDelete
  19. This actually comes as no suprise to me as an avid blogger. Several of my fellow bloggers and I have been noticing this effect in Google for over 3/4 of a year now. The truth is that Google is doing a good job of balancing new, fresh content with older more familiar content. Essentially, a robust search engine needs to operate in this manner, unlike Yahoo, where it takes eons for relative data to surface. Age shouldn't play a part in relevancy but shame on the people who try to game the search engines for their 15 seconds of fame. People who operate in this maner are no differnt than the people who hang outside of elementary schools and try to sell our children crack cocaine.

    The problem isn't with Google, or their algorithms, the problem is the typical problem plauging our society, "lcak of personal responsibility for actions".

    ReplyDelete
  20. I am very worried about something.

    Google is now heavily favouring blogs in SERPS. If someone was to create a blog to bad mouth a company and kept it updated google would return the blog ahead of the company in the serps.

    This IS NOT A GOOD IDEA!

    ReplyDelete
  21. Alex Chaitu,
    Thanks for informing me about the mistake in my blog post. I'm very happy to see you as a visitor of blog. I'm honored.

    ReplyDelete
  22. Seems to be favoring pages that just mention a "new" topic or subject, while penalizing sites that actually have quality content. The perfect vehicle for black hat coders to take advantage of.

    ReplyDelete
  23. After observing what has happened on some SERP's which we regularly observe it seems like this is a downgrade to the search results. I can understand bringing up recent content for "dynamic data" such as news, sports scores, etc. but it has caused some very valuable, relevant sites to fall out of sight when the value of the content, although not new, remains of consistent value.

    For example, if a site about widgets is accurate, has good content, and the widget world has not changed then what importance is it if the content is not new? To rank the value of content on its age is to dismiss value.

    Personally this looks like a screw up. Hopefully G will work out the nonsense part of this.

    Was the system really broken in the first place? Google's market share has been skyrocketing so people must be happy . . . and then some MBA's "had a great idea". Pffffftttttt on that.

    ReplyDelete