April 26, 2007

Ranking Web Pages Based on Their History

A new Google patent describes some scores that could be used for ranking search results. These scores use information about a document, from the moment when Google first finds it to the present. The history of a web page could help Google determine if the content is fresh, still useful or outdated.

"Search engine may use the inception date of a document for scoring of the document. For example, it may be assumed that a document with a fairly recent inception date will not have a significant number of links from other documents (i.e., back links). For existing link-based scoring techniques that score based on the number of links to/from a document, this recent document may be scored lower than an older document that has a larger number of links (e.g., back links)."

"For some queries, documents with content that has not recently changed may be more favorable than documents with content that has recently changed. As a result, it may be beneficial to adjust the score of a document based on the difference from the average date-of-change of the result set. In other words, search engine may determine a date when the content of each of the documents in a result set last changed, determine the average date of change for the documents, and modify the scores of the documents (either positively or negatively) based on a difference between the documents' date-of-change and the average date-of-change. "

"Documents for which there is an increase in the rate of change might be scored higher than those documents for which there is a steady rate of change, even if that rate of change is relatively high. The amount of change may also be a factor in this scoring."

"Using this date as a reference, search engine may then monitor the time-varying behavior of links to the document, such as when links appear or disappear, the rate at which links appear or disappear over time, how many links appear or disappear during a given time period, whether there is trend toward appearance of new links versus disappearance of existing links to the document, etc. (...) By analyzing the change in the number or rate of increase/decrease of back links to a document (or page) over time, search engine may derive a valuable signal of how fresh the document is."

If a page still gets links one year after it was created, Google might assume it's still useful. If a page is constantly updated (like Wikipedia pages), the content could be more relevant to the reader. These are some simple rules that could remove outdated pages from the top results.

{ via Russel Shaw. }

4 comments:

  1. This covers pretty much same territory that was included in the patent application that was published in March, 2005 - Information retrieval based on historical data 20050071741

    It's been split up into multiple patent applications, with expanded claims sections, but I didn't notice much in the way of actual changes.

    ReplyDelete
  2. Now it`s clear. My wiki goes up, my blog stumbeld :(

    ReplyDelete
  3. thats no good idea for bloggers :( its better to start articel directorys?

    ReplyDelete
  4. Seems to me like this type of date/history-based analysis could be used to help weed out some of the more disreputable spam sites and such.

    Such a site, once reported, would be noted such that in order to get back into search results, they would have to set up new domains and stuff.

    But if Google gave less precedence to new sites, then the spam site's new domain would not be able to rise quickly in search results.

    I don't have a good enough knowledge of SEO to really say how this data could be used, but it seems like someone could probably figure out how to use their historical data for a purpose like this.

    ReplyDelete

Note: Only a member of this blog may post a comment.