Comments on Google Operating System: Detecting Near-duplicate Documents

The author is right, this is a big problem for sea...

2013-03-31T09:39:59.625-07:00

The author is right, this is a big problem for search engines. But it is also a problem for any big company producing lots of documents. Just think about space they take, maintenance they require and what value you get. If you need to find dups or near-duplicates you can try this program in Java: http://softcorporation.com/products/neardup/

I don't believe there is a problem with duplicate ...

2007-10-12T03:43:00.000-07:00

I don't believe there is a problem with duplicate content. Google already has the highly effective PageRank to sort out the most relevant pages for any given search phrase.

Why do they need to remove duplicate content when they already return the most relevant pages to their visitors?

At my blog I link to a 13-minute movie I made debunking the Duplicate Content Penalty Myth.

Neil Shearing.