An unofficial blog that watches Google's attempts to move your operating system online since 2005. Not affiliated with Google.

Send your tips to gostips@gmail.com .

July 3, 2008

Viacom Wanted the Source Code for Google's Search Engine, But Obtained YouTube's Server Logs

In the ongoing trial between Viacom and Google, regarding the videos uploaded to YouTube that infringe Viacom's copyright, Viacom really wants to prove that the most popular videos watched at YouTube were from its programs. Viacom even claimed that Google's search results are biased to give better ranking to the infringing YouTube videos, so it asked for... Google's source code (and YouTube's source code too). Here are some excerpts from the rulings:
Plaintiffs move jointly pursuant to Fed. R. Civ. P. 37 to compel YouTube and Google to produce certain electronically stored information and documents, including a critical trade secret: the computer source code which controls both the YouTube.com search function and Google's internet search tool "Google.com". YouTube and Google cross-move pursuant to Fed. R. Civ. P. 26(c) for a protective order barring disclosure of that search code, which they contend is responsible for Google's growth "from its founding in 1998 to a multi-national presence with more than 16,000 employees and a market valuation of roughly $150 billion" and cannot be disclosed without risking the loss of the business.

The search code is the product of over a thousand person-years of work. There is no dispute that its secrecy is of enormous commercial value. Someone with access to it could readily perceive its basic design principles, and cause catastrophic competitive harm to Google by sharing them with others who might create their own programs without making the same investment. Plaintiffs seek production of the search code to support their claim that "Defendants have purposefully designed or modified the tool to facilitate the location of infringing content." (...) YouTube and Google maintain that "no source code in existence today can distinguish between infringing and non- infringing video clips -- certainly not without the active participation of rights holders".

Unfortunately for Viacom and Google's competitors, the request to provide the source code has been rejected. But another request, this time for YouTube's server logs, has been approved.
Defendants' "Logging" database contains, for each instance a video is watched, the unique "login ID" of the user who watched it, the time when the user started to watch the video, the internet protocol address other devices connected to the internet use to identify the user’s computer ("IP address"), and the identifier for the video. That database (which is stored on live computer hard drives) is the only existing record of how often each video has been viewed during various time periods. Its data can "recreate the number of views for any particular day of a video." Plaintiffs seek all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website. They need the data to compare the attractiveness of allegedly infringing videos with that of non-infringing videos.

Google argued that the task requires a lot of resources, since the logging database has 12 TB, and it violates users' privacy. Google has previously stated in a blog post that an IP address without additional information cannot identify people, so it's not personal information. "Therefore, the motion to compel production of all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website is granted."

Viacom wanted other things: the schema for Google's advertising database and for Google Video's database, data about private YouTube videos etc. You can read the entire document as it's pretty entertaining.

Salon thinks that "all's not lost. Google might manage to reverse this decision on appeal, and Viacom, gauging the outrage, could decide to withdraw or limit its request." After all, getting YouTube's server logs just to determine the popularity of the infringing videos is an abuse: YouTube could have offered aggregated data about those videos.

Update: According to Search Engine Land, Google sent a letter to Viacom regarding the removal of personal data.
Given Plaintiffs' stated reasons for seeking information from the logging database -- to conduct proportionality analyses -- potentially personally identifiable information should be irrelevant. Indeed, Plaintiffs have previously represented that they do not desire to investigate users' viewing activities, and Viacom's general counsel is on record today stating that Viacom does not want to receive individuals' usernames and IP addresses. Accordingly, we request that Plantiffs agree that YouTube may redact usernames and IP addresses from the viewing data in the interests of protecting user privacy.

Update 2 (July 15): "We are pleased to report that Viacom, MTV and other litigants have backed off their original demand for all users' viewing histories and we will not be providing that information," says YouTube Blog.

This blog is not affiliated with Google.