How Much Data Does Google Store?

September 10, 2006

How Much Data Does Google Store?

In case you were wondering how much information Google stores, the paper about BigTable I was talking about last week gives some interesting insights.

Google search crawler uses 850 TB of information (1 TB = 1024 GB), so that's the amount of raw data from the web. Google Analytics uses 220 TB stored in two tables: 200 TB for the raw data and 20 TB for the summaries.

Google Earth uses 70.5 TB: 70 TB for the raw imagery and 500 GB for the index data. The second table "is relatively small (˜500 GB), but it must serve tens of thousands of queries per second per datacenter with low latency".

Personalized Search doesn't need too much data: only 4 TB. "Personalized Search stores each user's data in Bigtable. Each user has a unique userid and is assigned a row named by that userid. All user actions are stored in a table."

Google Base uses 2 TB and Orkut only 9 TB of data.

If we take into account that all this information is compressed (for example, the crawled data has compression rate of 11%, so 800 TB become 88 TB), Google uses for all the services mentioned before 220 TB. It's also interesting to note that the size of the raw imagery from Google Earth is almost equal to the size of the compressed web pages crawled by Google.

56 comments:

AnonymousSeptember 10, 2006 at 3:01 PM
Not a lot there, i could nearly fit that on my hard drive *sarcasim*

How did you find this out?
ReplyDelete
Replies
QSeptember 10, 2006 at 3:03 PM
What about Gmail, Blogger, Googlepages, Calendar, Writely etc etc etc.

It would be nice to see the exact number.

Nice job Ionut.
ReplyDelete
Replies
Alex ChituSeptember 10, 2006 at 4:57 PM
For a comparison:

The U.S. Library of Congress has claimed it contains approximately 20 terabytes of text.
Rapidshare has over 360 terabytes of space used for hosting files.

If Google has 24 billion pages and the crawled data needs 850 TB, an average page should be:

934,584,883,609,600 / 24,000,000,000 = 38,941 (38 K)

Google must have more than 24 billion pages (or store multiple versions of the same page) as this value seems pretty big.
ReplyDelete
Replies
Alex ChituSeptember 10, 2006 at 5:02 PM
The source is a paper [PDF] written by some Google engineers about BigTable, an interesting way of storing data.
ReplyDelete
Replies
AnonymousSeptember 11, 2006 at 12:53 AM
For some perspective, big enterprises often have storage needs in the petabytes, not terabytes. While Google seems to have a lot of data from its web indexing, I am guessing it is less data (less information) than Shell Oil has.

So when we hear or think about Google setting out to index the world's information, they are perhaps only nibbling at it.

Or maybe Google is storing exabytes that aren't obvious to us...
ReplyDelete
Replies
NimalSeptember 11, 2006 at 1:36 AM
If I can get some of that(mearly a fraction about 0.01%) how big it is to me.
ReplyDelete
Replies
AnonymousSeptember 11, 2006 at 5:55 AM
"the crawled data has compression rate of 11%, so 800 TB become 88 TB"

Er that doesn't make sense. 800TB with a compression rate of 11% means it's more like 722TB.

To compress down to 88TB means the compression rate is more like 90%.
ReplyDelete
Replies
Alex ChituSeptember 11, 2006 at 6:09 AM
Not true. If the file size is S and the compression rate is X%, then the compressed file will have this size S*X/100.

Compression rate = (compressed_size/original_size) * 100.

See this Wikipedia article and check the compression rate displayed by your archiver.
ReplyDelete
Replies
AnonymousSeptember 12, 2006 at 6:46 AM
well thats a lot thinking from a desktop perspective i guess... but i know of some global banks that have a few petabytes of storage in their data-centers ....
ReplyDelete
Replies
AnonymousSeptember 13, 2006 at 5:17 AM
And wath is about Google Video? How much it spends?
ReplyDelete
Replies
AnonymousSeptember 13, 2006 at 10:23 AM
Do you know the "How Much Info" page?
http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/
ReplyDelete
Replies
ParijatOctober 11, 2006 at 1:11 PM
11% compression??? I am from the information retrieval community and I can tell you that the index alone takes up 33% of the size of the original set of documents. Plus, google stores all the pages and images. That should be a lot too. I'd say, no matter what google does, it cannot be using less than 50% (425 TB) of space to hold its web-crawl+index.
ReplyDelete
Replies
AnonymousDecember 2, 2007 at 10:31 PM
You should not have read so much of this page to make a comment....

>Anonymous said...
>yall like boys//who gives a shit >about terabytes, petabytes and >all that shit?? are you kidding >me??

>Tuesday, November 27, 2007 6:27:00> PM PST
ReplyDelete
Replies
strenght55@gmail.comNovember 17, 2008 at 10:05 AM
the individuals taht hacking me pleases stop . strenght55@gmail.com .9174778373
ReplyDelete
Replies
Dr. Doug IkelerNovember 29, 2008 at 4:46 PM
I think it is just amazing that Google has taken on the task to make all knowledge available on the internet. Even still, only 23 percent of the world's population has access to the internet and two thirds of chinese people still live on two dollars a day. Not very advanced are we? Especially considering that only 1 percent of the people in the world ever goes to college. We are dumb and dumber, though the future looks bright!
Dr. Doug Ikeler
ReplyDelete
Replies
AnonymousJanuary 10, 2009 at 8:15 AM
Holy cow! that's a lot of bytes! I'm gonna go figure out how many bytes that is with a calculator! that's 2,252,800,000,000 bytes. our hard drive only holds 250 GB.
ReplyDelete
Replies
AnonymousJanuary 10, 2009 at 8:16 AM
How many computers do you use??????????????????
ReplyDelete
Replies
AnonymousJanuary 31, 2009 at 2:16 AM
with an efficient algorithm, indexing can actually be compressed at a greater ratio with more indexed data. For example, alot of websites have similar long phrases like "In case you were wondering how much" 807 results. If you have more data, it is more likely that data will repeat and can therefore be linked to a larger index.
"In case you were wondering how much" 7 words * 807, linked as an integer lookup by word 807*7*4*8 = 180KB, Linked to phrase 807*1*4*8 = 26KB
Compression ratio due to the ability to compress larger phrases: 14%
ReplyDelete
Replies
AnonymousFebruary 23, 2009 at 12:57 PM
must be outdated as soon as I hit 'post comment'
ReplyDelete
Replies
Steve GrindleyApril 15, 2009 at 7:20 PM
wow, thats alot of storage, i wonder what it is now. :)
ReplyDelete
Replies
Uzumaki4UMay 10, 2009 at 9:08 AM
what about blog spot?
ReplyDelete
Replies
Steve GrindleyJune 12, 2009 at 3:14 PM
i wonder ow much it it now, with youtube aswell. :D
ReplyDelete
Replies
SamJuly 15, 2009 at 10:36 AM
Just hope that Google uses our data for the good and not evil :)
ReplyDelete
Replies
BigIrvineAugust 3, 2009 at 7:52 PM
What, 70 TerraByte for Google earth? Im actally shocked that is so low. I would have thought satelite imagery for even 1/8th of the planet would take up amounts of data sorage I simply cannot image. And my Scientific Calculator cannot display!
ReplyDelete
Replies
BigScottAugust 3, 2009 at 7:52 PM
Smoke some Pot an think about this...

One day the world will be available on Virtual Reality. You will just be able to fly around in 3D. Accurate to 1cm/100 Megapixels.

How much storage will that take?

Any mathematicians care to take a guess? Im sure someone can work out a formula. Mwahahahaha! God Ill be up all night thinking about this now....
ReplyDelete
Replies
AnonymousAugust 19, 2009 at 1:32 PM
510385129 Petabytes of information is all it would take to store every surface on the planet at microscopic scale, buildings,ants, people. not to much if you think about it. just find a scanner and get started...it should onlyn take a few decades
ReplyDelete
Replies
AnonymousAugust 19, 2009 at 1:34 PM
5103851292548736987450000 MB of information
ReplyDelete
Replies
SomedudeAugust 19, 2009 at 5:58 PM
Haha! Brilliant question. And brilliant answers. Can you please show your work. I want to see your long division...
ReplyDelete
Replies
AuCorpSeptember 5, 2009 at 10:47 PM
Sorry off topic, but in response to Dr. Ikeler,

I.T aside, you have touched on very important political agendas.
With extreme poverty there is wealth, as in your e.g. China and also 400 million in poverty in India.

Global politics dictates the 50 cent per day slave trade.
Essential knowledge: Money Masters, Constructing Fear, Life and debt.

We are VERY advanced, but wealth and knowledge is NOT for everyone.

Politics, media hype, influences the young that sport is the ticket to success, and not education. Just like in the human society, Drones in the bee society work for the queen.

http://corpau.blogspot.com
ReplyDelete
Replies
AnonymousOctober 10, 2009 at 7:55 AM
I think its all a lie- afterall; we all came here from google or some other search engine. There was previously, a site which i got from another search engine that talked about it...it didnt interest me at tht time...but now i cant find it!
Of course its all wrong- think about it this way-let's say i buy 1tb hdds. For argument's sake, i get it for about 300$. Let's say i increase the amount they "claim" to about 1000 TB(which is a lot compared to what they say). So i can actlly get the whole data of google for 300K? so, in effect, after spending, say, a million dollars more...i can actlly start up my own company?
OF COURSE NOT!
And what about gmail, blog, youtube, etc?
ReplyDelete
Replies
AnonymousOctober 10, 2009 at 8:02 AM
I think its all a lie!
I had come across a good link on hw much google stores a few years back...from another search engine...but tht whole site has vanished now cos it dosnt appear anywhere.

I can get 2000TB for a max of 0.6 million. And, say, with an additional 1 million for running costs, etc...if i cud lay my hands on the data, i can actlly set up my own rival company?
OF COURSE NOT!
This is all a lie, ppl! The amount of data they store is enormous...and they are scared tht ppl will be frightened and tht environmentalists will be concerned, etc

And what about Gmail, Youtube, etc?!

I like google...but i'm a bit scared abt the influence it has....if it cud go into wrong hands and stuff.
ReplyDelete
Replies
Alex ChituOctober 10, 2009 at 8:05 AM
@Anonymous:

1. The data is from 2006, so obviously it's no longer accurate.

2. The data is from a Google paper, so you can't claim it's "all a lie".

3. Google acquired YouTube in October 2006, after this paper has been published.

4. Let's assume that Google indexed 10 billion web pages at that time and the average size of a page was 100 KB. To store all those pages, Google would've needed 931 TB, which is close to the value from Google's paper (850 TB).
ReplyDelete
Replies
AnonymousNovember 28, 2009 at 10:49 PM
SERIOUSLY! THATS IT?! HOW FREAKIN' COOL! i totally though google had some unfathomably huge network of data shared among multiple affiliates to make this enormous dataset that they had access to, not actually stored themselves.

but on a side note:

are people seriously freaking out about how much data google has? how come nobody cares about NASA, the Russians or North Korea? i mean i don't think google's ever blown anything the f**k up...

and for thoes anonys that are flippin out about how it must be a lie based on some point about how simply having "disk space" in the ballpark of 2000TB must mEaN SoMetHInG, someone, please, slap them. google does more than just "store stuff". they offer special ways of accessing/indexing info, plus provide services based on productivity, marketing, communication and education.

its not impossible to start a search engine with some cash a big HD and a dumb idea. it happens all the time.
BUT
then they get washed
AND
then we go back to google(or yahoo or msn or bing or whoever).

i mean, think: when google started up, Yahoo was DOMINATING. when facebook started up Myspace was DOMINATING. Anybody could likeley name their favorite poineer in technology who came in w/guns blazin at just the right tune while someone else was monopolizing the scene. making a new search engine now is exactly the same concept, just larger scale. dont hate just cuz ur broke with no ideas.

And no, telling people how fake or impossible something is doesnt earn you any respect nor does it make you sound smart. unless you happen to be a talking snowball or dung pile. then u deserve your own youtube channel.
ReplyDelete
Replies
AnonymousApril 7, 2010 at 6:30 PM
any current information on that topic?
ReplyDelete
Replies
Online CelebrityMay 15, 2010 at 7:43 AM
Great info... i recently came to know that google does its server backup with battery installed in each CPU !!! now if each CPU has 1 TB hard drive ...given the info in the article ...its server firm size should not be too huge.

-- RIA Tweet (http://twitter.com/wave_)
ReplyDelete
Replies
AnonymousOctober 6, 2010 at 7:17 PM
google is cyberdine
ReplyDelete
Replies
AnonymousNovember 1, 2010 at 2:00 AM
Which server does google uses to store the data....?
ReplyDelete
Replies
AnonymousDecember 10, 2010 at 5:03 AM
I'd like to see these figures updated... I bet in four years there have been some huge changes. Just how much sh*t are we (humans) (obviously, 'tho my dog has a Facebook account and he's never bloody off it) storing via interweb now, with Google, Facebook, Youtube (also Google), Twitter, etc. etc. There's just tons of "stuff" out there and growing fast.
ReplyDelete
Replies
LlewgnolmDecember 30, 2010 at 9:50 PM
Alright alright, so obviously this is a great topic as it has been going on for over 2 years now... I'm impressed anyways...

Google's compression ratio (which is somewhat impressive as well) may put a comparatively large dent in the amount of space Google uses; however, it does not save it from the Petabytes among Petabytes of information that Google indexes. Think of every page Across the web. Google is not text only, it does images and videos as well...

Speaking of which, we're also dealing with YouTube, a Google-owned company. They may compress the videos into .mov files, but that isn't saving it from it's mass of data either...

Now you must consider the estimated 500GB of space in just code for Google's applications, if you will.

Google owns server farms everywhere--each of these little farms has Terabytes among Terabytes of information, and they're spread out across the world!

Now think of how much data we're dealing with now, and then multiply that by three. Each byte of data that Google stores is backed up three times...

Compression or not, if you some up every one of Google's applications and multiply it by three, you're getting Petabytes among Petabytes of data...

At this point, dare I say it sarcastically, they probably have Googlebytes of indexed and compressed information on their mass servers. Yup, give Google their own unit of space, hands down...
ReplyDelete
Replies
JimJanuary 26, 2011 at 1:46 AM
lol, the mathematical number "1 google" (which is 10^100, or 1 followed by 100 zeroes) has a real-world application
ReplyDelete
Replies
AnonymousJuly 17, 2011 at 12:52 AM
thats a lot of data
ReplyDelete
Replies
KwongTNSeptember 25, 2011 at 3:21 AM
think of it,
if google has a virus system for all files on the web,that is mediafire+rapidshare+......
plus the virus database...
sooner or later google will take over the whole world.
ReplyDelete
Replies
AnonymousOctober 30, 2011 at 4:00 AM
but still it can not store beyond the limit. if that exceeds than how do they manage. and what is the capacity of that storing device?
ReplyDelete
Replies
Storage MelbourneDecember 1, 2011 at 8:41 PM
Does google system has limited data storage? what will happend if all the 220TB is used? This is a very interesting and informative post.
ReplyDelete
Replies
John WoodDecember 2, 2011 at 8:53 AM
@Anonymous & @Storage Melbourne

What limit? Do you think Google has just one server with hundreds of drives plugged in?

They have hundreds of "blade" servers with quite a few disks in - basically its like one huge disk spread out over hundreds of servers, all over the world...

::smiles:: I can't help smiling about you lot thinking about all those USB cables to external caddies!!

@Kwong Tung Nan: What are you talking about? Virus? Rapidshare? What has this to do with Google?
ReplyDelete
Replies
NekkyMay 16, 2012 at 8:39 PM
A google is a very large number.. when we get to a googleplex thats a number that's hard to even imagine.. a 1 with a google of zero's after it.. millions and billions of zeros just wrap ya mind round that....++=---==== ERROR OUT OF CHEESE ERROR>>>... SYNTAX ERROR IN LINE 10... REDO FROM START =====++++_----==
ReplyDelete
Replies
AnonymousJuly 18, 2012 at 1:43 PM
It's googol and googolplex, not google.
ReplyDelete
Replies
CoolAJ86December 17, 2012 at 2:05 PM
Updated link to the PDF:

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf
ReplyDelete
Replies
djmisterioJanuary 14, 2013 at 3:17 PM
Google processes about 24 petabytes of data per day. To process that king of data per day, google probably has a few exabytes in data all together and probably keep adding every month :o

Interesting!
ReplyDelete
Replies
UnknownMarch 16, 2016 at 6:27 PM
Sitting here in 2016 commenting on a 2006 post
ReplyDelete
Replies
UnknownMarch 24, 2016 at 2:43 AM
HOW DOES GOOGLE STORE AND ACCESS PETABYTES OF DATA ? HOW DO THEY MANAGE HARD DRIVES? IS THERE ANY BACKUP PLANS IF THE DATA IS LOST ?
ReplyDelete
Replies
UnknownNovember 29, 2017 at 6:05 AM
You may also create a directory of dofollow blogs for other bloggers to use as a reference when posting comments for backlinks. dofollow
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Google Operating System

Unofficial news and tips about Google

September 10, 2006

How Much Data Does Google Store?

56 comments:

Follow

Labels

Popular Posts

Blog Archive

Recommended Sites