In case you were wondering how much information Google stores, the paper about BigTable I was talking about last week gives some interesting insights.
Google search crawler uses 850 TB of information (1 TB = 1024 GB), so that's the amount of raw data from the web. Google Analytics uses 220 TB stored in two tables: 200 TB for the raw data and 20 TB for the summaries.
Google Earth uses 70.5 TB: 70 TB for the raw imagery and 500 GB for the index data. The second table "is relatively small (500 GB), but it must serve tens of thousands of queries per second per datacenter with low latency".
Personalized Search doesn't need too much data: only 4 TB. "Personalized Search stores each user's data in Bigtable. Each user has a unique userid and is assigned a row named by that userid. All user actions are stored in a table."
Google Base uses 2 TB and Orkut only 9 TB of data.
If we take into account that all this information is compressed (for example, the crawled data has compression rate of 11%, so 800 TB become 88 TB), Google uses for all the services mentioned before 220 TB. It's also interesting to note that the size of the raw imagery from Google Earth is almost equal to the size of the compressed web pages crawled by Google.
September 10, 2006
Subscribe to:
Post Comments (Atom)
Not a lot there, i could nearly fit that on my hard drive *sarcasim*
ReplyDeleteHow did you find this out?
What about Gmail, Blogger, Googlepages, Calendar, Writely etc etc etc.
ReplyDeleteIt would be nice to see the exact number.
Nice job Ionut.
Are you sure?
DeleteFor a comparison:
ReplyDeleteThe U.S. Library of Congress has claimed it contains approximately 20 terabytes of text.
Rapidshare has over 360 terabytes of space used for hosting files.
If Google has 24 billion pages and the crawled data needs 850 TB, an average page should be:
934,584,883,609,600 / 24,000,000,000 = 38,941 (38 K)
Google must have more than 24 billion pages (or store multiple versions of the same page) as this value seems pretty big.
I think the 850tb of memory is just the ram
DeleteThis comment has been removed by the author.
DeleteThe source is a paper [PDF] written by some Google engineers about BigTable, an interesting way of storing data.
ReplyDeleteFor some perspective, big enterprises often have storage needs in the petabytes, not terabytes. While Google seems to have a lot of data from its web indexing, I am guessing it is less data (less information) than Shell Oil has.
ReplyDeleteSo when we hear or think about Google setting out to index the world's information, they are perhaps only nibbling at it.
Or maybe Google is storing exabytes that aren't obvious to us...
If I can get some of that(mearly a fraction about 0.01%) how big it is to me.
ReplyDelete"the crawled data has compression rate of 11%, so 800 TB become 88 TB"
ReplyDeleteEr that doesn't make sense. 800TB with a compression rate of 11% means it's more like 722TB.
To compress down to 88TB means the compression rate is more like 90%.
Not true. If the file size is S and the compression rate is X%, then the compressed file will have this size S*X/100.
ReplyDeleteCompression rate = (compressed_size/original_size) * 100.
See this Wikipedia article and check the compression rate displayed by your archiver.
well thats a lot thinking from a desktop perspective i guess... but i know of some global banks that have a few petabytes of storage in their data-centers ....
ReplyDeleteAnd wath is about Google Video? How much it spends?
ReplyDeleteDo you know the "How Much Info" page?
ReplyDeletehttp://www2.sims.berkeley.edu/research/projects/how-much-info-2003/
11% compression??? I am from the information retrieval community and I can tell you that the index alone takes up 33% of the size of the original set of documents. Plus, google stores all the pages and images. That should be a lot too. I'd say, no matter what google does, it cannot be using less than 50% (425 TB) of space to hold its web-crawl+index.
ReplyDeleteYou should not have read so much of this page to make a comment....
ReplyDelete>Anonymous said...
>yall like boys//who gives a shit >about terabytes, petabytes and >all that shit?? are you kidding >me??
>Tuesday, November 27, 2007 6:27:00> PM PST
the individuals taht hacking me pleases stop . strenght55@gmail.com .9174778373
ReplyDeleteI think it is just amazing that Google has taken on the task to make all knowledge available on the internet. Even still, only 23 percent of the world's population has access to the internet and two thirds of chinese people still live on two dollars a day. Not very advanced are we? Especially considering that only 1 percent of the people in the world ever goes to college. We are dumb and dumber, though the future looks bright!
ReplyDeleteDr. Doug Ikeler
Holy cow! that's a lot of bytes! I'm gonna go figure out how many bytes that is with a calculator! that's 2,252,800,000,000 bytes. our hard drive only holds 250 GB.
ReplyDeleteHow many computers do you use??????????????????
ReplyDeletewith an efficient algorithm, indexing can actually be compressed at a greater ratio with more indexed data. For example, alot of websites have similar long phrases like "In case you were wondering how much" 807 results. If you have more data, it is more likely that data will repeat and can therefore be linked to a larger index.
ReplyDelete"In case you were wondering how much" 7 words * 807, linked as an integer lookup by word 807*7*4*8 = 180KB, Linked to phrase 807*1*4*8 = 26KB
Compression ratio due to the ability to compress larger phrases: 14%
must be outdated as soon as I hit 'post comment'
ReplyDeletewow, thats alot of storage, i wonder what it is now. :)
ReplyDeletewhat about blog spot?
ReplyDeletei wonder ow much it it now, with youtube aswell. :D
ReplyDeleteJust hope that Google uses our data for the good and not evil :)
ReplyDeleteWhat, 70 TerraByte for Google earth? Im actally shocked that is so low. I would have thought satelite imagery for even 1/8th of the planet would take up amounts of data sorage I simply cannot image. And my Scientific Calculator cannot display!
ReplyDeleteSmoke some Pot an think about this...
ReplyDeleteOne day the world will be available on Virtual Reality. You will just be able to fly around in 3D. Accurate to 1cm/100 Megapixels.
How much storage will that take?
Any mathematicians care to take a guess? Im sure someone can work out a formula. Mwahahahaha! God Ill be up all night thinking about this now....
510385129 Petabytes of information is all it would take to store every surface on the planet at microscopic scale, buildings,ants, people. not to much if you think about it. just find a scanner and get started...it should onlyn take a few decades
ReplyDelete5103851292548736987450000 MB of information
ReplyDeleteHaha! Brilliant question. And brilliant answers. Can you please show your work. I want to see your long division...
ReplyDeleteSorry off topic, but in response to Dr. Ikeler,
ReplyDeleteI.T aside, you have touched on very important political agendas.
With extreme poverty there is wealth, as in your e.g. China and also 400 million in poverty in India.
Global politics dictates the 50 cent per day slave trade.
Essential knowledge: Money Masters, Constructing Fear, Life and debt.
We are VERY advanced, but wealth and knowledge is NOT for everyone.
Politics, media hype, influences the young that sport is the ticket to success, and not education. Just like in the human society, Drones in the bee society work for the queen.
http://corpau.blogspot.com
I think its all a lie- afterall; we all came here from google or some other search engine. There was previously, a site which i got from another search engine that talked about it...it didnt interest me at tht time...but now i cant find it!
ReplyDeleteOf course its all wrong- think about it this way-let's say i buy 1tb hdds. For argument's sake, i get it for about 300$. Let's say i increase the amount they "claim" to about 1000 TB(which is a lot compared to what they say). So i can actlly get the whole data of google for 300K? so, in effect, after spending, say, a million dollars more...i can actlly start up my own company?
OF COURSE NOT!
And what about gmail, blog, youtube, etc?
I think its all a lie!
ReplyDeleteI had come across a good link on hw much google stores a few years back...from another search engine...but tht whole site has vanished now cos it dosnt appear anywhere.
I can get 2000TB for a max of 0.6 million. And, say, with an additional 1 million for running costs, etc...if i cud lay my hands on the data, i can actlly set up my own rival company?
OF COURSE NOT!
This is all a lie, ppl! The amount of data they store is enormous...and they are scared tht ppl will be frightened and tht environmentalists will be concerned, etc
And what about Gmail, Youtube, etc?!
I like google...but i'm a bit scared abt the influence it has....if it cud go into wrong hands and stuff.
@Anonymous:
ReplyDelete1. The data is from 2006, so obviously it's no longer accurate.
2. The data is from a Google paper, so you can't claim it's "all a lie".
3. Google acquired YouTube in October 2006, after this paper has been published.
4. Let's assume that Google indexed 10 billion web pages at that time and the average size of a page was 100 KB. To store all those pages, Google would've needed 931 TB, which is close to the value from Google's paper (850 TB).
SERIOUSLY! THATS IT?! HOW FREAKIN' COOL! i totally though google had some unfathomably huge network of data shared among multiple affiliates to make this enormous dataset that they had access to, not actually stored themselves.
ReplyDeletebut on a side note:
are people seriously freaking out about how much data google has? how come nobody cares about NASA, the Russians or North Korea? i mean i don't think google's ever blown anything the f**k up...
and for thoes anonys that are flippin out about how it must be a lie based on some point about how simply having "disk space" in the ballpark of 2000TB must mEaN SoMetHInG, someone, please, slap them. google does more than just "store stuff". they offer special ways of accessing/indexing info, plus provide services based on productivity, marketing, communication and education.
its not impossible to start a search engine with some cash a big HD and a dumb idea. it happens all the time.
BUT
then they get washed
AND
then we go back to google(or yahoo or msn or bing or whoever).
i mean, think: when google started up, Yahoo was DOMINATING. when facebook started up Myspace was DOMINATING. Anybody could likeley name their favorite poineer in technology who came in w/guns blazin at just the right tune while someone else was monopolizing the scene. making a new search engine now is exactly the same concept, just larger scale. dont hate just cuz ur broke with no ideas.
And no, telling people how fake or impossible something is doesnt earn you any respect nor does it make you sound smart. unless you happen to be a talking snowball or dung pile. then u deserve your own youtube channel.
any current information on that topic?
ReplyDeleteGreat info... i recently came to know that google does its server backup with battery installed in each CPU !!! now if each CPU has 1 TB hard drive ...given the info in the article ...its server firm size should not be too huge.
ReplyDelete-- RIA Tweet (http://twitter.com/wave_)
google is cyberdine
ReplyDeleteWhich server does google uses to store the data....?
ReplyDeleteI'd like to see these figures updated... I bet in four years there have been some huge changes. Just how much sh*t are we (humans) (obviously, 'tho my dog has a Facebook account and he's never bloody off it) storing via interweb now, with Google, Facebook, Youtube (also Google), Twitter, etc. etc. There's just tons of "stuff" out there and growing fast.
ReplyDeleteAlright alright, so obviously this is a great topic as it has been going on for over 2 years now... I'm impressed anyways...
ReplyDeleteGoogle's compression ratio (which is somewhat impressive as well) may put a comparatively large dent in the amount of space Google uses; however, it does not save it from the Petabytes among Petabytes of information that Google indexes. Think of every page Across the web. Google is not text only, it does images and videos as well...
Speaking of which, we're also dealing with YouTube, a Google-owned company. They may compress the videos into .mov files, but that isn't saving it from it's mass of data either...
Now you must consider the estimated 500GB of space in just code for Google's applications, if you will.
Google owns server farms everywhere--each of these little farms has Terabytes among Terabytes of information, and they're spread out across the world!
Now think of how much data we're dealing with now, and then multiply that by three. Each byte of data that Google stores is backed up three times...
Compression or not, if you some up every one of Google's applications and multiply it by three, you're getting Petabytes among Petabytes of data...
At this point, dare I say it sarcastically, they probably have Googlebytes of indexed and compressed information on their mass servers. Yup, give Google their own unit of space, hands down...
lol, the mathematical number "1 google" (which is 10^100, or 1 followed by 100 zeroes) has a real-world application
ReplyDeleteActually, the number 10^100 is googol. Google changed the spelling for reasons unknown.
Deletethats a lot of data
ReplyDeletethink of it,
ReplyDeleteif google has a virus system for all files on the web,that is mediafire+rapidshare+......
plus the virus database...
sooner or later google will take over the whole world.
but still it can not store beyond the limit. if that exceeds than how do they manage. and what is the capacity of that storing device?
ReplyDeleteDoes google system has limited data storage? what will happend if all the 220TB is used? This is a very interesting and informative post.
ReplyDelete@Anonymous & @Storage Melbourne
ReplyDeleteWhat limit? Do you think Google has just one server with hundreds of drives plugged in?
They have hundreds of "blade" servers with quite a few disks in - basically its like one huge disk spread out over hundreds of servers, all over the world...
::smiles:: I can't help smiling about you lot thinking about all those USB cables to external caddies!!
@Kwong Tung Nan: What are you talking about? Virus? Rapidshare? What has this to do with Google?
A google is a very large number.. when we get to a googleplex thats a number that's hard to even imagine.. a 1 with a google of zero's after it.. millions and billions of zeros just wrap ya mind round that....++=---==== ERROR OUT OF CHEESE ERROR>>>... SYNTAX ERROR IN LINE 10... REDO FROM START =====++++_----==
ReplyDeleteIt's googol and googolplex, not google.
ReplyDeleteUpdated link to the PDF:
ReplyDeletehttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf
Google processes about 24 petabytes of data per day. To process that king of data per day, google probably has a few exabytes in data all together and probably keep adding every month :o
ReplyDeleteInteresting!
Sitting here in 2016 commenting on a 2006 post
ReplyDeleteHOW DOES GOOGLE STORE AND ACCESS PETABYTES OF DATA ? HOW DO THEY MANAGE HARD DRIVES? IS THERE ANY BACKUP PLANS IF THE DATA IS LOST ?
ReplyDeleteYou may also create a directory of dofollow blogs for other bloggers to use as a reference when posting comments for backlinks. dofollow
ReplyDelete