August 11, 2009

On Google File System

Google File System is "a scalable distributed file system for large distributed data-intensive applications" created by Google. Initially used to store Google's search indexes and the crawling data, GFS is now mostly used to store user generated content.

ACM has an interesting interview with Sean Quinlan, who was a GFS tech lead and is now a principal engineer at Google.
Although organizations don't make a habit of exchanging file-system statistics, it's safe to assume that GFS is the largest file system in operation (in fact, that was probably true even before Google's acquisition of YouTube). Hence, even though the original architects of GFS felt they had provided adequately for at least a couple of orders of magnitude of growth, Google quickly zoomed right past that.

One thing that helped tremendously was that Google built not only the file system but also all of the applications running on top of it. While adjustments were continually made in GFS to make it more accommodating to all the new use cases, the applications themselves were also developed with the various strengths and weaknesses of GFS in mind. "Because we built everything, we were free to cheat whenever we wanted to," Gobioff neatly summarized. "We could push problems back and forth between the application space and the file-system space, and then work out accommodations between the two."

The guys who built Gmail went to a multihomed model, so if one instance of your Gmail account got stuck, you would basically just get moved to another data center. Actually, that capability was needed anyway just to ensure availability. Still, part of the motivation was that they wanted to hide the GFS problems.

{ Thanks, Daniel. }

7 comments:

  1. Nice info here! Good insights in GFS...

    ReplyDelete
  2. It is important to get detailed idea about a certain topic like this,so that this will give enough knowledge and information to the readers and picks some ideas form the posted article.

    Thanks for this post!

    ReplyDelete
  3. i remember studying distributed systems 10 years ago in college, but it was all quite primitive compared to what google, youtube and facebook have now implemented

    ReplyDelete
  4. I love hearing from Google's own people when it comes to such things, it makes everything a lot clearer.

    ReplyDelete
  5. They will release it soon but isn't exactly what I was expecting, I don't like the cloud computing concept.

    ReplyDelete
  6. Totally agree with about posters, is very interesting when google allow us to listen what they want!

    Thanks for the post Daniel

    ReplyDelete

Note: Only a member of this blog may post a comment.