Some random thoughts about the Google File System:

  • It uses libraries which must be linked into applications instead of standard VFS/vnode layer integration. In some people’s opinions (including mine, most of the time) this means it’s not really a filesystem.
  • Certain decisions, such as the lack of data consistency or appends occurring “at least once”, make it unsuitable for a large variety of potential applications.
  • The master/client relationship very closely resembles that used in HighRoad (my second-to-last project at EMC) except for the aforementioned lack of OS integration. In particular, the I/O flow description in section 2.3 and the use of leases as described in section 3.1 look extremely familiar to me.
  • The client/chunkserver relationship closely resembles that of the distributed block store that was my last project at EMC, except that it’s at a much coarser granularity and non-consistent.
  • Based on my own experience implementing versions of both layers, I can say with confidence that they need not be implemented together. In fact, doing so is IMO a mistake; separating these two functions makes for a much more maintainable system, and allows the lower layer to be used by things other than filesystems as well.

It might sound like I’m dismissing GoogleFS (sorry guys, but the name GFS is already taken and it’s obnoxious for you to reuse it like that). That’s really not the case. Nobody knows as well as I do what a bitch it can be to make something like this work, and my hat’s off to anybody who can do it even if they sidestep the kernel-integration and consistency parts. The atomic-append piece makes a great deal of sense in general, and particularly for their application. There’s a lot of good work here. The above are just the things that first struck me as a read the paper.