This blog has been moved to

Sunday, February 13, 2011

The internal secrets of Git

Thursday night I attended a lecture at the Boulder Linux user's group called Unlocking the Secrets of Git by Tom, one of the co-founders of Github. This was extremely eye-opening. Up until now I had viewed Git as simply a distributed version control system. Tom showed us how to manipulate Git's internal file format and demonstrated that Git is actually a filesystem in userspace with built-in versioning and synchronization. He demonstrated how, by storing a SHA1 hash of files, Git is (1) extremely fast at comparing files and (2) doesn't actually care about the file name - it just cares about the contents of files. This is important when you're renaming files - the filename is generally unimportant in the grand scheme of things.

Tom also showed us several open source projects that build upon the concept of Git as a filesystem. One was a highly efficient backup system. Another is a static site generator. There were many more. The point here is that Git is destined to be not just version control; it will be a feature-complete platform for anything that requires a filesystem with versioning and synchronization.

The critical component to the success of Git as a plaform is libgit2, a C library for interacting with Git. The reason why this is the critical component is that many people had been re-creating the functionality of Git. By combining this functionality into a library, the logic only has to be written once and can be used by everyone else. The other reason why this is a critical component is because libgit2 is being released under a permissive license that allows it to be easily used by many other people and projects without getting into any legal snafu's.

Most importantly, Thursday night I realized that the tech community of Boulder is so complex and complete, I should never get bored here. I haven't lived here for a full six months yet but already I feel like I can't leave this city.

No comments:

Post a Comment