Tim Kellogg: January 2011

Wednesday, January 19, 2011

Mind control

I found this blog post about a couple Harvard students who wrote some [GPL'd] software for controlling worms' minds. They can control how these worms move and even make them lay eggs!

The implications of this are obviously huge. This is only an academic project now, but in a couple decades I wonder if we'll see animals used like machines? I guess there's several other ideas you could draw from this, but no matter how you view it, it's a fascinating idea.

Sunday, January 9, 2011

Declaring the Future of Programming

Programming languages have developed significantly over the past several decades. I hypothesize that this development has tended more towards declarative syntax than imperative. The future of programming languages will only become more declarative in the years to come.

In the beginning was machine code. Programmers wrote programs by stringing together arcane byte codes of instructions and parameters. Programs were getting pretty hard to read so they made assemblers so you could write instructions in plain text, complete with comments. An assembler program would process the source code and turn each instruction into it's equivalent machine code. This is imperative programming at its most pure state.

When the first C compiler was written it immediately became popular because the programmer only had to declare what should happen in the program and the compiler would generate the necessary machine code to make that happen. Hence why you can write a C program that can be compiled for Linux, Windows and Mac with zero changes to the source code. However, C and C++ are still imperative languages in most other aspects because the thought process is still very much a "do this, now do this, now do this" algorithmic sequence of instructions.

Query Languages

The hallmark of declarative languages thoughout history is probably SQL (referring strictly to set operations here). In SQL you describe the result set and let the DBMS decide the best way to produce that result set. For instance, consider this query:

select p.FirstName, p.LastName, a.AccountName
from Person p
inner join Account a
on p.PersonId = a.ResponsiblePerson
where a.IsActive = 1
order by p.FirstName, p.LastName

First we describe the columns that we want (this actually happens last, if you want to be technical). In the from clause we say what tables we want information from and specify how we want them matched up using the on clause of the join. In the where clause we specify what criteria for the rows that we want to show and in the order by we describe the sort order.

All this was done strictly declaratively. If you have the opportunity to look at the execution plan, it all ends up being quite elaborate. It might consult two or three indexes before actually joining rows, selecting columns and ordering the result set not to mention all the asynchronous locking that took place so as not to run into race conditions. If we had to write this in C# or Java code it would be an extremely gnarly component and would probably be buggy and slow.

Expression Trees in C#

Interestingly, .NET land is also developing into a declarative playground. The biggest step in this direction happened with Linq and it's expression trees. Now, the Linq query syntax is declarative, but I'm referring to something more basic. Expression trees can be broken down at run time by a processor that can analyze the contents of a lambda that it was passed. For instance, NHibernate can receive a method call like:

var timsAccounts = accounts.Where(x => x.ResponsiblePerson == "Tim");

and pull out the meaning (ResponsiblePerson = Tim) and convert it into a SQL "where" clause at run time (sql = "where a.ResponsiblePerson = 'Tim'). The implications of this are wild, and in recent months and years have become very powerful. Examples include Fluent NHibernate, Moq, and Castle Windsor's fluent registration API. Both castle windsor and NHibernate both used to use XML configuration files but have since moved towards using expression trees in combination with dynamic proxies and interceptors to configure via code. This declarative approach is leading towards less code that has potential to be more efficient.

Treatise on Domain Specific Languages

The topic of domain specific languages deserves an entire blog post. SQL and CSS are the obvious examples, but there are hundreds more. In one of my internships a coworker wrote a DSL to specify sort order for dictionaries for arcane natural languages and scripts. A simple DSL is much easier to develop than a GUI for the same purpose and can many times be easier for a non-techy user to learn and become productive in.

The sad news is that colleges and universities are putting less focus on compiler & parser classes. The assumption being that we have all the languages we need, why would we need more? The answer is simple: by providing a simple syntax to describe problems or solutions we can simplify the entire process of arriving to that solution. If the problem is abstracted away from the solution we can easily leverage constructs like multi-threading and highly optimized solutions. Sometime you should take a look at the byte codes that your compiler produces - ask yourself if you could have even thought of those sorts of mind bending tricks.

We need domain specific languages because they simplify problems. They create more effective abstraction than even inversion of control frameworks. Unfortunately, less people are learning about string processing these days. How many people have you worked with actually consider themselves proficient in regular expressions or compiler generators? (yet two more declarative DSLs that simplify solutions)

Conclusion

Anytime you write code that is less imperative, it allows the layer underneath more room to innovate efficient algorithms. Surely this isn't surprising since any good programmer would feel exactly the same way towards a micro-managing supervisor. So after saying all this, it should be clear why I believe that the future of programming is declarative. Declarative syntaxes allow us to simplify the problem by simply stating what the problem is (or describing what the solution looks like) and allowing the underlying engine to determine the solution. As such, I believe we will be seeing the number of domain specific languages multiply in the years to come.

Sunday, January 2, 2011

Would I choose Git again?

I wrote a post a few months ago about the reasons we chose to use Git over subversion and I think it's time to follow up that post and write about how its gone so far. We're an ASP.NET outfit, and as such there are a few considerations that might not apply to, say, the Linux kernel team. I'm going to break this up into three parts: my perspective, my team's perspective, and some tips for anyone who might want to also try using Git.

My Experiences With Git
I seriously love using Git. I make a branch for everything I do just like they recommend. An old-school member of our team made a comment, "we always considered branches as something to be avoided", hinting at SVN branches' trait of being hard to manage and keep in sync with the trunk. Git branches are very different from SVN branches - they are very light and easy to keep up to date.

Git has some seriously awesome merging mechanisms. First, you can select from a list of merge algorithms (you really only need one of these, but hey, its great to have choices just in case). Then they also have rebase and cherry-picking. These last two aren't regular merges because their algorithms look at the history of the entire repository and make several [and possibly hundreds of] incremental merges. Because these schemes take history into account, you can actually do some serious refactoring and still apply patches to both the production and development branches with relatively little effort.

Our team develops and maintains a web application that our company sells as a service. As such, we don't spend time on installers or maintaining previous versions because the only versions that matter are the version that's in production and the development version. Git allows us to cherry-pick hotfixes from development into production (or vice versa) without really thinking much. This would have been a small nightmare in SVN (and invoke suicidal tendencies in TFS). Back when we were using TFS there really wasn't any process or procedure that went into hotfixes. You basically just updated production. With Git, its incredibly easy to just stash whatever you're doing, checkout the production branch, fix a critical bug, test & deploy it, an then cherry pick it back into the dev branch. Git works well for people who get interrupted by escalations (everyone??).

My Team's Experiences
My team hates Git. Well, that's a bit harsh and premature, but there was some backlash when we first switched. About three weeks in I gave a brown bag lunch presentation on Git to teach everyone how to use it. After that people generally caught on to the basics with exception of some merging snafus.

Merging is actually an interesting point. TFS merging drove me nuts. Perhaps it was just the merge program, but I always felt like I had my hands tied. Now that we're using Git I feel free again to branch and merge at will, but one of my teammates seemed to be (at least at first) completely confused by Git merging. This was [probably] entirely due to the fact that Git Extensions didn't come with kdiff by default (they now offer a convenient all-in-one installer that includes kdiff & Git).

Another point of confusion in using Git GUIs was that TortoiseGit makes it very difficult to see what's different between local and remote repositories. I think the Tortoise crew made too much of an effort to make it feel like TortoseSVN when in reality it left some very important questions unanswered (TortoiseSVN only has to answer 1 or 2 important questions, but Git GUIs need to answer 4 or 5 important questions). Among these unanswered questions are "what branch am I on?" and "have I pushed this to the server yet?". TortoiseGit doesn't provide a clear answer to either of these questions, so I had everyone make a switch to Git Extensions.

Tips for Future Git Users
We were forced to learn a few lessons pretty quickly. I'll list them here in paragraph format...

GUIs are still young. Most Git users are sick Linux users who live by vi & grep, so developing a decent GUI hasn't really been a priority for Git (there is an official Git GUI that ships with Git, but it possesses some serious suckage). If you work in a Microsoft/Windows outfit there is no conceivable way your coworkers will be happy with command line, so a good GUI is critical. Use Git Extensions!

Setting up a central server is not entirely straightforward. While SVN is distributed as either a client or a server, Git has no reason to require a central server so this was also an afterthought. Use gitolite on Linux. Use the package manager method of installing it, its very easy to get it started and its also easy to maintain.

SSH keys are problematic. Try to use putty/plink to manage keys if possible. OpenSSH is very un-Windows-like.

Unit tests are good and they can make Git shine even brighter. If you maintain a generally complete unit test suite you can have Git utilize your test runner to quickly find where code started breaking. The "bisect" command can take a program or command that returns 0 or 1 (standard success/failure codes, so throwing exceptions would work) and perform a binary search through past commits to find the first place where a test started failing. This could also work great if you're a scripting guru - write a short script to check for some text (like "CREATE TABLE X") in a particular file and Git will do the leg work.

Conclusive Thoughts
Git is very powerful and can adapt to any workflow. If process is important to you, Git will enable you in whatever process you choose. If process isn't important, Git won't get in your way. It is very scalable via its distributed nature (ref dictator and lieutenants). It's also great for small personal projects that I do in my spare time. I can still have code version controlled without sharing it with anyone, but when I want to I can push it to Github (another awesome idea). However, if your coworkers are generally stagnant and opposed to change, Git will drive them nuts and you will hate your life. Choose Git only if you want a program that will abstract away mundane tasks like merging but you don't mind having to change your world view towards version control.