Tim Kellogg: 2011

Friday, December 30, 2011

Can Bad Code Ruin Your Career?

I started writing this post over a year ago. I was working at a large company where I was stuck in a mouse wheel - always running to keep up but never getting anywhere. The code I had to work with was downright terrible. This, among other things, prodded me into looking for another job. While I was starting my job search I was pondering this post and decided to not finish it because I wasn't sure if some prospective employer would hold it against me.

With that said...

I just finished reading through a messy Java file. It was the usual mess of a class with a 500 line god-method (similar to the god-object) and hundreds of counts of copy and pasted code. Besides the redundant code and lack of structure the coder also used nested loops through ArrayLists when they could have used a HashSet and didn't once use generic collections, using the un-type checked versions instead. After several hours of refactoring and renaming variables I finally got to a point where I could begin fixing the bug I was after. There were absolutely no unit tests - all this code was written inline with HTML in a JSP.

I spend so much time reading bad code that sometimes I wonder if I am beginning to specialize in hacks. Is it possible to read so much bad code that you forget what good code looks like? Humans are an especially adaptive species, and I think it's definitely possible that a great programmer can be forced to work in the muck so long that they forget what good code looks like.

I've seen several situations where good developers produced bad code. These situations are almost always a product of an environment where features are more important than bug fixes. These companies typically invest heavily in sales and neglect IT and development costs. Or sometimes the problem is just that product management knows nothing of software development.

The 5 stages of grief

A recent coworker likened our job of working with brittle, badly designed code to the 5 stages of grief. While we were uneasily laughing about it I silently decided that this was more realistic than I wanted to believe.

For instance, imagine starting a new job. In the interview process you were interviewed by intelligent, enthusiastic developers and were led to believe you were going to be working on cutting edge technologies - a dream right? When you actually get to the job you find out that the code is so backwardly complicated that its nearly impossible to touch anything without bringing the proverbial house of cards crashing down.

Grief Stage 1: Denial and Isolation

Obviously the code isn't the problem, you just weren't careful enough. They probably have specific guidelines and strategies that help them be more productive. It's probably just something wrong with me...

Grief Stage 2: Anger

Dammit! Who the hell even thinks of this crap? [more cursing...] Is this a god-object?? [hair gets thinner...]

Grief Stage 3: Bargaining

This is typically when you start plotting potential strategies to hide the ugliness of the code. Creativity and hopeful thoughts abound. Many IT managers will talk like they are very supportive of you at this stage.

Grief Stage 4: Depression

This is where the reality strikes that this stage is bad for the business plan because it involves spending less time on revenue-producing features. The IT managers that seemed so supportive now flip flop to the CEO's side and deny you the ability to cope with your problems

Grief Stage 5: Acceptance

There are only two outcomes of this stage. Either (1) you accept that you can never fix the code so you decide to move on to another job or (2) you accept that you can never fix the code so you give up on trying. This is what separates good coders from bad.

Conclusion

Again, I started this post over a year ago. I've seen a lot of bad code. At my most recent job I almost took the "give up on trying" path in the acceptance stage. Luckily we hired a great older developer who snapped me out of it. I just started my new job today, I think I will be much happier.

So can bad code ruin your career? My answer is a resounding YES! But it doesn't have to. Honestly, stage 5 can have better endings, but that inevitably requires understanding on behalf of management - a scarce resource.

Wednesday, December 28, 2011

Behavior Driven Development in C#

I've been a fan of Test Driven Development since I worked in an XP shop. But every time the work starts getting bigger and more complex I always struggle to not get lost in the magnitudes of tests. I remember many early-on conversations with my elders about unit test naming conventions. The [method]_[input]_[output] convention starts to break down badly when your inputs become things like mocks, or if there ends up being more than 1 or 2 inputs; same with outputs.

When a coworker introduced me to BDD earlier this year, it really clicked and flowed naturally. The idea of writing tests so they read like sentences out of a book or spec seems like the answer to all my questions. The ruby rspec is beautiful:

The organization of the tests forces you to focus on the expectations of your test and highlight descriptive assertions. This is especially useful for complicated setups with lots of mocks, etc. I put as much of my setup code in one of those before :each blocks, so that way the assertions are limited to simple inputs and one or two observations about the outputs.

There's been a number of people in the .NET community that have attempted BDD but [imo] failed to grasp the simplicity. NBehave is a complete overhaul of unit testing that uses attributes like xUnit. As a result, NBehave doesn't really look at all like rspec - which really isn't a bad thing, necessarily. However, the thing I like about rspec is it's ability to describe things of arbitrary depth, which is handy when testing complex code:

This spec is able to describe possible modes that the object under test can be in (complex inputs). This is made possible by rspec's arbitrary nesting depth. This is definitely a language feature that is much harder to implement in C#.

My current approach to BDD in C# usually looks like

I think this is the simplest BDD layer I can slap on top of NUnit. And simple is important to me because (a) I do a lot of open source projects and I want to keep the barrier to entry for contributions low and (b) the people I work with tend to resist change. When people are resistant to change, it's hard to rationalize using something other than NUnit or introducing lots of nested lambdas.

NUnit remains the most popular unit testing framework and has excellent support with a GUI runner, console runner, and IDE integration with R#, TestDriven.NET, and others. Given all that support, I would really rather not abandon NUnit if possible.

FluentAssertions is a nice simple BDD layer on top of NUnit (or whatever you use). It doesn't change the structure of our spec above, but it does change the structure of our assertion to

This assertion is [imo] very clean and succinct. I like how it reads even clearer than NUnit's fluent syntax. Last weekend I was thinking about this and I decided to explore an idea to make a BDD extension to NUnit that is even clearer than FluentAssertions. The project, BehavioralNUnit for now, is hosted at github. The earliest goal for the project was simply to use operator overloading to make the assertions even more like rspec. For instance, I want to be make the previous assertion:

I was able to do this, but I realized that the C# compiler was insisting that this expression needed to be assigned to something, so I [haven't yet] added another concept somewhat analogous to "it" in rspec:

This is most similar to NSpec's approach by using an indexer instead of a method. This appeals to me because I sometimes find matching parentheses to be a pain (I guess I just like ruby & coffeescript). Then again, I don't like NSpec because it feels like it was written by one of those whining .NET developers that wishes dearly he could get a RoR job - it doesn't abide to .NET conventions at all.

I still have a ton of ideas to hash out with Behavioral NUnit. I'm convinced that BDD in C# can be simpler and more beautiful than it currently is. If you have input or ideas, please fork the repository & try out your ideas (pull requests are welcome).

Monday, December 26, 2011

Why I hate generated code

If you've worked with me for any amount of time you'll soon figure out that I often profess that "I hate generated code". This position comes from years of experience with badly generated code. Let me explain.

The baby comes with a lot of bathwater

In the past year I had an experience with a generated data layer where CodeSmith was used to generate a table, 5 stored procedures, an entity class, a data source class, and a factory class for each entity that was generated. My task was to convert this code into NHibernate mappings.

The interesting thing about this work is how little of the generated code was actually being used. I'm sure, in the beginning, the developer's thoughts were along the lines "oh look at all this code I don't have to write manually :D". However, after some time, subsequent developer's thoughts were along the lines of "with all this dead code, it's hard to find real problems". It's funny how some exciting breakthroughs turn into headaches down the road. The table is always used, but some entities are created & read but never modified, others are only created during migrations and only read from during run time.

Code generators often produce code you don't need. Since all code requires maintenance, dead code is just a liability because it doesn't provide any benefit. I always delete dead code and commented out code (it'll live on in version control, no need to release it into production).

There are several professional developer communities that generate code as a way of life. Ruby on Rails comes prepackaged with scripts to generate models, views, and controllers in a single command. ASP.NET MVC will generate controllers and views with a couple clicks. And if you've ever used either of these frameworks, you'll probably find yourself deleting a lot of generated code.

The problem of transient code generation

The issue that I keep running into with my policy of hating code generation is that it's nearly impossible to be a professional software engineer and not generate code. The most fundamental problem is compilers. When you run a compiler over your source code, it generates some sort of machine readable code that is optimized for various goals like speed or debugging or different platform targets.

While I hate code generators, it's hard to argue how I could possibly hate compilers. They allow me to write code once and compile it several different ways and achieve different goals. Therefore, I have to introduce my first caveat - I don't hate all generated code, I only hate generated source code.

This problem of hating generated code is complicated further by the fact that NHibernate generates source code too. You don't ever check in the code that NHibernate generates because it's done at run time. The most obvious way NHibernate generates code is the SQL that is written in the background to query & perform DML operations. (For those questioning if SQL is source code, consider how SQL is compiled into an execution plan prior to execution). It's also hard to argue that I hate this kind of code generation because it doesn't suffer from the same problems of the CodeSmith generated code. It only generates code just-in-time meaning that it's only generated when needed, so there isn't any extra code generated.

Since NHibernate and compilers do code generation in a way that I like, I'm going to refine my statement to "I hate generated persistent code". This generally means, I still hate generated code when the resulting code sticks around long enough for a fellow developer to have to deal with it.

The thin line between good and bad code generation

When is generated code persistent and when is it transient? We already decided that code generation isn't so bad when it happens during of after the compilation process. But my statement is that I hate persistent code. There are other cases of code generators generating transient source code. One such example is in iSynaptic.Commons.

Since C# doesn't yet (and probably won't ever) include variadic templates or variadic generic types, writers of .NET API's often write some really redundant code to account for all combinations of generic methods or types. I know I've done it. This example uses a T4 template to produce a C# file with a *.generated.cs extension. The T4 template is executed on build but not ignored from version control.

I do like this approach because it takes a DRY approach to a redundant problem without much complication. Another thing I really like about this approach is that T4 templates are a standard part of Visual Studio and are executable from Mono as well. As such, they can be considered a free tool that is openly available (important for open source projects) and, more importantly, are executed as part of the build process.

Another thing I like about this approach is the usage of partial classes to separate the generated portion of the class from the non-generated portion. This minimizes the amount of code that is sheltered from refactoring tools (code inside the *.tt file).

The thing I hate about this particular iSynaptic.Commons example is that the generated file is included in version control. I think, perhaps, this is reduced to a small pet peeve of mine since the generated code isn't wasteful and is updated on every build. Still, I would like a mechanism to (a) have the file ignored from the IDE's perspective and (b) ignored from version control. I wouldn't want anyone to mistakenly edit the file when they should be editing the T4 template.

Summary

The end result of my thought is "I hate source code that is generated prior to the build process". I want to further say that I also hate generated code that is checked into version control, but this is a bit of a lesser point. However, code generation can be a useful tool; as seen in the cases of NHibernate and T4 templates. But even still, code generation should be used wisely and with care. Generating excess code can become a liability that detracts from the overall value of a product.

Thursday, December 1, 2011

Defining Watergile

At the place of my current employment we've had a layer of management placed above us that fervently preaches the mightiness of agile. This management devotes much lecture time into informing us the proper procedure of planning a product. First you gather requirements and architect the entire system and write detailed requirements documents - good enough that developers don't need to refine them any further and QA knows exactly what to test. When requirements are written for the entire system - 12-24 months in advance - then you begin coding. After you're done coding, QA begins to test.

To be clear, anyone reading the previous paragraph should be scratching their head and thinking to themself, "gee, that sounds a lot like waterfall". Well it is, hence the portmanteau watergile (we considered agilfall but it just doesn't roll off the tongue as well).

The trouble is, even though we coined the term just recently, this watergile thing is a frigging pandemic. Every time I crack open a fresh copy of SD Times there seems to be some guy telling you that you need to be measuring KSLOC and a billion other software metrics but at the same time claiming that agile is the only way. It wouldn't be so scary except that this is the source of direction for software development managers.

It's no wonder watergile is so widespread, IT managers are fed a constant stream of B.S. mixed messages. How could anyone make sense of any of it without dismissing most of it? The truth is, waterfall is hard and so is agile. Anything in between is just ad-hoc and setup to fail. If you are a development manager and reading this, find those tech magazines on the corner of your desk and show them to the recycling bin. They're worthless and distracting to progress.

Sunday, November 6, 2011

The Pain and Glory of C

I don't normally write much C code, but this past week I was fiddling around with it this past week to solve some programming puzzles. When I say C I mean straight C (without the ++ or #). Completely un-object-oriented; just structures, helper functions and malloc/free. It took me 3 days (a total of probably 9 hours) to write a fully functional 250-300 SLOC solution to a puzzle (complete with huge memory leaks). This all brings me to the burning question - who would ever want to write programs in C?

C++ has developed over the years. I recently looked at some of the enhancements in C++11 which include the auto keyword (like var in C#), better reference counting "smart pointers", lambdas and closures. Obviously, C++ is developing and progressing. C hasn't had a spec change since 1999, and even then it wasn't exactly dramatic. We still don't have any OO or reference counting pointers.

Have you ever tried interfacing with a library in C? It's very cumbersome. You have to read all the documentation and call the right my_library_object_*() functions at the right times. Everything is hands-on, nothing is left to imagination. You have to remember what memory you allocated so you can free it sometime later when you're sure you don't need it anymore (and then recursively free sub-structures and arrays).

I think anyone can see warts in C. But its easy to forget the simplistic beauty. I mean, there aren't many operators in C, and there's only one way to cast. I mean, sure, you still can't create & initialize a counter variable inline in a for-loop. But the complex syntax of C++ is scary in comparison with all it's member::accessors, template, 5-6 ways to cast a variable and a slew of gotchas. Sure, C has it's share of gotchas, but the language is so small that anyone who's spent any significant time programming C can list most of them out for you (probably not so true with C++).

So why not C#? Well, it's freaking slow!! Think about when people were converting their business apps from VB6 to C#. Sure the maintainability of the code improved by leaps and bounds, but almost everyone noticed the performance difference and wondered how the same program could be so slow.

Recently Microsoft unveiled some information to developers about the upcoming Windows 8 release and it's metro interface. One of the biggest surprises to developers is how hard Microsoft is trying to sell C/C++ and how C#/.NET is falling by the wayside. The driving factor is that Apple has snappy user interfaces and Windows Forms are known for being slow and boring. So Microsoft created a new WinRT UI toolkit for Windows 8 that intends to never block the UI thread. Operations that take longer than ~50ms should use Async code so that the UI can continue to feel responsive. (This sounds eerily similar to Node.JS but with a lot more code).

Obviously Microsoft wants developers to develop faster apps by going back to C/C++, maybe we should consider taking them seriously. But I think the more likely direction is development being done primarily in one of the common dynamic languages like Ruby/Python/Node.JS with certain code that needs speedup written as C modules. All of those general purpose scripting languages are written in C (not C++) and interface very well with C. I've seen lots of math-intensive Python libraries being composed partly of C code (some with increasing portions written in C). I could also see the popularity of Node.JS increase if it was applied to more than web/networking apps but also non-blocking UI. (After all, this is basically what WinRT is).

I don't know about you, but I'm going to be spending some time tuning up my C/C++ skills. History has been known to repeat, and I think it is now repeating yet again.

Monday, October 31, 2011

Occupy Wall Street Is Not Stupid

Earlier today I was talking with someone today who exclaimed, "Occupy Wall Street, that's so stupid!". I then proceeded to explain to them that OWS is trying to say "hey, this capitalism thing isn't really working right now". It's not to say that capitalism never worked, it's just pointing out that there are some significant holes in it right now.

I believe that by now, most people (except some in Boulder) realize that communism has also failed. Now, communism didn't fail because God hates communists. It failed because it wasn't maximizing the total economic prosperity of all people. The people behind OWS have also realized [, I naively assume,] that capitalism in America is also no longer maximizing the total economic prosperity.

In America today you see thousands of families that incurred large amounts of debt to a disgustingly rich minority. This rich minority (an oligarchy) forced these families out of their homes and into slavery. You might recognize that this looks a lot like the economic system that capitalism replaced - feudalism.

OWS protesters are also crying out about the death grip that rich and powerful businesses have on our federal government. Some even claim that presidential elections are completely rigged (I probably wouldn't go that far). Either way, the government that our American forefathers created is completely absent and void from our current government. We've become so obsessed with being the most powerful country that we sacrificed the values and virtues that made us who we are.

The Occupy Wall Street movement is right, our system is broken. Yes, there are many broken systems out there, but that's not a reason to not change them. Protest is an important political mechanism that has been proven to work in the past. We need it to work now. The only problem I have with OWS is that it seems to be an incohesive jumble of complaints with no real answers. But I suppose that's where real change begins.

Friday, September 30, 2011

Quiet Time

Recently, we instituted a "core hours" policy among our developers that essentially equates to 4 hours of quiet time every day. During the hours of 10-12 and 2-4 developers aren't allowed to interrupt each other, nor can QA, product managers, or anyone else in the office interrupt developers. If you need help on a problem you have to either work through it on your own or wait until after the quiet time.

The policy hasn't been in effect very long, but I've immediately noticed a significant jump in productivity. I would say I'm 1.5-2 times as productive now that I'm not getting interrupted every 15 minutes. I've also notice that I just plain enjoy coming to work more now.

When we were talking about instituting the policy some were worried that it would be a problem that you couldn't clear up issues and roadblocks immediately. In practice, however, I think it isn't too much to ask everyone to wait [up to] two hours to clear roadblocks. In fact, it ends up forcing developers to solve their own problems.

When I first started with this company I was isolated in a room by myself with entire days to myself. The isolation was too much; I often felt like I was being confined in a prison. Obviously I'm not advocating that total isolation is any kind of real solution. It's impractical to suggest that developers can complete their work successfully in total isolation. It takes a lot of dialog to produce quality software. But it's also impractical to suggest that they can get any work done when they're being pestered every 5-30 minutes.

I highly recommend some sort of quiet time in any work place. In my opinion, the benefits are definitely not limited to just software engineering either.

Thursday, September 15, 2011

AutoMapper And Incompleteness

This is part 2 of a series. Read part 1

Earlier I talked about the Law of Demeter and how view models help us better adhere to the Law of Demeter. I also briefly outlined how AutoMapper makes view models practical. While AutoMapper is a great tool, it isn't completely fulfilling. Let me explain

As I pointed out previously, some of the behaviors in AutoMapper make it feel incomplete. The first is that you can't map two view models to the same model and back.

A much bigger problem with AutoMapper is that view models can't extend models. I'm not sure why they decided to disallow this usage, but it causes a cascade of code duplication (very un-DRY). Take a look at these classes:

There are a few things wrong here. Age is a nullable int on the model but the view model has just an int. If a null slips through this could cause a crashing error. While AutoMapper has an AssertConfigurationIsValid method, it doesn't test for this sort of case. You'll have to make unit tests for this, luckily you can use NetLint to easily test for these sorts of flukes.

Another issue is the validation attributes. The facts that account codes look like CO11582 and that all accounts must have a name are descriptors of the domain (which the model is modelling). They aren't facts about the view (although they have to be expressed in the view), they are part of the model. Every time you create another AccountViewModelX derivative AutoMapper requires you to copy these attributes. This is a massive failure in the attempt to keep code DRY.

Another issue I have is when I'm creating a view model I'm not sure what properties need to be created. I usually have to split the window and copy properties from model to view model (this screams obscenities at the idea of DRY code).

One solution that I keep coming back to is to have view models extend models. For instance, see this implementation:

Here, you don't have to type out all those properties a second (or third) time. They're just available. You also won't make the mistake of marking Age as non-nullable or forget to copy the validation attributes. It's all done for you by the compiler - no need to write extra tests.

There are still some issues with this approach, and other approaches (such as encapsulation) that you can take. Perhaps there will be a part 3.

Monday, September 12, 2011

View Models, AutoMapper, and The Law of Demeter

The Law of Demeter was created for the intent of simplifying object hierarchies and structures. Obviously it's not a blanket sort of law (doesn't seem to apply to DSL's or fluent interfaces). But it is handy to keep in mind when modelling a domain.

A classic example of a shortcomings of the Law of Demeter is name example: passing a model to a view that has a name object (Model.Name.First, Model.Name.Last, etc) versus passing a flattened view model (Model.FirstName, Model.LastName, etc). I think this is a great application of view models.

I like the idea of view models because they're a great way to express view-specific business logic. The FirstName/LastName is an example, but they're also great for holding data necessary to populate drop down lists and summary views. Beyond code, view models are also a good example of the .NET community's ability to innovate new solutions to old problems (akin to my thoughts about the ruby community)

Yes, But...

While I definitely understand the benefits of view models, I'm still trying to figure out the best way to use them. When first creating view models the urge is to write and populate them by hand. This quickly becomes very tiresome. Enter AutoMapper.

AutoMapper is an object-to-object mapper designed very specifically for flattening models into view models. It bases it's decisions on conventions and provides a fluent interface for the remaining anomalies. It is a savior for those writing view models by hand.

AutoMapper works only in one direction. You take an existing model and map and migrate the data into a view model. Going backwards; however, is another story. One big limitation of AutoMapper is that you can't map from two different source types to the same destination type. This makes it difficult or impossible to use AutoMapper to do bidirectional mappings (for instance, if you want to use AutoMapper when updating the model from FormCollection).

There is quite a bit more I want to say on this matter, which I will continue in a second part

Monday, September 5, 2011

Introducing comboEditable

I'll admit, comboEditable is an extremely dry name for an open source project (I would have used something like Project Bierstadt but it's not really that descriptive). Like everything else I develop and share publicly, this came out of necessity.

In Windows there is a UI concept of an editable combo box. Basically you're given a drop down list of options and if you can't find the option you're looking for, you just type in another (see the demo if you're having trouble visualizing). This concept does not exist on the web or anywhere outside Windows applications. I assume that UX designers across the globe unanimously decided that an editable combo box is a UI kludge, but I still think it's a handy control.

It is an unintrusive jQuery plugin that uses the regular HTML DOM as input and transforms into an editable combo box (a text box, hidden field and several divs, if you're wondering). The unintrusive part means that if scripts are disabled, the user still gets a combo box, just not an editable combo box.

If you find yourself in need of an editable combo box, head over to the jQuery plugin page or download it at github. Also, take a look at the demo to see usage.

Monday, August 29, 2011

Parenthetical Thesis on Ruby.NET (or IronGem (or whatever the kids call it these days))

Since college I've always been a huge fan of dynamic languages. I was really into Python for a long time and in the past year or so I've picked up Ruby. It's well known that the open source/dynamic language world has always looked down on the .NET/Java world as some sort of inferior. While having a conversation with a colleague about ruby versus .NET I stumbled on a conclusion.

Ruby has some great features like mixins, monkey patching, a REPL. I also love how blocks make closures such an accessible and natural way to program. Ruby makes easy things easy and hard things fun.

On the other hand, C# is one of the most beautiful typesafe languages (although F# is gaining favor with me). Linq and expression trees provide functionality that you literally cannot reproduce in dynamic languages (it requires knowledge of types, which dynamic languages theoretically shouldn't care about). With the crazy stuff that people are doing with expression trees (building SQL statements, mapping objects, selecting properties, etc) it makes it hard to say I'd rather be doing ruby.

While C# has some analogous ruby constructs (extension methods are kind of like a lesser form of monkey patching), it still suffers from some of the classical faults of static languages (there can be a lot of extra code just to deal with types and to play nicely with the compiler). At the same time, the compiler also writes tests for you (a contract states you will have these methods, yet in ruby you can't ever be completely sure they'll actually be there. Something that you'd have to write unit tests for in ruby).

The conclusion I came to was that, at this point in time, there really isn't a compelling reason why ruby is better than .NET or vice versa. Except for one thing - the communities. The ruby community is nearly too much fun. In Boulder, where I live, there are several companies that host regular hackfests. There are also annual ruby conventions where people get together, socialize, and share new ideas. In the .NET world we have some of those perks, but we're notoriously laiden with deadbeats. I can't tell you how many lame coworkers I've worked with that have little interest in improving themselves or the code they write. While in the Ruby world, they're not just interested in themselves or the code they write, but also in the community around them.

Despite all the debate, I'll probably keep my current job. I love the people I work with and I like participating in the .NET open source world (there really aren't any deadbeats in any sector of the open source world, by definition).

Saturday, August 27, 2011

Launching personal website

I spent some time today and solidified my personal website (http://tkellogg.github.com). I'm pretty excited about this website just because its a great demonstration of single page apps. Each of my main links doesn't actually take you to a different page - it uses a JavaScript routing engine (backbone) to load and display new content.

I do have some plans for the site, but there are so many more important things to deal with these days. But if I can get to them I want to start a picasa site and load images into the site using the gdata api (like how I load blog posts now) and also integrate with github to list out my repositories and activity.

Monday, August 8, 2011

Maybe Node isn't so bad

I know in previous posts I bashed Node.js a bit. I've done some thinking about it and I was struck by a revelation. If you write a Node app that serves to a browser you can use the same code on client & server. That means you can use frameworks like Backbone to manage your business logic on both on the server and on the client inside a browser.

The implications for this are huge. I've toyed with the idea of using Backbone + ASP.NET MVC together for a while now but I kept tripping up on all that code duplication between Backbone models and C# models. Node could be what launches the browser into a universal rich client host (and yes, HTML5 will help too).

The other crazy idea I had about using node is that this means less languages to learn. Imagine if you wrote JavaScript intensive apps with Node and backed it up with couchbase on the DB end. You would have JavaScript in your view, Javascript for business logic and JavaScript in the DB. The learning curve for a new developer to become productive would be the smallest learning curve that IT has seen in decades, probably for all time. This could change the landscape of IT forever. It wouldn't be such a bad idea to build a development team around that concept.

Wednesday, July 27, 2011

Git is a platform

This evening I stuck my head in at quickleft's hackfest downtown boulder. They gave a great intro to ruby & sinatra. Sinatra is mind-bendingly simple. It makes you wonder why you've been doing anything but sinatra.

Anyway, while I was playing around at the hackfest they introduced heroku, which is a cloud platform for ruby. Heroku uses git to let you manage your application's files on the server. Pushing a brand new repo creates a new domain name and sets up the infrastructure for your app. They built a very cool application on top of the git platform.

Github has been doing this for a while. I blogged earlier about github and the things they've done with git. The most public things include git as a blogging/wiki engine as well as a static website generator (github pages). You can also fork git-achievements and broadcast your mastery over git, like I did. Honestly, the things you can do with git are endless since it is, after all, nothing more than a versioning filesystem in user space.

I think this is the biggest thing that separates git from other version control systems. No one has done anything with SVN beyond simple pre or post-commit hook scripts. TFS has a lot of application infrastructure built around it, but it doesn't build on top of it's version control system. Neither does mecurial or bazaar, even though they are also distributed version control systems.

The git folks really focused on defining git as a standard rather than an application. By that I'm referring to how they defined objects, trees, packfiles, etc (see progit) instead of focusing on developing an application. For much of it's lifetime git was nothing but a hodgepodge of shell scripts and C libraries. Now days there are several varying implementations of git. The fact that git is so widely programatically accessible is making it insanely easy to leverage inside programs. I'm still waiting for a .NET app to do something big with git#...or maybe I could.

Sunday, July 10, 2011

Semantic versioning

I've seen some interesting software version sequences. Like Windows 3, 3.1, 3.11, 95, 95, ME, XP, Vista, 7. Or Oracle DBMS v5, v6, 7, 8, 8i, 9i, 10g , 11g (what does the g mean??). I've seen all sorts of version schemes to designate major versions, minor versions, patches, and other types of releases. (The worst ones are always when marketing gets involved).

Tom Preston-Werner formalized the major-minor-point release (X.X.X) scheme at semver.org. I highly recommend anyone who considers themselves a professional developer to read every word in the article at semver.org. The beauty of semantic versioning is that there isn't anything new or innovative about it at all. It's all what you already know to be true. All versions <1.0.0 are development versions. Once 1.0 hits, the public interface is solidified. If and only if you break backwards compatibility you have to increase the major version. Minor versions and point releases (1.X.0 and 1.0.X) are for various levels of new features and bug fixes.

When you release software labeled with semantic versions you make it easy for people to quickly asses how significant the release is (I might skip a point release and upgrade to minor releases, but I might avoid a major release due to the incompatibilities it might cause). It also forces the developers to exercise restraint in breaking compatibility with previous releases.

The trouble with semantic versions in the corporate world is that marketing always has ulterior motives. They want to release a major version to make the product feel alive; they want to downplay breaking changes to a minor version to keep customers; or they want to introduce new terms that mean nothing to the average user (XP for eXPerience, Vista because it sounds cool). Those names are great for development code-names but they detract from a buyer's experience (I use the term buyer loosely to mean any potential user) in determining compatibility between products.

In .NET assemblies, there are four segments supported with the AssemblyVersion and AssemblyFileVersion attributes (major, minor, build number, revision). This seems fine until you want to release alphas, betas and release candidates. The semantic version for a 1.0 beta release would be 1.0.0beta1 indicating that this is the first beta for the 1.0.0 release (you can use any string of alphabetical characters, not just beta). In a .NET assembly you do this as follows:

[assembly: AssemblyVersion("1.0.0")]
[assembly: AssemblyFileVersion("1.0.0.253")]
[assembly: AssemblyInformationalVersion("1.0.0beta1")]

The new attribute here is obviously AssemblyInformationalVersion, which is used to specify more arbitrary strings. It will show up in the Windows properties dialog as the assembly version (otherwise AssemblyVersion will be used). Also, the AssemblyFileVersion is used to indicate build numbers. So while working on the 1.0.0 release, we also have a continuous integration environment like Teamcity or Hudson building the code each night and incrementing the build version. However, continuous integration environments shouldn't need to have any impact on what you actually tag the version as.

As Tom says in the article, kinda sorta following the standard doesn't reap much benefit. But once we all start releasing software that conforms exactly to this standard, then users can more efficiently understand which two components are compatible and which aren't. I believe this applies to all software, not just software that supplies a public API.

Tuesday, June 28, 2011

Got a backbone?

Earlier, I posted about those lame hipster developers, as I call them. Mainly, I just find it a little hard to believe that anyone can create a truly scalable JavaScript app using node.

Recently I stumbled into Backbone (or rather I kept on hearing about it and finally checked it out). Backbone is a bare bones MVC framework for JavaScript that is meant to help give your JavaScript apps structure without weighing them down. Also, more important, is that Backbone is by no means mutually exclusive with jQuery. Actually they compliment each other quite nicely.

Back to those hipster developers. I don't often like to admit that a badly dressed 20-year-old can be right, and I still won't go so far as saying node.js is really a presentable solution for anything on the server, but the fact that they're expanding the infrastructure around JavaScript is really pushing me to think about how I can evolve my own .NET work. For me, Backbone is where it starts.

An Answer to Uncontrollably Messy JavaScript
I've written a lot of pages with big long blocks of jQuery chains and anonymous functions. It's such a huge pain to maintain or refactor that I sometimes end up rewriting. Part of the problem is just simply that the code is messy. But even when I break it down into smaller nugget sized functions I still have a fist-full of spaghetti code that is prone to unchecked regressions. I definitely need to test my code but

Backbone lets you organize your code into Models, Views and Controllers and Collections. If you go all the way with Backbone, you're going to be creating pageless apps where you load the page the first time, and you never reload the page (like GMail). Everything is data fed to the page via JSON services. Controllers let you bind bookmarks to functions (i.e. when a link gets clicked where href="#!/inbox" the link gets routed to an inbox function and handled there). Views bind models to HTML. They also keep the models bound to the HTML, so when newer fresher data arrives, the models are rebound to the page where necessary.

By modularizing code according to the MVC pattern, unit testing becomes significantly easier. Most of your normal issues like mocking the DOM & XHR become less important because your code is broken into smaller pieces. Besides being easier to test, it's just plain easier to understand also.

When testing, if you do require mocking facilities, I've heard that SinonJS is excellent for all types of mocking, and comes with built in server & XHR mocks. Also, a coworker is pushing me towards Behavior Driven Development and so Jasmine is a natural winner for a test framework.

I've heard people stress that Backbone is for web applications, not web sites. But at the same time, I don't think you need to go completely single-page to use Backbone either. In .NET, I don't really want to go single-page because MVC provides so much. But some of my pages that involve several page states could be dramatically simplified with an MVC approach. At bare minimum, I want to be able to simplify and test my client-side logic.

Sunday, June 26, 2011

Introducing NetLint

Last week our QA guys wrote up a bug that one of our new pages wasn't working. After a little investigation I figured out it was just a JavaScript file that was inadvertently merged out of existence while resolving merge conflicts. We also had something like this happen where the app would run locally on developer boxes but would fail miserably when we deployed to the test environment.

I don't really like giving the QA guys an excuse to blemish my reputation with bug reports, so I threw together a little tool to prevent this from ever happening again. Enter NetLint...

NetLint processes Visual Studio project files (*.csproj, *.fsproj, etc) and compares files that exist in the project file and the files that actually exist on disk. So if a JavaScript file exists on disk but isn't in the project file, NetLint will throw an exception summarizing this and any other discrepancies.

I also setup NetLint with simple file globbing functionality, so all files under bin/ and obj/ are ignored by default (you can also do custom patterns). I run NetLint from a unit test, so whenever anyone resolves merge conflicts they will instantaneously know if they missed a file.

The future of NetLint will be a staging ground for testing conventions. I'm licensing it under the MIT license, so hopefully no one should have any reservations due to licensing. I also created a NuGet package to make it even easier to use

Tuesday, May 24, 2011

Hipster developers

I'd like to know what the deal is with these new hipster developers, as I like to call them. You know, those guys who adore those new languages and frameworks until they start catching on. I mean, you have to respect them for putting in that initial work to bring technology forward, but eventually they just become a headache. Honestly, does node even have a chance of being a truly scalable solution?

Friday, May 13, 2011

Some useful git aliases

Git aliases are a great way to do more with less typing. Our team uses submodules to an extent which can sometimes be confusing. Some of these aliases help to clarify behavior. These are a few of my favorites.

git lg

This gives you a nicely formatted semi-graphical log view with users, branches, and remotes

git config --global alias.lg "log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %C(green)%an%Creset %Cgreen(%cr)%Creset' --abbrev-commit --date=relative"

git latest

This does a git pull on the current repository as well as all submodules

git config --global alias.latest '!sh -c "git pull && git submodule foreach \"git pull\""'

git virgin (getting to a pure state)

This will reset your changes and delete all untracked and ignored files (includes bin/ and obj/ directories)

git config --global alias.virgin '!sh -c "git reset HEAD --hard && git clean -fXd && git clean -fd"'

git harem (a whole lot of virgins)

This does a virgin for your repository as well as all submodules

git config --global alias.harem '!sh -c "git virgin && git submodule \"git harem\""'

Wednesday, April 20, 2011

Scripting with rake

Rake is a great twist on traditional make (honestly, I never really liked Ant or NAnt). On the surface it looks more like make than Ant or Nant, but you can leverage the full syntax and standard library of Ruby (and there's no weird rules about tabs). As a .NET developer, albacore augments rake nicely with tasks for MSBuild (building Visual Studio projects and solutions), NUnit, ASP.NET precompiler, modifying your AssemblyInfo.cs (like for bumping the version number), and many more.

Since rake is just ruby code, you can do just about anything, but most file manipulation routines are even easier to write in rake, because most everything is already imported and ready to use. Unlike make, Ant, and Nant, you don't have to start a separate project just to develop tools to use in a rakefile, just write a ruby function!

Building dependencies first
A lot of people who aren't already familiar with build languages make some common mistakes. Among them, not using dependencies correctly. For instance, given a website solution that references framework

msbuild :framework do |msb|
  msb.solution = 'framework/src/framework.sln'
end

msbuild :website do |msb|
  msb.solution = 'src/website.sln'
end

task :default => [:framework, :website]

The default task is the task that's executed when you just type rake at the CLI. The reason this is terrible is that it's procedural and inflexible. Now, if I do rake website the build fails because framework hasn't been built yet. Instead, each task should specify what other tasks it directly relies on. This script should change to:

msbuild :framework do |msb|
  msb.solution = 'framework/src/framework.sln'
end

msbuild :website => :framework do |msb|
  msb.solution = 'src/website.sln'
end

task :default => :website

This way both rake and rake website work the same. This leverages rakes dependency framework that is at the core of all build languages.

Using file tasks
The other point that people often forget is that build languages are oriented around files. Make tasks were oriented around questions like "does this file need to be created?". This is where rakes file task comes in very handy. For instance, the above tasks can become

$framework_dll = 'framework/src/framework/bin/Debug/framework.dll'

file $framework_dll => :framework

$website_dll = 'website/bin/Debug/website.dll'

file $website_dll => :website

msbuild :framework do |msb|
  msb.solution = 'framework/src/framework.sln'
end

msbuild :website => $framework_dll do |msb|
  msb.solution = 'src/website.sln'
end

task :default => $website_dll

This makes it so that framework and website are only built if they aren't built already and won't be attempted unless they're missing.

Arbitrary scripting
Rake is a great platform for hosting arbitrary scripts that you might write to automate your development process. I have scripts to bump the assembly version and subsequently commit to git, deploy to our test server, and I plan to make tasks to interact with redmine via it's REST API (something certainly not possible in NAnt). Basically, any little task that I might write a script for (which is quite a bit) can be imported into the rakefile and mounted as a task (yes, ruby is very modular).

Wednesday, April 13, 2011

Automocking containers are not just for mocks

In my last post I introduced MoqContrib's automocking container. In this post I want to describe what sets it apart from MoqContrib's previous automocking container and all other automocking containers that I've heard of thus far.

A Castle.Windsor contributor said that for unit tests, "it's recommended that you don't use the container at all, or if the test setup gets too dense because of dependencies, use an AutoMockingContainer." This is in response to a stack overflow question regarding how to remove components in order to replace them with mocks. There are others that agree with him.

I don't agree with Mauricio or Derek (from the links above). I strongly believe that there are several reasons to let an automocking container have real services registered that aren't mocks. The primary reason is for integration tests. This is where you are testing a system of modules, a subset of the entire system, but you still need to isolate those modules to just the system under test (SUT). So while the dependencies within the SUT are going to be implemented with real implementations, everything else is mocked. This is a partially mocked situation.

One of the big reasons to use an automocking container is just to simplify everything. Sure, you're setups are starting to get pretty long for unit tests, but sometimes you run into issues where there is already a component registered so you can't register a mock without first removing the original component. This is very tedious and totally ruins any love you might have had for your IoC container.

In MoqContrib 1.0 the container will favor the last component registered over everything else. This is handy because you can do setups by exception. For an integration test fixture you can setup everything as a production implementation and then just mock components as needed. You can also do it the other way and just override with production implementations. I believe this will lead to much cleaner tests and much less time tracking down "how that friggin' component got registered".

As far as the progress of a 1.0 release, I had originally said that it was going to be released last weekend. However, there have been some problems getting the community on board. I also realized that it was missing several important features. I will release a preview as soon as I get the current code stable.

Wednesday, April 6, 2011

Introducing MoqContrib Auto-mocking Container

The past couple weeks I have been working on an auto-mocking inversion of control container for Moq Contrib. The first results are almost ready to release in the form of an Alpha. The first container to be released will be Castle.Windsor, later we will release an Autofac container.

You will be interested in this project if you use an IoC container in conjunction with unit tests and mocking (with Moq). You probably find yourself writing setups like:

[SetUp]
public void Given()
{
 _service = Mock<IService>();
 Container.Register(For<IService>().Instance(service.Object));
}

[Test]
public void I_did_something() 
{
 var test = new TestThingy();
 test.DoSomething();
 
 _service.Verify(x => x.Something(), Times.Once();
}

When you use an auto-mocking container, the container will create mocks at resolve-time if it doesn't already have a component for it. So in the above example, the setup would drop out completely as there wouldn't be any need to explicitly create and register the mock:

[Test]
public void I_did_something() 
{
 var test = new TestThingy();
 test.DoSomething();
 
 _service.Verify(x => x.Something(), Times.Once();
}

We will release an alpha version of the Castle.Windsor auto-mocking container later this week. Soon after we will add an Autofac container and start working towards a regular release schedule. If you are interested, visit the site at codeplex and give feedback through the discussion groups.

Happy Mocking!

Wednesday, March 23, 2011

Object Incest

Note: I thought I had read this term from somewhere else, but after a quick internet search turned up only dirty videos, I think I may be the sole "coiner" of the term.

Many inexperience developers (and experienced ones too) have been known to make several common mistakes in object oriented design. Hence, the coining of the terms anti-pattern and code smell to refer to patterns of development (like design patterns) that lead to convoluted, overly complex code that costs exponentially to maintain and exhibits little value.

Object incest is a pattern where two unrelated classes are intimately dependent on each other. Simply put, if object A directly relies on object B and B relies directly on A, you have two incestual objects. This usually happens to intermediate developers who realize that they need separation of concerns and break a class into two classes without actually breaking the dependencies. While it is understandable (and almost respectable) why a developer might commit object incest, it is no less dangerous and harmful to a code base full of child objects.

Here is an example of object incest:

class Brother {
 public Sister MySister { get; set; }

 private void GetMyHairBrushed() {
  MySister.BrushHair(this);
 }

 public void DefendFromBullies(Sister sis) {
  // ...
 }
}

class Sister {
 public Brother MyBrother { get; set; }

 public void BrushHair(Brother bro) {
  // ...
 }

 private void GetRidOfBullies() {
  MyBrother.DefendFromBullies(this);
 }
}

This is wrong because the two objects are so involved that it's hard to tell them apart, breaking the principal of separation of concerns. You can fix this by extracting roles from the objects as interfaces. Therefore, each object depends on some kind of object that can fulfill a role. A brother object needs someone to brush his hair, a sister needs someone to defend her from bullies.

class Brother : IDefenderOfTheWeak, IPersonWithHair {
 public IHairBrusher MyHairBrushPartner { get; set; }
 
 private void BrushMyHair() {
  MyHairBrushPartner.BrushHair(this);
 }
 
 public void DefendFromBullies(IWeakling weakling) {
  // ...
 }
}

class Sister : IWeakling, IHairBrusher {
 public IDefenderOfTheWeak Defender { get; set; }
 
 public void BrushHair(IPersonWithHair hairyPerson) {
  // ...
 }
 
 private void FightOffBullies() {
  Defender.DefendFromBullies(this);
 }
}

In the second example, the two objects are no longer reliant on each other. Now they only rely on the roles that each of them provide. Down the road it will be much easier to create other objects that implement those interfaces (roles) like Husband and Wife.

Thursday, March 17, 2011

Unit testing databases - with NHibernate!

One of the pesky problems with databases is unit testing the database portion of your application. For instance, it's enough of a pain to tear down and restore data to it's original state, but it's even harder if your application code requires you to commit changes. A while ago I saw this stack overflow question that said you could wrap all your code in a TransactionScope like:

using (new TransactionScope())
{
    // Database access code here
}

When .Dispose() is called at the end of the using block, the code is supposed to roll back all transactions, even if they were committed. After reading the documentation I realized that any new transactions will use this transaction scope, and hence be rolled back when the transaction scope rolls back at the end of the using block.

This all seems like a great idea for ADO.NET code, but I was skeptical of using this with NHibernate because I know NHibernate does funny things with the session and how it creates transactions. Even though I've known about this trick for some time, I never trusted it or even took the time to actually test it...until now.

I tested this idea out inside the scope of our application code which I'm basically just pasting here. So bear with some of the abstraction code we have built up in IGenericDAO and Container.

[Test]
public void CheckNHibernateMappings()
{
    using (new TransactionScope())
    {
        // IGenericDAO is our abstraction layer for accessing NHibernate
        var dao = Container.Resolve<IGenericDAO<WorkflowTransition>>();
        var obj = new WorkflowTransition() { FromFk = 1, ToFk = 2, IsAllowed = true, WorkflowFk = 1, RightFk = 1 };
        dao.Save(obj);
        dao.CommitChanges();

        var selected = dao.SelectById(obj.WorkflowTransitionId);
        Assert.That(selected.WorkflowTransitionId, Is.GreaterThan(0));
        Assert.That(selected.To, Is.EqualTo(2));
    }
}

I placed a breakpoint at line 12, after CommitChanges(). I debugged the unit test and when it stopped at the breakpoint I ran this query in SSMS:

select * from WorkflowTransitions with (nolock)

The query returned the row I just inserted. The nolock table hint means to ignore any locks that might be on the table and read everything, even uncommitted data. This means we can see the results of NHibernate's insert statement without having to mess with the SQL profiler. If you run the query without the nolock option it hangs until timeout. I then let the test finish and ran the query again. This time the row was gone!

Apparently, this TransactionScope is fully capable of rolling back all transactions, even if they were created automagically by NHibernate. I presume this means it will work with any ORM framework, not just NHibernate.

Monday, March 14, 2011

Introducing ObjectFlow

I've been assigned to create a light and flexible workflow for two separate projects. After doing some research, I found that there really aren't any light, easy to use and understand, workflows. I noticed that objectflow lets you define workflows in C# with an easy-to-read fluent interface, but after digging into it I realized it was missing some crucial features. For instance, there was no clear way that you could pause a workflow in the middle so that a real person can interact with it.

I contacted the maintainer of the project and have contributed a large portion of functionality that makes it easy to define workflows that include people. Here is a sample workflow:

var open = Declare.Step();
var wf = new StatefulWorkflow<SiteVisit>("Site Visit Workflow")
  .Do(x => x.GatherInformation())
  .Define(defineAs: open)
  .Yield(SiteVisit.States.Open)
  .Unless(x => x.Validate(), otherwise: open)
  .Do(x => x.PostVisit());

// And send an object through
var visit = new SiteVisit();
wf.Start(visit);

// It stops at the Yield, maybe persist it in a database and display a page to the user
wf.Start(visit);

// extension methods to check if it's still in the workflow
if (visit.IsAliveInWorkflow("Site Visit Workflow"))
    wf.Start(visit);

This workflow is fairly simple and demonstrates how you can create a module for defining workflow and isolate all business logic in data objects (models and view-models work great here). I was initially concerned with the idea of creating conditional goto constructs, but after more thought I decided that this shouldn't be a significant problem as long as workflows stay simple and there is a clear separation from business logic and workflow logic.

There is a lot more to this project - and to the features I contributed. However, I haven't even put forth a good effort in developing the official documentation, so perhaps I'll write about this more after developing the core documentation a little more. I think this is an excellent solution for companies who want to quickly through together workflows without a significant barrier to understanding. I think I will continue developing on ObjectFlow as long as I have something I feel I can add.

Friday, March 4, 2011

Crass grammar drives me crazy

I recently had a conversation with someone that went something like:

Me: Yeah, I went to the Sunflower market down on 287 & South Boulder Rd
PersonX: That's one long ass walk

How am I supposed to reply to that? I could say, "Not really, I wasn't ass walking the whole way" or "Yes, my ass is long, I should get in shape". No wonder people have such a hard time learning English...

Thursday, March 3, 2011

I'm becoming a DVCS snob

Today i was looking at open source workflow frameworks for work today and paused on objectflow. I almost decided not to use the library because they're still using SVN or TFS (I'm not real sure which) even though codeplex supports Mecurial.

I'm coming in with the idea that I may contribute to the project if I find, down the road, that I have something that could be added to the project. Submitting patches seems so painful compared to a simple pull request. The workflow of a distributed version control system (DVCS) makes sharing code so incredibly easy that it causes me psychological pain to think about going back to SVN.

On the other hand, one benefit of objectflow being available as SVN is that I can easily use git-svn to create a git clone that can be included as a submodule. It wouldn't be quite as straight-forward if it were a mecurial repository. Submodules are an excellent feature of Git!

Saturday, February 26, 2011

NUnit Extension Methods

I've always used NUnit for testing code so it's naturally the framework I'm most familiar with (I haven't used anything else). I learned unit testing using the classic Assert.AreEqual(expected, actual) methods. Although, I was finding my tests slightly confusing to read - I sometimes can't remember which comes first, expected or actual.

More recently I've been getting into v2.5 including the new asserts - Assert.That(actual, Is.EqualTo(expected)). I think this makes a lot of sense and I often find myself using Assert.That most of the time just because it makes sense.

Recently, a coworker created a few extension methods that I'm finding quite handy:

public static void ShouldBe(this object @this, object expected) {
    Assert.AreEqual((dynamic)expected, (dynamic)@this);
}
public static void ShouldNotBe(this object @this, object expected) {
    Assert.AreNotEqual((dynamic)expected, (dynamic)@this);
}
public static void ShouldBeNull(this object @this) {
    Assert.IsNull(@this);
}
public static void ShouldNotBeNull(this object @this) {
    Assert.IsNotNull(@this);
}

I've completely fallen in love with how this reads: actual.ShouldBe(expected). It also makes me giggle to do actual.ShouldBeNull() (Don't you love extension methods?). This makes unit testing so easy...

Sunday, February 13, 2011

The internal secrets of Git

Thursday night I attended a lecture at the Boulder Linux user's group called Unlocking the Secrets of Git by Tom, one of the co-founders of Github. This was extremely eye-opening. Up until now I had viewed Git as simply a distributed version control system. Tom showed us how to manipulate Git's internal file format and demonstrated that Git is actually a filesystem in userspace with built-in versioning and synchronization. He demonstrated how, by storing a SHA1 hash of files, Git is (1) extremely fast at comparing files and (2) doesn't actually care about the file name - it just cares about the contents of files. This is important when you're renaming files - the filename is generally unimportant in the grand scheme of things.

Tom also showed us several open source projects that build upon the concept of Git as a filesystem. One was a highly efficient backup system. Another is a static site generator. There were many more. The point here is that Git is destined to be not just version control; it will be a feature-complete platform for anything that requires a filesystem with versioning and synchronization.

The critical component to the success of Git as a plaform is libgit2, a C library for interacting with Git. The reason why this is the critical component is that many people had been re-creating the functionality of Git. By combining this functionality into a library, the logic only has to be written once and can be used by everyone else. The other reason why this is a critical component is because libgit2 is being released under a permissive license that allows it to be easily used by many other people and projects without getting into any legal snafu's.

Most importantly, Thursday night I realized that the tech community of Boulder is so complex and complete, I should never get bored here. I haven't lived here for a full six months yet but already I feel like I can't leave this city.

Wednesday, January 19, 2011

Mind control

I found this blog post about a couple Harvard students who wrote some [GPL'd] software for controlling worms' minds. They can control how these worms move and even make them lay eggs!

The implications of this are obviously huge. This is only an academic project now, but in a couple decades I wonder if we'll see animals used like machines? I guess there's several other ideas you could draw from this, but no matter how you view it, it's a fascinating idea.

Sunday, January 9, 2011

Declaring the Future of Programming

Programming languages have developed significantly over the past several decades. I hypothesize that this development has tended more towards declarative syntax than imperative. The future of programming languages will only become more declarative in the years to come.

In the beginning was machine code. Programmers wrote programs by stringing together arcane byte codes of instructions and parameters. Programs were getting pretty hard to read so they made assemblers so you could write instructions in plain text, complete with comments. An assembler program would process the source code and turn each instruction into it's equivalent machine code. This is imperative programming at its most pure state.

When the first C compiler was written it immediately became popular because the programmer only had to declare what should happen in the program and the compiler would generate the necessary machine code to make that happen. Hence why you can write a C program that can be compiled for Linux, Windows and Mac with zero changes to the source code. However, C and C++ are still imperative languages in most other aspects because the thought process is still very much a "do this, now do this, now do this" algorithmic sequence of instructions.

Query Languages

The hallmark of declarative languages thoughout history is probably SQL (referring strictly to set operations here). In SQL you describe the result set and let the DBMS decide the best way to produce that result set. For instance, consider this query:

select p.FirstName, p.LastName, a.AccountName
from Person p
inner join Account a
on p.PersonId = a.ResponsiblePerson
where a.IsActive = 1
order by p.FirstName, p.LastName

First we describe the columns that we want (this actually happens last, if you want to be technical). In the from clause we say what tables we want information from and specify how we want them matched up using the on clause of the join. In the where clause we specify what criteria for the rows that we want to show and in the order by we describe the sort order.

All this was done strictly declaratively. If you have the opportunity to look at the execution plan, it all ends up being quite elaborate. It might consult two or three indexes before actually joining rows, selecting columns and ordering the result set not to mention all the asynchronous locking that took place so as not to run into race conditions. If we had to write this in C# or Java code it would be an extremely gnarly component and would probably be buggy and slow.

Expression Trees in C#

Interestingly, .NET land is also developing into a declarative playground. The biggest step in this direction happened with Linq and it's expression trees. Now, the Linq query syntax is declarative, but I'm referring to something more basic. Expression trees can be broken down at run time by a processor that can analyze the contents of a lambda that it was passed. For instance, NHibernate can receive a method call like:

var timsAccounts = accounts.Where(x => x.ResponsiblePerson == "Tim");

and pull out the meaning (ResponsiblePerson = Tim) and convert it into a SQL "where" clause at run time (sql = "where a.ResponsiblePerson = 'Tim'). The implications of this are wild, and in recent months and years have become very powerful. Examples include Fluent NHibernate, Moq, and Castle Windsor's fluent registration API. Both castle windsor and NHibernate both used to use XML configuration files but have since moved towards using expression trees in combination with dynamic proxies and interceptors to configure via code. This declarative approach is leading towards less code that has potential to be more efficient.

Treatise on Domain Specific Languages

The topic of domain specific languages deserves an entire blog post. SQL and CSS are the obvious examples, but there are hundreds more. In one of my internships a coworker wrote a DSL to specify sort order for dictionaries for arcane natural languages and scripts. A simple DSL is much easier to develop than a GUI for the same purpose and can many times be easier for a non-techy user to learn and become productive in.

The sad news is that colleges and universities are putting less focus on compiler & parser classes. The assumption being that we have all the languages we need, why would we need more? The answer is simple: by providing a simple syntax to describe problems or solutions we can simplify the entire process of arriving to that solution. If the problem is abstracted away from the solution we can easily leverage constructs like multi-threading and highly optimized solutions. Sometime you should take a look at the byte codes that your compiler produces - ask yourself if you could have even thought of those sorts of mind bending tricks.

We need domain specific languages because they simplify problems. They create more effective abstraction than even inversion of control frameworks. Unfortunately, less people are learning about string processing these days. How many people have you worked with actually consider themselves proficient in regular expressions or compiler generators? (yet two more declarative DSLs that simplify solutions)

Conclusion

Anytime you write code that is less imperative, it allows the layer underneath more room to innovate efficient algorithms. Surely this isn't surprising since any good programmer would feel exactly the same way towards a micro-managing supervisor. So after saying all this, it should be clear why I believe that the future of programming is declarative. Declarative syntaxes allow us to simplify the problem by simply stating what the problem is (or describing what the solution looks like) and allowing the underlying engine to determine the solution. As such, I believe we will be seeing the number of domain specific languages multiply in the years to come.

Sunday, January 2, 2011

Would I choose Git again?

I wrote a post a few months ago about the reasons we chose to use Git over subversion and I think it's time to follow up that post and write about how its gone so far. We're an ASP.NET outfit, and as such there are a few considerations that might not apply to, say, the Linux kernel team. I'm going to break this up into three parts: my perspective, my team's perspective, and some tips for anyone who might want to also try using Git.

My Experiences With Git
I seriously love using Git. I make a branch for everything I do just like they recommend. An old-school member of our team made a comment, "we always considered branches as something to be avoided", hinting at SVN branches' trait of being hard to manage and keep in sync with the trunk. Git branches are very different from SVN branches - they are very light and easy to keep up to date.

Git has some seriously awesome merging mechanisms. First, you can select from a list of merge algorithms (you really only need one of these, but hey, its great to have choices just in case). Then they also have rebase and cherry-picking. These last two aren't regular merges because their algorithms look at the history of the entire repository and make several [and possibly hundreds of] incremental merges. Because these schemes take history into account, you can actually do some serious refactoring and still apply patches to both the production and development branches with relatively little effort.

Our team develops and maintains a web application that our company sells as a service. As such, we don't spend time on installers or maintaining previous versions because the only versions that matter are the version that's in production and the development version. Git allows us to cherry-pick hotfixes from development into production (or vice versa) without really thinking much. This would have been a small nightmare in SVN (and invoke suicidal tendencies in TFS). Back when we were using TFS there really wasn't any process or procedure that went into hotfixes. You basically just updated production. With Git, its incredibly easy to just stash whatever you're doing, checkout the production branch, fix a critical bug, test & deploy it, an then cherry pick it back into the dev branch. Git works well for people who get interrupted by escalations (everyone??).

My Team's Experiences
My team hates Git. Well, that's a bit harsh and premature, but there was some backlash when we first switched. About three weeks in I gave a brown bag lunch presentation on Git to teach everyone how to use it. After that people generally caught on to the basics with exception of some merging snafus.

Merging is actually an interesting point. TFS merging drove me nuts. Perhaps it was just the merge program, but I always felt like I had my hands tied. Now that we're using Git I feel free again to branch and merge at will, but one of my teammates seemed to be (at least at first) completely confused by Git merging. This was [probably] entirely due to the fact that Git Extensions didn't come with kdiff by default (they now offer a convenient all-in-one installer that includes kdiff & Git).

Another point of confusion in using Git GUIs was that TortoiseGit makes it very difficult to see what's different between local and remote repositories. I think the Tortoise crew made too much of an effort to make it feel like TortoseSVN when in reality it left some very important questions unanswered (TortoiseSVN only has to answer 1 or 2 important questions, but Git GUIs need to answer 4 or 5 important questions). Among these unanswered questions are "what branch am I on?" and "have I pushed this to the server yet?". TortoiseGit doesn't provide a clear answer to either of these questions, so I had everyone make a switch to Git Extensions.

Tips for Future Git Users
We were forced to learn a few lessons pretty quickly. I'll list them here in paragraph format...

GUIs are still young. Most Git users are sick Linux users who live by vi & grep, so developing a decent GUI hasn't really been a priority for Git (there is an official Git GUI that ships with Git, but it possesses some serious suckage). If you work in a Microsoft/Windows outfit there is no conceivable way your coworkers will be happy with command line, so a good GUI is critical. Use Git Extensions!

Setting up a central server is not entirely straightforward. While SVN is distributed as either a client or a server, Git has no reason to require a central server so this was also an afterthought. Use gitolite on Linux. Use the package manager method of installing it, its very easy to get it started and its also easy to maintain.

SSH keys are problematic. Try to use putty/plink to manage keys if possible. OpenSSH is very un-Windows-like.

Unit tests are good and they can make Git shine even brighter. If you maintain a generally complete unit test suite you can have Git utilize your test runner to quickly find where code started breaking. The "bisect" command can take a program or command that returns 0 or 1 (standard success/failure codes, so throwing exceptions would work) and perform a binary search through past commits to find the first place where a test started failing. This could also work great if you're a scripting guru - write a short script to check for some text (like "CREATE TABLE X") in a particular file and Git will do the leg work.

Conclusive Thoughts
Git is very powerful and can adapt to any workflow. If process is important to you, Git will enable you in whatever process you choose. If process isn't important, Git won't get in your way. It is very scalable via its distributed nature (ref dictator and lieutenants). It's also great for small personal projects that I do in my spare time. I can still have code version controlled without sharing it with anyone, but when I want to I can push it to Github (another awesome idea). However, if your coworkers are generally stagnant and opposed to change, Git will drive them nuts and you will hate your life. Choose Git only if you want a program that will abstract away mundane tasks like merging but you don't mind having to change your world view towards version control.