Tim Kellogg

Blog is moving

2012-04-22T10:40:00.002-07:00

Thanks to everyone who reads my blog. Recently I made the move and switched to using Github pages for my blog. I also bought a domain name that sounds kind of cool. Now I have a place where I can add other content besides blog posts.

I made the switch because now I can write blog posts in markdown on Vim, and offline so I can blog while riding the bus or on an airplane. I also found that writing code in Blogger was a huge pain, but jekyll provides inline code highlighting for a large number of syntaxes. After considering it for a while, I realized the benefits of switching far outweighed the features blogger offers.

I've already written some pretty interesting posts. I have a lot more planned, so please keep with me. If you follow this blog via the atom feed, I also have a new atom feed, so please update your links!

Why Object IDs & Primary Keys Are Implementation Details

2012-03-24T14:07:00.000-07:00

Recently I wrote a post about a project that I was working on with an abstracted data layer concept that can work in the context of either relational or document data store. In retrospect I think I brushed too quickly over the details of why I think object identifiers (and primary keys) are a part of the implementation that should be hidden, when possible. To explain what I mean I'll use a surreal-world story.

The Situation

You are the chief software engineer at a software company. One day your product manager comes to you with a list of ideas for a new product where users can post definitions to slang words, like a dictionary. He says people are going to love this new app because everyone has a different idea of what words mean. After talking with him to establish ubiquitous language and identify nouns and verbs, you crank up some coding music and hack out some model classes.

A weekend later you finish coding the app using Int32s (int) as the identity data type for most of your models because it's usually big enough and works well as a primary key. Honestly, you didn't really think about it because its what you always do.

After the launch your app quickly gains popularity with the user base doubling every day. Not only that, but as more definitions get posted, more people are attracted to the site and post their own word definitions. While reviewing the exponential data growth figures, your DBA decides that Definition.Id should be changed to an Int64 (long) to accommodate the rapidly multiplying postings.

Let's stop for a minute and review what the business needs were. Your product manager wants an app where people can post words and definitions. Each word has many definitions. There's no talk in the business domain of tables and primary keys. But you included those concepts in the model anyway, because that's how you think about your data.

The DBA chose to make the ID into a larger number to accommodate a larger amount of data. So now to help optimize the database, you are forced to update all your business logic to work nicely with the data logic.

Data Logic Was Meant to Live in the Database

The trouble with tying data logic closely to business logic is that the database isn't part of your business plan. As your application grows you'll have to tweak your database to squeeze out performance - or even swap it out for Cassandra. Databases are good at data logic because they are declarative. You can usually tune performance without affecting how the data is worked with. When you place an index, it doesn't affect how you write a SELECT or UPDATE statement, just how fast it runs.

At the same time, databases are also very procedural things. When you put business logic in stored procedures you lose the benefits of object oriented programming. It also makes unit tests complicated, slow, and fragile (which is why most people don't unit test the database). In the end, it's best to let your database optimize how data is stored and retrieved and keep your domain models clean and focused on the business needs.

The Type of the Object ID Is an Implementation Detail

Lets say you hire a new COO that lives in Silicon Valley and thinks the latest coolest technology is always the gateway to success. With the new growth he decides that you should rewrite the dictionary application to use MongoDB because it's the only way your application can scale to meet the needs of the business. While evaluating Mongo you draw out what an example word and definitions might look like when stored as BSON:

In Mongo, you usually would store the Definitions inline with the Word. Now there is no need for a Definition.Id or Definition.WordId because all of this is implicit. Not only that, but Word.Id is now an ObjectId - a very different 12 byte number that includes time and sequence components. In order to update your application to work with Mongo, you'll have to update all references IDs to use these ObjectIds.

The ID is an implementation concern. In a centralized SQL database, sequential integers make sense. In a distributed environment like Mongo, ObjectIDs offer more advantages. Either way, the type of your ID is an implementation detail.

Encapsulation Requires That You Hide Implementation Details

Most OO programmers understand that encapsulation means that an object has or contains another object. However, some forget that a large part of encapsulation is that you should keep the implementation details of an object hidden from other objects. When the details of an object leak into other objects, the contract is broken and you lose the benefits of the OO abstraction.

Any ORM tool should give you the ability to select protected (if not private) members of the object to be persisted. If it doesn't, it's not using because it'll cause too great of a compromise in design. This is how we should have been allowed to write our objects from the start:

But Dynamic Languages Diffuse The Problem

If you're in a dynamic language like Ruby or Node.js this is less of an issue. Most of my argument hinges on the idea that your API will latch onto the object's ID and insist that all methods that use it will match. This is really just a constraint of strict statically typed languages. Even implicit typing will mitigate the issue some.

You can notice above that I got around the constraint by using object as the ID type. This is really what you want. It's telling the compiler and API that you really, shouldn't care what the type is - it's an implementation detail. You shouldn't run into many problems as long as you are keeping the ID properly encapsulated within the object.

Abstract Data Layer Part 1: Object ID Types And Conventions

2012-03-19T20:56:00.000-07:00

In February I went to the MongoDB conference in Boulder. That day was my first real taste of any sort of document oriented database. Since then I've played around with Mongo in C#, Node.JS and natively in the Mongo shell. Since then, I also can't help feeling overwhelmingly happy when thinking about how I can use Mongo for a project.

At Alteryx we're entering a project where we require some specific business needs. We require an extremely fast and scalable database, hence Mongo. But we also need to package our product for on-premise installations, which I hear requires that we also support certain SQL databases.

...I don't actually understand why enterprises insist on using SQL. I'm told that enterprise DBA's want control over everything, and they don't want to learn new products like MongoDB. To me, it seems that 3rd products that are bought would be exempt from DBA optimizations & other meddling. But I guess I wouldn't know what it takes to be an enterprise DBA, so I'll shut up about this now. Just my thoughts...

Since relational databases are a lot different than document oriented databases I decided to use NHibernate as an ORM since they've already figured out a lot of the hard problems. I chose NHibernate over Entity Framework mainly because I already know NHibernate, and I know that it has good support across many databases. Nothing against EF in particular.

I've been working on this for a week or so. I've gotten pretty deep into the details so I thought a blog post would be a good way to step out and think about what I've done and where I'm going. The design is mostly mine (of course, I stand on the backs of giants) and really just ties together robust frameworks.

Convention Based Object Model

In order to remain agnostic toward relational/document structure, I decided that there would have to be some basic assumptions or maxims. I like the idea of convention-based frameworks and I really think its the best way to go about building this kind of infrastructure. Also, conventions are a great way to enforce assumptions and keep things simple.

IDs Are Platform Dependent

It's not something I really thought about before this. In relational databases we'll often use an integer as the object ID. They're nice because they're small, simple, and sequential. However, Mongo assumes that you want to be extremely distributed. Dense sequential IDs (like int identity) run into all kinds of race conditions and collisions in distributed environments (unless you choose a master ID-assigner, which kind of ruins the point of being distributed).

MongoDB uses a very long (12 byte) semi-sequential number. It's semi-sequential in that every new ID is a bigger number than the IDs generated before it, but not necessarily just +1. Regardless, it's impractical to use regular integers in Mongo and also a little impractical to use long semi-sequential numbers in SQL.

As a result, I chose to use System.Object as the ID type for all identifiers. NHibernate can be configured to use objects as integers with native auto-increment after some tweaking. The Mongo C# driver also supports object IDs with client-side assignment.

Ideally, I would like to write some sort of IdType struct that contains an enumeration and object value (I'm thinking along the lines of a discriminated union here). This would help make IDs be more distinctive and easier to attach extension methods or additional APIs. I'd also like to make IDs protected by default (instead of public).

The Domain Object

I also created a root object for all persistent objects to derive from. This is a fairly common pattern, especially in frameworks where there is a lot of generic or meta-programming.

I had DomainObject implement an IDomainObject interface so that in all my meta-programming I can refer to IDomainObject. That way there shouldn't ever be a corner case where we can't or shouldn't descend from DomainObject but have to anyway (separate implementation from interface).

The User and Name objects are simple, as you can expect any NHibernate object model to look like. The idea is to keep them simple and keep business and data logic elsewhere.

Are You Interested?

From what I can tell, I think we're breaking ground on this project. It doesn't seem like too many people have tried to make a framework to support both relational and document data stores. Initially I was hesitant to support both relational and document stores. But I think there are some excellent side effects that I will outline in upcoming posts.

The content I've written about so far is only a small fraction of what it took to get this on it's feet. Someone once said that you should open source (almost) everything. So, if you (or anyone you know) would like to see the full uncensored code for this, let me know so I can start corporate conversations in that direction.

Discriminated Unions in C# Mono Compiler

2012-03-10T12:25:00.001-08:00

Recently I've been using F# a bit. F# is .NET's functional language (the syntax of F# 1.0 was backward compatible with OCaml, but 2.0 has diverged enough to make it more distinct). Learning F# was a huge mind-shift from the C-family of languages. Of all the features of F#, like implicit typing, tail recursion, and monads, many people list discriminated unions as their favorite.

Discriminated unions feel like C# enums on the surface. For instance, a union that can represent states of a light switch:

This example is really no different from C# enums. Discriminated unions, however, can hold data. For instance, consider when our light switch needs to also be a dimmer:

In C# we would have had to rewrite this whole program to handle the new dimmer requirement. Instead, we can just tack on a new state that holds data.

When you're deep in the F# mindset, this structure makes perfect sense. But try implementing a discriminated union in C#. There's the enum-like part, but there's also the part that holds different sizes of data. There's a great stackoverflow answer that explains how the F# compiler handles discriminated unions internally. It requires 1 enum, 1 abstract class and n concrete implementations of the abstract class. It's quite over-complicated to use in every-day C#.

Nevertheless, I really want to use discriminated unions in my C# code because of how easy they make state machines & workflows. I've been brainstorming how to do this. There are several implementations as C# 3.5 libraries, but they're cumbersome to use. I've been looking at the source code for the mono C# compiler, and I think I want to go the route of forking the compiler for a proof-of-concept.

I'm debating what the syntax should be. I figure that the change would be easier if I re-used existing constructs and just tweaked them to work with the new concepts.

I've been debating if the Dimmed case should retain the regular case syntax or get a lambda-like syntax:

I'm leaning toward the lambda syntax due to how C# usually handles variable scope. I've barely just cloned the mono repository and started reading the design documents to orient myself with the compiler. This could be a huge project, so I'm not sure how far I'll actually get. But this is a very interesting idea that I want to try hashing out.

One Thing I Learned From F# (Nulls Are Bad)

2012-02-29T07:16:00.000-08:00

Recently I started contributing to VsVim, a Visual Studio plugin that emulates Vim. When he was starting the project, Jared Parsons decided to write the bulk of it in F#. He did this mostly as a chance to learn a new language but also because it's a solid first class alternative to C#. For instance, F#'s features like pattern matching and discriminated unions are a natural fit for state machines like Vim.

This is my first experience with a truly functional language. For those who aren't familiar with F#, it's essentially OCaml.NET (the F# book uses OCaml for it's markup syntax), but also draws roots from Haskell. It's a big mind shift from imperative and pure object oriented languages, but one I'd definitely recommend to any developer who wants to be better.

Since I've been working on VsVim, I've been using F# in my spare time but C# in my regular day job. The longer I use F# the more I want C# to do what F# does. The biggest example is how F# handles nulls.

In C# (and Ruby, Python, and any imperative language) most values can be null, and null is a natural state for a variable to be in. In fact (partly due to SQL), null is used whenever a value is empty or doesn't exist yet. In C# and Java, null is the default value for any member reference, you don't even need to explicitly initialize it. As a result, you often end up with a lot of null pointer exceptions due to sloppy programming. After all, it's kind of hard to remember to check for null every time you use a variable.

In F#, nothing is null (that's not entirely true, but in it's natural state it's true enough). Typically you'll use options instead of null. For instance, if you have a function that fails to find or calculate something you might return null in imperative languages (and the actual value if successful). However, in F# you use an option type and return None on failure and Some value on success.

Here, every time you call find(kittens) you get back an option type. This type isn't a string, so you can't just start using string methods and get a null pointer exception. Instead, you have to extract the string value from the option type before it can be used.

At this point you might be thinking, "why would I want to do that? It looks like a lot of extra code". However, I challenge you to find a crashing bug in VsVim. Every time we have an instance of an invalid state we are forced to deal with it on the spot. Every invalid state is dealt with in a way that makes sense.

If we wrote it in C# it would be incredibly easy to get lazy while working late at night and forget to check for null and cause the plugin to crash. Instead, the only bugs we have are behavior quirks. If we ever have a crashing bug, the chances are the null value originated in C# code from Visual Studio or the .NET Framework and we forgot to check.

Discussion on HN

C# Reflection Performance And Ruby

2012-02-10T06:16:00.000-08:00

I've always known that reflection method invocations C# are slower than regular invocations, but I've never never known to what extent. So I set out to make an experiment to demonstrate the performance of several ways to invoke a method. Frameworks like NHibernate or the mongoDB driver are known to serialize and deserialize objects. In order to do either of these activities they have to scan the properties of an object and dynamically invoke them to get or set the values. Normally this is done via reflection. However, I want to know if the possibility of memoizing a method call as an expression tree or delegate could offer significant performance benefits. On the side, I also want to see how C# reflection compares to Ruby method invocations.

I posted the full source to a public github repo. To quickly summarize, I wrote code that sets a property on an object 100 million times in a loop. Any setup (like finding a PropertyInfo or MethodInfo) is not included in the timings. I also checked the generated IL to make sure the compiler wasn't optimizing the loops. Please browse the code there if you need the gritty details.

Before I get into the implementation details, here are the results:

You can see that a reflection invoke is on the order of a hundred times slower than a normal property (set) invocation.

Here's the same chart but without the reflection invocation. It does a better job of showing the scale between the other tests.

Obviously, the lesson here is to directly invoke methods and properties when possible. However, there are times when you don't know what a type looks like at compile time. Again, object serialization/deserialization would be one of those use cases.

Here's an explanation of each of the tests:

Reflection Invoke (link)

This is essentially methodInfo.Invoke(obj, new[]{ value } on the setter method of the property. It is by far the slowest approach to the problem. It's also the most common way to solve the problem of insufficient pre-compile time knowledge.

Direct Invoke (link)

This is nothing other than obj.Property = value. Its as fast as it gets, but impractical for use cases where you don't have pre-compile time knowledge of the type.

Closure (link)

This isn't much more flexible than a direct invoke, but I thought it would be interesting to see how the performance degraded. This is where you create a function/closure ( (x,y) => x.Property = y) prior to the loop and just invoke the function inside the loop (action(obj, value)). At first sight it appears to be half as fast as a direct invoke, but there are actually two method calls involved here, so it's actually not any slower than a direct invoke.

Dynamic Dispatch (link)

This uses the C# 4.0 dynamic feature directly. To do this, I declared the variable as dynamic and assigned it using the same syntax as a direct invoke. Interestingly, this performs only 6x slower than direct invoke and about 20x faster than reflection invoke. Take note, if you need reflection, use dynamic as often as possible since it can really speed up method invocation.

Expression Tree (link)

The shortcoming of most of the previous approaches is that they require pre-compile time knowledge of the type. This time I tried building an expression tree (a C# 3.0 feature) and compiled a delegate that invokes the setter. This makes it flexible enough that you can call any property of an object without compile-time knowledge of the name, as long as you know the return type. In this example, like the closure, we're indirectly setting the property, so two method calls. With this in mind, it took almost 2.5 times as long as the closure example, even though they should be functionally equivalent operations. It must be that expression trees compiled to delegates aren't actually as simple as they appear.

Expression Tree with Dynamic Dispatch (link)

Since the expression tree approach requires compile-time knowledge of the return type, it isn't as flexible. Ideally you could use C# 4.0's covariance feature and cast it to Action which compiles, but fails at runtime. So for this one, I just assigned the closure to a variable typed as dynamic to get around the compile/runtime casting issues.

As expected, it's the slowest approach. However, its still 16 times faster than direct reflection. Perhaps, memoizing method calls, like property sets and gets, like this would actually yield a significant performance improvement.

Compared To Ruby

I thought I'd compare these results to Ruby where all method calls are dynamic. In Ruby, a method call looks first in the object's immediate class and then climbs the ladder of parent classes until it finds a suitable method to invoke. Because of this behavior I thought I would be interesting to also try a worst-case scenario with a deep level of inheritance.

To do this fairly, I initially wrote a while loop in Ruby that counted to 100 million. I rewrote the while loop in n.each syntax and saw the execution time get cut in half. Since I'm really just trying to measure method invocation time, I stuck with the n.each syntax.

I honestly thought C# Reflection would be significantly faster than the Ruby with 5 layers of in inheritance. While C# already holds a reference to the method (MethodInfo), Ruby has to search up the ladder for the method each time. I suppose Ruby's performance could be due to the fact that it's written in C and specializes in dynamic method invocation.

Also, it interests me why C# dynamic is so much faster than Ruby or reflection. I took a look at the IL code where the dynamic invoke was happening and was surprised to find a callvirt instruction. I guess I was expecting some sort of specialized calldynamic instruction (Java 7 has one). The answer is actually a little more complicated. There seems to be several calls - most are call instructions to set the stage (CSharpArgumentInfo.Create) and one callvirt instruction to actually invoke the method.

Conclusion

Since the trend of C# is going towards using more Linq, I find it interesting how much of a performance hit developers are willing to exchange for more readable and compact code. In the grand scheme of things, the performance of even a slow reflection invoke is probably insignificant compared to other bottlenecks like database, HTTP, filesystem, etc.

It seems that I've proved the point that I set out to prove. There is quite a bit of performance to be gained by memoizing method calls into expression trees. The application would obviously be best in JSON serialization, ORM, or anywhere when you have to get/set lots of properties on an object with no compile-time knowledge of the type. Very few people, if any, are doing this - probably because of the added complexity. The next step will be to (hopefully) build a working prototype.

Thoughts on the C# driver for MongoDB

2012-02-03T20:17:00.000-08:00

I recently started a new job with a software company in Boulder. Our project this year is rewriting the existing product (not a clean rewrite, more like rewrite & evolve). One of the changes we're making is using MongoDB instead of T-SQL. Since we're going to be investing pretty heavily in Mongo we all attended the mongo conference in Boulder on Wednesday. The information was great and now I'm ready to dig into my first app. Today I played around with some test code and made some notes about features/shortcomings of the C# driver.

First of all, the so-called "driver" is much full featured than a typical SQL driver. It includes features to map documents directly to CLR objects (from here on I'll just say document if I mean Mongo BSON document and object for CLR object). There's plans to support Linq directly from the driver. So right off I'm impressed with the richness of the driver. However, I noticed some shortcomings.

For instance, all properties in the document must be present (and of the right type) in the object. I perceived this as a shortcoming because this is unlike regular JSON serialization where missing properties are ignored. After thinking a little further, this is probably what most C# developers would want since the behavior caters toward strongly typed languages that prefer fail-fast behavior. If you know a particular document might have extraneous properties that aren't in the object, you can use the BsonIgnoreExtraElements attribute.

Thinking about this behavior, refactor renaming properties could be less trivial. You would have to run a data migration script to rename the property (mongo does have an operation for renaming fields). It would be great if the driver had a [BsonAlias("OldValue")] attribute to avoid migration scripts (maybe I'll make a pull request).

Something I liked was that I could use object for the type of the _id property instead of BsonObjectId. This will keep the models less coupled to the Mongo driver API. Also, the driver already has a bi-directional alias for _id as Id. I don't know any C# developers who wouldn't squirm at creating a public property named _id.

This brings me to my biggest issue with the C# mongo driver. All properties must be public. This breaks the encapsulation and SRP principles. For instance, most of the time I have no reason to expose my Id (or _id) property as public. NHibernate solves this by hydrating protected fields. I would like this to be solved very soon (but there are some issues with this since there isn't any mappings).

Last, it has poor support for C# 4.0 types. Tuple doesn't fail, but it's serialized as an empty object ({ }). There is also zero support AFAIK for dynamic.

In conclusion, there's some room for improvement with Mongo's integration with .NET but overall I have to say I'm impressed. Supposedly Linq support is due out very soon, which will make it unstoppable (imo). Also, we haven't started using this in a full production environment yet, so there will most likely be more posts coming on this topic.

BDD ideas for structuring tests

2012-01-02T18:55:00.000-08:00

Lately I've been thinking a lot about the best way to do BDD in C#. So when I saw Phil Haack's post about structuring unit tests, I think I had a joyful thought. Earlier I had been thinking in terms of using my Behavioral NUnit experimental project to hash out Haack's structuring idea with better BDD integration.

In short, his idea is to use nested classes. There is the normal one-to-one class-to-test-class mapping, but each method under test gets it's own inner class. To use his example:

In this example the Titleify and Knightify methods (imo two terrible uses of the -ify suffix) have corresponding test classes dedicating to testing only one method. Each method in the class (or Fact, in the case of xUnit. I actually haven't used xUnit but it seems to encourage a somewhat BDD readability) test one aspect of the method, much like the it method is used in rspec.

I generally like Haack's test structure. For example, he points out how it plays nicely with Visual Studio's natural class/method navigation which makes the tests even more navigable. The only issue I have with it is that I dislike having 1000+ SLOC classes - tests or regular. If I were to adopt this method, I would probably break each of those inner classes into separate files (and use partial classes to break up the top class).

My practice for a long time was to have one whole namespace per class under test. Consider my tests for objectflow. I actually picked up this practice from Garfield Moore, objectflow's original developer. Each class (or significant concept) has a namespace (e.g. objectflow.stateful.tests.unit.PossibleTransitions or PossibleTransitionTests). Each class in that namespace is names according to essentially what the Setup does. Some examples: WhenGivenOnlyBranches, WhenGivenOnlyYields, etc.

I like the way these tests read. It's very easy to find a particular test or to read up on how a particular method is supposed to operate. But in practice this has led to very deep hierarchies, often with single class namespaces. Further, I find that creating a whole new class for each setup tends to create too much extra code. As a result, I have a hard time sticking closely to this practice.

More recently I've felt a little overwhelmed with my original practice so I've evolved it slightly. Now I've started doing the one-to-one class to test mapping like commonly practiced. But each test has it's own method that does setup. For instance

I also sometimes use this small variation of that structure where I keep the BDD sentence-style naming scheme but use TestCase attributes to quickly cover edge cases.

I often use some hybrid of the last two approaches, especially if I would be using a TestCase attribute that breaks the BDD readability, I'll break the setup code into one of those Given_* support setup methods and reuse it between two different test methods.

I generally like my most recent ways of structuring tests because of it's readability and ability to gain excellent edge case coverage by adding additional test cases. But I do really like Haack's structuring, so I may find myself adopting part of his suggestion and further evolving my tests.

As far as this applies to Behavioral NUnit, I want to explore the possibility of a Describe attribute that mimics the usage of rspec's describe method. One idea is to make the new attribute generate another hierarchical level of test cases

Can Bad Code Ruin Your Career?

2011-12-30T18:14:00.000-08:00

I started writing this post over a year ago. I was working at a large company where I was stuck in a mouse wheel - always running to keep up but never getting anywhere. The code I had to work with was downright terrible. This, among other things, prodded me into looking for another job. While I was starting my job search I was pondering this post and decided to not finish it because I wasn't sure if some prospective employer would hold it against me.

With that said...

I just finished reading through a messy Java file. It was the usual mess of a class with a 500 line god-method (similar to the god-object) and hundreds of counts of copy and pasted code. Besides the redundant code and lack of structure the coder also used nested loops through ArrayLists when they could have used a HashSet and didn't once use generic collections, using the un-type checked versions instead. After several hours of refactoring and renaming variables I finally got to a point where I could begin fixing the bug I was after. There were absolutely no unit tests - all this code was written inline with HTML in a JSP.

I spend so much time reading bad code that sometimes I wonder if I am beginning to specialize in hacks. Is it possible to read so much bad code that you forget what good code looks like? Humans are an especially adaptive species, and I think it's definitely possible that a great programmer can be forced to work in the muck so long that they forget what good code looks like.

I've seen several situations where good developers produced bad code. These situations are almost always a product of an environment where features are more important than bug fixes. These companies typically invest heavily in sales and neglect IT and development costs. Or sometimes the problem is just that product management knows nothing of software development.

The 5 stages of grief

A recent coworker likened our job of working with brittle, badly designed code to the 5 stages of grief. While we were uneasily laughing about it I silently decided that this was more realistic than I wanted to believe.

For instance, imagine starting a new job. In the interview process you were interviewed by intelligent, enthusiastic developers and were led to believe you were going to be working on cutting edge technologies - a dream right? When you actually get to the job you find out that the code is so backwardly complicated that its nearly impossible to touch anything without bringing the proverbial house of cards crashing down.

Grief Stage 1: Denial and Isolation

Obviously the code isn't the problem, you just weren't careful enough. They probably have specific guidelines and strategies that help them be more productive. It's probably just something wrong with me...

Grief Stage 2: Anger

Dammit! Who the hell even thinks of this crap? [more cursing...] Is this a god-object?? [hair gets thinner...]

Grief Stage 3: Bargaining

This is typically when you start plotting potential strategies to hide the ugliness of the code. Creativity and hopeful thoughts abound. Many IT managers will talk like they are very supportive of you at this stage.

Grief Stage 4: Depression

This is where the reality strikes that this stage is bad for the business plan because it involves spending less time on revenue-producing features. The IT managers that seemed so supportive now flip flop to the CEO's side and deny you the ability to cope with your problems

Grief Stage 5: Acceptance

There are only two outcomes of this stage. Either (1) you accept that you can never fix the code so you decide to move on to another job or (2) you accept that you can never fix the code so you give up on trying. This is what separates good coders from bad.

Conclusion

Again, I started this post over a year ago. I've seen a lot of bad code. At my most recent job I almost took the "give up on trying" path in the acceptance stage. Luckily we hired a great older developer who snapped me out of it. I just started my new job today, I think I will be much happier.

So can bad code ruin your career? My answer is a resounding YES! But it doesn't have to. Honestly, stage 5 can have better endings, but that inevitably requires understanding on behalf of management - a scarce resource.

Behavior Driven Development in C#

2011-12-28T09:51:00.000-08:00

I've been a fan of Test Driven Development since I worked in an XP shop. But every time the work starts getting bigger and more complex I always struggle to not get lost in the magnitudes of tests. I remember many early-on conversations with my elders about unit test naming conventions. The [method]_[input]_[output] convention starts to break down badly when your inputs become things like mocks, or if there ends up being more than 1 or 2 inputs; same with outputs.

When a coworker introduced me to BDD earlier this year, it really clicked and flowed naturally. The idea of writing tests so they read like sentences out of a book or spec seems like the answer to all my questions. The ruby rspec is beautiful:

The organization of the tests forces you to focus on the expectations of your test and highlight descriptive assertions. This is especially useful for complicated setups with lots of mocks, etc. I put as much of my setup code in one of those before :each blocks, so that way the assertions are limited to simple inputs and one or two observations about the outputs.

There's been a number of people in the .NET community that have attempted BDD but [imo] failed to grasp the simplicity. NBehave is a complete overhaul of unit testing that uses attributes like xUnit. As a result, NBehave doesn't really look at all like rspec - which really isn't a bad thing, necessarily. However, the thing I like about rspec is it's ability to describe things of arbitrary depth, which is handy when testing complex code:

This spec is able to describe possible modes that the object under test can be in (complex inputs). This is made possible by rspec's arbitrary nesting depth. This is definitely a language feature that is much harder to implement in C#.

My current approach to BDD in C# usually looks like

I think this is the simplest BDD layer I can slap on top of NUnit. And simple is important to me because (a) I do a lot of open source projects and I want to keep the barrier to entry for contributions low and (b) the people I work with tend to resist change. When people are resistant to change, it's hard to rationalize using something other than NUnit or introducing lots of nested lambdas.

NUnit remains the most popular unit testing framework and has excellent support with a GUI runner, console runner, and IDE integration with R#, TestDriven.NET, and others. Given all that support, I would really rather not abandon NUnit if possible.

FluentAssertions is a nice simple BDD layer on top of NUnit (or whatever you use). It doesn't change the structure of our spec above, but it does change the structure of our assertion to

This assertion is [imo] very clean and succinct. I like how it reads even clearer than NUnit's fluent syntax. Last weekend I was thinking about this and I decided to explore an idea to make a BDD extension to NUnit that is even clearer than FluentAssertions. The project, BehavioralNUnit for now, is hosted at github. The earliest goal for the project was simply to use operator overloading to make the assertions even more like rspec. For instance, I want to be make the previous assertion:

I was able to do this, but I realized that the C# compiler was insisting that this expression needed to be assigned to something, so I [haven't yet] added another concept somewhat analogous to "it" in rspec:

This is most similar to NSpec's approach by using an indexer instead of a method. This appeals to me because I sometimes find matching parentheses to be a pain (I guess I just like ruby & coffeescript). Then again, I don't like NSpec because it feels like it was written by one of those whining .NET developers that wishes dearly he could get a RoR job - it doesn't abide to .NET conventions at all.

I still have a ton of ideas to hash out with Behavioral NUnit. I'm convinced that BDD in C# can be simpler and more beautiful than it currently is. If you have input or ideas, please fork the repository & try out your ideas (pull requests are welcome).

Why I hate generated code

2011-12-26T10:30:00.000-08:00

If you've worked with me for any amount of time you'll soon figure out that I often profess that "I hate generated code". This position comes from years of experience with badly generated code. Let me explain.

The baby comes with a lot of bathwater

In the past year I had an experience with a generated data layer where CodeSmith was used to generate a table, 5 stored procedures, an entity class, a data source class, and a factory class for each entity that was generated. My task was to convert this code into NHibernate mappings.

The interesting thing about this work is how little of the generated code was actually being used. I'm sure, in the beginning, the developer's thoughts were along the lines "oh look at all this code I don't have to write manually :D". However, after some time, subsequent developer's thoughts were along the lines of "with all this dead code, it's hard to find real problems". It's funny how some exciting breakthroughs turn into headaches down the road. The table is always used, but some entities are created & read but never modified, others are only created during migrations and only read from during run time.

Code generators often produce code you don't need. Since all code requires maintenance, dead code is just a liability because it doesn't provide any benefit. I always delete dead code and commented out code (it'll live on in version control, no need to release it into production).

There are several professional developer communities that generate code as a way of life. Ruby on Rails comes prepackaged with scripts to generate models, views, and controllers in a single command. ASP.NET MVC will generate controllers and views with a couple clicks. And if you've ever used either of these frameworks, you'll probably find yourself deleting a lot of generated code.

The problem of transient code generation

The issue that I keep running into with my policy of hating code generation is that it's nearly impossible to be a professional software engineer and not generate code. The most fundamental problem is compilers. When you run a compiler over your source code, it generates some sort of machine readable code that is optimized for various goals like speed or debugging or different platform targets.

While I hate code generators, it's hard to argue how I could possibly hate compilers. They allow me to write code once and compile it several different ways and achieve different goals. Therefore, I have to introduce my first caveat - I don't hate all generated code, I only hate generated source code.

This problem of hating generated code is complicated further by the fact that NHibernate generates source code too. You don't ever check in the code that NHibernate generates because it's done at run time. The most obvious way NHibernate generates code is the SQL that is written in the background to query & perform DML operations. (For those questioning if SQL is source code, consider how SQL is compiled into an execution plan prior to execution). It's also hard to argue that I hate this kind of code generation because it doesn't suffer from the same problems of the CodeSmith generated code. It only generates code just-in-time meaning that it's only generated when needed, so there isn't any extra code generated.

Since NHibernate and compilers do code generation in a way that I like, I'm going to refine my statement to "I hate generated persistent code". This generally means, I still hate generated code when the resulting code sticks around long enough for a fellow developer to have to deal with it.

The thin line between good and bad code generation

When is generated code persistent and when is it transient? We already decided that code generation isn't so bad when it happens during of after the compilation process. But my statement is that I hate persistent code. There are other cases of code generators generating transient source code. One such example is in iSynaptic.Commons.

Since C# doesn't yet (and probably won't ever) include variadic templates or variadic generic types, writers of .NET API's often write some really redundant code to account for all combinations of generic methods or types. I know I've done it. This example uses a T4 template to produce a C# file with a *.generated.cs extension. The T4 template is executed on build but not ignored from version control.

I do like this approach because it takes a DRY approach to a redundant problem without much complication. Another thing I really like about this approach is that T4 templates are a standard part of Visual Studio and are executable from Mono as well. As such, they can be considered a free tool that is openly available (important for open source projects) and, more importantly, are executed as part of the build process.

Another thing I like about this approach is the usage of partial classes to separate the generated portion of the class from the non-generated portion. This minimizes the amount of code that is sheltered from refactoring tools (code inside the *.tt file).

The thing I hate about this particular iSynaptic.Commons example is that the generated file is included in version control. I think, perhaps, this is reduced to a small pet peeve of mine since the generated code isn't wasteful and is updated on every build. Still, I would like a mechanism to (a) have the file ignored from the IDE's perspective and (b) ignored from version control. I wouldn't want anyone to mistakenly edit the file when they should be editing the T4 template.

Summary

The end result of my thought is "I hate source code that is generated prior to the build process". I want to further say that I also hate generated code that is checked into version control, but this is a bit of a lesser point. However, code generation can be a useful tool; as seen in the cases of NHibernate and T4 templates. But even still, code generation should be used wisely and with care. Generating excess code can become a liability that detracts from the overall value of a product.

Defining Watergile

2011-12-01T20:59:00.001-08:00

At the place of my current employment we've had a layer of management placed above us that fervently preaches the mightiness of agile. This management devotes much lecture time into informing us the proper procedure of planning a product. First you gather requirements and architect the entire system and write detailed requirements documents - good enough that developers don't need to refine them any further and QA knows exactly what to test. When requirements are written for the entire system - 12-24 months in advance - then you begin coding. After you're done coding, QA begins to test.

To be clear, anyone reading the previous paragraph should be scratching their head and thinking to themself, "gee, that sounds a lot like waterfall". Well it is, hence the portmanteau watergile (we considered agilfall but it just doesn't roll off the tongue as well).

The trouble is, even though we coined the term just recently, this watergile thing is a frigging pandemic. Every time I crack open a fresh copy of SD Times there seems to be some guy telling you that you need to be measuring KSLOC and a billion other software metrics but at the same time claiming that agile is the only way. It wouldn't be so scary except that this is the source of direction for software development managers.

It's no wonder watergile is so widespread, IT managers are fed a constant stream of B.S. mixed messages. How could anyone make sense of any of it without dismissing most of it? The truth is, waterfall is hard and so is agile. Anything in between is just ad-hoc and setup to fail. If you are a development manager and reading this, find those tech magazines on the corner of your desk and show them to the recycling bin. They're worthless and distracting to progress.

The Pain and Glory of C

2011-11-06T21:35:00.000-08:00

I don't normally write much C code, but this past week I was fiddling around with it this past week to solve some programming puzzles. When I say C I mean straight C (without the ++ or #). Completely un-object-oriented; just structures, helper functions and malloc/free. It took me 3 days (a total of probably 9 hours) to write a fully functional 250-300 SLOC solution to a puzzle (complete with huge memory leaks). This all brings me to the burning question - who would ever want to write programs in C?

C++ has developed over the years. I recently looked at some of the enhancements in C++11 which include the auto keyword (like var in C#), better reference counting "smart pointers", lambdas and closures. Obviously, C++ is developing and progressing. C hasn't had a spec change since 1999, and even then it wasn't exactly dramatic. We still don't have any OO or reference counting pointers.

Have you ever tried interfacing with a library in C? It's very cumbersome. You have to read all the documentation and call the right my_library_object_*() functions at the right times. Everything is hands-on, nothing is left to imagination. You have to remember what memory you allocated so you can free it sometime later when you're sure you don't need it anymore (and then recursively free sub-structures and arrays).

I think anyone can see warts in C. But its easy to forget the simplistic beauty. I mean, there aren't many operators in C, and there's only one way to cast. I mean, sure, you still can't create & initialize a counter variable inline in a for-loop. But the complex syntax of C++ is scary in comparison with all it's member::accessors, template, 5-6 ways to cast a variable and a slew of gotchas. Sure, C has it's share of gotchas, but the language is so small that anyone who's spent any significant time programming C can list most of them out for you (probably not so true with C++).

So why not C#? Well, it's freaking slow!! Think about when people were converting their business apps from VB6 to C#. Sure the maintainability of the code improved by leaps and bounds, but almost everyone noticed the performance difference and wondered how the same program could be so slow.

Recently Microsoft unveiled some information to developers about the upcoming Windows 8 release and it's metro interface. One of the biggest surprises to developers is how hard Microsoft is trying to sell C/C++ and how C#/.NET is falling by the wayside. The driving factor is that Apple has snappy user interfaces and Windows Forms are known for being slow and boring. So Microsoft created a new WinRT UI toolkit for Windows 8 that intends to never block the UI thread. Operations that take longer than ~50ms should use Async code so that the UI can continue to feel responsive. (This sounds eerily similar to Node.JS but with a lot more code).

Obviously Microsoft wants developers to develop faster apps by going back to C/C++, maybe we should consider taking them seriously. But I think the more likely direction is development being done primarily in one of the common dynamic languages like Ruby/Python/Node.JS with certain code that needs speedup written as C modules. All of those general purpose scripting languages are written in C (not C++) and interface very well with C. I've seen lots of math-intensive Python libraries being composed partly of C code (some with increasing portions written in C). I could also see the popularity of Node.JS increase if it was applied to more than web/networking apps but also non-blocking UI. (After all, this is basically what WinRT is).

I don't know about you, but I'm going to be spending some time tuning up my C/C++ skills. History has been known to repeat, and I think it is now repeating yet again.

Occupy Wall Street Is Not Stupid

2011-10-31T21:33:00.000-07:00

Earlier today I was talking with someone today who exclaimed, "Occupy Wall Street, that's so stupid!". I then proceeded to explain to them that OWS is trying to say "hey, this capitalism thing isn't really working right now". It's not to say that capitalism never worked, it's just pointing out that there are some significant holes in it right now.

I believe that by now, most people (except some in Boulder) realize that communism has also failed. Now, communism didn't fail because God hates communists. It failed because it wasn't maximizing the total economic prosperity of all people. The people behind OWS have also realized [, I naively assume,] that capitalism in America is also no longer maximizing the total economic prosperity.

In America today you see thousands of families that incurred large amounts of debt to a disgustingly rich minority. This rich minority (an oligarchy) forced these families out of their homes and into slavery. You might recognize that this looks a lot like the economic system that capitalism replaced - feudalism.

OWS protesters are also crying out about the death grip that rich and powerful businesses have on our federal government. Some even claim that presidential elections are completely rigged (I probably wouldn't go that far). Either way, the government that our American forefathers created is completely absent and void from our current government. We've become so obsessed with being the most powerful country that we sacrificed the values and virtues that made us who we are.

The Occupy Wall Street movement is right, our system is broken. Yes, there are many broken systems out there, but that's not a reason to not change them. Protest is an important political mechanism that has been proven to work in the past. We need it to work now. The only problem I have with OWS is that it seems to be an incohesive jumble of complaints with no real answers. But I suppose that's where real change begins.

Quiet Time

2011-09-30T19:01:00.000-07:00

Recently, we instituted a "core hours" policy among our developers that essentially equates to 4 hours of quiet time every day. During the hours of 10-12 and 2-4 developers aren't allowed to interrupt each other, nor can QA, product managers, or anyone else in the office interrupt developers. If you need help on a problem you have to either work through it on your own or wait until after the quiet time.

The policy hasn't been in effect very long, but I've immediately noticed a significant jump in productivity. I would say I'm 1.5-2 times as productive now that I'm not getting interrupted every 15 minutes. I've also notice that I just plain enjoy coming to work more now.

When we were talking about instituting the policy some were worried that it would be a problem that you couldn't clear up issues and roadblocks immediately. In practice, however, I think it isn't too much to ask everyone to wait [up to] two hours to clear roadblocks. In fact, it ends up forcing developers to solve their own problems.

When I first started with this company I was isolated in a room by myself with entire days to myself. The isolation was too much; I often felt like I was being confined in a prison. Obviously I'm not advocating that total isolation is any kind of real solution. It's impractical to suggest that developers can complete their work successfully in total isolation. It takes a lot of dialog to produce quality software. But it's also impractical to suggest that they can get any work done when they're being pestered every 5-30 minutes.

I highly recommend some sort of quiet time in any work place. In my opinion, the benefits are definitely not limited to just software engineering either.

AutoMapper And Incompleteness

2011-09-15T21:19:00.000-07:00

This is part 2 of a series. Read part 1

Earlier I talked about the Law of Demeter and how view models help us better adhere to the Law of Demeter. I also briefly outlined how AutoMapper makes view models practical. While AutoMapper is a great tool, it isn't completely fulfilling. Let me explain

As I pointed out previously, some of the behaviors in AutoMapper make it feel incomplete. The first is that you can't map two view models to the same model and back.

A much bigger problem with AutoMapper is that view models can't extend models. I'm not sure why they decided to disallow this usage, but it causes a cascade of code duplication (very un-DRY). Take a look at these classes:

There are a few things wrong here. Age is a nullable int on the model but the view model has just an int. If a null slips through this could cause a crashing error. While AutoMapper has an AssertConfigurationIsValid method, it doesn't test for this sort of case. You'll have to make unit tests for this, luckily you can use NetLint to easily test for these sorts of flukes.

Another issue is the validation attributes. The facts that account codes look like CO11582 and that all accounts must have a name are descriptors of the domain (which the model is modelling). They aren't facts about the view (although they have to be expressed in the view), they are part of the model. Every time you create another AccountViewModelX derivative AutoMapper requires you to copy these attributes. This is a massive failure in the attempt to keep code DRY.

Another issue I have is when I'm creating a view model I'm not sure what properties need to be created. I usually have to split the window and copy properties from model to view model (this screams obscenities at the idea of DRY code).

One solution that I keep coming back to is to have view models extend models. For instance, see this implementation:

Here, you don't have to type out all those properties a second (or third) time. They're just available. You also won't make the mistake of marking Age as non-nullable or forget to copy the validation attributes. It's all done for you by the compiler - no need to write extra tests.

There are still some issues with this approach, and other approaches (such as encapsulation) that you can take. Perhaps there will be a part 3.

View Models, AutoMapper, and The Law of Demeter

2011-09-12T20:14:00.000-07:00

The Law of Demeter was created for the intent of simplifying object hierarchies and structures. Obviously it's not a blanket sort of law (doesn't seem to apply to DSL's or fluent interfaces). But it is handy to keep in mind when modelling a domain.

A classic example of a shortcomings of the Law of Demeter is name example: passing a model to a view that has a name object (Model.Name.First, Model.Name.Last, etc) versus passing a flattened view model (Model.FirstName, Model.LastName, etc). I think this is a great application of view models.

I like the idea of view models because they're a great way to express view-specific business logic. The FirstName/LastName is an example, but they're also great for holding data necessary to populate drop down lists and summary views. Beyond code, view models are also a good example of the .NET community's ability to innovate new solutions to old problems (akin to my thoughts about the ruby community)

Yes, But...

While I definitely understand the benefits of view models, I'm still trying to figure out the best way to use them. When first creating view models the urge is to write and populate them by hand. This quickly becomes very tiresome. Enter AutoMapper.

AutoMapper is an object-to-object mapper designed very specifically for flattening models into view models. It bases it's decisions on conventions and provides a fluent interface for the remaining anomalies. It is a savior for those writing view models by hand.

AutoMapper works only in one direction. You take an existing model and map and migrate the data into a view model. Going backwards; however, is another story. One big limitation of AutoMapper is that you can't map from two different source types to the same destination type. This makes it difficult or impossible to use AutoMapper to do bidirectional mappings (for instance, if you want to use AutoMapper when updating the model from FormCollection).

There is quite a bit more I want to say on this matter, which I will continue in a second part

Introducing comboEditable

2011-09-05T18:48:00.000-07:00

I'll admit, comboEditable is an extremely dry name for an open source project (I would have used something like Project Bierstadt but it's not really that descriptive). Like everything else I develop and share publicly, this came out of necessity.

In Windows there is a UI concept of an editable combo box. Basically you're given a drop down list of options and if you can't find the option you're looking for, you just type in another (see the demo if you're having trouble visualizing). This concept does not exist on the web or anywhere outside Windows applications. I assume that UX designers across the globe unanimously decided that an editable combo box is a UI kludge, but I still think it's a handy control.

It is an unintrusive jQuery plugin that uses the regular HTML DOM as input and transforms into an editable combo box (a text box, hidden field and several divs, if you're wondering). The unintrusive part means that if scripts are disabled, the user still gets a combo box, just not an editable combo box.

If you find yourself in need of an editable combo box, head over to the jQuery plugin page or download it at github. Also, take a look at the demo to see usage.

Parenthetical Thesis on Ruby.NET (or IronGem (or whatever the kids call it these days))

2011-08-29T21:24:00.000-07:00

Since college I've always been a huge fan of dynamic languages. I was really into Python for a long time and in the past year or so I've picked up Ruby. It's well known that the open source/dynamic language world has always looked down on the .NET/Java world as some sort of inferior. While having a conversation with a colleague about ruby versus .NET I stumbled on a conclusion.

Ruby has some great features like mixins, monkey patching, a REPL. I also love how blocks make closures such an accessible and natural way to program. Ruby makes easy things easy and hard things fun.

On the other hand, C# is one of the most beautiful typesafe languages (although F# is gaining favor with me). Linq and expression trees provide functionality that you literally cannot reproduce in dynamic languages (it requires knowledge of types, which dynamic languages theoretically shouldn't care about). With the crazy stuff that people are doing with expression trees (building SQL statements, mapping objects, selecting properties, etc) it makes it hard to say I'd rather be doing ruby.

While C# has some analogous ruby constructs (extension methods are kind of like a lesser form of monkey patching), it still suffers from some of the classical faults of static languages (there can be a lot of extra code just to deal with types and to play nicely with the compiler). At the same time, the compiler also writes tests for you (a contract states you will have these methods, yet in ruby you can't ever be completely sure they'll actually be there. Something that you'd have to write unit tests for in ruby).

The conclusion I came to was that, at this point in time, there really isn't a compelling reason why ruby is better than .NET or vice versa. Except for one thing - the communities. The ruby community is nearly too much fun. In Boulder, where I live, there are several companies that host regular hackfests. There are also annual ruby conventions where people get together, socialize, and share new ideas. In the .NET world we have some of those perks, but we're notoriously laiden with deadbeats. I can't tell you how many lame coworkers I've worked with that have little interest in improving themselves or the code they write. While in the Ruby world, they're not just interested in themselves or the code they write, but also in the community around them.

Despite all the debate, I'll probably keep my current job. I love the people I work with and I like participating in the .NET open source world (there really aren't any deadbeats in any sector of the open source world, by definition).

Launching personal website

2011-08-27T22:25:00.000-07:00

I spent some time today and solidified my personal website (http://tkellogg.github.com). I'm pretty excited about this website just because its a great demonstration of single page apps. Each of my main links doesn't actually take you to a different page - it uses a JavaScript routing engine (backbone) to load and display new content.

I do have some plans for the site, but there are so many more important things to deal with these days. But if I can get to them I want to start a picasa site and load images into the site using the gdata api (like how I load blog posts now) and also integrate with github to list out my repositories and activity.

Maybe Node isn't so bad

2011-08-08T20:18:00.000-07:00

I know in previous posts I bashed Node.js a bit. I've done some thinking about it and I was struck by a revelation. If you write a Node app that serves to a browser you can use the same code on client & server. That means you can use frameworks like Backbone to manage your business logic on both on the server and on the client inside a browser.

The implications for this are huge. I've toyed with the idea of using Backbone + ASP.NET MVC together for a while now but I kept tripping up on all that code duplication between Backbone models and C# models. Node could be what launches the browser into a universal rich client host (and yes, HTML5 will help too).

The other crazy idea I had about using node is that this means less languages to learn. Imagine if you wrote JavaScript intensive apps with Node and backed it up with couchbase on the DB end. You would have JavaScript in your view, Javascript for business logic and JavaScript in the DB. The learning curve for a new developer to become productive would be the smallest learning curve that IT has seen in decades, probably for all time. This could change the landscape of IT forever. It wouldn't be such a bad idea to build a development team around that concept.

Git is a platform

2011-07-27T21:08:00.000-07:00

This evening I stuck my head in at quickleft's hackfest downtown boulder. They gave a great intro to ruby & sinatra. Sinatra is mind-bendingly simple. It makes you wonder why you've been doing anything but sinatra.

Anyway, while I was playing around at the hackfest they introduced heroku, which is a cloud platform for ruby. Heroku uses git to let you manage your application's files on the server. Pushing a brand new repo creates a new domain name and sets up the infrastructure for your app. They built a very cool application on top of the git platform.

Github has been doing this for a while. I blogged earlier about github and the things they've done with git. The most public things include git as a blogging/wiki engine as well as a static website generator (github pages). You can also fork git-achievements and broadcast your mastery over git, like I did. Honestly, the things you can do with git are endless since it is, after all, nothing more than a versioning filesystem in user space.

I think this is the biggest thing that separates git from other version control systems. No one has done anything with SVN beyond simple pre or post-commit hook scripts. TFS has a lot of application infrastructure built around it, but it doesn't build on top of it's version control system. Neither does mecurial or bazaar, even though they are also distributed version control systems.

The git folks really focused on defining git as a standard rather than an application. By that I'm referring to how they defined objects, trees, packfiles, etc (see progit) instead of focusing on developing an application. For much of it's lifetime git was nothing but a hodgepodge of shell scripts and C libraries. Now days there are several varying implementations of git. The fact that git is so widely programatically accessible is making it insanely easy to leverage inside programs. I'm still waiting for a .NET app to do something big with git#...or maybe I could.

Semantic versioning

2011-07-10T17:05:00.000-07:00

I've seen some interesting software version sequences. Like Windows 3, 3.1, 3.11, 95, 95, ME, XP, Vista, 7. Or Oracle DBMS v5, v6, 7, 8, 8i, 9i, 10g , 11g (what does the g mean??). I've seen all sorts of version schemes to designate major versions, minor versions, patches, and other types of releases. (The worst ones are always when marketing gets involved).

Tom Preston-Werner formalized the major-minor-point release (X.X.X) scheme at semver.org. I highly recommend anyone who considers themselves a professional developer to read every word in the article at semver.org. The beauty of semantic versioning is that there isn't anything new or innovative about it at all. It's all what you already know to be true. All versions <1.0.0 are development versions. Once 1.0 hits, the public interface is solidified. If and only if you break backwards compatibility you have to increase the major version. Minor versions and point releases (1.X.0 and 1.0.X) are for various levels of new features and bug fixes.

When you release software labeled with semantic versions you make it easy for people to quickly asses how significant the release is (I might skip a point release and upgrade to minor releases, but I might avoid a major release due to the incompatibilities it might cause). It also forces the developers to exercise restraint in breaking compatibility with previous releases.

The trouble with semantic versions in the corporate world is that marketing always has ulterior motives. They want to release a major version to make the product feel alive; they want to downplay breaking changes to a minor version to keep customers; or they want to introduce new terms that mean nothing to the average user (XP for eXPerience, Vista because it sounds cool). Those names are great for development code-names but they detract from a buyer's experience (I use the term buyer loosely to mean any potential user) in determining compatibility between products.

In .NET assemblies, there are four segments supported with the AssemblyVersion and AssemblyFileVersion attributes (major, minor, build number, revision). This seems fine until you want to release alphas, betas and release candidates. The semantic version for a 1.0 beta release would be 1.0.0beta1 indicating that this is the first beta for the 1.0.0 release (you can use any string of alphabetical characters, not just beta). In a .NET assembly you do this as follows:

[assembly: AssemblyVersion("1.0.0")]
[assembly: AssemblyFileVersion("1.0.0.253")]
[assembly: AssemblyInformationalVersion("1.0.0beta1")]

The new attribute here is obviously AssemblyInformationalVersion, which is used to specify more arbitrary strings. It will show up in the Windows properties dialog as the assembly version (otherwise AssemblyVersion will be used). Also, the AssemblyFileVersion is used to indicate build numbers. So while working on the 1.0.0 release, we also have a continuous integration environment like Teamcity or Hudson building the code each night and incrementing the build version. However, continuous integration environments shouldn't need to have any impact on what you actually tag the version as.

As Tom says in the article, kinda sorta following the standard doesn't reap much benefit. But once we all start releasing software that conforms exactly to this standard, then users can more efficiently understand which two components are compatible and which aren't. I believe this applies to all software, not just software that supplies a public API.

Got a backbone?

2011-06-28T21:09:00.000-07:00

Earlier, I posted about those lame hipster developers, as I call them. Mainly, I just find it a little hard to believe that anyone can create a truly scalable JavaScript app using node.

Recently I stumbled into Backbone (or rather I kept on hearing about it and finally checked it out). Backbone is a bare bones MVC framework for JavaScript that is meant to help give your JavaScript apps structure without weighing them down. Also, more important, is that Backbone is by no means mutually exclusive with jQuery. Actually they compliment each other quite nicely.

Back to those hipster developers. I don't often like to admit that a badly dressed 20-year-old can be right, and I still won't go so far as saying node.js is really a presentable solution for anything on the server, but the fact that they're expanding the infrastructure around JavaScript is really pushing me to think about how I can evolve my own .NET work. For me, Backbone is where it starts.

An Answer to Uncontrollably Messy JavaScript
I've written a lot of pages with big long blocks of jQuery chains and anonymous functions. It's such a huge pain to maintain or refactor that I sometimes end up rewriting. Part of the problem is just simply that the code is messy. But even when I break it down into smaller nugget sized functions I still have a fist-full of spaghetti code that is prone to unchecked regressions. I definitely need to test my code but

Backbone lets you organize your code into Models, Views and Controllers and Collections. If you go all the way with Backbone, you're going to be creating pageless apps where you load the page the first time, and you never reload the page (like GMail). Everything is data fed to the page via JSON services. Controllers let you bind bookmarks to functions (i.e. when a link gets clicked where href="#!/inbox" the link gets routed to an inbox function and handled there). Views bind models to HTML. They also keep the models bound to the HTML, so when newer fresher data arrives, the models are rebound to the page where necessary.

By modularizing code according to the MVC pattern, unit testing becomes significantly easier. Most of your normal issues like mocking the DOM & XHR become less important because your code is broken into smaller pieces. Besides being easier to test, it's just plain easier to understand also.

When testing, if you do require mocking facilities, I've heard that SinonJS is excellent for all types of mocking, and comes with built in server & XHR mocks. Also, a coworker is pushing me towards Behavior Driven Development and so Jasmine is a natural winner for a test framework.

I've heard people stress that Backbone is for web applications, not web sites. But at the same time, I don't think you need to go completely single-page to use Backbone either. In .NET, I don't really want to go single-page because MVC provides so much. But some of my pages that involve several page states could be dramatically simplified with an MVC approach. At bare minimum, I want to be able to simplify and test my client-side logic.

Introducing NetLint

2011-06-26T16:34:00.000-07:00

Last week our QA guys wrote up a bug that one of our new pages wasn't working. After a little investigation I figured out it was just a JavaScript file that was inadvertently merged out of existence while resolving merge conflicts. We also had something like this happen where the app would run locally on developer boxes but would fail miserably when we deployed to the test environment.

I don't really like giving the QA guys an excuse to blemish my reputation with bug reports, so I threw together a little tool to prevent this from ever happening again. Enter NetLint...

NetLint processes Visual Studio project files (*.csproj, *.fsproj, etc) and compares files that exist in the project file and the files that actually exist on disk. So if a JavaScript file exists on disk but isn't in the project file, NetLint will throw an exception summarizing this and any other discrepancies.

I also setup NetLint with simple file globbing functionality, so all files under bin/ and obj/ are ignored by default (you can also do custom patterns). I run NetLint from a unit test, so whenever anyone resolves merge conflicts they will instantaneously know if they missed a file.

The future of NetLint will be a staging ground for testing conventions. I'm licensing it under the MIT license, so hopefully no one should have any reservations due to licensing. I also created a NuGet package to make it even easier to use