This blog has been moved to http://info.timkellogg.me/blog/

Monday, December 26, 2011

Why I hate generated code

If you've worked with me for any amount of time you'll soon figure out that I often profess that "I hate generated code". This position comes from years of experience with badly generated code. Let me explain.

The baby comes with a lot of bathwater

In the past year I had an experience with a generated data layer where CodeSmith was used to generate a table, 5 stored procedures, an entity class, a data source class, and a factory class for each entity that was generated. My task was to convert this code into NHibernate mappings.

The interesting thing about this work is how little of the generated code was actually being used. I'm sure, in the beginning, the developer's thoughts were along the lines "oh look at all this code I don't have to write manually :D". However, after some time, subsequent developer's thoughts were along the lines of "with all this dead code, it's hard to find real problems". It's funny how some exciting breakthroughs turn into headaches down the road. The table is always used, but some entities are created & read but never modified, others are only created during migrations and only read from during run time.

Code generators often produce code you don't need. Since all code requires maintenance, dead code is just a liability because it doesn't provide any benefit. I always delete dead code and commented out code (it'll live on in version control, no need to release it into production).

There are several professional developer communities that generate code as a way of life. Ruby on Rails comes prepackaged with scripts to generate models, views, and controllers in a single command. ASP.NET MVC will generate controllers and views with a couple clicks. And if you've ever used either of these frameworks, you'll probably find yourself deleting a lot of generated code.

The problem of transient code generation

The issue that I keep running into with my policy of hating code generation is that it's nearly impossible to be a professional software engineer and not generate code. The most fundamental problem is compilers. When you run a compiler over your source code, it generates some sort of machine readable code that is optimized for various goals like speed or debugging or different platform targets.

While I hate code generators, it's hard to argue how I could possibly hate compilers. They allow me to write code once and compile it several different ways and achieve different goals. Therefore, I have to introduce my first caveat - I don't hate all generated code, I only hate generated source code.

This problem of hating generated code is complicated further by the fact that NHibernate generates source code too. You don't ever check in the code that NHibernate generates because it's done at run time. The most obvious way NHibernate generates code is the SQL that is written in the background to query & perform DML operations. (For those questioning if SQL is source code, consider how SQL is compiled into an execution plan prior to execution). It's also hard to argue that I hate this kind of code generation because it doesn't suffer from the same problems of the CodeSmith generated code. It only generates code just-in-time meaning that it's only generated when needed, so there isn't any extra code generated.

Since NHibernate and compilers do code generation in a way that I like, I'm going to refine my statement to "I hate generated persistent code". This generally means, I still hate generated code when the resulting code sticks around long enough for a fellow developer to have to deal with it.

The thin line between good and bad code generation

When is generated code persistent and when is it transient? We already decided that code generation isn't so bad when it happens during of after the compilation process. But my statement is that I hate persistent code. There are other cases of code generators generating transient source code. One such example is in iSynaptic.Commons.

Since C# doesn't yet (and probably won't ever) include variadic templates or variadic generic types, writers of .NET API's often write some really redundant code to account for all combinations of generic methods or types. I know I've done it. This example uses a T4 template to produce a C# file with a *.generated.cs extension. The T4 template is executed on build but not ignored from version control.

I do like this approach because it takes a DRY approach to a redundant problem without much complication. Another thing I really like about this approach is that T4 templates are a standard part of Visual Studio and are executable from Mono as well. As such, they can be considered a free tool that is openly available (important for open source projects) and, more importantly, are executed as part of the build process.

Another thing I like about this approach is the usage of partial classes to separate the generated portion of the class from the non-generated portion. This minimizes the amount of code that is sheltered from refactoring tools (code inside the *.tt file).

The thing I hate about this particular iSynaptic.Commons example is that the generated file is included in version control. I think, perhaps, this is reduced to a small pet peeve of mine since the generated code isn't wasteful and is updated on every build. Still, I would like a mechanism to (a) have the file ignored from the IDE's perspective and (b) ignored from version control. I wouldn't want anyone to mistakenly edit the file when they should be editing the T4 template.

Summary

The end result of my thought is "I hate source code that is generated prior to the build process". I want to further say that I also hate generated code that is checked into version control, but this is a bit of a lesser point. However, code generation can be a useful tool; as seen in the cases of NHibernate and T4 templates. But even still, code generation should be used wisely and with care. Generating excess code can become a liability that detracts from the overall value of a product.

No comments:

Post a Comment