This blog has been moved to http://info.timkellogg.me/blog/

Monday, March 19, 2012

Abstract Data Layer Part 1: Object ID Types And Conventions

In February I went to the MongoDB conference in Boulder. That day was my first real taste of any sort of document oriented database. Since then I've played around with Mongo in C#, Node.JS and natively in the Mongo shell. Since then, I also can't help feeling overwhelmingly happy when thinking about how I can use Mongo for a project.

At Alteryx we're entering a project where we require some specific business needs. We require an extremely fast and scalable database, hence Mongo. But we also need to package our product for on-premise installations, which I hear requires that we also support certain SQL databases.

...I don't actually understand why enterprises insist on using SQL. I'm told that enterprise DBA's want control over everything, and they don't want to learn new products like MongoDB. To me, it seems that 3rd products that are bought would be exempt from DBA optimizations & other meddling. But I guess I wouldn't know what it takes to be an enterprise DBA, so I'll shut up about this now. Just my thoughts...

Since relational databases are a lot different than document oriented databases I decided to use NHibernate as an ORM since they've already figured out a lot of the hard problems. I chose NHibernate over Entity Framework mainly because I already know NHibernate, and I know that it has good support across many databases. Nothing against EF in particular.

I've been working on this for a week or so. I've gotten pretty deep into the details so I thought a blog post would be a good way to step out and think about what I've done and where I'm going. The design is mostly mine (of course, I stand on the backs of giants) and really just ties together robust frameworks.

Convention Based Object Model

In order to remain agnostic toward relational/document structure, I decided that there would have to be some basic assumptions or maxims. I like the idea of convention-based frameworks and I really think its the best way to go about building this kind of infrastructure. Also, conventions are a great way to enforce assumptions and keep things simple.

IDs Are Platform Dependent

It's not something I really thought about before this. In relational databases we'll often use an integer as the object ID. They're nice because they're small, simple, and sequential. However, Mongo assumes that you want to be extremely distributed. Dense sequential IDs (like int identity) run into all kinds of race conditions and collisions in distributed environments (unless you choose a master ID-assigner, which kind of ruins the point of being distributed).

MongoDB uses a very long (12 byte) semi-sequential number. It's semi-sequential in that every new ID is a bigger number than the IDs generated before it, but not necessarily just +1. Regardless, it's impractical to use regular integers in Mongo and also a little impractical to use long semi-sequential numbers in SQL.

As a result, I chose to use System.Object as the ID type for all identifiers. NHibernate can be configured to use objects as integers with native auto-increment after some tweaking. The Mongo C# driver also supports object IDs with client-side assignment.

Ideally, I would like to write some sort of IdType struct that contains an enumeration and object value (I'm thinking along the lines of a discriminated union here). This would help make IDs be more distinctive and easier to attach extension methods or additional APIs. I'd also like to make IDs protected by default (instead of public).

The Domain Object

I also created a root object for all persistent objects to derive from. This is a fairly common pattern, especially in frameworks where there is a lot of generic or meta-programming.


I had DomainObject implement an IDomainObject interface so that in all my meta-programming I can refer to IDomainObject. That way there shouldn't ever be a corner case where we can't or shouldn't descend from DomainObject but have to anyway (separate implementation from interface).


The User and Name objects are simple, as you can expect any NHibernate object model to look like. The idea is to keep them simple and keep business and data logic elsewhere.

Are You Interested?

From what I can tell, I think we're breaking ground on this project. It doesn't seem like too many people have tried to make a framework to support both relational and document data stores. Initially I was hesitant to support both relational and document stores. But I think there are some excellent side effects that I will outline in upcoming posts.

The content I've written about so far is only a small fraction of what it took to get this on it's feet. Someone once said that you should open source (almost) everything. So, if you (or anyone you know) would like to see the full uncensored code for this, let me know so I can start corporate conversations in that direction. 

3 comments:

  1. Tim, can you further explain why you would like to make your Id protected? What might make sense for you is to setup your Id to have a private backing field where it is only initialized in the constructor. This way whenever you initialize a User you are forced to also provide an Id. Once you have the private backing field, the NHibernate mappings can be setup to be Access Field which will let it know to map to the private backing field. Let me know if that makes sense or if that helps you out any.

    ReplyDelete
  2. I want the Id to be protected because it is an implementation detail that shouldn't be exposed outside the object. Like I was saying earlier, the type of the Id is dependent on which database you choose, and the fact that there even is an Id is also an implementation detail. For instance, Mongo doesn't require IDs for sub-documents.

    Also, if at a later point you decide to refactor a sub-document into it's own top-level document collection in Mongo, you have to add IDs to the new documents. I would consider this type of refactoring to usually be a performance tuning task (similar to creating indexes). So naturally it's a concern of the data layer, not the model or business logic.

    The trouble with actually making it protected is that so many frameworks expect the ID to be exposed. Probably because relational databases always expect you to have and ID, so many MVCs are designed with that maxim. We're using WCF, so we might actually be able to get away from that concept.

    ReplyDelete
  3. I see that you are inspired by the possibility of Mongo in the work. In the continuation of your article, and as an alternative example I want to draw your attention to this item due diligence data room

    ReplyDelete