Tuesday, September 10, 2013

“New Rules” for library developers

I love Bill Maher’s show, “Real time” on HBO, and in particular, am a big fan of the section “New Rules” where Bill Maher sets what are typically a new (and unusual) set of rules for his favourite targets. (This is one of my favourites, and there are more here.)

Of late, I’ve been thinking about announcing a set of “New rules” myself – but for library developers. Here are a few I can think of right away:

1. You will NOT accept a base class type as parameter to a function unless you mean it.

This is one of the sad legacies of Java. You have a class that you think might have multiple implementations, and so you create a class hierarchy. You then want to use this class in some function, and in conformance to traditional OO principles, you accept a base class (or interface) object as parameter. But you know that there are multiple incompatible implementations of the interface, so in your code, you specialize by adding a runtime check on the type of the object passed. Days go by, and an intrepid developer looks at your interface, and develops a class that implements it, and a third developer comes in and invokes your function using this object as parameter. The interface allows it, the compiler allows it, but voila, he runs the program and everything crashes!

Think this is a fairy tale? A classic .NET example of this is LINQ, which in theory takes a DbConnection parameter, but is really expecting a SqlConnection object (that establishes  SQL Server connections). So, it does a runtime check on the type of the class, which I’m quite sure is something like:

if ( connection.GetType() != typeof(SqlConnection) )

Enter the Microsoft Enterprise Library, and its class: ReliableSqlConnection, which establishes a reliable connection to a SQL Azure database. Create a connection of this class, and pass it to LINQ, and LINQ immediately throws an exception, even though this is a SQL Server connection, and LINQ supports SQL Azure databases!

2. You will not have hidden dependencies between your types

A class should be a single, cohesive unit of abstraction. Everything a class method needs should either be created in the constructor or in a static constructor, or must be taken as a parameter.

Consider the RetryManager and RetryPolicy classes in the Enterprise library. You use a RetryPolicy to specify how many times an operation should be retried. However, what is unknown is that the RetryPolicy class expects the RetryManager class to be initialized earlier. Now, my question is, if that is the case, why not do it in the static constructor of RetryPolicy? Or why not have the RetryManager create instances of RetryPolicy?

3. You will NOT generate user documentation exclusively from code comments.

This is another tragic consequence of Java’s influence on the programming world. When introduced, Javadocs were a boon to programmers – you wrote all your documentation as comments and there was a tool that automatically generated formatted HTML documentation, freeing developers from mucking around with Word and repeating everything one wrote in their code comments.

An unfortunate consequence of this is what I call “instant documentation”. Normally, when developers document functions, they are focused on documenting the behaviour of the function, and not the big picture or the 10,000m view. Extrapolate this to all functions and classes in the code and the entire documentation becomes about the trees in the forest, while excluding the forest itself. Users of the API are now unaware of why a certain structure/behavior exists, they are unaware of the interdependencies between the API components (except when bitten badly by them) and are unaware of the thought process behind the behavior, all of which are lost in the minds of the creators.

Sadly, there is now no incentive to write classic documentation like that of early versions of MFC, or like that in the *-nix man pages. No engineer gets to the Principal level by writing good documentation. What is worse is that even books are now being written this way, essentially printing copies of generated documentation. Which is a shame because this reduces developers to search-engine users rather than system-aware engineers. It prevents transfer of useful insights from those who build these APIs to those who use them.

4. Your test cases shall be your sample code

I have been frustrated time and again, by the sparseness and triviality of the sample code that accompany libraries. Why not supply test code written to test the API as samples? At least in theory, test code should cover corner cases, combinations of functions, and other conditions that would interest  users of the function/class.

5. You will make simple things easy, complex things possible

This is probably the golden rule of API development. The API must be easy to use for the most common scenarios. Most things must be accomplished with one or a few function calls. At the same time, users who have complex requirement should be able to meet them, maybe by writing complex code.

6. You will NOT write incomplete APIs

Completeness of an API is of course, a relative measure. It is easy, in the name of completeness, to fall down the slope of adding too much functionality. For instance, adding an encrpyt() method to a String class! On the other hand, today’s agile API writers fall into the YAGNI trap, by not adding necessary functionality – like a cache that does not have a clear() method!  APIs must have the full set of methods that a typical user might consider necessary.

7. You will avoid the “Kingdom of nouns”

APIs are no different from books. Have too many characters in your book, and the resulting mental overload will turn off most readers (with honourable exceptions of George R R Martin and Tom Clancy). So, think about every word you use for your class and function names. Class names are concepts – use as few of them as necessary. Use standard verb-ese for function names – like store/retrieve instead of AddToStore and GetFromStore – the simpler names have the full meaning of the longer ones with none of the conceptual overload. And never, never overload the meaning of a common verb to do something other than what its common usage suggests.

8. Your code shall be idempotent

Idempotency is probably the most important property in today’s multi-threaded, distributed environments. If your API depends on state, it is inherently unsuitable for a world where computations fail and are retried!

Before I conclude, apologies for the lack of examples, the poor organization and the length of the post. I’ve been away from writing for too long now Sad smile .

Needless to say, even though I present examples from the Microsoft world, these problems are not exclusive to it.