How to manage “Chapter 11″
“Chapter 11″ is my slightly tongue-in-cheek term for the time in a project just after you declare technical debt “bankruptcy” and decide to do something about it.
Technical debt is that inevitable accumulation of items in your codebase that could be better. They slow you down, and often end up on a list somewhere as something to fix when you get around too it. Round tuits being in short supply, they often stay on that list for a long, long time.
Technical debt tends to increase the cost (in time, a.k.a money) to add each new feature or change to your system. Over time, this cost climbs, usually not steadily or regularly, but in an inexorable path that makes each new change a bit harder than it should be. Often this leads to taking shortcuts, applying the “broken windows” logic that if there’s already a mess in the code, what’s a bit more mess?
It’s often very subtle: you don’t realize you’re getting that much slower on every feature request, and you don’t realize you’re avoiding doing some features because the code in the area you need to change is “scary”, but it happens.
Point of no returns
In this context, I’m defining technical debt bankruptcy as the point where the value derived from adding new features or changes to an existing system exceeds the lifetime value of those features.
For instance, you’ve got this customer invoicing system that has devolved into a steaming morass, and you need to make a change to make it easier to send dunning letters, let’s say. The change, because of the mess, will cost you $10,000. You expect the system to be in production for another five years before it’s replaced, and you’re going to save $1,000 a year by adding the feature compared to typing up the letters manually. In this case, it’s nice and clear-cut: get a typewriter, you’re better off.
Unfortunately, in the real world it’s seldom that obvious: it’s hard to say what the cost of the change is until after you’ve done it, as estimating gets harder and harder in a messy system. You’ve generally got no real idea what a feature will save (or earn) you, and you probably don’t know how long the system will continue to be used. The only safe thing to do is to keep technical debt to a manageable minimum, so you can proceed without needing a crystal ball. This often doesn’t happen.
What this adds up to is that it’s not that clear when you’ve crossed the point of diminishing returns until you’re way past that point, and it’s painfully obvious.
When it does become clear you’ve passed that point of diminishing returns, what are the options?
Essentially there are two choices in most cases: Rewrite the system in question, or clean it up so it doesn’t cost a fortune to make every change. There’s less of a difference between these two than it might seem.
Here’s where the decision gets even harder: There are serious disadvantages in rewriting a system, especially if it takes a while and the old one is still in motion, with some changes still getting pushed through — you’re chasing a moving target. If you re-write with the same developers that made the mess in the new place, you’ll probably just have a shiny new mess, as bad as the old one, in the end.
Of course, once the replacement is in place we reap the benefits, assuming we build the new stuff with all of the proper techniques and eye to quality that we did not (presumably) do with the old one – otherwise we’ve just doomed ourselves to repeat history, and doubled our cost in the process.
No, let’s assume we know what we’re doing with the new system, and that it will be done the “right way”. How do we find time and resources to get the new system off the ground?
This is sometimes described as replacing the wings on the airplane while it’s still in flight. How do you do that? Very, very carefully
Another approach is to simply build a new airplane, fly it up alongside the old one, and transfer the passengers from one to the other. This is where we construct our new system and migrate the users and data from one to the other in an organized fashion. The disruption to the passengers is definitely greater than in the first case (where they may not even notice the new stuff happening), but it may be overall safer, as we’re not disconnecting bits from the old airplane mid-flight at all.
In both cases, a key factor is measuring the functionality of the existing system.
You can’t replace a thing unless you can measure a thing, and tell what it’s currently doing. Each feature must be sufficiently well understood to be able to say if it’s functionality can be adequately replaced by the new system, and it’s value must be understood to determine if it’s even necessary to replace it: maybe it’s better to just remove that feature, not implementing it in the new system.
Often in older systems you will find features that aren’t being used at all, and haven’t been in years. It would be a waste of developer dollars to replace these, but proving that they’re indeed not being used can sometimes be difficult. You can’t guess, you must measure.
For other features, it’s easy to see what they look like they’re doing, or even what they were intended to do. It’s not uncommon, though, for those features to be used in a way that was not originally foreseen, so don’t make assumptions. Measure – in this case, by communicating with users and ask them about the tasks they use features to perform, to derive a true business value. Then you can write a proper use case to re-build the feature in the new system with confidence.
So, if you’ve determine via a set of decoupled functional tests exactly what your existing system does, how do you go about injecting this re-write into your Agile process?
You might be tempted to write stories to replace each feature – e.g. “re-implement feature X”. In my opinion, this is a mistake, on several levels.
Firstly, it means that the replacement must be prioritized by your customer proxy and planned into a sprint (if you Scrum, or gotten into your backlog if you Kanban). The problem is that technical debt is not easily visible at the business level, so it will tend to be prioritized at “negative infinity” (as a colleague put it so succinctly), e.g. it’ll never get to be important enough to get done – therefore you go on pouring money into the black hole of the existing system.
Secondly, if we’re defining a new feature as a replacement, exact or otherwise, of an existing feature, then it has, by definition, no direct business value. You already have a feature to do that operation, you can’t (and won’t) justify another one that does the exact same thing. You can’t even write a story for it in the classic “as a… I want… so that” format, which is also a smell.
No, I’d consider this type of replacement of a feature, even if it’s a real re-write, as a refactor. This means it’s not a separate story, it’s handled as a part of a “real” story.
Let’s look at this scenario: You want to replace feature X with a new version. The next time any story touches feature X, you “build in” the time and effort required to re-write feature X into the estimate. Let’s say story Y was to add a new doohickie to feature X, and would cost 5 dev-bucks (whatever that is – some arbitrary unit of cost), assuming it’s applied to the existing feature X (not it’s rewritten replacement). If story Y is applied to the new feature X, it would only cost 2 dev-bucks. To replace feature X (without story Y) will cost 10 dev-bucks. Now you’ve got a delta of 5 dev-bucks compared to just doing story Y on the existing feature X, but for only 5 dev-bucks more, you get the new feature X, then for another 2, you get story Y.
You estimate 12 dev-bucks to do story Y, and the first sub-task in story Y is “re-write feature X”. This assumes you’ve already got functional tests around feature X – if you don’t, then you need to build that cost in as well.
Ouch, you say? Well, yes, there’s some expense here for sure. The good news is that the next time you touch feature X, you start to go the other way – it takes way less time to make your change than if you hadn’t spent the 12 dev-bucks now.
Now, this is, of course, to state the blindingly obvious, not as easy as it sounds. We’re assuming here that you can re-write feature X in isolation, or that you can even test it in isolation from your functional tests. This is where a system designed for modularity and easy testing is really handy, and those are not two attributes of many legacy systems. You may be required to a a fair bit of refactoring just to decouple your existing features sufficiently to get to this point.
Nobody said declaring technical debt bankruptcy was easy or fun, after all, but in some cases it may be the responsible and professional thing to do.
The hard work starts when you consider how to recover, whether it’s a clean re-write, or a hybrid refactor as we describe here.