Skip to main content

How To Recognize a Bad Codebase

We learn to recognize a bad bit of code quickly as our code-fu grows. Arbitrary side-effects smell badly and crazy one-liners frustrate us. It becomes easier to identify what lines of a codebase you might want to clean up to improve the overall quality of the work.

There is a line between codebaess with bad code in them and bad codebases. When do we learn to recognize this and what are the signs that the problem is far reaching, not localized? A bad codebase is an expensive codebase. It is difficult to work with and difficult to collaborate with others on. Identifying what makes a codebase bad is key to knowing when, where, and why to improve it. Improving the overall code quality reduces the overall code cost. I'm thinking about software in economic terms these days, and I'm hoping we can turn the recession to our favor by pushing the mantra Bad Code is Expensive Code.

Costs of code come from three actions. Adding features costs, fixing bugs costs, and understanding costs. Adding features is an obvious source of code cost, and every time you want to expand a products abilities you're going to pay appropriately. Fixing bugs is both obvious and subtle. Where its obvious that you need to fix bugs you see, it can be very subtle when costs are added that you can't actually detect (more on this later). Understanding the code, to most minds, might be entire subtle and never obvious. New developers, existing developers moving to new areas, and users trying to understand the behavior emerging from the collection of code all need to understand these things and the most expensive to understand it the less likely they will.

I feel no need to expand on the cost of adding to a codebase. What will hit us are the subtle points. Bugs' cost explode against the subtle misunderstandings, leading to the conclusion that a lack of understanding the code is the single greatest source of increasing its cost. This is through the partial obvious needs to understand the code and the more subtle costs they add to being able to fix bugs, and even to properly expand the feature set. The problems manifest as the actual bugs in the software.

The sign of a bad codebase is a difficult to debug codebase.

Now we only need to know the causes of difficult debugging to know the signs of a bad codebase.

Does the codebase lack tests? No tests mean you can't be sure any change breaks more than you intended to fix. Locating the source of a problem is hugely expensive when you're manually verifying correctness, instead of via automated testing. There are fantastic techniques of binary debugging, narrowing a changeset range down to the extra change that introduced a bug. This is so expensive with manual testing that it might as well be impossible, while with tests its one of the greatest debugging tools you could ever have at your disposal: It can automatically tell you exactly what code caused your bug. It can debug for you, but only in a codebase that started out good.

Does the codebase lack documentation? If your understanding of the code comes mostly from trial and error or asking other developers, then you lack documentation or enough clear code to self-document. Every time you add a feature or fix a bug, you're debugging more than the code, but your understanding of how it functions. Clear code, concise comments, and good documentation let you focus on the breakage of the code, and not the breakage of your understanding of its design.

Does the codebase grow or shrink? We might think a growing codebase is a generally universally good sign, but its not so. A shrinking codebase can be a great sign. It means two things. Firstly, it means an increase in the quality when the amount of code reduces while maintaining or increasing the value (not to be confused with cost) of the code. For example, if you can make a function clearer but finding more concise ways of expressing the same ideas, you reduce how much code there is to understand to get the same job done. A shrinking codebase also tells you that the code is understandable enough to be refactored, which is a little deceptive. The better quality of your code, the easier it becomes to improve the quality even futher.

Take this as a three point test. How do your current projects score?

Comments

Anonymous said…
This comment has been removed by a blog administrator.

Popular posts from this blog

CARDIAC: The Cardboard Computer

I am just so excited about this. CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer.  One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC? It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on

Statement Functions

At a small suggestion in #python, I wrote up a simple module that allows the use of many python statements in places requiring statements. This post serves as the announcement and documentation. You can find the release here . The pattern is the statement's keyword appended with a single underscore, so the first, of course, is print_. The example writes 'some+text' to an IOString for a URL query string. This mostly follows what it seems the print function will be in py3k. print_("some", "text", outfile=query_iostring, sep="+", end="") An obvious second choice was to wrap if statements. They take a condition value, and expect a truth value or callback an an optional else value or callback. Values and callbacks are named if_true, cb_true, if_false, and cb_false. if_(raw_input("Continue?")=="Y", cb_true=play_game, cb_false=quit) Of course, often your else might be an error case, so raising an exception could be useful

How To Teach Software Development

How To Teach Software Development Introduction Developers Quality Control Motivation Execution Businesses Students Schools Education is broken. Education about software development is even more broken. It is a sad observation of the industry from my eyes. I come to see good developers from what should be great educations as survivors, more than anything. Do they get a headstart from their education or do they overcome it? This is the first part in a series on software education. I want to open a discussion here. Please comment if you have thoughts. Blog about it, yourself. Write about how you disagree with me. Write more if you don't. We have a troubled industry. We care enough to do something about it. We hark on the bad developers the way people used to point at freak shows, but we only hurt ourselves but not improving the situation. We have to deal with their bad code. We are the twenty percent and we can't talk to the eighty percent, by definition, so we need to impro