Skip to main content

How To Own Your Mistakes

Today was a very troubling and frustrating day for both myself and one of my best clients. This is my declaration of ownership for the my own failure to make today not happen. The short story is right after declaring the "make the site more stable" milestone complete and shipping out an invoice, the site spent its most unstable day ever being frantically put on stilts and duct taped to the wall by myself. For the long version, read on.

I had already spent roughly a week and a half working on an impromptu milestone in the project to increase the reliability and stability of the site, as well as beinggreenlit to apply hours to better build, test, and deployment processes. This is a good thing and it still stands as such. Now, the site wasn't fragile before, but a couple incidences understandably gave concern about long term quality. We had a few instances of corrupt MySQL logs, ran out of space on ourEBS volume, and embarrassingly I've had occasion to deploy code and find bugs, even a broken page, even testing locally and trying to be careful. The choice to spend time specifically on a better foundation was a good one.

This isn't about that time I spent, but another post may be.

Thursday we flipped the switch to the new system, running all new instances on EC2, migrated to Postresql, and with a whole new deployment process that includes spawning a new "staging" instance that clones our production web server and lets us test new versions before rolling it out to the public. Everything looked good, I spent some time correcting a couple hiccups, and at the end of the day when things had been running and seemed stable and golden, I declared the milestone complete (and in this arrangement, that means invoicing for a payment, so its not just an ego issue).

I woke up the next morning to find the site had been down for a few hours. It was unavailable about a dozen times throughout the rest of the day, and I clocked about 7.5 hours today getting everything in line. It has been running for longer than that now, without problem, and we seem to be in the clear.

Situations like this require us to look inward and ask what we could have done differently to avoid the escalation of a problem into a crisis, and I've spent much of today, while working on the issues and afterwards, trying to understand this. Much of what I can do now is speculation. While there are many things I could have or should have done, there are few of them that I can know for a certainty would have been "the" things to make a difference.

Priorities are one area I can be confident in believing able to avoid what happened today. A service should not run without thorough watchdogs. Websites should be given realistic traffic test exposures. I can test my code and comment it well, but the upfront work needs to be in place to ensure that my new code is actually servicing requests.

Can you always make these claims?
  • Our site's resources are tested automatically and report broken pages and other issues to us
  • We can test our production environment before it is actually production for new code
  • If something goes wrong, our server processes are restarted and we are informed, before the users know and even if they never know
I know, from now on, I will.

Comments

Popular posts from this blog

Why I Switched From Git to Microsoft OneDrive

I made the unexpected move with a string of recent projects to drop Git to sync between my different computers in favor of OneDrive, the file sync offering from Microsoft. Its like Dropbox, but "enterprise."

Feeling a little ashamed at what I previously would have scoffed at should I hear of it from another developer, I felt a little write up of the why and the experience could be a good idea. Now, I should emphasize that I'm not dropping Git for all my projects, just specific kinds of projects. I've been making this change in habit for projects that are just for me, not shared with anyone else. It has been especially helpful in projects I work on sporadically. More on why a little later.

So, what drove me away from Git, exactly?

On the smallest projects, like game jam hacks, I just wanted to code. I didn't want to think about revisions and commit messages. I didn't need branching or merges. I didn't even need to rollback to another version, ever. I just …

CARDIAC: The Cardboard Computer

I am just so excited about this.


CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer. 
One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC?

It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on. The only re…

On Pruning Your Passions

We live in a hobby-rich world. There is no shortage of pastimes to grow a passion for. There is a shortage of one thing: time to indulge those passions. If you're someone who pours your heart into that one thing that makes your life worthwhile, that's a great deal. But, what if you've got no shortage of interests that draw your attention and you realize you will never have the time for all of them?

If I look at all the things I'd love to do with my life as a rose bush I'm tending, I realize that careful pruning is essential for the best outcome. This is a hard lesson to learn, because it can mean cutting beautiful flowers and watching the petals fall to the ground to wither. It has to be done.

I have a full time job that takes a lot of my mental energy. I have a wife and a son and family time is very important in my house. I try to read more, and I want to keep up with new developments in my career, and I'm trying to make time for simple, intentional relaxing t…