Sunday, February 25, 2007

Minimal Working Examples: How to, Why, and Who cares

When you have a problem and you rush to colleagues, or strangers on IRC and mailing lists, you've got to present a problem they'll want to help you fix, and with all the information they need to fix it. You can't give them information they dont need, because any extra work filing through your unrelated code is going to reduce the chances anyone will put in the time to help you.

We can state a few rules about seeking help with code.
  1. Ask the question clearly and don't be ambiguous about your intentions and requirements.
  2. If you need to include code, it needs to include all important context.
  3. Present the problem without reference to out-of-context issues.
Don't come in with a link to your entire body of code telling us it doesn't work. What doesn't work is asking for help like that. Besides telling us exactly how things don't work, and what they are doing compared to what you are expecting, you need to give us code that specifically and only demonstrates the problem directly at hand. This is our golden "Minimal Working Example", where "working" means that it works just enough to show us how its broken. You need to reproduce the situation causing your code to break, without showing us the environment your code is in when it breaks. That means taking code segments out of their modules and even out of the insides of functions, and surrounding them with just enough jumpstart to fail the same way it did in the original code.

Before you even get around to asking your question, you might solve it simply by isolating the problem into your example. When you remove the problem code from everything else, you can remove the distractions of everything else going on around it. You might remove another part of the code to reduce things to the minimal example, and suddenly find the problem gone, identifying the removed code as the source of your problems. If you think isolating test cases sounds familiar, then you know enough that I shouldn't have to tell you these minimal working examples should already exist in the form of your unit test suites. When something goes wrong, you should have already had a test to catch it, and added one if you didn't. If the problem can be isolated now, keep it isolated for later.

Remember what is important to your problem. If you can't figure out some particular pysqlite2 issue, and you're working with data your extracted from XML files grabbed from a remote server, you can bet the XML, HTTP, and all the logic to process it is not worth your time to show anyone. Your example only needs to show the data you have to push through SQL, and no one should need to see where its come from. If your components are more tightly woven, and separating them isn't possible or is even moderately difficult, then you have a serious design flaw and extracting the problem example has revealed away to clear up your code and likely solve many latent problems, all at once.

Once proper testing, documentation, and isolation have let you up the creek without a paddle, thats where community support comes in. Come to us with the example that tells us right away what the problem is, what its doing, and the obvious thing you think it should have done, instead. We can all run this code and approach it from the same direction as yourself, so we know exactly what your problem is and where to approach the solution.

Friday, February 23, 2007

Standard Gems: SimpleHTTPServer.test()

Have you ever needed to share some files real quick with real low setup? You can start up a web server from the current directory on port 8000 with a single line of python.

python -c "from SimpleHTTPServer import test; test()"

Standard Gems: colorsys

Lots of us do color work. Maybe you're writing a full-fledged Photoshop killer, or you just want a utility to do some bulk conversions or color scheme generations. Well, in any case, you might find the often-forgotten colorsys module handy.

You can easily convert between different color component systems with these included functions:

hls_to_rgb(h, l, s)
hsv_to_rgb(h, s, v)
rgb_to_hls(r, g, b)
rgb_to_hsv(r, g, b)
rgb_to_yiq(r, g, b)
yiq_to_rgb(y, i, q)

Wednesday, February 21, 2007

Book Review: Practices of an Agile Developer

Software development is widely known as having the largest gap of any discipline between the best known practices and the actual practices utilized in the field. The most depressing part of that statement is how inaccurate the qualifier of "widely known" probably is, because if the gap was as well and widely known as it should be, the gap would be shrinking much faster than it is. Is the gap even shrinking at all?

One of the things on the good end of this gap is the collective practices known as Agile Development. The creation of software is fluid and must be adaptable at a moments notice, defined as it is created, and grown rather than built, says the mantras of the agile development proponents. They contrast to the traditional and aging mentality of designing everything before writing anything, following a walled path to the goal, and building monoliths of code. The differences between the old side and the new side are great, and this is the gap we need to wade across to solve the ever growing problem of software quality nightmares.

Practices of an Agile Developer does a fantastic job of taking you across that gap, step by step, and bestowing the benefits of engineering yoga. First to last page is packed with excellent tips, techniques, proven methodologies, and stories of horror and salvation by the blessing of a better way to create software.

There is a lot in this book that I was not aware of and a lot I was well aware of, and even some things that I tried actively to practice. I didn't know how much of the practices I used were part of the "agile" moniker, but they worked. Because I knew a lot of my favorite practices had something to do with this strange movement I kept getting glimpses of, I picked up a copy of the book at Borders and decided to fill in the blanks between what I knew.

I found most of the book to be a reinforcement of things, but there was a sizable number of completely new thoughts to ponder. Iterative development, unit testing, small and non-breaking changes, and utilizing the tools like bug trackers and source control, were all related to my real world experience, which made them even more mentally valid.

For anyone who can program and wants to become a better developer, this is a great volume to pick up. At 178 pages before the appendixes, you can run through and absorb the content quickly and begin putting it to good use. I've seen improvement in my own work, and it takes will power to put the things you learn to common use by yourself, but its worth the effort. If you have heard of agile development, learned some things about it, or know nothing at all of the issues, you will find this a good read if you want to be better at what we love to do: create good software.

Monday, February 19, 2007

RTFM Not Just a Disgruntled Reply

Are you or have you been new to something technical? Of course. Have you asked a question when you were lost? Have you been told, by those who you trusted to enlighten your path, "RTFM!"? Well, you are not alone, and if you felt you got a raw deal, you are not alone. However, you are wrong. "RTFM" is a perfectly valid and, despite the opinion of many, very good advice in your time of need, indeed.

The camp of the knowledge seekers is seperated into two groups, with the line between them varying depending on the context. The first and largest group is the active knowledge seeker, who is after some bit of information. The second and smaller group are those who have that information. The seeking group has two options to get what they need: utilizing known resources, such as books and articles and tutorials; or, asking those who have previously sought and found, and can give them the information they seek quickly, without wading through entire volumes of documentation.

The knowledge holders are becoming personal googles.

When you turn a sage into a personal google, you injure the spirit of both the knowledgeable and the Google. It is insulting to someone who takes time of their day, away from their job and family, volunteering for your sake, because they would prefer actually interesting questions and if you can read it in "The 'Freaking' Manual", then its not so intersting a problem to solve. When you are after such trivial issues, you have a perfect opportunity to use the wonder free service offered to you by the many choices of search engine. By going to the knowledged with small questions, you waste their time and misuse the technology they enjoy, which doesn't do anything but discourage their volunteering of their time until you actually need their help, and they are gone, and Google has reduced in its usefulness because you finally buckled down and RTFM.

RTFM now, so you still have someone to help you later.

Pure Functions in the Python Standard Library?

I am toying with a small package for working with and developing pure functions in Python. An important part of this is allowing the pure functions you write to use as much of the standard library as possible, while remaining pure. What functions in the standard library could you call pure? Builtin types, the entire math library, the itertools module, etc. are on my list, but I don't want to miss things. Any suggestions?

So far I have a decorator that ensures a function A) only takes immutables and known-pure functions as arguments, and B) ensures any global names used within the function are pure functions. It is primitive, but a start. I hope to take this further with some actual uses, such as create working processes locally and remotely and pushing pure functions to them. This also has interesting potential for optimizing large computations, like executing functions before they are called when the potential arguments are known.

Pure functions are interesting and might be a nice addition to our Python toolkits.

Sunday, February 18, 2007

PyCon Recordings?

Is anyone planning on handling the recording and posting of talks at PyCon '08? I can't make it, yet again, and I really hope the talks can be available to the non-attenders this year. YouTube would take them kindly, as well (wink wink).

Saturday, February 17, 2007

Standard Gems: shlex.split()

str.split() is so well known, but a simple step beyond leaves a lot of pythonistas lost: how do you split without breaking up embedded strings? How do you split "1 '2 3' 4" into ['1', '2 3', '4']? Why, shlex.split("1 '2 3' 4"), of course! The shlex module is a lexical analyzer and includes this little useful utility for us.

Standard Gems: calendar.month_name

This is part of a new series I want to keep up with. There are a lot of hidden gems in the Python standard library, which gets larger all the time. As the number of packages and modules grow, and the size of those grow themselves, it becomes harder and harder for all of us to keep everything in mind all the time. There are large parts of the standard library I have never used or even looked at once, because its never been needed by anything I have done. This means that when I do have a need for these things, I don't know they exist. Perhaps one of the greatest reasons for reinventing the wheel is simply ignorance of the wheel existing in the first place! I see the same problem in others all the time. This series, "Standard Gems", is an attempt to get things out there that some people maybe have not seen or known of, and will later find useful when the need sparks memory of the gem.

If you have any suggestions for gems, please drop me a line!


Ever needed to get the real name, even localized, of a month by its number? 3 is "March" and 8 is "August", etc. Well, calendar.month_name is a psuedo-sequence that gives just what you need! Try it out the next time you need to display some date information.

Note: this is sequence-like, but it indexes from 1 to 12, so dont try 0 for January. This is moderately misleading, especially when it raises IndexError on a bad number, rather than a KeyError.

Monday, February 12, 2007

Extreme Code - Version Control For One File

You should start using version control systems when you only have a single file.

You don't need a version control system until you have a large project and lots of collaborators.

Two statements I've heard from individuals of great caliber, and it carried weight coming from individuals on both sides. Version control is important, and no developer worth their weight in USB keys will tell you to never user a VCS, but exactly when and where and why you should use them is not so agreed upon. The differences in opinion come from cultural, experience, and industry reasons. Many members of the free software community seem more exposed and ready to use CVS or Subversion, where version control is almost impossible to avoid when you need to collaborate across the world. Some individuals have horror stories about projects without version control, and others have never seen practical benefit in their time as developers. Yet, some simply work in areas of the industry that have less call, or less cultural motivation. Web designers rarely version control HTML and stylesheets, and game developers have a short code life cycle to begin with. Can we find the happy middle ground between always using version control and avoiding it like a ten ton plague? If we look at it from the two extremes, we just might.

Extremely Using Version Control

You should create the repository to track and house your files before you even create the first file. There isn't any reason to wait, and you'll see the benefits immediately. The extra time spent creating the repository and tracking changesets when you only have a handful of files is still worth the benefit, because the benefit is not reduced in proportion to the number of files or the number of developers. You can have a small, single file and a single developer, and you'll still be better off with a VCS than without, I guarantee it!

The use of a repository isn't a passive act. You can't just record a changeset when you feel like it, and treat the repository like a second-class tool. Committing is not what you do when the code seems OK, its something you plan ahead for and plan your work around. From an untouched state, you decide your plan of attack on a problem and work on that, and only that. When you record the changes, you need to be able to attribute them to a particular goal, to have the repository in a valid state with a logical progression. Before the patch, some bug existed, and after the patch, this bug was resolved. This is clear, logical, linear, and how the repository helps both remind you of the past and guide you to the future.

As a single developer, you have just as much reason to make full use of version control as a team of hundreds of developers. The structure it gives you will lend well to your progression and not becoming lost in a sea of changing code. If among a hundred programmers, a mistake can be made and rollbacks or history needed, then a single programmer without peer review will obviously have their own share of problems with the code written months ago. When you know you have a large number of changes to make, and you are unsure of the progression, the freedom of working in a controlled branch will give you the peace of mind to just have at the code and swing it into gear.

There might be larger benefits as the number of files and the number of developers increase, but version control is not something you should wait for. It is an essential component of any project, and should be employed from day one as part of your regular infrastructure of development. Collaboration is not the only, or even the best thing these systems have to offer. We've talked about the branching, structuring, and good habits that use of version control systems and regiments bestows on a developer, and they apply to each of us individually before as a group.

Avoid Using Version Control

Developing is hard work and when you're just getting started with something or working on your own, you don't need any added distractions. Bringing in tools like a version control system is just asking for trouble when you have all the big issues to deal with in a new project. You're so constantly changing your code and your structure that putting something between you and your fresh code is asking for nothing but trouble. You can also import the code into a repository down the road, when things get complex, more people are getting involved, or just when it feels right. The VCS is just one more thing to worry about and you have enough of those to handle all on your own with a brand new baby project.

New developers, especially, should stay away from this kind of non-development detail. Version control might be important, but its not a detail the developer should have to care about, and even though you can't avoid that forever, when you're just getting your feet wet with the world of code, you need to expose yourself to a limited number of facets at a time. No budding programmer cares about version control when the code they are writing is so terrible they know they'll throw it away when they learn enough to have disdain for their own creations. Education and learning can't be jeoprodized by bothering students with trivial things.

The original creation of version control systems, usually attributed to the VCS utility of UNIX systems, was intended to lock portions of the source while you worked on them, and existed solely as a coordination tool for multiple developers. So, obviously there is no point in their use if you're the only developer. No matter how complicated your project gets, if you are the only person writing the code, there is no confusion to be had, and you can rest easy. Open the files you need, edit the code as you need, and you'll be fine.


Spark some discussion. Make a comment. Play the Devil's Advocate and argue on the side you would not normally stand for. Fresh perspective is a good thing to have.

Wednesday, February 07, 2007

Extreme Code - Examining Software Development Through Polarized Eyes

Developers are opinionated people. There are issues in our line of work with very bold lines between one side and the next, arguing over this point and that. I will not debate if emacs or vim is the superior editor, but I'm going to take a look at some important issues from the extreme lines, from the far polarized groups that chant all-or-nothing for their answer, from the points
of view where the distractions of opposition won't lead us astray. This is not to say I agree with either side, as I will try to look at them both, separately, but that I think looking at some issues, temporarily, in with a few doors shut is a good way to get the full view of that aspect.

I'll be sure to disclose my personal preferences, but I want to be as neutral in the matters as I can be.

This series will be called "Extreme Code" because of the "extreme" views we'll examine on each of the issues. The first edition will be published tomorrow, and is entitled "Extreme Code - Version Control For One File"
I write here about programming, how to program better, things I think are neat and are related to programming. I might write other things at my personal website.

I am happily employed by the excellent Caktus Group, located in beautiful and friendly Carrboro, NC, where I work with Python, Django, and Javascript.

Blog Archive