Friday, December 26, 2008

How To Measure The Merb/Rails Merge

This is a two part posting, hopefully. The first part is a request from me to my Rails-using readers. (Do I have Rails-using readers?) Can you leave lots of comments telling a non-Ruby guy like me what importance the Rails/Merb merger I've been hearing about means for the wider web developer community?

Saturday, November 22, 2008

How To Be Dissappointed in Something You Recommend

I recommend purchasing Expert Python Programming, by Tarek Ziadé. I am extremely disappointed in this book, but I'm recommending it specifically if you already have a good grasp of Python.

You see, I was really looking forward to recommending this book. I had hoped that the many people I know with a good developer head on their shoulders, but had not approached Python with seriousness before, would find this a perfect introduction to sit down with. While I'm really pleased with the writing and structure of the content, I'm afraid this is a book suffering from severe editing oversights. There are subtle mix-ups in terminology in many places and some code samples that are simply and absolutely incorrect.

This is where I made my decision:

>>> from threading import RLock
>>> lock = RLock()
>>> def synchronized(function):
...     def _synchronized(*args, **kwargs):
...         lock.acquire()
...         try:
...             return function(*args, **kwargs)
...         finally:
...             lock.release()
...     return _sychronized
>>> @locker
... def thread_safe():
...     pass

I'm actually not going to point out the actual two mistakes here (I suspect most people that notice will only notice one of them). I want to demonstrate that the problem can be subtle for someone new, but otherwise with a good understanding of software development. This rendered the text applicable to a much smaller readership than it would have otherwise been perfect. I want to repeat how much I really liked the writing, and that I really am recommending it this book. I simply want to express my simultaneous disappointment. I'm really looking forward to posting a glowing review of a second edition of this book.


A closing note...

I sat with this book on myself for the last two weeks trying to decide what to do about my decision on it. Honestly, it was a difficult choice to write about it at all. I am certainly not making any friends at Packt. Make your own decision with this free sample chapter.

Thursday, November 13, 2008

How To Follow the Book Meme

This is not a bug.

I traced the book meme that is going around back to this guy. My nearest was The Best Software Writing. Somehow, I really like what this random peek into a book gave me to post. Somehow quite fitting.

The rules for this meme thing are :
  • Grab the nearest book.
  • Open it to page 56.
  • Find the fifth sentence.
  • Post the text of the sentence in your journal along with these instructions.
  • Don’t dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

Wednesday, November 05, 2008

How To Spin Your Wheels in Content

The week has been giving me little time to write, with lots of things to run around and do. I'll be waiting on my auto work to get done tomorrow and have more free time going forward. This blog a day thing has been difficult to do with any value so far, but I'll make it all average out. Anyway, I'm cheating today with a preview. This is a list of things I will absolutely be writing in the coming weeks.

  • How To Understand AppEngine Datastore Under the Hood - Part 3 (by request)
  • At least one new "How To Test Django ..." post
  • Per chapter reviews of the exciting new book, Expert Python Programming by Tarek Ziadé. For now, please check out this free chapter!
  • Announcement of a new project many people are aware of to help guide both new and moderate Python developers
See you 'round the tubes.

Tuesday, November 04, 2008

How To Vote

I voted.

Being a geek, I took a keen interest on the touchscreen voting machine used in my county. There was a checkbox within the larger label for each option. This kept everyone safe from the common "finger slip" mistake that has plagued the touchscreen voting machines. I also took note of a window along the side, where I could watch my votes being recorded on the paper receipts. I feel pretty confident most voters will completely ignore this physical verification.

There were some parallels I drew between this and a recent interest of mine: touch screen interfaces. Voting machines might make us angry every few years, but mobile phones and PDAs annoy us each and every day. We're reaching a critical point in the cheapness of touchscreen displays, making them the inevitable first-class interface. We need to come to some standards and practices that give them the consistancy and expectedness that we get using our common interfaces today. At the same time, I'm glad to see a place where some creativity can happen seriously.

Monday, November 03, 2008

How To Own A T-Mobile G1 for One Week

So I've had my G1 for about a week now and I'm happier with it every day. There are still some issues I have, but none that are related to the most important piece: the Android operating system. I love the software available and new things come at a decent pace. There are some network issues, but limited to my particular side of the apartment complex. I don't know anywhere else in town that I don't have good coverage.

Interestingly, to me, my favorite application is Bubble, a basic bubble level app that works vertically or horizontally or to test the level of a surface. I don't have a lot of use for it, but it highlights some of the things I like most about open, portable devices. I imagine a real reduction not just in the number of devices I need (I'll be selling my iPod soon) but just the number of things, period.

Now, I have felt like there is a lack of games for the system. More, that the games there are have been pretty much feeling like prototypes pushed to the Market because it feels cool. I'm really hoping this will change, but I think the mobile gaming market is going to need a good cross-platform solution before we see really nice things. I'm sure the iPhone people will tell me how they have better games, but I can't imagine really good entertainment available until we get a common platform for Android, the iPhone, Windows Mobile and Blackberry devices, and their brethren. Get on it, Adobe. If you loose mobile, you loose your foothold.

I've started to get really interested in mobile development. Some serious thought went into hacking in the Android SDK to get Jython on the device, but I feel confident it will be done and I simply don't have the cycles for it. It has given me a new mindset in my web work, however, and I'm giving some real consideration to the problems there are doing just about anything on the mobile web. Yeah, this Android Browser can handle just about any page I've tossed at it, and I'm really happy about that. We can't deny, however, that it just doesn't work to implement the same on something completely different. I don't like dragging text and I hate horizontal panning with a passion.

Sunday, November 02, 2008

How To Blog a Day for November 2008

So, the recent interest in National Blog Posting Month has made me shift gears from NaNoWriMo to, obviously, posting once a day every day this month. Obviously, I missed the first, because I was out of town. I could have managed it anyway, but the change in plans was late in the game and I was out late that night. This still gives me a chance to flex the writing muscles and I'm really happy to take part in it. Thanks to everyone who has inspired each other to keep this going this month.

This is my semi-mandatory and semi-cheating meta-post about the month, where I get to say I posted for the day. Really, I'm just filling time by writing about the fact that I'm going to be writing. To give this emptiness a little meat, I'll say that I have something of a schedule. I have a few topics to be covered in some fun posts and even somethings to cover in a series of posts. It will be a fun month.

Everyone, keep me honest. Yell at me if I miss a day.

Thursday, October 30, 2008

How To Call It A Day

This week hasn't been great for my productivity. It has been a series of days overshadowed by a series of things coming up. Between standing in line at the DMV, computer issues, and today helping my brother-in-law with a very sudden move, it feels like typing is an unfamiliar act. (Unless its on the T-Mobile G1, which I'll be reviewing this weekend.)

Today, I helped load a seventeen-foot U-HAUL truck, made a few last minute stops, drove said truck just over an hour south and helped unload it into a storage shed. I've never loaded and unloaded a complete truck in one day, in all the several moves I've made over the last years. I was always able to stretch them over two days, with a nice sleep in the middle. After all that, I had to drive the truck back to drop it off. I barely made it. My dear wife hit traffic on her way to pick me up, so I sat and I waited. I listened to the mechanic at the drop-off location declaring how "North Carolina is McCain country," which was informative of him. I enjoyed some Mike and Ike candies.

So finally getting home, eating dinner, and putting my son to sleep, I sit down at this familiar, glowing box. What code can I get out before its time to call a day? What debugging and planning can I get in before the consciousness must be suspended? How do I make use of what day I have left?

I did a lot today, even if I couldn't work today. Sometimes, knowing when to quit is the best productivity choice you can make. I'll see you in the morning, Internet.

Tuesday, October 28, 2008

How To Backport Multiprocessing to 2.4 and 2.5

Just let these guys do it for you.

My hats off to them for this contribution to the community. It is much appreciated and will find use quickly, I'm sure. I know I have some room for it in my toolbox. Hopefully, the changes will be taken back to the 2.6 line so that any bugfixes that come will help stock Python and the backport.

So, if you don't follow 2.6/3.0 development you might not be aware of multiprocessing, the evolution of integrating the pyprocessing module into the standard library. It was cleaned up and improved as part of its inclusion, so its really nice to have the result available to the larger Python user base that is still on 2.5 and 2.4. Although some edge cases might still need to be covered, the work is stable quickly.

Here's an overview incase you don't know, so hopefully you can see if it would be useful for any of your own purposes. I think, starting out, there is more potential for this backport than the original multiprocessing module. Thus, I hope this introduction is found useful by a few people.

>>> from multiprocessing import Process, Pipe
>>>
>>> def f(conn):
...     conn.send([42, None, 'hello'])
...     conn.close()
...
>>> parent_conn, child_conn = Pipe()
>>> p = Process(target=f, args=(child_conn,))
>>> p.start()
>>> print parent_conn.recv()   # prints "[42, None, 'hello']"
[42, None, 'hello']
>>> p.join()

This is an example from the multiprocessing docs, utilizing its Pipe abstraction. The original idea was emulating the threading model. The provisions are basic, but give you what you need to coordinate other Python interpreters. Aside from pipes, there are also queues, locks, and worker pools provided. If you're working on a multicore system with a problem that can be broken up for multiple workers, you can stop complaining about the GIL and dispatch your work out to child processes. Its a great solution and this makes it a lot easier, giving the anti-thread crowd a nice boost in validation and ease-of-convincing. That's a good thing for all of us, because it means software that takes advantage of our new machines and more people who can write that software without the problems threading always gave us. Of course, some problems, like locks, can be problematic in the wrong situation, so don't think I'm calling anything a silver bullet. The point is, it improves. Nothing perfects, and I know that.

Monday, October 27, 2008

How To Review Memiary in 5 Easy Steps

This is how to review Memiary in 5 easy steps:
  1. Forget what you did yesterday. Check!
  2. Decide that all problems can be solved not just with software, but by adding new software just for that purpose. Check!
  3. Get written about on the popular ReadWriteWeb so people find you. Check!
  4. Be nifty enough to grab someone's attention when they try out the new service. Check!
  5. Surpass a plain text file in convienience, flexability, privacy, and install base. Damn! Maybe next time.
The best way to solve a problem is to avoid needing to solve it in the first place.

How to Underestimate Google App Engine

Yeah, AppEngine has been around for a while. That doesn't make my general AppEngine article less timely. Hey, I don't just write about stuff because its hip. In a few months, I'll announce what Google Chrome means for the web landscape. Seriously.

Although a lot of people believe Google App Engine is a very big thing and extremely important to the landscape of the web, I get the strong impression from outside the camp that its more of a toy, and I want to address that. As with my quick review of App Engine itself, its hard to make real calls when everything is still beta, but we're working with what we've got here. The people who see the real potential of App Engine feel it and the people who just think its neat Just Don't Get It. What is there to get that so many developers are missing and why would those of us that do think its important enough to be evangelical about, as I'm doing right now?

Once again, making any claims or arguments in this discussion has to start by defining what we're talking about in the first place. Are we talking about the choice of first runtime (Python) and included libraries (namely, the Django templates)? Is Google's design of the Datastore API and what other service APIs they provide the import factor to praise or ignore? Perhaps the details of their hosting plan makes it all worth gold, regardless of what software they put on top of all that iron? Going with my previous post on App Engine, it all comes down to the experience we need to discuss.

For Newbies This Means...

People just starting out with Python, web development, or even programming at all have a great opportunity here. They can focus on writing code in a very low-barrier environment and not worry about a lot of the details of deployment and hosting that got in the way before. The river between the Field of Writing Code and the Field of Running Your Website has been reduced to a trickle that can easily be waded across.

Few deny the benefit of the lower barrier here for the uninitiated, but there may be some misplaced valuation. There is more to benefit these individuals than staving off their eventual need to understand how to manage and deploy to their own hosting solution. New developers are in an amazing position that none of the rest of us are privy to: they may go entire careers without knowing how to setup a webserver. This is wonderful. Does it mean they are incapable of it or that we'll get a flood of developers who are less able to perform? I believe we see a change in the way we learn. We're narrowing disciplines and we aren't wasting mental cycles and man hours having every gear understand the entire machine. If you write code well, then learn that and just that. Ignore the rest.

Every coder today knows something about designing, even if they aren't good at it. Anyone who has written a line of code, HTML, or CSS for the web has probably configured a database at some point. We take this as normal and expected, and just rites of passage. We fit ourselves into specialties as we gain experience in our niche, but we all have an expectation of knowing a little about a lot. We seem adverse to the idea that the next generate will know how to do their job well, and not at all the jobs we know, but don't do on a regular basis.

For Experienced Hobbyists This Means...

Even when a developer reaches the point that hosting things themselves, managing servers, and configuring databases is feasible, none of it is necessarily worth the effort. All of that is time that could be spent solving the real problem at hand, with your family, or on your real job. The ability to do something doesn't negate the cost in time and effort of doing it. Being able to handle a problem when it arises does not make it meaningless to avoid the possibility of that adversity in the first place.

For Serious Ventures This Means...

With the flood of tiny little apps being launched on App Engine, the question of a large scale app being unrolled on the platform is a big one. Will anyone really build businesses hosted on App Engine? Can Google be trusted with your code? Will this platform offer the real power and opportunity needed to meet the demands of a growing business? None of these are particularly interesting to me in this context, because the deeper question is if the benefits that work for tinkerers and hobbyists extends to "serious" work. I think it does.

Sunday, October 26, 2008

How To Test Django Template Tags - Part 2

In Part 1 I wrote about my method of testing the actual tag function for a custom Django template tag. I said I would follow it with a Part 2 on testing the rendering of the resulting Node. This is that follow up post for any of you who were waiting for it.

Testing the rendering poses some more problems than our little tag function. Rendering is going to do a bit more, including loading the templates, resolving any variables in question, doing any processing on those results (like looking up a record from the database based on the variable value), and finally populating a new context to render the tag's template with. How do we test all of that, without actually doing any of that? This is the goal we're reaching here, for unittests. We want each test to be so specific that we test what something will do, without actually relying on those things it does. We aren't testing any of those things, just that our render() method does them.

What can we mock easily? get_template() is an easy call, so we can patch that to return a mock inside of our test. Our render() needs to load the template, do its processing, and then render the template. We can assert the rendering was done properly afterwards, thanks to the mock template.

So far...

@patch('django.template.loader.get_template')
def test_link_to_email_render(self, get_template):
    node = LinkToEmail(obfuscate=False, email=Mock())
    node.email.resolve.return_value = 'bob@company.com'

    ...

But now we get to our problem. We have to call our render method to test it, and its expecting a Context to be passed. Normally, we want to mock things we aren't directly testing, but it doesn't always present itself as easy.

As of mock 0.4.0 the Mock class does not support subscripting, and contexts are dict-like objects. My first inclination? Just pass a dictionary. Unfortunately, the context also has an important attribute, autoescape, which needs to be inherited by the context we use inside the render() method, and dictionaries don't have this.

class ContextMock(dict):
    autoescape = object()

@patch('django.template.loader.get_template')
def test_link_to_email_render(self, get_template):
    node = LinkToEmail(obfuscate=False, email=Mock())
    node.email.resolve.return_value = 'bob@company.com'

    context = ContextMock({})

We're making progress and we're at the point where we need to actually call the render() method. Now, after its basic processing its going to create the Context in which to render the template. For the sake of limiting what "real things" we invoke during our test, this might be something we watch to mock.

class ContextMock(dict):
    autoescape = object()

@patch('django.template.loader.get_template')
@patch('django.template.Context')
def test_link_to_email_render(self, get_template, Context):
    template = get_template.return_value
    node = LinkToEmail(obfuscate=False, email=Mock())
    node.email.resolve.return_value = 'bob@company.com'

    context = ContextMock({
        'email': Mock(),
        'obfuscate': Mock(),
    })

    node.render(context)
    template.render.assert_called_with(Context.return_value)

    args, kwargs = Context.call_args
    assert kwargs['autoescape'] is context.autoescape
    assert args[0]['email'] is context['email']
    assert args[0]['obfuscate'] is context['obfuscate']

The testing itself is pretty basic. We want to make sure the mocked context is given go the template to use in rendering and that the context properly inherits the autoescape property. We also test that the context matches the data we're giving. In the end, this was pretty easy. I actually cleaned up the code I based this on in response to writing the article and discovering cleaner ways to do it.

We need to put some thought into our tests. Often we are tempted to take shortcuts. We might write a unittest which simply calls the function, maybe checks the result, and we call it a day. We need to test different conditions under which a function is called. We need to ensure we are testing reliably, and using things like mocks help us ensure that when our test calls the function, we know what the world looks like to that function. Mocks are our rose colored glasses.

This two parter on testing Django template tags is hopefully the start of more similar writings on specific testing targets. Many of them will likely focus on Django, for two reasons. Firstly, I think there is a lack of good testing practices in the Django world, where I see. Secondly, I'm in the process of adding tests to a not-small codebase and these posts both document my journey and guide me.

How To Test

This is an index of different articles I've written covering techniques for testing specific software components. The number is small, but will grow in time. Initially, expect a heavier lean towards Django topics.

  • How To Test Django Template Tags Parts One and Two

Saturday, October 25, 2008

How To Test Django Template Tags - Part 1

I'm involved in a project that has gone for a long time without tests and everyone involved knows tests are rilly rilly important. There is a point where acknowledged best practices simply meets the reality of the development mind and it doesn't always work out like you'd hope. We know tests are important, but we need to resolve this ticket right freaking now. You understand. The point was reached that this just couldn't continue and the costs of fixing the same bugs over and over were way too obvious to ignore. Tests are now popping up for things we're fixing and new things we're writing. As it happens, I came across my first real need to create a custom template tag. Of course, I wanted to test it. So how do you test something that is so entrenched in the Django processing pipeline as a template tag?

Incidentally, I'm just going to assume you either know all about testing and Django template tags or you can follow along just fine.

Testing breaks down into individual functions and we try to keep them individually small, to be easier to test and less likely to be broken. The simpler something is, the more likely you actually understand it. So our custom template is really two functions: the tag parser and the renderer. The first is the function we actually tell Django to call when it needs to parse our tag. The second is the render() method of a Node subclass.

Here is an example of a kind of tag we might be working with. It creates a link to an email address, and optionally can obfuscate it instead. For example, the obfuscate flag might come from whether or not the page is being viewed by an anonymous user or a friend.

{% link_to_email "bob@company.com" do_obfuscate %}

The parsing first, which I do in LinkEmail.tag(), a classmethod.

...
@classmethod
def tag(cls, parser, token):
    parts = token.split_contents()
    email = template.Variable(parts[1])
    try:
        obfuscate = parts[2]
    except IndexError:
        obfuscate = False
    return cls(email, obfuscate)

So we have two conditions that can happen here. Either the tag is used with just an email and we default to not obfuscating, or we are told to obfuscate or not by the optional second tag parameter. To simplify this post, the second parameter is simply given or not. If its given, we obfuscate, we don't resolve it as a variable like the email.

So we need to test this function getting called when the parser gives us the different possible sets of tokens we're dealing with. Mocking comes in handy.

@patch('django.template.Variable')
def test_tag(self, Variable):
    parser = Mock()
    token = Mock(methods=['split_contents'])

    token.split_contents.return_value = ('link_to_email', 'bob@company.com')

Now we actually call the tag method to test it.

    node = LinkToEmail.tag(parser, token)

    self.assertEqual(node.email, Variable.return_value)
    assert not node.obfuscate

This is the axiom of good testing: we're only testing one thing at a time. We don't actually invoke any template processing to test our one little tag. We don't even let the function we're testing do anything else that might break, except for a pretty innocent creation of an instance of our node. That's OK, because it can't break:

def __init__(self, email, obfuscate):
    self.email = email
    self.obfuscate = obfuscate

The only things it tries to do outside of the function we're testing is split_contents() to parse the parameters and create a template.Variable instance, but we mock both. We control what split_contents() returns, instead of relying on actually parsing a template. We replace template.Variable with a Mock instance, so it doesn't do anything other than record that it was called and let us test some things about how it was called and what the tag() method did with the result.

We'll also want a second test where split_contents() returns three items and we verify the obfuscate parameter was handled properly.

In an effort to remember that I don't usually read any blog post longer than this, I'm not making this longer. So, I'll make it two parts. Tomorrow, I'll write about the larger issue of testing the template renderer, while trying to keep our test as clean as possible. It is a little trickier.

Read Part 2 on testing tag rendering.

Monday, October 20, 2008

How To Limit Your Possibilities

So, this was going to be a post about the Python module, subprocess. I'm a big fan of subprocess and there are a lot of problems that are easier to solve by using it. We reduce thirteen distinct facilities into one class. We reduce a diverse ecosystem of interfaces into one, uniform interface. The subprocess module is good, both by itself and as a symbol for what Python stands for. I won't be writing my original post about subprocess.

It isn't that subprocess isn't important, or that I don't think I can express myself properly, but that it brought up something else I should write about right now: What should I write about?

Is this a blog about software development or is this a blog about Python development? Does it need to be only one? I'm looking for my direction here. I'm not going to stretch this out, because if I do, you won't read it. And truth be told, I want you to read it. I want you to enjoy reading what I write. At heart, I am a writer. I take no shame in admitting that I love watching my graph in Google Analytics rise on every post I make. But, this is also about expressing myself, as a developer. And that is no more a Python developer than a software developer. I can't abstract everything I write.

The final answer to what my direction is? I don't have one, and that's just fine.

Saturday, October 18, 2008

How To Recognize a Bad Codebase

We learn to recognize a bad bit of code quickly as our code-fu grows. Arbitrary side-effects smell badly and crazy one-liners frustrate us. It becomes easier to identify what lines of a codebase you might want to clean up to improve the overall quality of the work.

There is a line between codebaess with bad code in them and bad codebases. When do we learn to recognize this and what are the signs that the problem is far reaching, not localized? A bad codebase is an expensive codebase. It is difficult to work with and difficult to collaborate with others on. Identifying what makes a codebase bad is key to knowing when, where, and why to improve it. Improving the overall code quality reduces the overall code cost. I'm thinking about software in economic terms these days, and I'm hoping we can turn the recession to our favor by pushing the mantra Bad Code is Expensive Code.

Costs of code come from three actions. Adding features costs, fixing bugs costs, and understanding costs. Adding features is an obvious source of code cost, and every time you want to expand a products abilities you're going to pay appropriately. Fixing bugs is both obvious and subtle. Where its obvious that you need to fix bugs you see, it can be very subtle when costs are added that you can't actually detect (more on this later). Understanding the code, to most minds, might be entire subtle and never obvious. New developers, existing developers moving to new areas, and users trying to understand the behavior emerging from the collection of code all need to understand these things and the most expensive to understand it the less likely they will.

I feel no need to expand on the cost of adding to a codebase. What will hit us are the subtle points. Bugs' cost explode against the subtle misunderstandings, leading to the conclusion that a lack of understanding the code is the single greatest source of increasing its cost. This is through the partial obvious needs to understand the code and the more subtle costs they add to being able to fix bugs, and even to properly expand the feature set. The problems manifest as the actual bugs in the software.

The sign of a bad codebase is a difficult to debug codebase.

Now we only need to know the causes of difficult debugging to know the signs of a bad codebase.

Does the codebase lack tests? No tests mean you can't be sure any change breaks more than you intended to fix. Locating the source of a problem is hugely expensive when you're manually verifying correctness, instead of via automated testing. There are fantastic techniques of binary debugging, narrowing a changeset range down to the extra change that introduced a bug. This is so expensive with manual testing that it might as well be impossible, while with tests its one of the greatest debugging tools you could ever have at your disposal: It can automatically tell you exactly what code caused your bug. It can debug for you, but only in a codebase that started out good.

Does the codebase lack documentation? If your understanding of the code comes mostly from trial and error or asking other developers, then you lack documentation or enough clear code to self-document. Every time you add a feature or fix a bug, you're debugging more than the code, but your understanding of how it functions. Clear code, concise comments, and good documentation let you focus on the breakage of the code, and not the breakage of your understanding of its design.

Does the codebase grow or shrink? We might think a growing codebase is a generally universally good sign, but its not so. A shrinking codebase can be a great sign. It means two things. Firstly, it means an increase in the quality when the amount of code reduces while maintaining or increasing the value (not to be confused with cost) of the code. For example, if you can make a function clearer but finding more concise ways of expressing the same ideas, you reduce how much code there is to understand to get the same job done. A shrinking codebase also tells you that the code is understandable enough to be refactored, which is a little deceptive. The better quality of your code, the easier it becomes to improve the quality even futher.

Take this as a three point test. How do your current projects score?

Monday, August 11, 2008

How to Understand AppEngine Datastore Under the Hood: Part 2 - The Raw Datastore API

If you haven't yet read the first part of this series, feel free to start from the beginning with Part 1 - An Overview of the Underview

Every AppEngine developer is familiar with the module. In Part 1 I introduced what goes on under the hood of this API, to give everyone a better understanding of what they are taking advantage of. Now, in Part 2, I'm going to detail the actual API that is used to utilize the raw entities behind our Model instances. At this time I am unsure if anything in this API is suspect to change, but I doubt anything is subject to drastic flux and I'm fairly confident everything here is safe for actual use, as much as anything else in AppEngine.

Module: google.appengine.api.datastore

Our main focus here is the Entity class. Everything supports it, from the Get, Put, and Delete functions to the Query class. Their uses are obvious. As previous exposed, each entity is essential a property bag and will take any given properties to the datastore for storage, query, and retrieval. Now, the entity is much more flat than its abstract cousin, the Model. It stores and retrieves the values, and then its job is done. It will tell you the key of a reference, but its up to you to request the actual entity based on that key.

Here is a full round trip for creating, storing, querying, and retrieving and entity at this low-level API.


from google.appengine.api import datastore

e = datastore.Entity(kind='test')
e['name'] = 'My Test Entity'
datastore.Put([e]) # The list must be of entities of the same kind only
also_e = datastore.Query(kind='test').Get(1)

assert e == also_e
assert e is not also_e

datastore.Delete(e)


This is a very basic overview. We'll look at the details very soon. The entity is used very much like a dictionary, with value types restricted to datastore-compatible types of str, unicode, int, float, datastore.Key, or lists of one of these types.

One detail to note is that there are no provisions in place to ensure that Entities are cached or that when loading an entity, an existing instance with the same key is reused. This means that two entities (or Models) could represent the same persisted record, and changes to one or both that conflict will meet a race condition. This is something I would like to see change in the overall Datastore API. For now, keep it in mind and consider a cache of your own.

For complete API details, look in the AppEngine SDK's copy of this module. It is not the same as what runs on the AppEngine servers, but the API matches for all the public functions and classes.

While researching this I came across an interesting detail about the keys as represented by the datastore library. Every key is basically a trio of the Kind, ID, and the application identifier. Most of us are familiar just with the hash-looking form of the entity key and know that entities have numeric IDs, but we shouldn't rely on them as strongly as the keys. A little investigation into the source reveals that every key is actually a Protocol Buffer message, and that the hash-like key we see is actually the encoded PB message in url-safe base64, containing all three components. The keys are actually full paths to individual entities, mapped by application, kind, and ID. This intrigued me to attempt loading an entity by key giving another application name (of my own), to which I received an interesting error "BadRequestError: untrusted app shell cannot access app foo's data". The interesting thing about the error is that it doesn't tell us one application cannot access another's data, but that this particular application can't access this specific other applications data. Does this mean a future feature will allow it? The possibilities here are very exciting.

Please vote on Reddit and/or Digg this article.

How to Understand AppEngine Datastore Under the Hood: Part 1 - An Overview of the Underview

There are a lot of wrong perceptions about the datastore in Google AppEngine. People both familiar and foreign with AppEngine don't really understand what the datastore is. There is a deeper system underneath the nice API we are given. Understanding the guts can help us understand the skin. We may also find there are times when we must shed the skin for new clothing.

The biggest misconception about the datastore is the assumption that "kinds" are anything like "tables". You could use a set of entity kinds similar to the way you would use a set of tables, but they simply are different beasts, entirely. A table controls a strict requirement on the structure of its rows. Every entity, on the other hand, is free to hold any properties of allowed types. The published Model API is all an abstraction provided to give us a nice interface on top of an otherwise much looser foundation.

Many people would be very surprised to learn that a given kind doesn't actually require anything of its entities, but from the right angle it makes perfect sense. Meeting the kind of scalability requirements the datastore is designed for places interesting limitations. Schema changes can't get in the way when you could have such a large dataset that no operation can ever effectively operate on the entire set at once. This means what was a simple matter of ALTER TABLE in SQL is practically impossible in this new world, as the logistics behind updating and migrating potentially millions of entities to a new schema grows beyond the acceptable resources to give to a schema change. However, if we allow flexibility, we simply start creating new entities in the updated form and be sure that when we load one of the previous versions, we're prepared to use or upgrade it on the spot. For this and other reasons, allowing all entities to be free-form is the simplest direction to provide the foundation we need.

With a better understanding of our foundation we can better understand the abstractions in google.api.ext.db, with the Model subclasses most AppEngine developers know. I've seen quite a few people asking about migrating to changes in their db.Model subclasses, not understanding why or how their existing entities will change to match the newly defined properties. The behavior and how to work with it is a lot easier to understand when you view the individual entities are independent property bags, and not rows following a defined column schematic. We can also come to understand db.Expando as closer to the wire, so to speak, than its stricter Model cousin.

Perhaps a more exciting gain from this different view of the datastore is that we aren't bound by the published Model-centric API at all. In fact, we can access the underlying Entity class directly, providing us with a simple, persisted mapping object, without anything building on top of it. If we need some structure to our persistence, but the provided API simply isn't to taste, then an understanding of this layer gives us what we need to build our own variant datastore API. We may even use this understand to provide implementations compatible with previous ORM solutions, but powered by the entities and BigTable, rather than traditional SQL databases. The possibilities open up with our deeper understanding.

The more variation we have in what everyone is doing on AppEngine, the more value it has to all of us. Take this information and do some exciting. Share it and we're all reap the benefits.

Look for Part 2: The Raw Datastore API

Please vote on Reddit and/or Digg this article.

Saturday, August 02, 2008

How to Bubble the Good of Twitter to the Top

The aftermath of the quakes in California saw a lot of talk about Twitter getting the word spread, from the trenches, very quickly. Chris O'Brien heralded it as a sign that NextNewsRoom is doing something right. A lot of people were talking about it. Twitter carried the news before any news agency. First is one thing, but quality control is something else. The flood of messages reached a point that its almost assured no one read every quake tweet that was sent. There were just too many of them. Can anyone imagine the flood that would have been seen if Twitter existed and was popular on the morning of 9/11? It would have been maddening.

We can take this situation and ask two questions. How can we form something better from the flood of tiny messages? Do we even want to? Can we find some way of filtering both relevant and "good" posts and could we pull some larger picture from all the little pieces? Of course, doing so would take resources, and those are either iron, eyes, or time. What can we spare that is worth the result? Maybe at any cost, its just not worth the result. Does this new source of news simply fill a gap the old misses, not threatening the established zones?

I'm really interested in what kind of system we could implement to condense a stream of tweets into something larger and more thought out, but it poses a lot of problems. It would either take a lot of processing power to analyze and merge a stream or a lot of people doing it manually. Either way has costs, and reducing either resource would lead to the results taking too long to be relevant.

What would any system like this filter out? When you could have hundreds or thousands of people reporting on an event at the same time, you could get a lot of redundancy, so you'd want to filter that. If twenty people break the same news at a trade show, we only need the fact once. Can language processing do this? Human eyes would probably have even more trouble. What humans could do is read the stream, through filters, and summarize it as they read. Maybe retweets need a bigger status? This could repeat up the ranks of relevancy and importance.

Some solution to this perceived problem may or may not be possible, but the end, we may not care. Twitter certainly isn't the only end all beat all communication mechanism, despite what some enthusiasts may seem to believe. At the end of day, its uses are limited, and limits don't have to be a bad thing.

Sunday, July 27, 2008

How to Delay First Impressions of Google App Engine

Most of the buzz about the App Engine has died down, except among the developers actually using the platform. When the first public announcements were made, I was a part of the original group of developers first given access. This privilege was wasted. I did nothing with it. This has changed, which is a topic for a different post. I thought I'd take a moment to make my mark on the "What I think about Google App Engine" wall.

A proper review is difficult, for a number of reasons. Namely, there is a very vague understanding of the difference between this "Preview Release" and what we'll have when it launches with all official status and commercial potential. We can only hope that any big changes won't interfere with our existing applications. If that does happen then existing players might turn to AppDrop for help, although any mass migration is highly doubtful. In any case, what everyone is talking about is what App Engine is now, not what it might turn into at some unknown point. (As a side note, it would be really great if that point was less unknown!)

What is a review of App Engine really about? Their choice of included libraries, as well as the promotion of Django templates into such a defacto standard (already having Guido's blessing) is one thing to talk about. The differences between BigTable and traditional relational databases is a big topic of debate. We can discuss the development server and deployment method, as well as the control panel available to us, if we want to focus away from the actual programming for a moment. In the end, what we really need to care about is the thing Google set out to solve: experience.

They changed the experience in two meanings of the word. A less experienced developer can now make a successful launch. All developers can have a much better experience. The experience, both in history and present, is of the full cycle of development. App Engine isn't doing anything for writing code. That is all low-bar when you look at the tools they use that were already available and in wide-use, like Python and Django. The value added spice of App Engine is what you do when you aren't writing code.

People complain about the choice of language, if they aren't already Python lovers. Some people just don't like the choice of included template system. There are complaints about BigTable in App Engine and its problems compared to a "normal" database. These debates are all bunk. People are complaining about the very things that don't matter one bit. In the end, we might develop a hundred libraries for doing the same thing, because we all think we know the slightly better way to do it, but App Engine exposes our primary flaw: we're developers. We're great at solving problems, building solutions, and writing code. We run out of steam when it comes to doing something with it. Our problem solving desire is a largely academic one.

If a software developer solved world hunger, he would blog about it and move on to the next project.

App Engine would read that post and actually go out and implement the solution the developer forgot about. It does a great job at what it solves, and I love using it. What I'm doing with it and why not enough developers care about it are both stories for other posts.

Friday, July 25, 2008

How to Defend Twitter's Spam-Fighting Follow Throttling

So, the twittersphere is in an uproar about those dropped follower counts. Is everyone more afraid of the lost high-count vanity or that so many people follow without thought that we might never regain many of the legitimate follows? Either way, there is a lot of complaining about the apparently service mishap from the company that we shell over so none of our hard earned dollars to. The mistake is one thing, but I see quite a bit of sentiment against the very method they undertook to combat the spam problem. I challenge that claim, because I think they're on the right track limiting follows, and I'm going to explain why.

For Popular People This Means...


You're popular by how many people follow you, not the other way around. You can go on your way, with thousands of people hanging on your every toilet flush, and Twitter can still limit those damn spammers from following you along with ten-thousand other ego filled, txt-fingered masters of the twitterverse.

For "Community Managers" This Means...

Now, ReadWriteWeb makes a claim that so-called "community managers" are harmed by these changes. Examples include Comcast, JetBlue, and Pandora, who use Twitter to keep in touch with their customer base. Now, kudos to some random guy at each of these corporations signing up under his employer's name. However, a reasonable use case falling under this category of twitter account just shouldn't be worried with how many people they can follow. Just like the populars, its all about how many people are listening to you, because what those peasents have to say doesn't even register to you.

No one is reading closely to a timeline filled by thousands of follows!

For Spammers This Means...

Don't follow thousands of people when only a couple dozen morons fall for your bullshit.

For Twitter This Means...

Discourage the need for any legitimate uses of massive follow lists, you blue bird lovers. The value of following anyone breaks down soon after hitting three digits, so figure out why people are doing that in the first place. There is a small set of reasons that are even conceivably plausible.

Heavy twitter users who have migrated over to IM or TXT based usage may have discovered a nearly hidden feature about your follow lists: it is two tiered. That's right, you have important people and everyone else, but this is only revealed if you start to use IM or TXT and filter the updates you get, likely to reduce phone charges. I found a different usage, because limiting the updates was great, even when I have unlimited txt on my plan. When I started using desktop clients more often than my phone, I wanted that back. Bring this to the forefront and let us have our active follows and our passive follows. We should probably only care to see our passive follows on some broad timeline, versus our narrow timeline. While you're at it twitter, lets make this taggable, but that's a whole other story.

Aside from this, the only other reason I see is the number of things you can't do on twitter without explicitly following someone, or being followed by them. Open up direct messages (optionally), even without follows. Let us do more without following people. Of course, the addition of passive follows, as I mentioned previously, would do just as well to fix this.

The number one benefit these changes would have would come from expanding the notification options to ignore the passive follows. That is, don't tell me if someone is following me passively, because they don't really care about me, so I don't really care about them. They can put whatever restrictions they want on active follows, within reason, and we can all still keep track of thousands of twitterers, without looking like spammers. All you need to do is attack the number one reason spammers mass-follow: they're abusing twitter to send plain old fashioned e-mail spam, with a very crappy costume.

In The End

Summing up with the bold lines:
  • No one is reading closely to a timeline filled by thousands of follows!
  • The value of following anyone breaks down soon after hitting three digits
  • they're abusing twitter to send plain old fashioned e-mail spam, with a very crappy costume
Concluding easily that automated checks and limits on following lots of people is fine, because only spammers have a real reason to do it.

Monday, July 07, 2008

How To Host Every Language in Every Language

Atul writes:
Last week, Scott Petersen from Adobe gave a talk at Mozilla on a toolchain he’s been creating—soon to be open-sourced—that allows C code to be targeted to the Tamarin virtual machine. Aside from being a really interesting piece of technology, I thought its implications for the web were pretty impressive.
The next steps Scott took are the most interesting, because he starts using this to build stock Python and Ruby runtimes that are hosted on Tamarin. This is a fascinating solution to one of our biggest itches: more languages on more platforms.

Imagining the sheer number of languages (most) this opens up to running on any Tamarin run-time (Flash and Firefox 4) is mind boggling. Go on, let your mind be boggled. Combine this with the basic idea being targetted to other platforms and you've got a lot of possibilities. Target other bytecode, like Java or .Net, and you open up more possible cross-builds than you can count. Platforms begin to fade on the borders.

At the same time, Mozilla is already busy learning to convert DLR bytecode to Tamarin bytecode, so I guess Java is the only bytecode left anyone (maybe) cares enough about getting to run on it. Down the road, could this mean Flash (and Firefox 4) will be the only platform supporting, essentially, any and all languages and libraries, in some form or another? Impressive.

Not only would Tamarin support Python, but potentially all major implementation of the language. Choices are great.

Of course, the same will be done for every platform, and once again the pattern repeats itself. Vendors will fight over control of the platform, just to be made irrelevent by one layer above them.

Saturday, February 23, 2008

How To Destroy the Handheld Game Dominator

I couldn't even pluralize "dominator" because Nintendo won't let Sony in the door. Nintendo has the handheld game market locked tighter than Fork Knox. This won't be the first place to call out the "Apple is entering the handheld gaming market" flag, but I do think I can lay out the steps they would (or should) take that can lend credibility to the idea. If nothing else, I hope someone there is reading.

Apple can't do this alone, but they have a very good friend in another company with a name that starts with A: Adobe. The pair would be the ultimate contender into the very tight market and the approach is amazingly simple. Flash is coming to the iPhone and iTouch, and I'll hope they make bookmarking Flash games easy and give us the option to "fullscreen" them on the devices. Explicit offline caching wouldn't hurt either. The next step is obviously to allow flash apps and games to be installed directly for quick access and immediately the devices have an interactive media platform with an amazingly rich community of developers and user support.

Do we even need a separate device? The only thing needed would be upgrades to the lines that would probably happen anyway. More memory, speed, and storage are always nice. The touch could spawn some more sensitive actuators and allow some different control types.

The only mistake they could make here is to require any physical medium.

Outside of the physical aspects the entire approach just hinges on how they market the devices in the coming years and if they can price a model competing against the DS and PSP.

Thursday, January 31, 2008

How To Perfect the Keyboard and Mouse

This is my dream so don't squash it for sounding trivial. This is my window to the world, the tools of my job, and the outlet of my creativity! I want the Perfect Keyboard and the Perfect Mouse.

  • Operate as NiMH battery chargers when plugged into USB for power
  • Lighted keyboard to type in darker conditions. Must be adjustable
  • Must be configurable to PC and Mac layouts
  • Would be handy to configure to DVORAK layout, as well
  • Retractable USB cables
  • Keyboard functions as USB hub, even wirelessly
  • Scroll ball instead of a scroll wheel. I do love my Mighty Mouse
  • Weights for mouse, with storage in keyboard
  • Trackball (or even a nub) in the keyboard to lean back and browse with
  • Splittable keyboard with locking adjustments
I am going to spend the rest of my life replacing perfectly good keyboard and mouse combos if no one solves this simple list of requirements.

The adjustable keyboard is probably the hardest part, combined with the other requirements I want fit into it. I'd like to pull the keyboard apart at a split, adjust the angle, and lock it into positions. The numpad would be handy to detach or just adjust, but it doesn't bother me as much.

I use a cheap Micro Innovations set right now, and they serve me well. I use the new slim apple keyboard and a Mighty Mouse at work. Everyone else at the office hates the Mighty Mouse, except one girl upstairs who I do not know. I have taken my place as maintainer of these holy relics, so that I will always have them to love upon.

Looking at the current market of adjustable keyboards gives my wallet a sharp pain in the money fold. Not that I need permission from the little lady to make such a purchase, but she's said there is no problem. I think she just wants me to bitch a little less about hand cramps and joint pain. No trouble in the wrists that would point to something serious, so don't worry. I Always use a wrist pad and meticuously adjust my keyboard, pad, and chair to keep the arms at the best position. I'm a stickler for ergonomics, and its the arms and hands that get the bulk of that attention.

Stay comfortable, people.

EDIT September 9, 2009 I moved onto a Logitech EX110 set a couple months ago and the feet already cracked in half and broke off, simultaneously. I am getting by with a flat keyboard for the moment and have made the decision to get a Kenesis Freestyle, but I haven't decided on the details yet. Has anyone tried these? Can anyone recommend good setups with them? Alternatively, can anyone suggest other makers of two-part keyboards, maybe even with wireless models?

Saturday, January 26, 2008

How To Expose the Guts of Twitter (A post about Starling)

Twitter does a lot of queuing. I mean, a lot. We know other people have a need for some good queuing, so much that Amazon even released Amazon Queue Service, not so long ago. There has never really been a common queue server, and maybe that is because its so simple that no one has really had the need to push one hard into the public eye. At least, as public as our eyes are.

Enter Starling, the internal queue system of Twitter, recently released to the public. Written in Ruby, and I don't even mind! Pointed there by my ever-pointing buddy, David Novakovic, Starling does nothing absolutely remarkable, but someone has to get the light. What is interesting is their choices. Starling uses the MemCached protocol, so your clients are probably already prepared to use it, they just need to treat the queues a little different from the mappings. The typical MemCached get-operation now removes the item from the queue. The keys function is identifiers for the queues. I don't think it could have been simpler. I'm planning to look at setting up Starling for testing on my linux servers and my Macbook, and to try and find something interesting in the way of using it. I have some plans I could utilize it in, and maybe bring it to the office later.

Now that Starling has some attention and gives us something of a standard for queue protocols (I love reusing protocols!), if anyone has different needs or just wants to scratch an itch in their language of choice, lets make the smart move and take the same protocol route. Queues may be a small thing, but its the same things we really need to agree on more. Anyone up for a MemCached-protocol to Amazon Queue Service bridge?

Tuesday, January 22, 2008

How To Blog For Choice

So I vowed to write more and blog more and the year has plenty of time left in it, so don't worry about me. The past month has been amazing, and that's why I haven't had the time to write. I'll be scheduling it soon, so a resurgence in content is imminent. I try to keep on my tech topic, but I do far too little activism on the things I believe in, and its high time I changed that. Don't worry, politics will not become a staple of this blog, but I'm likely reviving my personal blog. But, no one reads that, so how vocal can I be about something with no readers?

Today is Blog for Choice Day

Blog for Choice Day

We're supposed to be a logical bunch. We spend out careers thinking about things and being intelligent. When you think about something long enough, there are obvious realizations that everyone comes to. People that think about tracking version changes all realize you need goof version control. Any group of people trying to coordinate understand the need for issue trackers. Software is designed in chaos, but a small bit of thought leads us all to the same conclusions.

Do we reach the same ends outside our industry? Is thought universal? There are a lot of things in the world that people take for granted, and that those who think about for some time come to the same conclusions of that the non-thinkers just don't understand. There is no secret that the more educated a person is, the more likely they are to be atheist. Health conscious members of society are more likely to wander away from the steaks and mix up some soy-shakes.

Is it any different with the right for a family to decide when they grow? I can't see the logic and forcing every mistake to live through a strained budget, a broken family, or to drag down the life of an aspiring teenager. Life is a precious miracle and the biggest way we can waste it is to let it find its way in the inopportune spots of the world. We don't do the gift of consciousness any favors by making it deal with a life that didn't have room for it.

The wrong-right won't even budge to save the life of a mother that could go on to birth more, healthier children. They think its somehow more humane to bring broken babies into broken familes, and take teenagers who made mistakes out of school instead of giving them a second chance to build a life with a family the right way. They can call choice supporters baby killers, and I have a friend who does just that, but all they support is inhumanely putting babies into doomed lives.

Think about it.

I was an accident.

Friday, January 04, 2008

How To Walk Backwards to HTML 5: Follow Up

This is a follow up to my first How To Walk Backwards to HTML 5 article. The one comment I got in this first Twenty-Four hours pointed out a lack of explanation on my part for a few things. I know about the current HTML 5 specification. I've read most of it, reviewed plans and others' reactions, etc. My views on HTML 5 are not out of a lacking of knowledge, but are a reaction to my knowledge of HTML 5.

I think what HTML 5 looks to be shaping into is the wrong direction.

The creation of the layout specific tags is a response to what was coined "div hell", but it isn't the right solution. We all have different needs for what we need HTML to represent and it gets abused into representing everything from resumes to tetris clones. Abandon schemas and doctypes and just let us write the tags that have meaning for our cases. Hey, we can do that with XML namespaces! Give us to the tools to discover formatting and layout rules and control the pages intelligently.

If you need an article tag, fine. Use it and have fun, but maybe it just doesn't do anything for me.

The need to post this article was rekindled when my colleagues spent the better part of twenty minutes debating the default rendering properties of the paragraph element. Can you imagine when we start adding even more layout and content specific tags to the new spec? The result is going to be disastrously inconsistent, because there is just more to be inconsistent about.

Thursday, January 03, 2008

How To Start 2008

So this is my obligatory start-of-2008 post. I know I haven't written much lately, but work was busy and then there was the holidays, and I'm making a commitment to really revitalize my blog. Part of that may be that my adsense, after years of blogging, as only hit half the required minimum balence for payment. But, I'm not in it for the money. Not unless there was a lot of money in it!

For 2007 this means... that I need to wrap up the last year

We moved back to North Carolina when the Pennsylvania winter cleared up, and I'll admit that the summer was a bit rough. I lost my most steady contract when funding went sour, shortly before the movie, but you know what? Staying home with the family was great without a lot of work to be done, and we got by OK. I enjoyed the time.

After a while, I started CharPy, the Charlotte Python Group. We're still small and growing, but the first meeting gave me a lead on a full-time position at SocialServe.com, where I'm now happily employeed. We're bringing the group back up, after a holiday hiatus, and looking forward to a year of expanding in any way we can.

For New Years Resolutions this means... I have some promises to make

I'm going with the usual diet and excersize. I quit smoking years ago and I've very recently become a vegetarian, so the time is ripe to really hit the health bandwagon with an excersize routine, however small but consistant. Dropping 10 pounds over the holidays is pretty encouraging.

More important to me is my new dedication to do something non-code and creative at least once a day. So, that means I'm either writing, drawing, painting, or practicing the guitar, my lost skill. I'm actually scheduling some time to read fiction, as my non-fiction reading is just consuming all my literary life.

For Blogging this means... I need to expand and focus

I'll be posting my writings on Spilt Mind and writing about what I read and watch on Mental Outlash, two blogs I barely touch. I'll try to write some personal things on my original blog, but I don't even think anyone has it in a reader or every looks at it. I'll likely do something like cross-blog linking, so be aware that posts may now contain links to recent posts in my other blogs.

For Projects this means... one of my back-burners needs to cook

Keeping all of my own projects on hiatus can't continue. I need to take one of them off, but probably not until the summer. I'm guessing my domain, jigspace.com, is going to get used soon, so look out for it.

For 2008 this means... unoriginally, I have predictions

Everyone else is doing it!
  • Kindle will be called a success, but won't be. Kindle 2 will be slimmer, cheaper, and the pricing model will drop
  • Android phones will have terrible advertising. iPhone will continue to be sexier, reguardless of any real comparison
  • IronPython will adopt the standard library and I've got my fingers crossed for this one
  • At least one major PC gaming title will release with simultaneous Mac, Linux, and Windows support
  • Glow technology will find its way into backlights
  • Cross-browser extension platforms will emerge so we can break the chains of Firefox
  • Nintendo will open up the WiiStore for indie game developers
  • Apple will open the iPhone and iTouch to compete with Android
  • Someone will release an Android compatibility layer, if partial, for iPhone
  • An over-the-counter recreational drug will be announced
Good year everyone!
I write here about programming, how to program better, things I think are neat and are related to programming. I might write other things at my personal website.

I am happily employed by the excellent Caktus Group, located in beautiful and friendly Carrboro, NC, where I work with Python, Django, and Javascript.

Blog Archive