Skip to main content

How To Turn Web Development Around (Part 3)

When I complained about the problem, I promptly outlined some ideas about solving it, vaguely. Now, I want to narrow that outline into systems I actually use. I do most of my work with Django, some hobby time is spent with App Engine and Twisted, and I enjoy Amazon Web Services, so I'm thinking from these perspectives when I approach this. Parts one and two were broad, but some of this might only apply to fewer of you. Either ignore those or adapt to whatever you use.

Django's cache layer sucks. Simply stated and simply true. Any time I decide I can cache something, I should ask myself if I could have built it before I even had the request in the first place. Doing that with the template caches simply isn't possible. It should be possible and it should be the first path you take, instead of forcing us to go out of our way to do the better thing. Anything I might want to cache, I also might want to be sure I'm not doing in more place than once, and forcing them inline in my templates does not help this. The template caches imply a copy-and-paste method of reuse when a cached portion is used in more place than one. When I define a cache block, I name it and I specify a set of keys. This is exactly the information, that when changed, I should just generate that block as a static snippet to be inserted. If it weren't for the lacking in reuse mechanics, I would advocate parsing all your templates for cache blocks and pre-generating them. Instead, we need to pull the cached contents out of the normal templates and use the existing names and keys to find the generated snippets.

On the more basic level, there are some abstractions that need to be injected into Django-proper to really be useful, by means of what they would standardize. We have no current means of standardizing our cache keys in a way that different applications can cooperate about what data is where and how to get it. Even the types that are taken for granted in Django have no useful standards. If they did, I would be able to drop a QuerySet object into the cache in a way that another query can find to reuse. And, when memcached is by far the most likely cache backend to be used, we would be providing a mechanism that abstracted away its limitations in entry size, allowing us to trust dropping our QuerySet in safely.

Denormalization should be normal. I have revision tracking in a document system, and from a normalization perspective it makes sense that each version hold a foreign key to either its previous or next version, but not both. From a practicality perspective, if I have one version I want to know the previous and next versions without doing a new query. Our Resources might offer a solution, by giving us some place outside of our model to allow denormalized data. I could generate a record of my documents with all the revision information queried and built and stored in one flat record, while keeping my base model clean.

Queuing work should be as accessible as doing work. There is little or nothing inhibiting a developing from dropping one little query or action into an existing operation. I've recently built a weighted sort to replace our basic date and time based order for posts. This means generating scores for all the posts and updating those when posts or votes change. Now, whenever we calculate scores we account for the age of all votes and the relative scores and age of all posts and votes together. In other words, this is something I'd prefer not to add to the cost of a user actually posting content or voting on something. It would have been extremely easy for me to call one generate_scores() function, but it takes thought, planning, and infrastructure to have this done after the request is handled.

Borrowing from existing Python canon makes sense, so I think multiprocessing is a candidate for use here, in one form or another. multiprocessing.Pool.apply_async() without a result returned fits the bill for an interface to call some function at another time, possibly in another process. Any function that works when passed through multiprocessing into another process should also work when queued up for execution at some later time, so borrowing here reusing existing semantics developers should be familiar with.


Comments

mike bayer said…
Make sure you consider Beaker, either in part or whole, before inventing your own caching layer. This is the caching framework used by Pylons and Turbogears 2.

Popular posts from this blog

Respect and Code Reviews

Code Reviews in a development team only function best, or possible at all, when everyone approaches them with respect. That’s something I’ve usually taken for granted because I’ve had the opportunity to work with amazing developers who shine not just in their technical skills but in their interpersonal skills on a team. That isn’t always the case, so I’m going to put into words something that often exists just in assumptions.
You have to respect your code. This is first only because the nature and intent of code reviews are to safeguard the quality of your code, so even having code reviews demonstrates a baseline of respect for that code. But, maybe not everyone on the team has the same level of respect or entered a team with existing review traditions that they aren’t acquainted with.
There can be culture shock when you enter a team that’s really heavy on code reviews, but also if you enter a team or interact with a colleague who doesn’t share that level of respect for the process or…

CARDIAC: The Cardboard Computer

I am just so excited about this.


CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer. 
One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC?

It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on. The only re…

How To Care If BSD, MIT, or GPL Licenses Are Used

The two recent posts about some individuals' choice of GPL versus others' preference for BSD and MIT style licensing has caused a lot of debate and response. I've seen everything as an interesting combination of very important topics being taken far too seriously and far too personally. All involved need to take a few steps back.

For the uninitiated and as a clarifier for the initiated, we're dealing with (basically) three categories of licensing when someone releases software (and/or its code):
Closed Source. Easiest to explain, because you just get nothing.GPL. If you get the software, you get the source code, you get to change it, and anything you combine it with must be under the same terms.MIT and BSD. If you get the software, you might get the source code, you get to change it, and you have no obligations about anything else you combine it with.The situation gets stickier when we look at those combinations and the transitions between them.

Use GPL code with Closed S…