Monday, June 25, 2007

Validation for MySpace Hating

The hating of MySpace is not unique, but any professional-seeming information to back it up is rare. The findings are probably dead on with what I would expect, and I don't even see Facebook or know anyone on it. I do see the people on MySpace and the kind of people that I definitely do not see there. Social classes in the United States are always interesting, because there is a different dynamic than the expected class lines. Although, income certainly comes into play, it is not the definitive factor.



After reading about the division it reaffirms my desire to use Facebook. However, I don't know anyone on Facebook. All of my friends are on MySpace, including those running their own businesses, those with families, any of the younger members of my family, and the ones making far more money than I. The social divisions that mark MySpace are also what tie me to it.



apophenia: viewing American class divisions through Facebook and MySpace

Saturday, June 23, 2007

Factual Google

Google is building fact mining into the search engine. Coming across a little article over at The Best Article Every Day, I got wind that Google Spreadsheets can do lookup of certain statistical and financial information. You can have formulas that include things like the latest Microsoft stock quote or the boiling point of sodium. This seemed interesting, so I played with it a bit, but changing the formula quickly to play with it was awkward. "Can I just Google this stuff," I thought? Yes. Read on for my findings.

The documentation for the Spreadsheet function, GoogleLookup, talks about entities and attributes. "Pluto" is an entity and "mass" is an attribute. As it turns out, you can just search for "mass of Pluto" or "birth rate in Canada" and are presented with a new type of search result.

We can see that Google seems to be pulling facts from the websites they index. They are structuring the information into subjects and properties about them. The feature has some large holes of missing functionality. "boiling point of sodium" gives a fact, but the system fails to parse any of the hits for "boiling point of mercury". The information we can get seems a little hit and miss. The community needs to put effort to document all of the entities and attributes.

One interesting result is searching for "mass of Pluto" doesn't just give us a fact result, but what appears to be a Google calculator result. This means they are recognizing the mass in both value and units. We can even use "mass of Pluto" in any calculation we would give to Google calculator.

As the shift is made from taking finding relevant documents to just giving us the information directly, we might wonder what the future of the search engine is. I expect we'll see someone in the next year bring Google to court for yet another lawsuite about what they can or cannot scrape from their website. When you have a nice site with good information, and Google just gives the users the data, you probably worry about the affect on your traffic. If it does affect traffic, then will the sites Google is grabbing the information from even remain active? Where will they get facts from when their facts pulling eliminates their sources?

Thursday, June 21, 2007

The Stand Up Desk

My back and legs hurt, but this might be a solution: the Stand Up Desk. There are different ways to implement this. Some people shell out the money for adjustable desks. You could place a shelf with an extra monitor and keyboard at standing height and attached to your machine with splitters. I'm looking into the kind of adjustable mount arm attached to portables in hospitals, to install behind my desk and allow my screen and a set of keyboard and mouse to adjust up easily, without the rest of the desk needed. I think alternating sitting and standing will be nice. Until then, I'll stand up to read.



The Stand Up Desk - lifehack.org

Wednesday, June 20, 2007

Implicit Interfaces and the Web

The best interface to software might be doing nothing at all. Implicit interfaces are gaining mindshare. This is not a new idea. Amazon improves your experience based on your habits, for example. Google increasingly employs subtle, personal weighting of our search results. In The Implicit Web, Alex Iskold talks about the services of Amazon, Google, and Last.fm. All of them take advantage of the implicit actions of their users. Last.fm lets us track, publish, and find songs we listen to and like, and after installation, I forget it most of the time I use it.

Implicit Today

A number of services have risen that really should be implicit, but are not. This might be caused by implicit interfaces' very nature of being unseen. Although they can be wonderful ways to interact with our networks, they are difficult to deploy. Developing the algorithms to translate user behavior into user interaction, without hindering the user experience, can be difficult. Even coming up with an idea for employing implicitness is difficult.

The ultimate implicit application might be Google, when taken in terms of number of users. Their intuitive Page Rank system turns millions of web pages interlinking between one another and turned it into a social ranking system. Digg, reddit, and their clones are hot news these days; however, we can't deny that they have done little more than turn what was implicit into something explicit. The change has good and bad qualities. An ironic note: Google seems completely unimpressed with social services, being the only major player expressing no interest in a service like social bookmarks. At least, this might appear to be the case, at first glance. However, when we take note that Google's entire business is built on the idea of utilizing the links on our web pages as votes, we find they were ahead of the game and have the largest social bookmarking site on the internet. The only missing features are associating the websites with actual people.

Why the Explicitness

If Google were so successful with the first massively deployed implicit interface, why would sites adapt the pattern into explicit voting systems? The migration from searching to sifting is a probable cause. The original Google model works great for mostly static content. Asking the popular search engine "What's new?" is not easy, and this is an angle explicit services employ. Social networks are nothing new, but the personal and explicit aspects are newly pushed. A search engine tells you which webpages are popular, but thinks knowing who agrees is less important. They also have a hard time distinguishing between things you like and things you do not like.

Implicit Tomorrow

We need to evaluate what makes a good system, which explicit interfaces can become implicit, and what naturally implicit features to improve. Embracing the implicit areas leads to a higher level of user involvement, because they can be involved when they are unaware of it. However, making the user aware of the affects of their implicit interactions might be exactly the sort of thing the user needs to understand these services are actually there and valuable. There is little market for sites that asks you manually rank books and movies and recommend more to you. Amazon made its business on doing just that, because it takes information automatically and makes it obvious to the user what value they are getting. I routinely buy books from my Amazon page, because I know my habits are tuned it into a great place for me to find what I need. The implicit is there, but I explicitly take advantage of it.

Monday, June 18, 2007

Google Your Spellchecker

Feature volume rises as applications and services merge and soon we will need the power of Google within single applications. Of course, there are reasons for this that lend to the idea that we will not have single applications in the future. As applications migrate into services, and services combine and interact, the whole of software is evolving into a massive software ecosystem. Every state of software can be integrate, broadcast, and pull from a host of other global services. The number of "features" available at any point is rocketing into unimaginable heights. Until we can automate the integration, filtering, and aggregation of the mass of services we have for working with the same data set, we do not benefit as fully from their availability.

Jeff Atwood brought this up in context of Office 2007's Ribbon and the Scout plug-in that may not see the light of day, for internal political reasons at Redmond. The apparent story is that adding a feature to search their interface, even optionally, would undermine their attempts at marketing the glory that is the Ribbon. Of course, a searchable Ribbon is leagues beyond the traditional mess of menus and toolbars. Embrace of this concept would do nothing but benefit them, and give a head start in giving users a compass to navigate the ocean of features coming to them. Usability is about to transform from a gentle drift to a tidal wave.

I want to expand on this, but it is for another post. Features adapt into web services. Microformats and service discovery replace Plug-in systems. The interfaces of our applications will become a search engine of features, contextualized to the present task. When I can gather some information and thoughts on these subjects, I want to produce something interesting to gather the ideas into one place.

Office 2007 and Blogging

I finally started running my copy of Office 2007, and I wish I had abandoned Open Office earlier.

Everything is a lot more snappy and responsive than I expected. The common wisdom of each new version of Office requiring hardware upgrades seems unwarranted in face of this. Certainly, it is furiously faster than Open Office. I don't expect to make as much use of Google Docs and Spreadsheets, either. Word is taking up 20 megabytes in memory, while Firefox is eating 300 MB. Which one I prefer to keep running is obvious.

Now, I tried to write blogs with Open Office, but I found no plug-ins to get it to post to Blogger. You would really think I could use Google Docs, but somehow they don't properly support posting to their own blogging service from their own word processor service! Multiple blogs on one account is not supported. Posting draws the title from the first line in the document, even if the title is present and differs from this, meaning the title appears repeated in the final post. Meanwhile, Word 2007 actually includes support to operate with Blogger, a competitor's service, and supports multiple blogs. This is out of the box, as well.

Lately, I took some heat for my hard views on the whole IronPython versus Python issue, so I want to clear up some things about my opinion and my open mindedness. I will be looking at IronPython for writing plug-ins for Office, and here it doesn't bother me that things will be missing, because I am not using the other things. My first hopeful project: a free, and actually available version of Scout, the ribbon search that politics killed.

One thing that has disappointed me is the static nature of the Ribbon, which is not how I understood it to be. This could be the product of my usage patterns thus far, but I have several times expected it to adapt to me, if it really did that. For example, when I select some text during the writing of a blog post, the hyperlink options should appear. It just seems that is not how the Ribbon works, but am I alone in thinking that was the whole idea?

Object Orientation Has Little to Do With “Objects”

I would like to declare that the word "Object" from "Object Orientated Programming" is damaging to any benefits. If this seems counter-intuitive, you should keep reading. This is a case where the title is harmful to the subject. Some people take things too far and imagine some requirement for the concept of an object, and forbid anything outside their definition. If we understand the real benefits of OOP, the inappropriateness of such object-enthusiasm becomes clear.

Do objects matter? Using a traffic simulation example, we'll say we have instances of a Car class. We add lots of methods, such as accelerate(speed_diff) and implement logic to stop the virtual car at a virtual red light. The non-OO alternative would be functions operating on data describing the state of the vehicle. When we add motorcycles, we non-OO version requires a new function to operate on the new kind of data; or, so we are told. We know the OO way of doing things is to create a Vehicle class and inherit it in both Car and Motorcycle. Somewhere along the way, we loose emphasis of the affect we actually benefit from.

We benefit from the interface of the "objects", not their virtue of being objects. Too often the consensus you here focuses on completely irrelevant aspects. Methods, classes, and objects are completely without value, if you do not employ the real benefits. The real benefit is that objects have shape, and multiple objects can have the same shape. This can manifest by a single function operating on both cars and motorcycles, for example. This is an obvious benefit to have accelerate() versus accelerate_car() and accelerate_motorcycle(). It does not matter that we pass something you can call an object to the function, but that we can pass different things which act similar enough to be handled uniformly. A very non-OO way would be a function which takes the current speed, and the acceleration, and returns the new speed. The caller would need to get the information, call the function, and change the speed of wherever it is stored. Here, the user is stuck depending on the internals, rather than the shape of the externals.

There are some common situations, where I hear complaints from new comers to the Python language. The misunderstanding of what OO means, and what a language should do, leads to misunderstandings of Python as a language.

Getting the length of an object is a great example. You find many Ruby and Java programmers confused or upset that we have no length property on all our objects with a length. The interesting part is the claim that this actually makes Python "less Object Oriented." The fact here is we have a perfectly acceptable model, with common interfaces in a variety of different Python objects. Duck typing is, perhaps, the ultimate goal of object orientation. Mappings, sequences, and iterations are other great examples of shape importance in Python.

Two top reasons are code reuse and design sanity. Centering on interfaces gives us both cheaply. We can reuse code, because we only care about how it acts, and not what it is. The design of the code is cleaner, because we can remove all reference and care related to what we are dealing with and treat it uniformly.

Saturday, June 16, 2007

The Software Prosumer

The affects of prosumerism are well documented in the evolving economy of content, but the pattern applies equally well and valuably to software creation.

You are most likely a consumer, and if you're an American you think that is a Really Good Thing, most likely. The producers pushed that, and benefit from it. That is not to say we don't benefit from the relationship. Have you seen the price of tube socks at Wal-Mart? I can live with that.

Nonetheless, someone always has to complain, attack the norm, and think they know better. Anyone betting on the dominate future of the “prosumer” is likely right on that current negativity. We don't just read the news. We filter, amend, and combine it. Every novel today spawns even more words of fan fiction. Slashdot1 would be completely worthless without their prosumer users. The barriers between those who produce and those who consume are blurring and the two are mingling. The party is just getting started.

Prosumerism in Software Development

We already are seeing the affects and benefits of adapting the prosumer identity to software developers. Things like free software and Greasemonkey, which allows anyone with a little JavaScript know-how to alter existing websites, are good examples of the software prosumer. Ideally, the prosumer will consume more than produce, and what is produced can be consumed on the same level by peers. I think we have seen this with Greasemonkey and user scripts. I might not even use Firefox if it were not for the user scripts I employ. Obviously you can't say Firefox is somehow above or better than extensions to it which are more important to the user than the product itself.

There is an obvious lack of interest and motivation to promote the prosumer by software developers. You can attribute that to a subtle fear by developers in providing the users with a way of replacing them. The more development can be done among the users, the less real developers we need. However, we can also realize that the more development can be done among the users, the more can be done for the product by all. The more flexible we make our products, the more work the users will do for us. Prosumer software cultures are also great ways to get free marketing, dedicated users, and to satisfy user needs you could never find time for or even be aware of.

Plugin systems are a great way to encourage prosumerism. Some products are developed almost entirely as extendable frameworks, with all the “real” work done all in plugins. When the traditional producers work in the same environment as their prosumer base, the ability of those prosumers will rise. When the traditional consumers' only path to prosumerism is wrought with difficulty, hackish patching, and little or no producer support, the results do nothing but harm both sides of the equation.

The next time you're doing a project, keep some things in mind. If you need a new component and you could open the plug-in API a little to allow making it an extension, do so. If you can develop in an accessible (not compiled) language, do so. When you have some spare cycles, set up a repository of user contributions. A few small steps can go a long way.

Some examples and references:

Wednesday, June 13, 2007

Advertising Forgot You Remember

Do you remember when you were walking down the street and you saw that billboard for an injury law firm, so you punched the billboard and were teleported magically to their offices?

How about that commercial break during an episode of Friends, for a brand of tooth paste. Do you remember kicking your TV and bottles of toothpaste falling out of it with the shattered glass and smoke?

If you don't remember these events, why do online advertisers want you to hit their banners right then and there, which is so different from how you are used to getting advertisements? Because, you know and they know, that if they do their job right, you'll remember them later, when you need to.

In this light, I propose a new advertising model for the Web: AdMarks. I see ads for things all the time that I would buy, or want to buy, but that doesn't mean I can buy them right now, or have an immediate need for them. I'm not about to follow the banner, open a new window up, read about it, bookmark it, and come back weeks later to my bookmarked ad. I might if it was easier, though. We need advertisements that bookmark when clicked.

Of course, we need useful things to do with those bookmarks later. We should be able to search them, for when we want to buy that thing we saw a week ago. When I search a shopping meta-search site, stuff I AdMarked should come up immediately before anything else. When I'm walking around in Target with my smartphone, the bluetooth chips in the shelves should tell my phone to alert me about something I was interested in. When I happen on the website of the thing, it should know I was interested in one of their products and tell me more about it. When I've earned enough Customer Reward Points, they should just send me one, since they know I want it anyway.

I'd love to see Google do this with AdSense.

Python, IronPython, Apples, and Oranges

While Fuzzyman is over at the voidspace, talking about how great it is that, in IronPython, str and unicode are the same things, I'm over here getting more worried every day about the segmentation of Python and IronPython.
IronPython is a new implementation of the Python ... maintaining full compatibility with the Python language.
From the IronPython homepage.

They should go ahead and drop that last qualify. I want to make something very clear, and that is that I absolutely hate writing this post. The IronPython project is really great, and I've been impressed by what it has done, and my Microsoft's embrace of the language. Admiration does not trump worry, in this case. A number of issues make IronPython simply not Python. I've been advocating this issue more and more recently, so it is about time I wrote at a moderate length about the issue.

In IronPython, str is unicode

Now, it may be true that Python plans to drop the current behavior, make str unicode, and add a separate type specifically for dealing with byte strings (See PEP 358). However, that is not the case yet, and jumping the gun and making str and unicode the same type is an absolutely incorrect non-solution. This is not just a matter of taste, but a situation where IronPython is absolutely wrong. I can make two arguments against this.

IronPython does not encode or decode between str and unicode

One of the most important issues about dealing with unicode is the difference between unicode or unicode strings of text and encoding strings of text or bytestreams containing encoded text, which may be decoded into understandable unicode (Joel has covered all this). IronPython implicitly can not do this. A str with a non-ASCII "byte" cannot be encoded by Python, if you don't tell it the encoding being used. This is no flaw, it is the law. IronPython, having no str type, effectively, just assumes the bytes over 128 are taken as the corresponding codepoints. There is no encoding anywhere, in which this is the correct behavior. That's right. They just give you a known bad result, and let it go.

When There Is No Bytestring, You Have to Look Elsewhere

So what happens when you truly need to work with byte strings in IronPython, which pretends byte strings are unicode strings? Well, you have to look elsewhere. Of course, the entire .Net API is at your finger tips, so look no further than System.Byte and System.Array, of course. Sounds easy, but the danger here should be obvious. Any Python code assuming, correctly, that str is a byte string type, is subject to implosion within IronPython and any IronPython code "properly" handling byte data simply can't import outside IronPython at all.

Language and Library

Does syntax alone make a language? Maybe one day it could, but those days died out. Python is far more than its clean, beautiful syntax. The libraries that come in the standard library provide even more value. As a foundation for all the software built on top, these packages are fundamental to the success of Python. Yes, your code looks beautiful all on its own, but all on its own it does not have an embedded database, configuration parser, and mail and web servers. Right there you have a basis for a huge number of applications, without even leaving the language's vanilla installation.

IronPython does not include any of these, so if you write software using them, don't expect them to run on the .Net runtime, just because IronPython claims compatibility. You can probably access all the same facilities, but you have to do so through the .Net APIs of similar facilities. I am not even sure that the same facilities are provided there. The sad fact about a lot of this, is that many fo the libraries not included in IronPython actually work perfectly, if they would include them in the distribution, without change.

Because of this, we have to resort to things I consider terrible, like two different Python scripts, both doing some basic HTTP downloads, and both being completely incompatible because they rely on entirely different APIs: IronPython through .Net APIs and the real Python through urllib2 or httplib.

Conclusion


IronPython takes the syntax, but stops short of the language. The problem is one for both Python and IronPython lovers. In Python land, we're seeing what appears to be an influx of interest from the IronPython (also, via Silverlight) world, but all those new developers are creating completely incompatible code. IronPython advocates, on the other hand, look silly to think they are promoting the Python language, and are completely missing out on hundreds of great libraries, years of built up community, and synergy that isn't just a buzzword.

I really want this to all work out. IronPython, can we get along?

Tuesday, June 12, 2007

What Human Beings Can Do

Humans beings are capable of some truly amazing feats.

Somehow, I still can't get that twitter app to list my updates for me. The web-based editor I'm writing this in has odd bugs in the buttons. At least twice a day, I need to restart Firefox, so the rest of my computer doesn't crawl and cry. We can move upside down mountains, yet common bugs in our software still elude us.


Something is seriously wrong with this picture.

Patent Peer Processing

Finally, some good news about technology patents. We have known of the problems with the system for a long time, and now that things are starting to turn around, the burden is on the people to take the power being given them and make a difference and show that this works.

I am calling on everyone who has the slightest time and knowledge to contribute to this new system, because the results affect you just as much as the rest of us. That includes all non-US citizens, because we do live in an global village, and anything anywhere can affect everyone everywhere.

We need to make sure the new system is setup in a way that we can consume and digest the information in the same way we filter, rank, pass, and project information around the blogosphere today. That means ensuring that feeds are setup from the PTO, establishing aggregators, tagging conventions, and working toward trusted patent review bloggers. We can use the same tools we have been employeeing to digest insane am0unts of our own information and apply all of that to locating the best, worst, and silliest of claims by the patenters.

It might be great that any individual can read, review, research, and respond to the patents for the PTO to utilize in their decisions, but there is only so much an individual can do, even when there are many such individuals. When we turn all of us individuals into a group, a community, a patent chomping machine, and we can do something that is actually bettering for the entire world.

I'm Backpacking it Now

After finally trying to spend some time using Backpack, to really get an idea of what it is all about, I decided it was to stick with. I was already at the 5 page limit for free accounts, so I went to upgrade to Basic for $5 a month. Thankfully, at the last minute I remembered that 37signals themselves post coupons. First month is free, so no risk.

This also means I can try the calendar service, and have file upload capability. I'm trying out the use of page sharing to coordinate with a client.

The Chaos Theory of User Ingenuity

There is just no telling what those crazy users are going to do. As a recent post at Worse Than Failure makes us realize, they can simply do some impressively unpredictable things. The case in question has bank tellers using the Windows Task Manager (ctrl+alt+del) to manually kill a process for an annoying dialog their employers had the developers make un-cancellable as an error checking precaution. I am simultaneously dumbfounded at their incompitence for thinking it fine to repeatedly hard kill processes as a form of annoyance reduction and my sheer amazement that the users knew enough to even try it in the first place.

The lesson can be applied in a lot of places. We need to do more than predict what the user will do: we need to make our software robust enough to stand up to the random environmental attacks it will take from the users' strange and completely unpredictable behavior. The user could be clicking on our links or importing our packages (end user versus developer) and inevitably they will do what you did not account for. Account for the unforeseeable.

Account for End User Ingenuity

Software is annoying and the most annoying things will be avoided. The ways we find to work around limitations, real or perceived, are huge. That is exactly what the bank tellers were trying to do. The dialog in question made them double check money counts on large amounts, but they trusted themselves and each other enough to learn how and pass on the technique of subverting the required dialog to save just a few seconds every few transactions. Yes, it didn't not even come up on a typical basis, so don't expect frequency to estimate likelyhood of tampering. The user might put up with an annoying main menu for years, but abuse a glitch to skip a step in a process they only use every few weeks.

Probably the single most effective way to combat dangerous ingenuity of end users is the feedback mechanism. Let the user subvert through you, not around you. Enable responsive adaptation to their needs, and tweak the hell out of the interface to shave off those milliseconds. Milliseconds add up when you're on your feet all day.

Account for Developer Ingenuity

We can take this story and adapt it to ourselves. We know there are things we do to software that only for-pay websites would show you. No one is more abusive to software than those who create it, and when we deal with the internals we only have more strings to pull. Whether you develop libraries consumed by other developers, or want to avoid abusing the libraries you use, there are steps you can take to keeping usage on the path.


Here, our single greatest ally is reduction. Take away optional parameters no one has asked for yet. Don't implement a function that has no use case. Eliminate type checking to allow proper ingenuity through duck typing, while being prepared to properly accommodate common patterns that arise, which you never foresaw. Give the other developers constraints by giving them less to work with, but let the pieces they have flex into shapes they need, so you can take their feedback and adapt the code to officially support every unofficial dirty deed they bend it over for.

Account for Your Ingenuity

Who uses your software more than you do? Maybe the most dangerous person to watch out for is yourself. No one has access to pushing the limits of the software more than you do. The users can find ways to subvert your interfaces. Other developers can exploit oversights in the API. You, on the other hand, can bend the entire thing to your will. If you think a high math function would be useful in all the places you happen to use the special file format library you develop over on Google Projects, don't add it right away. Ask yourself if it really belongs there, if anyone else will you use, or you would accept the patch coming from someone else and without yourself wanting the feature. As much as the users and other developers can take advantage of your software, you need to look over your own shoulder more than anyone, but there are a lot more of them than there are of you (hopefully!), so don't let your guard down from their side, either.

Monday, June 11, 2007

Pythonic Defined

Introduction
Losing is Good
Strings
Dictionaries
Conclusion

Introduction

Veterans and novices alike of Python will hear the term "pythonic" thrown around, and even a number of the veterans don't know what it means. There are times I do not know what it means, but that doesn't mean I can define a pretty good idea of what "pythonic" really means. Now, it has been defined at times as being whatever the BDFL decides, but we'll pull that out of the picture. I want to talk about what the word means for us today, and how it applied to what we do in the real world.

Languages have their strengths and their idioms (ways of doing things), and when you exploit those you embrace the heart of that language. You can often tell when a programmer writing in one language is actually more comfortable with another, because the code they right is telltale of the other language. Java developers are notorious for writing Java in every language they get their hands on. How can you write one language in another? The answer to that is exactly the opposite to understanding what a term like "pythonic" means. A programmer coming to Python from C might write everything in functions, and avoid classes, while a programmer coming from Java might refuse to ever use a function and often wants to create a separate module for every single class. These are the telltale influences of their comfort languages on their Python coding. The situation can occure between any migration between languages. Their following of those languages' idioms when they are writing Python is incorrect, but when writing those languages, it is embracing the language. Doing the same in Python, itself, is Pythonic.

In the Real World

You will not truly understand "pythonic" without seeing it and experiencing it in the real world. You'll know you understand it when you can reliably identify what is not pythonic. However, we can speed up your time to making those judgement calls through examples and talking about what makes them the way they are.

Losing is Good

New comers to the language are often nervous about dynamic typing. Most don't really understand what it means, or why it can be a good thing. The most common new comer thoughts about dynamic typing is that you can assign any type of value to any variable. They like to think of them as void * in C, which is a large mistake. Variables do not change type in Python. Instead, names have no type association at all and the objects they point to (reference) describe their own types. The nervousness and misunderstanding of dynamic typing leads to over-zealous employment of the isinstance() and type() as type checking facilities, even when the code is absolutely, perfectly valid without it.
If you code is correct if some section is removed, remove the
section.
I think the best example of this has to be the dangerous desire to add, for example, isinstance(x, list) at the beginning of a function that expects an argument x to be a list. The programmer thinks safety has been added, by blocking the function from being called with anything not expected. What was accomplished is making the function slower, more brittle, and less pythonic. No longer can the function type a tuple, the keys of a dictionary, a generator, or an xrange object. New comers are not always aware of those things, or of how its so valuable to use them all interchangeably when the interface you need for each is the same subset: iteration.

There are other cases where removing something actually gives you something. Make functions smaller, combine functions that do the same things and are named differently, and turn a class into a function where it does not add any properties or methods, such as a case where all the logic is in the __init__ method. If you are having trouble catching an exception you don't know what to do with, don't catch it at all and let your caller deal with it. One of the best feelings you can get is cutting the lines of code in half, while simultaneously making the code gain new features and more speed.

Strings

Although the string may be the most common kind of data in a programming language, next to the integer, how they are handled between languages varies as wildly as David Hasslehoff's popularity between the States and Germany. A great many constructs with strings are good in one language and absolutely terrible in another. Some times the difference is only visual, and other times it has to do with the very nature of what a string is from language to language.
Immutable strings referenced by untyped names require completely
different semantics than typed languages or those with mutable strings.
String concatenation, understanding of when to use string formatting, and grasping how string manipulations work are sometimes barriers faced by anyone not familiar with Python. I think we can break down the situation into a small set of rules to live by.
  • Connecting only two strings? Concatenation with the + operator is alright.

  • Connecting more than two strings? Use ''.join(iterable_of_strings), because Python can't optimize a chain of + on strings, as it does not know they are strings due to untyped names.

  • Are you using an empty string format, such as "%d" or "%r"? Then you are wastefully formatting where a simple call suffices, such as int(s) and repr(s), respectively.

  • Are you using a regular expression where the split, strip, or replace methods would suffice? Then do not use the regular expression.

Dictionaries

Responding to a post at Blue Sky On Mars, this section has been added to deal with the issue of classes and objects versus dictionaries.

You'll find dictionaries to be one of the most flexible and powerful concepts in Python. Much of the infrastructure of the language is actually built on top of them. We need to cover when you should use a dictionary and when you should not.

A common error of those new to Python and languages with proper hash tables is to really over use when they are applied directly. People from C draw an obvious connection between a dictionary and a struct, and it makes sense. However, the connection only lasts for as long as its not applied as widespread as a struct is in C code. Dictionaries as general data structures are actually a very non-pythonic thing to do, despite the pythonic nature of dictionaries themselves. Python provides a rich, featureful object and class system, so abusing dictionaries is only lessening what power you have at your finger tips.
  • Type less.
    enemy.health is better than enemy["health"]
    enemy = Enemy(100, 50, 30) is better than enemy = {"health": 100, "attack": 50, "defense": 30}
  • Add methods, now or later.
    enemy.hit(player) is better than hit(enemy, player)
  • Default values, lazy computed values, and deprecation of values is not possible in a dictionary.

Time is Your Teacher

Nothing is going to teach you how to understand the real meaning of "pythonic" without experience and a lot of exposure to all kinds of Python code. Your best measure of your understanding of the concept is surely how well you can spot that code which is not pythonic, rather than that code which is. Code will begin to smell bad, and code smell is one of your most powerful tools as a developer, and one of the only tools, which can not be taught.

Patience is your key.

Standard Gems: collections

This until-recently-lonely module only houses two alternative collection types, deque and defaultdict, but promises useful things today and more to come. Anytime we have a good place to put things, we find more things to put there. With the new defaultdict type, collections is finally more than just that thing you use to get a deque: its a full fledged utility library. More optimized collection types (chains, B-Trees, and bags, anyone?) are sure to come, so keep your eye here every new Python release changelog, and maybe you'll get an early Christmas present.

Here is a quick rundown of what is offered today, using possibly silly examples.

d = deque()
d.extend(xrange(10))
while d:
print d.popleft()

What you see here is that deque acts like a list but has mirror versions of many end-modifying operations, like append, extend, and pop, which operate on the 'left' side. A list is far less efficient with insertion and popping from anywhere but the end of the list. This makes deque great for First-In-First-Out structures, where a list is more suited for a First-In-Last-Out setup.

dd = collections.defaultdict(lambda: None)
dd['a'] = 1
dd['b'] = 2
print dd['c'] or 3

Here we automatically handle a non-existant key with a default value, None. A factory callable is used, so that we can actually return different values, but we don't get the key. One interesting use is itertools.count().next as the factory, which means every missing key is automatically filled with an automatically incrementing integer.

Sunday, June 10, 2007

Dynamic Hell in Python Names

There is a question I often see from Python newbies: How to use the contents of a string as a variable name. In other words, to dynamically create a variable based on some runtime-found name. A lot of these users come from more static backgrounds and have heard the benefits of dynamic languages. I have to wonder if these are signs of their over-eagerness to exploit that dynamic nature.

exec "%s = %d" % (raw_input(), input())
Example of dynamic variable names in action. To a pythoner, obviously not pythonic.

d = {}
d[raw_input()] = int(raw_input())
Less bad. This is why dictionaries exist, so we use them. Also, we remove the potentially dangerous and frivolous use of input() in favor of int(raw_input()).

There are a lot of things that different programming languages are good for and things they are bad for. There are times when their negative points can be exploited, of course. There are far more times when their positive points can be exploited, and turned into disaster. Pointers in C, for example, are useful in the cases where they are used properly and smartly, but terrible when wielded by the wrong hand with evil or ignorant intent. The same can be true for a lot of a dynamic language's features.

People trying to avoid knowing the names and number of variables they have, create classes on the fly with runtime information, and build and evaluate python code in strings are looking at the flexibility of the language and really want to use that for their own good. The problem is that great power requires great responsibility, as we all know. What most of us don't realize, is that great responsibility requires great wisdom, and we can't buy, read, or request on IRC for wisdom. That means the great power that lures these new comers to the language is inevitably what will always keep them from fully experiencing the gifts it bestows. This is not a flaw or a unique attribute of the language, of any language, or of anything at all which benefits mankind and requires knowledge. Just don't exert yourself. Don't do anything for the sake of paradigms or languages or buzzwords. Solutions are neutral, so solve them for solution's sake, not implementation's sake.

[foo() for i in range(10)]
An obvious abuse of a list comprehension as a for loop one-liner. (We know this, because the resulting list is thrown away.)

What I find the most odd is the reactions these individuals give when you tell them what they are doing wrong, and what to do instead. If they want dynamic variable names, we tell them to use a dictionary; they complain about the inefficiency of hash table lookups versus variable names (from a C perspective, often). The simple thing I cannot understand with this is how they expect the dynamic variables work, if not a hash table. If they think they work like C stack locals, yet can be dynamic, why would they think that magical technique would not be applied to dictionaries. In other words, if they recognize they achieve the same end, why do they think they are such wildly different routes?

Update: I added some examples the day after initially posting this.

Giving Google My Soul

This is about Eric Schmidt's comments on Google's future in personalized searches and the outcry around it. It is a little bit delayed, I know.

Like any of us are private today. You can find me on MySpace and follow my friendlists to pictures of my sisters kissing their boyfriends. Do you think we live in a private kind of world? Nearly every page I surf to I tag somewhere and everyone has a tracking of where I've been in this web of the world. We put our family photos up on Flickr for complete strangers to sit back and enjoy, in whatever way they happen to enjoy them. We are not private kind of people, regardless of what we might think. Privacy is both an illusion and a pain in the ass. Now, before you call me a moron, a push over, or a sell out, just bare with me for a moment and keep reading.

We Are Not Private People

Orwell warned us and boy are we scared as hell. Big Brother, Echelon, unauthorized wire tapping, and a whole host of other threats to our privacy are out there. Never do I discount that there are things out there misusing information about us, both that which is public and which is private. However, what constitutes appropriate and not in dealing with the ocean of data these organizations have to work with is obviously a completely uneaten cake. These are only the threats we impose on ourselves.

Privacy is increasingly a hindrance to profitability, so any time companies can make more money by violating your privacy, they may justify it. The thing is, most of the time we probably don't even notice it. When is the last time you read a EULA before clicking agree? Hell, we don't even read things we actually sign with our own hands. I might call for our uprising to actually read these and refuse agreeing to that which we do not agree with. I will not make this advice, because I believe firmly we will only harm ourselves by disagreeing in over-reaching, unfounded, fear-driven ways. Of course, there are things we should watch out for, but, more often than not, I think we'll do better to back off on our worries a bit.

Despite the outcry we think we create about these issues, convenience is more important than anything else in our lives. I'll give out my credit card number to every store I shop at instead of taking the time to count cash. I toss it around to multiple online services and stores, and that surely has its risk, but it also has its reward. In my lifetime I can expect some level of inappropriate use of those accounts. I do not doubt that someone, at some time, will charge something to my account that I did not approve of. However, the amounts that could be is worth it. When you factor in that I'll almost definitely get the cash back from my bank, its a pale thing to pay for the convenience and time savings I get from passing the number around in the first place. Much in the same way, we don't stop using e-mail just because spam is a huge problem. It wastes our time, and time is money, so it effectively takes our money, but can't do so to the degree that the value in the medium is lost.

This doesn't even bring into the picture all the multitude of ways we rip down the walls between ourselves and give privacy a swift kick in the face. Does MySpace not tell us something about how private people want to be? They don't.

We Are Public People

The droves of teenagers and, yes, adults on MySpace, Facebook, and other social sites is both proof we like publicity and slowly shifting what care we have about privacy into the history books. We'll tell the world everything about ourselves on our blogs. More and more teenagers today write in their MySpace journals for everyone to read about the same things they would scribe into a private journal under their mattress just a decade ago. When we tell the world our secret fears, how offended are we when a company remembers we bought a pair of Nike sneakers?

I surf the web for pornography. See? Publicity, transparency, and general openness are taking over. No more seedy stores behind fences as we migrate openly, everything becomes public and acceptance rises as a result. What we would be embarrassed of doing privately a few years ago we just laugh about today, because the open availability changes the public perception of acceptability of acts. As individuals we are part of a huge social world and only when we embrace connecting to that society in as many ways and as intimately as possible are we now able to feel like we're a part of it. As that connectedness sky rockets, the barriers between people and their fears of one another break down. Don't talk to strangers? Soon there will be no strangers.

Beyond the personal we find repeatedly that openness serves financial and political needs. We force large companies to publish their finances, and politicians who blog are more connected to their supporters. Today we can keep tabs on the governing bodies in near real time, when in the past we wouldn't know what new laws were enacted until we were arrested for breaking them. Fortune 500 companies open their private conversation into blogs and reveal to the world what was closely guarded in their past. Can we accept the benefits of opening us those who run our world when we ourselves pretend we still want to hide behind a curtain? What improves the organization will improve the individual.

New Results, Old Information

Does Google really want to take any more information than we already have? What are we pouring into their systems already, telling them our deepest desires and letting them hold onto every word we speak? Google knows when I order a pizza, they could even be recording the calls for all I know because i don't dial numbers, they do it for me. Every page I come across is recorded. Every one of them.

People complained when Google wanted to provide context targeted advertising in GMail, because of concerns about them looking at our sensitive information in the email. We had no problem with them having the information, but freak when they use it without any human eyes. Do you see the disconnect there? "Here, corporation," we say, "take all this personal information about me and hold on tight." But the minute they want something in return, that benefits us, we pretend to care. That tells me we have complaints only because we feel we must, not because we genuinely care. If we did, we wouldn't want the information given in the first place.

I really think we care less about it than we let on. I don't think we should worry so much. We need openness from them, but we can return it with just the same. Everyone wins. We get awesome services and they get the money to provide them.
I write here about programming, how to program better, things I think are neat and are related to programming. I might write other things at my personal website.

I am happily employed by the excellent Caktus Group, located in beautiful and friendly Carrboro, NC, where I work with Python, Django, and Javascript.

Blog Archive