Skip to main content

Python, IronPython, Apples, and Oranges

While Fuzzyman is over at the voidspace, talking about how great it is that, in IronPython, str and unicode are the same things, I'm over here getting more worried every day about the segmentation of Python and IronPython.
IronPython is a new implementation of the Python ... maintaining full compatibility with the Python language.
From the IronPython homepage.

They should go ahead and drop that last qualify. I want to make something very clear, and that is that I absolutely hate writing this post. The IronPython project is really great, and I've been impressed by what it has done, and my Microsoft's embrace of the language. Admiration does not trump worry, in this case. A number of issues make IronPython simply not Python. I've been advocating this issue more and more recently, so it is about time I wrote at a moderate length about the issue.

In IronPython, str is unicode

Now, it may be true that Python plans to drop the current behavior, make str unicode, and add a separate type specifically for dealing with byte strings (See PEP 358). However, that is not the case yet, and jumping the gun and making str and unicode the same type is an absolutely incorrect non-solution. This is not just a matter of taste, but a situation where IronPython is absolutely wrong. I can make two arguments against this.

IronPython does not encode or decode between str and unicode

One of the most important issues about dealing with unicode is the difference between unicode or unicode strings of text and encoding strings of text or bytestreams containing encoded text, which may be decoded into understandable unicode (Joel has covered all this). IronPython implicitly can not do this. A str with a non-ASCII "byte" cannot be encoded by Python, if you don't tell it the encoding being used. This is no flaw, it is the law. IronPython, having no str type, effectively, just assumes the bytes over 128 are taken as the corresponding codepoints. There is no encoding anywhere, in which this is the correct behavior. That's right. They just give you a known bad result, and let it go.

When There Is No Bytestring, You Have to Look Elsewhere

So what happens when you truly need to work with byte strings in IronPython, which pretends byte strings are unicode strings? Well, you have to look elsewhere. Of course, the entire .Net API is at your finger tips, so look no further than System.Byte and System.Array, of course. Sounds easy, but the danger here should be obvious. Any Python code assuming, correctly, that str is a byte string type, is subject to implosion within IronPython and any IronPython code "properly" handling byte data simply can't import outside IronPython at all.

Language and Library

Does syntax alone make a language? Maybe one day it could, but those days died out. Python is far more than its clean, beautiful syntax. The libraries that come in the standard library provide even more value. As a foundation for all the software built on top, these packages are fundamental to the success of Python. Yes, your code looks beautiful all on its own, but all on its own it does not have an embedded database, configuration parser, and mail and web servers. Right there you have a basis for a huge number of applications, without even leaving the language's vanilla installation.

IronPython does not include any of these, so if you write software using them, don't expect them to run on the .Net runtime, just because IronPython claims compatibility. You can probably access all the same facilities, but you have to do so through the .Net APIs of similar facilities. I am not even sure that the same facilities are provided there. The sad fact about a lot of this, is that many fo the libraries not included in IronPython actually work perfectly, if they would include them in the distribution, without change.

Because of this, we have to resort to things I consider terrible, like two different Python scripts, both doing some basic HTTP downloads, and both being completely incompatible because they rely on entirely different APIs: IronPython through .Net APIs and the real Python through urllib2 or httplib.

Conclusion


IronPython takes the syntax, but stops short of the language. The problem is one for both Python and IronPython lovers. In Python land, we're seeing what appears to be an influx of interest from the IronPython (also, via Silverlight) world, but all those new developers are creating completely incompatible code. IronPython advocates, on the other hand, look silly to think they are promoting the Python language, and are completely missing out on hundreds of great libraries, years of built up community, and synergy that isn't just a buzzword.

I really want this to all work out. IronPython, can we get along?

Comments

Fuzzyman said…
So you have the same problems with pywin32, py2exe, jython or indeed any platform specific module or implementation? (except you don't seem to devote so much energy to railing against these ;-)

I also think that you underestimate the value of having Python on a new platform. It isn't "just syntax", but the whole semantics of the Python language, which really takes the pain out of programming on the .NET platform. This is very valuable - and the .NET framework is pretty rich, so being able to use Python there is a good thing!

Also, ConfigParser seems to work fine with IronPython - and possibly some of the other modules you mention. Did you *try* them at all?
Fuzzyman said…
To make it clear - at Resolver we have a 'large' IronPython application, which uses many modules from the Python standard library as well as third party Python libraries - and it works *great*.

Sure, there are some issues. Efforts to create cross-compatibility layers would be (MUCH) more effective than complaints!

By the way, urllib and urllib2 already work with IPCE and will soon work with IronPython. Part of the problem is that these modules (as well as other parts of the standard library) rely on undocumented features or even implementation details.

The inspect module decompiles bytecode - how is this ever going to work on another implementation? (An unpatched iinspect.getargspec works on neither IronPython nor Jython - and that is the fault of the Python standard library, not these implementations.)

Some of the problems highlight things that need to change in Python...
Paddy3118 said…
It is good to see a large corporation get publicly behind Python but this is the same Microsoft that tried to wrest control of Java from Sun with its own slightly different flavour of "Java for Windows from Microsoft".
It would be good to see Microsoft make all the right noises about IronPython and seeking compatability - then back it with actions of course :-)

- Paddy.
Manuel said…
Quoting myself from another forum:

> I don't agree with Calvin. Having IronPython, Jython, PyPy attempt to perfectly duplicate the CPython
would *harm* Python as a language.

> The community is learning, slowly, exactly what we mean by "str", "unicode", "bytes". These bugs are *good*, they are opportunities to learn.

> I believe Guido explicitly said that he was learning about unicode issues "in the wild" for Python3k from Jython and IronPython.

> "str", "unicode", "bytes" mean different things than they did three years ago. They just do.

Now, replying to you article:

First of all, Python, CPython, IronPython, Jython, PyPy are all changing entities. As changing entities, we can only legitimately criticize their trajectories. I see zero evidence that IronPython will "fork" the Python community. So much activity is being spent to make the Python library run in IronPython, in the FePy community and in the Microsoft sanctioned community.

> IronPython takes the syntax, but stops short of the language.

> IronPython advocates, on the other hand, look silly to think they are promoting the Python language, and are completely missing out on hundreds of great libraries, years of built up community, and synergy that isn't just a buzzword.

Pure FUD. Nothing else to call it. Unless you possess evidence that forking the community is one of the goals of the IronPython project, how could you possibly defend these statements? Surely you aren't suggesting that they should have solicited your permission before they announced a "1.0" version?

> I really want this to all work out. IronPython, can we get along?

Let me answer your question with another question. What are you, personally, willing to do, to get along? If "getting along" is the goal, why does the full weight of the effort fall on those that choose to code in IronPython?

Is it "getting along", or is it "do what I want"?

In reality, Python is in flux. This is a feature, not a bug.

Python design features are being formed the correct way: they are informed by the different implementation decisions of different implementations of Python, and then collecting "in the wild", "real-world" best practices.
Anonymous said…
It sounds like the ongoing approach with Python language implementations to be as different as the library implementations.

Jython's different destructor/garbage collection mechanism seemed to set this tone early on.

But when these differences trickle all the way up to fundamental data type/STRING implementation differences per platforms, you've got a powerful point.

Python will continue to be the cross-platform problem-domain solver, but like C, the cross-platform compiler, it will be riddled with underlying mechanism riddles. Python: Garbage collection, C: Struct alignment.

Bring on the #pragma's:
import __cpython_strict__ ;-)
Calvin Spealman said…
Fuzzyman,

I never said Python was perfect, so obviously issues where implementation details are relied on or are the basis for a module (like bytecode decompiling) need to be fixed. The modules that rely on implementations need to be fixed, and the ones that center on implementations need to be delegated in usage and marked, maybe renamed with an underscore to denote their status as internal.

Furthermore, you either missed or ignored the part where I did say that these modules do work in IronPython most of the time, and that my issue is with their lack of inclusion, not their lack of working. The sad fact about a lot of this, is that many fo the libraries not included in IronPython actually work perfectly, if they would include them in the distribution, without change.

I don't want to argue with you more, so I'll stop attacking the issue and focus on promoting repair of the problems I see. But, they had to be mentioned first.
Paul Boddie said…
"In IronPython, str is unicode"

This rant is about ten years late: the same issue applies to Jython. I'd accept that this can be confusing, though, and that Jython has lacked a distinct "plain" or "byte" string type, disregarding byte arrays, of course, which is what you'd use in Java.
You realize that Jython has exactly the same str==unicode issue, right? I've endorsed this approach for both versions from the start. So I don't know what you are so bent out of shape about.
Anonymous said…
The time factors have negated any real-world testimony of concern over this issue. Is there any analysis that can shed further light or pre-emptive protective communication?

Do any of the string unittests run inconsistently between CPython, IronPython and Jython?

Should they?
Anonymous said…
IronPython? What's that?

Oh, a .NET implementation. What's that?

Oh, a Microsoft centric vision of the world. Who cares? Really?

Popular posts from this blog

Why I Switched From Git to Microsoft OneDrive

I made the unexpected move with a string of recent projects to drop Git to sync between my different computers in favor of OneDrive, the file sync offering from Microsoft. Its like Dropbox, but "enterprise."

Feeling a little ashamed at what I previously would have scoffed at should I hear of it from another developer, I felt a little write up of the why and the experience could be a good idea. Now, I should emphasize that I'm not dropping Git for all my projects, just specific kinds of projects. I've been making this change in habit for projects that are just for me, not shared with anyone else. It has been especially helpful in projects I work on sporadically. More on why a little later.

So, what drove me away from Git, exactly?

On the smallest projects, like game jam hacks, I just wanted to code. I didn't want to think about revisions and commit messages. I didn't need branching or merges. I didn't even need to rollback to another version, ever. I just …

Respect and Code Reviews

Code Reviews in a development team only function best, or possible at all, when everyone approaches them with respect. That’s something I’ve usually taken for granted because I’ve had the opportunity to work with amazing developers who shine not just in their technical skills but in their interpersonal skills on a team. That isn’t always the case, so I’m going to put into words something that often exists just in assumptions.
You have to respect your code. This is first only because the nature and intent of code reviews are to safeguard the quality of your code, so even having code reviews demonstrates a baseline of respect for that code. But, maybe not everyone on the team has the same level of respect or entered a team with existing review traditions that they aren’t acquainted with.
There can be culture shock when you enter a team that’s really heavy on code reviews, but also if you enter a team or interact with a colleague who doesn’t share that level of respect for the process or…

CARDIAC: The Cardboard Computer

I am just so excited about this.


CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer. 
One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC?

It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on. The only re…