Skip to main content

Concurrency and Stabilty. More Zen for Python.

I was planning on writing something about the current conversations on the Python-3000 list about concurrency, which just doesn't stop being brought up and never gets resolved to any point that anyone is happy with. In an unrelated action, I went to check on my older blog, which I thought of resurrecting for some non-development articles, and I found a draft from a previous time the topic came up. I read it and was surprised to find that it proposes pretty much exactly what is being put on the table over this past weekend, with reference to walling off multiple interpreters in a single process and controlling messages past between them. The same technique scales for multiple cores, multiple processors, or multiple machines.

I've decided to take the easy way out and just post the original draft with minor editing. I enjoy how spot on I ended up being with what is currently becomming an acceptable solution, it might seem. The original was written nearly a year ago. Does this mean my predictions are worth something? Decide for yourself!

Concurrency is a hot topic on the Python mailing lists lately. There is a strong push to get some kind of native concurrency into Python, as the 3.0 branch is a great opportunity to do things that we can't otherwise do, as they would break old code. If we don't get something in now, particularly, something that can scale to hundreds of thousands of tasks and take advantage of multiple processors, we may not get a chance at what could be the best improvement of the language, until the next major version 4.0.

A large part of the problems stems from how horrible an idea threads really are, as they are typically implemented. Threads, as the basic level, are just multiple pre-emptive tasks running with access to the same memory space. Doing this can be a boost for performance, but is hell to control properly. The threads must be syncronized to access their shared resources without clobbering each other. This can be done, but it is very error prone and very difficult to debug.

The solutions seem to lean toward two ends of the spectrum: cooperative tasks and processes. With cooperative tasks, each concurrent unit runs until it says "OK, I'll let someone else run now", and so there is no explicit syncronization needed, because nothing happens without you knowing it, idealy.

Processes, on the other hand, are basically threads without shared memory. These are the same way the concept of processes are implemted at an OS level. Some, including Guido van Russom, even think that is how we should go: multiple system processes communicating via pipes and sockets.

What I propose is a process implementation with python itself. This would offer a lightweight process execution, where each process would consist of a thread of execution and an "object space". Each thread would only be able to access objects within this space. Communication would occure through channels between processes, which can be used like generators (gen.send(10), for example). I've created a basic implemenation, available on my webserver, here.

With my basic demo, you can create object spaces, each with their own global and local dictionaries. There is no real protection, but it shows how it would work and offers an idea of how it would be used if we had real protected object spaces. You can run a function in a space with the run method, which takes a resonse function first, which is called when the the function returns and will be passed the called functions return value. If a function is already executing, the request is queued until its turn.


I need to look around and find that demo/prototype code. I barely remember writing it, but I remember being pleased with the results. I'll look around and resurrect it. Perhaps it will serve as an interesting proof of concept for a possible solution to some of our concurrency problems. I wish I could put more work into a solution now, but at the moment I have little practicle use for concurrency of this type. Maybe in a while I can find justification to spend time on it.

Comments

Anonymous said…
Lightweight processes and good communication primitives the way Erlang does it are really great. Something similar for Python would be a real benefit.
Anonymous said…
See also the parallel/pprocess module for similar thinking:

http://www.python.org/pypi/parallel
Fazal Majid said…
I don't buy this "we can't do threads right, therefore they must be useless or evil" mentality. If you want to manage pools of Oracle connections, for instance, threading is your only viable option.

That said, having higher-level IPC facilities available than socket, spread, mmap or the like would probably help. A transparently multi-process implementation of Queue.Queue for instance.
Paddy3118 said…
'Parallel' processing should be built on top of OS processes not threads, with the OS supplying base level memory protection and inter-process communication. Python libraries should abstract away the OS provided services and possibly provide higher level functionality such as transparently running 'parallel' processes across a network of machines with different OS's.
The accent should be on ease of development and maintenance rather than raw speed of each process as parallel programming has proved difficult, and now it seems that multi-core machines are the way to get more bang-per-buck Python needs to evolve.
- Paddy.
Anonymous said…
"each process would consist of a thread of execution and an "object space""

That's the "obviously correct" solution, of course. Just post the code already!
Unknown said…
Yeap, looks just like how Erlang solved the concurrency problem. Doesn't Stackless Python do something similar too?

Popular posts from this blog

CARDIAC: The Cardboard Computer

I am just so excited about this. CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer.  One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC? It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on

Statement Functions

At a small suggestion in #python, I wrote up a simple module that allows the use of many python statements in places requiring statements. This post serves as the announcement and documentation. You can find the release here . The pattern is the statement's keyword appended with a single underscore, so the first, of course, is print_. The example writes 'some+text' to an IOString for a URL query string. This mostly follows what it seems the print function will be in py3k. print_("some", "text", outfile=query_iostring, sep="+", end="") An obvious second choice was to wrap if statements. They take a condition value, and expect a truth value or callback an an optional else value or callback. Values and callbacks are named if_true, cb_true, if_false, and cb_false. if_(raw_input("Continue?")=="Y", cb_true=play_game, cb_false=quit) Of course, often your else might be an error case, so raising an exception could be useful

How To Teach Software Development

How To Teach Software Development Introduction Developers Quality Control Motivation Execution Businesses Students Schools Education is broken. Education about software development is even more broken. It is a sad observation of the industry from my eyes. I come to see good developers from what should be great educations as survivors, more than anything. Do they get a headstart from their education or do they overcome it? This is the first part in a series on software education. I want to open a discussion here. Please comment if you have thoughts. Blog about it, yourself. Write about how you disagree with me. Write more if you don't. We have a troubled industry. We care enough to do something about it. We hark on the bad developers the way people used to point at freak shows, but we only hurt ourselves but not improving the situation. We have to deal with their bad code. We are the twenty percent and we can't talk to the eighty percent, by definition, so we need to impro