Skip to main content

Standard Gems: collections

This until-recently-lonely module only houses two alternative collection types, deque and defaultdict, but promises useful things today and more to come. Anytime we have a good place to put things, we find more things to put there. With the new defaultdict type, collections is finally more than just that thing you use to get a deque: its a full fledged utility library. More optimized collection types (chains, B-Trees, and bags, anyone?) are sure to come, so keep your eye here every new Python release changelog, and maybe you'll get an early Christmas present.

Here is a quick rundown of what is offered today, using possibly silly examples.

d = deque()
d.extend(xrange(10))
while d:
print d.popleft()

What you see here is that deque acts like a list but has mirror versions of many end-modifying operations, like append, extend, and pop, which operate on the 'left' side. A list is far less efficient with insertion and popping from anywhere but the end of the list. This makes deque great for First-In-First-Out structures, where a list is more suited for a First-In-Last-Out setup.

dd = collections.defaultdict(lambda: None)
dd['a'] = 1
dd['b'] = 2
print dd['c'] or 3

Here we automatically handle a non-existant key with a default value, None. A factory callable is used, so that we can actually return different values, but we don't get the key. One interesting use is itertools.count().next as the factory, which means every missing key is automatically filled with an automatically incrementing integer.

Comments

Anonymous said…
For me the most common uses of defaultdict are cases where I need to accumulate a count or a list. For example a word count:

d=defaultdict(int)
for word in words:
  d[word] += 1

or to accumulate a list of first names corresponding to last names:

d=defaultdict(list)
for first, last in names:
  d[last].append(first)
mae said…
With the queue I have always had the feeling that is a Thread-related queue and not a normal datastructure, does it has some kind of performance issue? I prefer to use [].append [].remove(0) which for some reason is faster then [].insert(0) [].pop()


As for the dict.

I like to use is as follows for groups of data that are optional. self.items=collections.defaultdict(str)

I have this nice usecase in which I got a set of messages that represent a flow, with some optional. The above just lets me set non-existant message to ''.

You need a bigger trick to make the default something like MISSING, you will have to do as follows

self.items=collections.defaultdict(lambda : "MISSING")

which is a bit ugly but it works nice.

Popular posts from this blog

CARDIAC: The Cardboard Computer

I am just so excited about this. CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer.  One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC? It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on

Statement Functions

At a small suggestion in #python, I wrote up a simple module that allows the use of many python statements in places requiring statements. This post serves as the announcement and documentation. You can find the release here . The pattern is the statement's keyword appended with a single underscore, so the first, of course, is print_. The example writes 'some+text' to an IOString for a URL query string. This mostly follows what it seems the print function will be in py3k. print_("some", "text", outfile=query_iostring, sep="+", end="") An obvious second choice was to wrap if statements. They take a condition value, and expect a truth value or callback an an optional else value or callback. Values and callbacks are named if_true, cb_true, if_false, and cb_false. if_(raw_input("Continue?")=="Y", cb_true=play_game, cb_false=quit) Of course, often your else might be an error case, so raising an exception could be u

The Range of Content on Planet Python

I've gotten a number of requests lately to contribute only Python related material to the Planet Python feeds and to be honest these requests have both surprised and insulted me, but they've continued. I am pretty sure they've come from a very small number of people, but they have become consistent. This is probably because of my current habit of writing about NaNoWriMo every day and those who aren't interested not looking forward to having the rest of the month reading about my novel. Planet Python will be getting a feed of only relevant posts in the future, but I'm going to be honest: I am kind of upset about it. I don't care if anyone thinks it is unreasonable of me to be upset about it, because the truth is Planet Python means something to me. It was probably the first thing I did that I considered "being part of the community" when I submitted my meager RSS feed to be added some seven years ago. My blog and my name on the list of authors at Plan