Skip to main content

Standard Gems: collections

This until-recently-lonely module only houses two alternative collection types, deque and defaultdict, but promises useful things today and more to come. Anytime we have a good place to put things, we find more things to put there. With the new defaultdict type, collections is finally more than just that thing you use to get a deque: its a full fledged utility library. More optimized collection types (chains, B-Trees, and bags, anyone?) are sure to come, so keep your eye here every new Python release changelog, and maybe you'll get an early Christmas present.

Here is a quick rundown of what is offered today, using possibly silly examples.

d = deque()
d.extend(xrange(10))
while d:
print d.popleft()

What you see here is that deque acts like a list but has mirror versions of many end-modifying operations, like append, extend, and pop, which operate on the 'left' side. A list is far less efficient with insertion and popping from anywhere but the end of the list. This makes deque great for First-In-First-Out structures, where a list is more suited for a First-In-Last-Out setup.

dd = collections.defaultdict(lambda: None)
dd['a'] = 1
dd['b'] = 2
print dd['c'] or 3

Here we automatically handle a non-existant key with a default value, None. A factory callable is used, so that we can actually return different values, but we don't get the key. One interesting use is itertools.count().next as the factory, which means every missing key is automatically filled with an automatically incrementing integer.

Comments

Kent said…
For me the most common uses of defaultdict are cases where I need to accumulate a count or a list. For example a word count:

d=defaultdict(int)
for word in words:
  d[word] += 1

or to accumulate a list of first names corresponding to last names:

d=defaultdict(list)
for first, last in names:
  d[last].append(first)
Jorge said…
With the queue I have always had the feeling that is a Thread-related queue and not a normal datastructure, does it has some kind of performance issue? I prefer to use [].append [].remove(0) which for some reason is faster then [].insert(0) [].pop()


As for the dict.

I like to use is as follows for groups of data that are optional. self.items=collections.defaultdict(str)

I have this nice usecase in which I got a set of messages that represent a flow, with some optional. The above just lets me set non-existant message to ''.

You need a bigger trick to make the default something like MISSING, you will have to do as follows

self.items=collections.defaultdict(lambda : "MISSING")

which is a bit ugly but it works nice.

Popular posts from this blog

On Pruning Your Passions

We live in a hobby-rich world. There is no shortage of pastimes to grow a passion for. There is a shortage of one thing: time to indulge those passions. If you're someone who pours your heart into that one thing that makes your life worthwhile, that's a great deal. But, what if you've got no shortage of interests that draw your attention and you realize you will never have the time for all of them?

If I look at all the things I'd love to do with my life as a rose bush I'm tending, I realize that careful pruning is essential for the best outcome. This is a hard lesson to learn, because it can mean cutting beautiful flowers and watching the petals fall to the ground to wither. It has to be done.

I have a full time job that takes a lot of my mental energy. I have a wife and a son and family time is very important in my house. I try to read more, and I want to keep up with new developments in my career, and I'm trying to make time for simple, intentional relaxing t…

The Insidiousness of The Slow Solution

In software development, slow solutions can be worse than no progress at all. I'll even say its usually worse and if you find yourself making slow progress on a problem, consider stopping while you're a head.

Its easy to see why fast progress is better: either you solve the problem or you prove a proposed solution wrong and find a better one. Even a total standstill in pushing forward on a task or a bug or a request can force you to seek out new information or a second opinion.

Slow solutions, on the other hand, is kind of sneaky. Its insidious. Slow solution is related the Sunk Cost Fallacy, but maybe worse. Slow solutions have you constantly dripping more of your time, energy, and hope into a path that's still unproven, constantly digging a hole. Slow solutions are deceptive, because they still do offer real progress. It is hard to justify abandoning it or trying another route, because it is "working", technically.

We tend to romanticize the late night hacking…

Why I Switched From Git to Microsoft OneDrive

I made the unexpected move with a string of recent projects to drop Git to sync between my different computers in favor of OneDrive, the file sync offering from Microsoft. Its like Dropbox, but "enterprise."

Feeling a little ashamed at what I previously would have scoffed at should I hear of it from another developer, I felt a little write up of the why and the experience could be a good idea. Now, I should emphasize that I'm not dropping Git for all my projects, just specific kinds of projects. I've been making this change in habit for projects that are just for me, not shared with anyone else. It has been especially helpful in projects I work on sporadically. More on why a little later.

So, what drove me away from Git, exactly?

On the smallest projects, like game jam hacks, I just wanted to code. I didn't want to think about revisions and commit messages. I didn't need branching or merges. I didn't even need to rollback to another version, ever. I just …