alan little’s weblog

one simple & obvious way

18th July 2005 permanent link

One of the key qualities of python is supposed to be (and is, mostly) its cleanness and elegance. In contrast to perl, where no two programmers are ever supposed to do the same thing in the same way, in python there’s supposed to be “one simple and obvious way to do it”. Let’s look at this, for a couple of different values of “it”.

I discovered when I profiled my current project that it spends nearly half its time in a single function, and that function spends most of its time on a very common and obvious python programming scenario:

you want to look something up in a dictionary, but it might not be there

What’s the one simple and obvious way to do this? There are three. (At least).

Another python programming principle, we are often told, is it is easier to ask forgiveness than permission. So just try it, then worry about what to do if it fails:

try:
    value = d[key]
except KeyError:
    value = default

That should work nicely as long you’re expecting to usually find what you’re looking for. If your expected hit rate is low (it isn’t in my particular situation, but it could be), then instead of raising lots of exceptions it might be better to look before you leap:

if key in d:
    value = d[key]
else:
    value = default

But wait. We are also told we should use built in functions from the core python libraries wherever possible, because they are written in C code that is highly optimised and more efficient than anything we are likely to come up with ourselves. And Alex Martelli in Python in a Nutshell points out that python dictionaries have a get method that deals with precisely this situation:

value = d.get(key, default)

That saves us three lines of code (succinct is always good, right?) and should be at least as fast as the other two, if not even somehow magically faster. Er, no, as it turns out:

from time import time
lookups = 10000

hitrates = (1.00, 0.99, 0.9, 0.75, 0.5, 0.1)
for hitrate in hitrates:
    d = dict([(i,i) for i in range(1, lookups * hitrate)])
    start = time()
    for i in lookups:
        try:
            value = d[i]
        except KeyError:
            value = 0
    print 'Hit rate %f -- time for *try*: %f' % (hitrate, (time() - start))
    start = time()
    for i in lookups:
        if i in d:
            value = d[i]
        else:
            value = 0
    print 'Hit rate %f -- time for *if*: %f' % (hitrate, (time() - start))
    start = time()
    for i in lookups:
        value = d.get(i, 0)
    print 'Hit rate %f -- time for *get*: %f' % (hitrate, (time() - start))

10,000 dictionary lookups with various hit rates. Times in milliseconds, with python 2.3 running on a 1 GHz Powerbook with OS X 10.3.9:

Hitrate try if get
100% 21 29 30
99% 22 28 30
90% 22 27 29
75% 59 25 28
50% 116 23 28
10% 193 17 26
python dictionary lookup timings

try turns out, not surprisingly, to be the fastest approach if you’re sure what you’re looking for will nearly always be there. This is probably the normal case. Once your hit rate drops off a few percent, though, the cost of raising and handling all those exceptions becomes catastrophic, and then it’s faster to look before you leap. get is always slightly slower than look before you leap. Presumably it’s doing the same thing plus some overhead for the function call. I find it hard to see what it’s there for – why clutter up the standard library with a function that saves three simple & obvious lines of code at the expense of mediocre performance?

What about another common and obvious programming scenario?

you want to develop something that has a web front end

What’s the one simple and obvious way to do this? Zope? Webware? CherryPy? Quixote? Twisted? Subway? (…)

related entries: Programming

all text and images © 2003–2008