alan little’s weblog


10th November 2005 permanent link

Although I’m interested in Smalltalk, I also have projects I want to work on now in a language I can already use productively. These days that basically means python. Python has its frustrating aspects, but one of the great things about it is that it has really good libraries for lots of things. Fredrik Lundh’s elementtree, for example, has been my XML handler of choice for a while now – it provides a reasonably simple & clean interface whilst also being faster and more efficient than the rudimentary XML tools that come as standard with python.

elementtree is written in pure interpreted python; there’s also cElementTree, a version written in C that Fredrik says is 15-20 times faster and uses 2-5 times less memory. This is interesting: one of the projects I have in mind involves working with some fairly large XML files. So download, uncompress, install. Change one line in my source code to use cElementTree instead of the python version, and my tests pass first time. Another win for open source installation.

Fredrik’s performance claim appears to be true, even an understatement, for the largest XML file I happen to have lying around just now.

Time to load and parse my iTunes library file, an 11mb Apple plist, on a 1 GHz G4 Powerbook with Python 2.3:

(py)ElementTree 1.2: 70 to 80 seconds, memory used 160mb

cElementTree 1.0.2: 3.3 to 3.5 seconds, memory used 32mb

(Unfair comparison with python’s built-in xml.dom.minidom, which makes no claim to be either fast or compact: 267 seconds to parse the file, plus approximately a week to clean up after itself, memory used 573mb)

UPDATE: in Part Two, we are unimpressed by Ruby’s REXML. In Part Three, we look at VisualWorks Smalltalk, and think about whether the whole exercise has any value.

related entries: Programming

all text and images © 2003–2008