alan little's weblog

the beauty of python

1st October 2003 permanent link

After only a few weeks of developing in python in my limited spare time, I'm discovering that it's remarkably easy to do things that look complicated and daunting at first sight. I've already mentioned the state of the art software engineering principles on which ayawt is based, two of which are:

Do The Simplest Thing That Could Possibly Work
For any given text-processing problem, XSLT is unlikely to be The Simplest Thing That Could Possibly Work

Item (2) notwithstanding, it would be difficult to build a web page generator without some kind of templating mechanism. Fortunately, python comes with one built in in the form of string substitutions. In their simplest form they look like this:

'the %s is %s' % ('grass', 'green')

What this says is "replace the %s bits in the string with the values from the supplied list, in the order in which you find them", resulting in 'the grass is green'. Then there is a slightly more sophisticated version where the tokens to be replaced have names, and are looked up in a dictionary (hash table) of names and values.

'%(colour)s are the %(vegetable)ss' % 
	{'vegetable':'carrot', 'colour':'orange'}

... produces 'orange are the carrots'. This was what I decided was The Simplest Thing That Could Possibly Work for my templating mechanism.

The templates look something like this:

<!-- xhtml headers, link to stylesheet etc. ... -->
<body>
        <h1>%(weblogTitle)s</h1>
        <h2>%(entryTitle)s</h2>
        %(entryText)s
        %(entryPermalink)s

        <!-- etc etc ... -->

And I was fine as long as I made sure I had the same tokens in all my templates, (which I did by starting off with the same template for every page) but having to have the same tokens in all my templates clearly wasn't going to be a particularly flexible long term solution. What I needed for more flexibility was something that would parse all the token names out of the template, and dynamically build the dictionary of token values by mapping the names to properties of my page class. Not exactly a major problem conceptually, but in any language I've ever used before it would be rather more than I would expect to be able to knock out in half an hour on the train. Not in python, though, where turns out to be as easy as:

import re # standard python regular expression library

# regexp to extract tags from the template

rex = re.compile(r'%\((?P<value>.+?)\)') 

class page (object) :

     # ... initialisation code omitted
       
     def build (self) :
        tokenvalues = {} # empty dict
        
        for token in rex.findall(self.template) : 
            attr = self.__getattribute__(token)
            tokenvalues[token] = (callable(attr) and attr()) or attr        

        outfile = open(self.url, 'w')
        outfile.write(self.template % tokenvalues)  
        close(outfile)

Some metrics:

Lines of code that actually do the work: five (easily reducible to one - see below)
Time to get the regular expression for extracting the tags from the template right: about half an hour
Time to write & test the rest of the code: about 10 minutes
Worked when applied to the test version of alanlittle.org: first time

There are existing python templating libraries, webware psp and cheetah being two of the ones I've heard of. I'm pretty confident that it would have taken me more than 40 minutes to install, test and learn one of them and rewrite my templates. I'm absolutely certain it would take me more than 40 minutes to learn XSLT. And I think my approach is fairly flexible - in principle I could switch to a "standard" weblog templating language like blogger or Moveable Type just by changing my regular expression and adding a few methods (assuming I wanted not to be able to do all the cool better-than-blogger-or-Moveable-Type things that AYAWT will (?) be capable of one day).

Even for a novice python programmer like me, there are obvious ways to boil this down still further. Using python properties instead of attributes & methods would avoid the inelegant (callable(attr) and attr()) or attr bit without having to write heaps of ugly java-style get methods. Assuming python has a function for constructing dictionaries out of lists of value pairs (which I expect it probably does, and if it doesn't it would be a five minute job to write one) you could actually write the whole thing as a one-liner (note: pseudocode):

open(self.url, 'write').write(self.template % \
    dict([(tag, self.__getattribute__(tag)) \
    for tag in re.compile(r'%\((?P<value>.+?)\)'). \		
    findall(self.template)]))

But would the one-liner actually be better, other than for deriving a smug sense of one's own cleverness bordering on that of Lisp programmers? Elegance and succinctness are in principle Good Things, and I don't have any problem with the legibility of the shorter version, but I can think of a number of practical arguments in favour of the slightly more verbose approach. Python's significant whitespace is generally nice once you get over the initial shock, but does get in the way of making elaborate one-liners legible - line continuation backslashes are ugly and a pain to type, and cause irritating syntax errors when you forget one. Complex nested expressions are harder to debug because you can't easily get at the intermediate stages. Compiling the regular expression in advance is probably slightly faster (minor benefit) and keeps the nasty regexp syntax as far away as possible from civilized code (major benefit). I don't know how in the one-liner to get and close the implicit file handle that we're writing to - it would be nice if python automagically closed it when it goes out of scope (and perhaps it does?).

But, leaving aside stylistic questions of slightly verbose versus as-terse-as-possible, python is really very nice.

related entries: Programming

alan little's weblog

the beauty of python

1st October 2003 permanent link

categories

related posts

links