alan little’s weblog

transliterator

22nd July 2005 permanent link

I needed something for a project I’m working on that would let me easily enter romanised Sanskrit text on a normal keyboard (or better still, find romanised Sanskrit text on the internet) and then convert it to proper devanagari Sanskrit text in unicode.

Since I’m acutely aware that, as Phillip Eby puts it: “Python as a community is plagued by massive amounts of wheel-reinvention. The infamous web framework proliferation problem is just the most egregious example”, I did a bit of searching first to see if somebody has already done something similar that I could use/adapt/contribute to. It appears not: I found lots of people telling me how “transliterating” code line-for-line from other programming languages into python produces programs that are un-pythonic, ugly and slow – so don’t do it, folks – but nothing that looked anything like what I wanted.

So I wrote transliterator.py. Version 0.1 is available here for download in case any body else needs something similar. It even has documentation of sorts.

From a command line it works like this:

python transliterator.py text inputFormat outputFormat > outputFile

… assuming you have python installed. Mac and Linux users do. Windows users can get it here. text can be either the actual text you want transliterated, or the name of a file containing it. inputFormat and outputFormat are what you want to transliterate from and to, e.g. “HarvardKyoto”, “Devanagari”. If you don’t specify an output file, the transliterated results will just be shown on the screen.

See the documentation for examples of how to call transliterator from another python program, which allows you to set up your own transliterations and have more control over input and output encodings.

Version 0.1 supports transliteration to and from Sanskrit Devanagari using IAST, ITRANS and Harvard-Kyoto transliterations. Here’s an at-a-glance table of common Sanskrit transliterations. I think it wouldn't be too big a job to adapt it to modern Indian languages. ISO-9 transliteration for Cyrillic (Russian only) is in there too, but I haven’t really made any serious attempt to test that yet. It also supports users adding their own transliterations.

related entries: Programming

all text and images © 2003–2008