First Pythonic Impressions
It
seems like forever ago now since I first encountered Python,
since it's been the majority of what I've written software in
for work since then.
I wrote my first Python code on 2007-11-23, following the
tutorial on the Python
website, generating the usual collection of small scripts to test
various bits of the language.
It quickly became clear that Python has an interesting and
relatively clean style of expression, along with notably different
performance tradeoffs than some other languages I've used.
The notes below are all from those first few days in 2007,
and are mostly about quirks other programmers might identify with,
as well the ongoing appreciation for the ability to do quite
a bit with relatively small amounts of code.
Whitespace as Control, and Quirky Control Over Whitespace
hilo.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import random
def Game():
'Plays one game of Hilo.'
min, max = 0, 100
target = random.randint(min, max)
attempts = 0
while True:
text = raw_input('Guess #' + str(attempts + 1) + ': ')
try: guess = int(text.strip())
except ValueError:
print 'Oops - %s is not an integer' % repr(text)
else:
if min < guess < max:
attempts = attempts + 1
if guess < target: print 'Too low'
elif guess > target: print 'Too high'
else: break # you've won!
else:
print 'Oops - must be between %d and %d' % (min, max)
# no `finally:' clause
print '%d is correct on attempt #%d!' % (target, attempts)
try: Game()
except KeyboardInterrupt: print 'Interrupt received'
except EOFError: print 'EOF received'
finally: print 'exiting.'
#---eof
Once past generating the obligatory Hello World
and various code snippets
to test parts of the tutorial, I wrote a small version of Hilo to test
flow constructs, input from stdin, and exception handling with a project
where I'd be fitting the language to the problem rather than the other
way around. Some of my initial reactions were:
- It's funny that the character encoding directive is based on
Emacs file variables.
- I expected to hate using whitespace for flow control, but it does
reduce visual clutter a bit and removes the control-vs-indentation
synchronization issue seen in many other languages.
- On the other hand, Bash can't save paragraphic Python one-liners
in the history file without splitting them, even with with
shopt -s lithist cmdhist , due to
using newline as a inter-command history delimiter.
- Exception handling in Python has a clean feel.
- The inability to fully control the generation of additional spaces
in
print output is truly annoying. It has the comma
to suppress newlines - why in the world isn't there an obvious to
override its idea of whether you should have extra spaces injected?
Should I really have to resort to sys.stdout.write for
something so minor?
- The
% operator is cute, in an anti-C++-STL kind of way
(a contrast to the STL's dropping of the form() method from
streams).
- I understand why Python doesn't want to assume the
character encoding for output to files (or possibly to terminals),
and the use of
c.encode('utf-8') to choose an encoding
on output is straightforward, albeit cumbersome.
The unicodedata.name function's great, too - I'd
love to see this kind of character identification in the Emacs
modeline.
- BUT:
While working with a piece of Unicode text
(the classic Japanese 「いろは」 poem),
I encountered some unpleasantness.
The fact that one can output Unicode from a Python program to a
terminal, then be completely unable to redirect that output
to a file without the program crashing
(much less
cat the results from said file back your tty),
or spontaneously encoding your output in some other character set,
seems to violate a fundamental expection of datastream transitivity
for the Unix realm.
It seems like a perfect case for an overriding environment variable
that I'm simply not seeing anywhere.
Being able to open files via the codecs.open call
looks like it might help, but not being able to change the default
except on a system-wide level grates.
From 7, to 100 Frames per Second
About the only feature I didn't want to carry over (yet?) was from the
C++/Life experiment with a one-thread-per-cell approach, where the cells
communicated directly with their neighbors, yielding some
7000 threads in a 140x50 cell automaton and a rather large number
of mutexes. Reducing the per-thread stacksize to 1 KiB was a key enabler.
Next, I started writing yet another Life cellular automaton, since I
wanted to see how Python felt with respect to a familiar project that I'd
written in a C/C++ environment where function calls are relatively cheap,
and which would force me to reconceive the design once I was faced with a
working, but slow, program. I also wanted to try out classes this time,
deriving TTY and curses-focussed subvariants from base classes Cell and
Grid, find the terminal size with TIOCGWINSZ even outside
of curses, use select to check for input, and do profiling.
The profiling was done with 140x50 cell grids, allowed to settle to
a collection of trivial spinners and fixed configurations.
Some of the surprises at this stage included:
- Having
sys.argv and the in keyword
makes certain types of argument parsing really convenient.
- Code like this is much more eloquent than the C equivalent, and
the leading doc line gives a warm, LISPy fuzzy feeling:
import select
def StdinPending():
'Returns whether there is input pending on stdin.'
irdy, ordy, erdy = select.select([sys.stdin],[],[],0)
return sys.stdin in irdy
- On the other hand, the crippled
lambda s,
harsh distinction between statements and expressions,
and the somewhat high expense of function calls,
are clear proof that this isn't LISP.
- Having to explicitly call
del on instances to
get their __del__ destructors to fire before
leaving the function in which they were instantiated, even
with no other (obvious) references to them,
was counterintuitive, and seems to have other issues.
- The Hotshot profiler's quite nice, although it takes a surprisingly
long time to convert its output into a textual summary. It also
seems to interfere with instance GC and triggering of
__del__
destructors - enough to make me curious if I've misinterpreted their
intended use.
- Having to pass a buffer into an ioctl of an appropriate length
for the result, yet have the result returned along a different
channel, is just strange (the docs allege a fourth parameter
to change this, but it didn't seem to be work). This works, though:
import struct, fcntl, termios
def TiocgWinsz():
'Return the width (in columns) and height (rows) of the terminal (stdout).'
buf = struct.pack('HHHH', 0, 0, 0, 0)
result = fcntl.ioctl(sys.stdout.fileno(), termios.TIOCGWINSZ, buf)
rows, cols, pxwidth, pxheight = (struct.unpack('HHHH', result))
return cols, rows, pxwidth, pxheight
- Although a bit of a surprise, there are two competing types of
classes in Python (currently), and the newer style derived from
the
object built-in class works quite nicely.
Especially nice was using super to call
parent classes' __init__ constructors without having
to name the parent explicitly.
life.pydebugrc
# Use: PYTHONSTARTUP=life.pydebugrc python
from life import * # a debugging convenience
def gc(x,y): # grid cell
return g._grid[x][y]
def s(): # show status
print 65 * '-'
# side-by-side grid views (rows for normal, debug, neighbors)
for r3 in zip(g.rows(), g.drows(), g.nrows()):
print '|'.join(r3)
print 65 * '-'
def up():
g.Update()
s()
# interactive utilities acting on required grid `g'
def al(): return filter(lambda c: c._alive, g._cells)
def em(): return filter(lambda c: not c._alive, g._cells)
def bi(): return filter(lambda c: 3 == c._neighbors , em())
def di(): return filter(lambda c: not 2 <= c._neighbors <= 3, al())
g = GridTty(21,7) # setup - small, 3 side-by-side views, and a glider
[ g._grid[x][y].Born() for (x,y) in [(8,4),(9,4),(10,4),(9,2),(10,3)] ]
s()
Yielding an initial display of:
$ PYTHONSTARTUP=life.pydebugrc python
Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
-----------------------------------------------------------------
| |000000000000000000000
| |000000001110000000000
* | x |000000001121000000000
* | . * |000000013532000000000
*** | x** |000000011322000000000
| . |000000012321000000000
| |000000000000000000000
-----------------------------------------------------------------
>>>
-
Running Python with a $PYTHONSTARTUP script specific to testing
the life.py module, including scripting an initial testing
scenario and adding some interactive-specific utility functions
greatly eased the edit/test/debug cycle.
- Initial runs of the Python life automaton were pretty low on the
performance scale, only yielding about 7 fps at 140x50 at 90% CPU
load. In contrast, a more complex C version does 250 fps at the
same grid size with 0.3 % CPU load. I'll call that about 10700 times
more efficient for the moment. Tweaking various parts of the Python
code, particularly those generating lots of function calls, has so
far brought it up to about
10
20
36
47
53
100 fps,
~14 X faster,
and cut the 10700 factor down to about 750, a huge improvement.
Still, I'm not really happy with the unchanged 90% CPU load.
- Poking around for ways to write-protect class and instance attributes
gives the impression that one can arrange, at least with the newer
class approach, to cause an exception to be raised whenever code tries
to violate the protection.
- Python function call performance is slow enough, especially with
the number of function calls the life.py code had in loops, to notably
interfere with attempts to structure code in certain ways for
readability or data modelling purposes. Hopefully I'll run into or
discover some alternative approaches to this issue later.
- Certain permutations encouraged by Python comprehensions are PERL-like
in their inversive nature. Similar to situations in PERL where the
flow control constructs can end up following the controlled parts,
there are Python chunks like:
[ stdscr.addstr(y+1, x+1, grid[x][y].str()) \
for y in self.yvals() \
for x in self.xvals() \
if grid[x][y]._alive ]
That chunk is actually a bit naïve, considering it could generate
a lot of function calls for cells whose screen representations
hadn't changed.
Keeping a list of cells which were recently
subject to Born() and Dies()
would be faster...
...and it is, with an increase from 47 to 53 fps,
with GridCurses ' __init__
making a dict of cell-to-location pairs (indexed by repr()
of the objects, perhaps a bit flippant),
the Update method now recording changed cells in
_changed , and this replacement code in the display method:
for c in self._changed:
(x,y) = self._locations[repr(c)]
stdscr.addstr(y+1, x+1, c.str())
- Maintaining sets is more efficient than repeatedly using filters to
extract list of members of interest. Replacing filters along the
line of:
def Update(self):
alive = filter(lambda c: c._alive , self._cells)
empty = filter(lambda c: not c._alive , self._cells)
[...]
with code transferring cells between the Grid sets
._alive and ._empty
(different from the Cell's boolean ._alive seen above),
instead of extracting data from ._cells ,
brought the speed from 53 fps up to ~90 fps.
- The code would clearly be faster if the class separation between
Cell and Grid was removed, but that seems like it wouldn't be true
to the idea of the code being an emulation-in-the-small of a larger
program.
- Running
help(Cell) and pydoc life
showed some areas where the embedded doc in my life.py could
be improved - especially when the -*- coding: utf-8 -*-
directive and all of the top-of-file comments were exposed at
the alleged “DESCRIPTION” section of the man-page
like output of pydoc
- Unfortunately,
pydoc explodes if any of the doc
strings contain Unicode characters outside of the ASCII range.
Considering that there are cases where embedded doc strings,
or automated tests embedded in them, might require
non-ASCII to even make sense, this has to be considered a
pydoc bug if it is, itself, the cause of the problem.
Especially considering that the life.py file has a
coding directive explicity demanding Unicode UTF-8, of which
pydoc code have taken note.
Using SWIG to Integrate Python and a C varags Function
- or Oh, why did I have to pick a variadic function first?
- or Oh, should I have tried ctypes first?
Of course, the real vexer here was starting my SWIG experimentation
with a variadic function like printf , except worse, since
this one allows for format continuations in later parameters. This
ends up interleaving format strings and data, making the determination
of the number of args basically impossible without running the
full parsing pass outright on the whole chain of format string(s).
That didn't stop me from rewriting the variadic function in question
to have an alternate form taking its following args in a vector, macrozing
around the va_arg calls to allow the same code chunk to live in both
functions via an include. Once SWIG started grumbling about the
void** second parameter, though, I decided to take a second
pass at this will be with something other than a worst-case-scenario,
The latter, of course, went better, reaching some use of
vectors fairly quickly, allowing interaction as shown below,
based on a gratuitously-recursive C function along the lines of int
Sum(int count, Int *vec) with
Int just being a struct wrapped around
an int (swigc is just the name of the
test module, not an official name).
Trying to %extend int rather than Int
resulted - unsurpringly - in errors, but with Int ,
the following works just fine:
>>> vec = Vec([11, 12, 13, 14, 15, 16, 17])
>>> swigc.Sum(vec.len(),vec.vec())
98
>>>
The %typemap approach is looking like the way to go next,
since it should allow the Vec class to be jettisoned and the simpler
passing of native Python lists into function calls.
However, I'm also reading (and hearing) some rumors that a completely
different mechanism, ctypes , might be preferable to SWIG in
situations without and existing base of SWIG code.
AJAX with Python
While experimenting with an initial AJAX webpage, I realized that by
default I was about to promote my sh-based CGI backend to PERL, and that
going to Python instead might be more interesting. It only took a couple
of minutes to find a convenient Python library (via import cgi )
for CGI data parsing
as well as a instant debugger option of:
import cgitb; cgitb.enable(display=0, logdir="/tmp")
Between these two, I found myself with Python instead of PERL in under
five minutes, and without the need to drag in the PERL CGI parsing
functions I've been using for so long now.
|