Mon, 28 Jul 2008

Richard Gabriel writes, in [Performance and Evaluation of Lisp
Systems](http://www.dreamsongs.com/Books.html):

> The performance profiles for free/special lookup and binding are
> very different depending on whether you have deep or shallow
> binding.  In shallow-binding implementations, times for function
> call and internal binding of free/special variables are inflated
> because of the additional work of swapping bindings.  On some
> deep-binding systems, referencing a dynamically bound variable
> (which includes all variable references from the interpreter) can
> require a search along the access path to find the value.  Other
> systems cache pointers to the value cells of freely referenced
> free/special variables on top of the stack; caching can take place
> upon variable reference/assignment or upon entry to a new lexical
> contour, and at each of these points the search can be one
> variable at a time or all/some variables in parallel.
> Shallow-binding systems look up and store into value cells, the
> pointers to which are computed at load time.  Deep-binding systems
> bind and unbind faster than shallow-binding systems, but
> shallow-binding systems look up and store values faster.
> Context-switching can be performed much faster in a deep-binding
> implementation than in a shallow-binding one.  Deep binding
> therefore may be the better strategy for a multi-processing Lisp.

It occurred to me that there is actually a continuum between these
two, using hash tables, as was done in many Forth implementations.
Suppose you hash each symbol into one of eight buckets, and have one
alist hanging off each bucket.  Then saving a context requires saving
up to eight cells --- more than one, as in deep binding, but
potentially fewer than the total number of variable values being
restored, as in shallow binding; and looking up a value requires a
linear search along only one eighth of the total number of dynamic
variables.

Also, the hash table size need not be fixed; it can change over time,
or from program to program.

When the hash table size is 1 --- that is, there's only one hash
bucket --- you have traditional deep binding.  When it's large enough
that no hash bucket has more than one variable in it, you have
traditional shallow binding.  If you had about sqrt(number of
variables) buckets, then both lookup and swapping of bindings would be
O(sqrt(number of variables)).

This may seem a bit irrelevant today, since "special variables"
(i.e. dynamically-scoped variables) are used very rarely.

Sat, 26 Jul 2008

Like everything else posted to kragen-hacks without a notice to the
contrary, this is in the public domain.

I wrote a page about the original demo at
<http://canonical.org/~kragen/demo/klappquadrat.html>.

#!/usr/bin/python
"""Recreation of T$'s Klappquadrat intro in Python with Pygame and Numeric.

This is a recreation of the 64-byte version, and I think 31
instructions.  By contrast, this is 44 lines of code, about 1600
characters.  On the other hand, you can change this from 320x200x256
to, say, 640x480x512 by changing the screensize= and ncolors= lines to
say (640, 480) and 512.

Kragen Javier Sitaker wrote this recreation, but T$ is to credit for
the original intro.

"""

import pygame, sys
from Numeric import zeros, subtract, array, arange, where, take, shape, indices

screensize = (320, 200)
ncolors = 256

def colors(masks, levels):
    "Compute a grayscale pixel from bit masks and a floating-point level [0,1)"
    return sum([int(mask * level) & mask for mask, level in zip(masks, levels)])

def clamp(a, b, c):
    "Threshold b between lower limit a and upper limit c."
    d = where(a < b, b, a)
    return where(d < c, d, c)

def redraw(screen, buf, palette, frames):
    x, y = indices(screensize)
    # this 256 is not ncolors; it's a timing/pacing thing
    buf += ((x + frames) & (y + frames)) >> (frames % 256) >> 3
    buf %= ncolors
    pygame.surfarray.blit_array(screen, take(palette, buf))

def main(argv):
    pygame.init()
    screen = pygame.display.set_mode(screensize, pygame.FULLSCREEN)

    buf = zeros(screensize)
    fiery_rgb_integers = clamp(0, subtract.outer(arange(ncolors) + ncolors/8,
                                                 ((array([0, 1, 2]) * ncolors)
                                                  / 4)),
                               ncolors / 4)
    masks = screen.get_masks()[:3]
    # I'm not sure this palette is exactly right; it only goes to 63
    # in the original...
    palette = array([colors(masks, levels/float(ncolors/4))
                     for levels in fiery_rgb_integers])

    frames = 0
    while 1:
        ev = pygame.event.poll()
        if ev.type == pygame.NOEVENT:
            frames += 1
            redraw(screen, buf, palette, frames)
            pygame.display.flip()
        elif ev.type == pygame.KEYDOWN: break
        elif ev.type == pygame.QUIT: break

if __name__ == '__main__': main(sys.argv)

Thu, 24 Jul 2008

(Warning, this note is kind of rambling with no real point.)

[Finkel](http://www.nondot.org/sabre/Mirrored/AdvProgLangDesign/)
says, describing Modula-3's subtyping rules:

> If every value of one type is a value of the second, then the first
> type is a called a 'subtype' of the second. For example, a record
> type TypeA is a subtype of another record type TypeB only if their
> fields have the same names and the same order, and all of the types
> of the fields of TypeA are subtypes of their counterparts in TypeB.

Initially, this struck me as violating ML's "value restriction" and
therefore being unsafe, but I was wrong.  Suppose we say

    type Ushort = 0..65535;
         Long = -2147483648..2147483647;
         TypeB = record
           Data: Long;
         end;
         TypeA = record
           Data: Ushort;
         end;

Now it seems that `Ushort` is a subtype of `Long`, because every value
of type `Ushort` is a value of type `Long`.  So it is safe to assign a
value of type `Ushort` to a variable of type `Long`.  `TypeA` and
`TypeB` have fields of the same names in the same order, and the
single field in `TypeA` has a type that is a subtype of the single
field in `TypeB`.

However, I guess if you're passing by value, you can safely copy a
record of type `TypeA` into a variable declared as a record of type
`TypeB`, and you're still safe with no run-time type checks.  

Mutable Aliases Introduce Problems
----------------------------------

It's only once you get into aliasing that you start running into
problems; if you have a record that is mutably accessible through both
`TypeA` and `TypeB` pointers, you can set its `Data` to `-1` through
the `TypeB` pointer and then access it through the `TypeA` pointer.

It seems that if you could ensure that the `TypeB` pointer were
`const` --- that is, didn't allow modification of the pointed-to value
--- you could avoid this.  That would allow the following subtyping
rule:

    if                    TypeA is a subtype of TypeB
            -------------------------------------------------------
    then    pointer to TypeA is a subtype of pointer to const TypeB

That allows you to copy a pointer to `TypeA` into a variable declared as
pointer to const `TypeB`.  Which is pretty much equivalent to copying a
`TypeA` into a variable declared as `TypeB`, but more efficient.

Arrays
------

When it comes to arrays of the same element type but varying sizes,
the definition of "A is a subtype of B" that works in the above
subtyping rule is "A's indices are a superset of B's indices".  That
is, if you have an `int[100]` array, it's perfectly OK to copy a
pointer to it someplace that is expecting a pointer to a `const
int[10]` array; no run-time type checks will be needed.  Actually,
though, that's true even if that pointer isn't `const` --- which I
guess is Finkel's point when he distinguishes "extensions" from
"subtypes".

Lack of Conclusions
-------------------

Anyway, just some interesting things I hadn't realized before about
safe static typing.  There's a subtyping relation that still applies
after you take mutable references, and another one that doesn't, and
the one that doesn't can be pretty broad.

Mon, 21 Jul 2008

* Kragen Javier Sitaker <kragen at canonical.org> [2008-07-21 09:40]:
> now expressed in kg*s/s, or merely kg.  So our two
> relativity-based equivalences (mass as energy, and time as
> distance) turned out to be merely two facets of the same
> equivalence.

This is how Einstein stumbled into E = mc², of course! In a
letter to his friend Conrad Habicht, he wrote:

    Eine Konsequenz der elektrodynamischen Arbeit ist mir noch in
    den Sinn gekommen. Das Relativitätsprinzip in Zusammenhang
    mit den Maxwellschen Grundgleichungen verlangt nämlich, daß
    die Masse direkt ein Maß für die im Körper enthaltene Energie
    ist; das Licht überträgt Masse. Eine merkliche Abnahme der
    Masse müßte beim Radium erfolgen. Die Überlegung ist lustig
    und bestechend; aber ob der liebe Herrgott nicht darüber
    lacht und mich an der Nase herumgeführt hat, das kann ich
    nicht wissen.

Or in English:

    Another consequence of the work on electrodynamics has also
    occured to me. Namely, the principle of relativity in
    connection with Maxwell’s equations requires that its mass
    be a direct measure of the energy contained in a body; light
    transfers mass. A noticeable decrease of mass should occur in
    the case of radium. The thought is amusing and intriguing;
    but whether the good Lord isn’t laughing about it and hasn’t
    led me down the garden path, I cannot know.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

I was reading Raphael Finkel's book "Advanced Programming Language
Design", and he mentions a language somebody named Finkel invented in
the 1970s called AL.  AL had dimensional analysis built in, with four
base dimensions: 'time', 'distance', 'angle', and 'mass'.

(As Finkel says, this is the kind of "type safety" that's really
needed in everyday calculations.  His language did it statically, but
you can do it dynamically too, as units(1) does.)

He points out that 'angle' really shouldn't be included since radians
are really dimensionless, which leaves us with 'time', 'distance', and
'mass'.  But no 'speed', 'force' or 'energy'.

However, they can be derived from the base units; speed is a ratio
distance/time --- e.g. m/s.  Acceleration is speed/time, or m/s², and
force can be thought of as mass * acceleration, or kg*m/s².  Finally,
energy is force * distance, so you can express it in units of
kg*m²/s².  This is in fact how units(1) represents it.

But do we really need all three of 'time', 'distance', and 'mass'?
The speed of light provides a natural conversion factor between time
and distance, and mass can be equivalently measured as energy,
according to the well-known formula E = mc².  So speed is really just
a dimensionless quantity, and acceleration is the reciprocal of a time
interval, namely the time to reach the speed of light at that constant
acceleration; so its units are really 1/s.  So force can really be
measured merely with kg/s.

Unfortunately this doesn't really help us get rid of mass: energy is
now expressed in kg*s/s, or merely kg.  So our two relativity-based
equivalences (mass as energy, and time as distance) turned out to be
merely two facets of the same equivalence.

But I suspect that in everyday calculations, the equivalence of mass
and energy is rarely useful; it's far more likely to hide errors than
to reduce the amount of work necessary.  With time, distance, and
mass, the only incommensurable quantities I commonly run into with
commensurable units are:

- torque and energy;
- various dimensionless quantities;
- perhaps pressure and stress;
- stress and young's modulus.

Collapsing mass with energy and time with distance creates many more
"units collisions".

Sat, 19 Jul 2008

Robert Hayden made a joke back in 1993 or so, called the "Geek Code";
certain cypherpunks were including PGP public keys at the bottoms of
all their mail messages, so he decided to make something similar but a
little more revealing, essentially an encoding of your answers to a
Cosmo-style multiple-choice personality quiz about what kind of a geek
you were.  It caught on, and there are now Geek Codes in people's
.signatures all over the net.

More recently some people (whose names I unfortunately forget, but
they're at <http://hackerkey.com/>) created a modernized version of
the "Geek Code" called the "Hacker Key".  Here's mine:

v4sw7YCMSUPLhw6+8ln6pr7OFck4ma7u7+9Lw2Dm!g/l6Di6e0t1TMb9AGORDHMSen6g3a3Xs7MRr!p!

If properly decoded, this tells you a great deal about my personality:
some of my favorite programming languages, my aspirations, some of my
favorite book authors, my sex life, and the kinds of TV I like.  Some
of it is even accurate.  The hackerkey.com web site has a script that
will decode this key for you.  But I was in a hacking mood one day,
and so I wrote this script.

It would probably have been smarter to include a parser for the Hacker
Key text file, rather than reformatting it into a Python data
structure using Emacs macros.  Oh well.

Despite that shortcoming and some ugliness in the domain, I think the
code is reasonably well structured.  It probably suffers a bit from
reinventing the wheel, though.

Like everything else posted to kragen-hacks without any notice to the
contrary, I abandon any copyright in this program, so unless anyone
else owns copyright in it, it is in the public domain.  However, in
this case, you could argue that the program is largely a copy of the
hackerkey.com textfile, so maybe the folks who wrote that have a
copyright in this program.

#!/usr/bin/python
# parse the Hacker Key (hackerkey.com) v4
import re, sys
# I bet there's a bug with + and / on ChoicesWithOther

kragens_hacker_key = """
v4sw7YCMSUPLhw6+8ln6pr7OFck4ma7u7+9Lw2Dm!g/l6Di6e0t1TMb9AGORDHMSen6g3a3Xs7MRr!p!
"""

width = 80

def wrap(hanging, text, width):
    "Word wrap with a possible hanging indent."
    lines = [hanging]
    indent = ' ' * len(hanging)
    for word in text.split(' '):
        if len(lines[-1]) + len(word) > width:
            lines[-1] = lines[-1].rstrip()
            lines.append(indent)
        lines[-1] += word + ' '
    return lines

class Category:
    def print_banner(self, category, rest, indent):
        for line in wrap(indent,
                         '%s: (%s%s)' % (self.name, category, rest),
                         width):
            print line
        return '    ' + indent
    def present_val(self, prefix, char, val):
        for line in wrap("%s%s - " % (prefix, char), val, width):
            print line
    def parse(self, tail):
        groups = re.match('(?s)([-A-Z0-9/!+.]*)(.*)', tail)
        return groups.groups()


class Version(Category):
    def explain(self, category, rest, indent):
        print "Version %s (%s%s)" % (rest, category, rest)

class NotKnown(Category):
    def explain(self, category, rest, indent):
        if rest: print "%snot understood: %s%s" % (indent, category, rest)

class Choices(Category):
    def __init__(self, name, choices, next = NotKnown()):
        self.name, self.choices, self.next = name, choices.copy(), next
        for char, desc in [
            ('!', 'I staunchly refuse to participate in the category.'),
            ('$', 'I do this for a living.'),
            ('?', "I don't know anything about this category.")]:
            # setdefault to deal with '$': "I'm a prostitute.",
            self.choices.setdefault(char, desc)
    def __repr__(self):
        return '%s(%r, %r, %r)' % (self.__class__.__name__,
                                   self.name,
                                   self.choices,
                                   self.next)
    def present(self, prefix, char):
        self.present_val(prefix, char, self.choices[char])
    def explain_item(self, item, indent):
        if item in self.choices:
            self.present(indent, item)
        elif len(item) == 3:        # x+y or x/y
            from_, mid, to_ = item
            fromterms = {'+': 'Currently:        ', '/': 'Ranging from:'}
            toterms   = {'+': 'Striving to reach:', '/': 'to:          '}
            self.present("%s%s " % (indent, fromterms[mid]), from_)
            self.present("%s%s " % (indent, toterms[mid]), to_)
        elif len(item) == 2 and item[1] == '/':
            self.present_val(indent, item[0], self.choices[item[0]] +
                             ' (Said trait is not very rigid, may change '
                             'with time or with individual interaction)')
        else:
            return False
        return True
    def parse_items(self, string):
        return re.findall('.(?:[/+].?)?', string)
    def explain(self, category, rest, indent):
        items = self.parse_items(rest)
        if items: indent = self.print_banner(category, rest, indent)
        uncomprehended = []
        for item in items:
            if not self.explain_item(item, indent):
                uncomprehended.append(item)
        self.next.explain(category, ''.join(uncomprehended), indent)
    def renamed(self, newname):
        return self.__class__(newname, self.choices, self.next)

class ChoicesWithOther(Choices):
    """
    Two categories have an 'Other' option that breaks the simple regex parser.

    This option uses 'O' and runs until a '/'.  But 'pr', 'g', 'b',
    and 'u' have 'O' options that are just single letters.  That's why
    the parsing belongs to particular classes, and this is the one
    that does it differently.

    """
    def parse(self, tail):
        groups = re.match('(?s)((?:O[-A-Za-z0-9/!+.]*/|[-A-NP-Z0-9/!+.])*)(.*)',
                          tail)
        return groups.groups()
    def parse_items(self, string):
        return re.findall('O[-A-Za-z0-9/!+.]*/|[-A-NP-Z0-9/!+.]', string)
    def explain_item(self, item, indent):
        if item.startswith('O'):
            self.present_val(indent, 'O', 'Other: ' + item[1:-1])
            return True
        else:
            return Choices.explain_item(self, item, indent)

class Age(Category):
    name = "Age"
    choices = {
        '[0-9][0-9]': 'Actual Age',
        '[0-9]X': 'I only feel like updating my age once per decade!',
        'F': '100+ (Old Fart)',
        'I': 'Immortal',
        'N': 'N/A, or None of your business',
    }
    def explain(self, category, rest, indent):
        indent = self.print_banner(category, rest, indent)
        for k, v in self.choices.items():
            if re.match(k + '$', rest):
                self.present_val(indent, rest, v)
                break
        else:
            NotKnown()(category, rest, indent)

class Politics(Category):
    """
    Renders the Politics category.

    For version 4, use the ratings from the Political Compass 
    (www.politicalcompass.org) website.  Just list your "Economic 
    Left/Right" and "Social Libertarian/Authoritarian" scores with a slash 
    between them, or round up/down to the nearest whole number to save space 
    (preferred).  Yes, the (-) symbol is not valid PGP, but at least the 
    category is now somewhat useful.

    If you only list one number in this category it will be assumed to be 
    the Social Libertarian/Authoritarian rating, as social issues tend to be 
    more important to hackers
    """
    name = 'Politics (www.politicalcompass.org)'
    def explain(self, category, rest, indent):
        if rest in '?!':
            return Choices(self.name, {}).explain(category, rest, indent)
        
        indent = self.print_banner(category, rest, indent)
            
        if '/' in rest:
            economic, social = rest.split('/', 1)
        else:
            economic, social = None, rest
        if economic is not None:
            print "%sEconomic Left/Right: %s" % (indent, economic)
        print "%sSocial Libertarian/Authoritarian: %s" % (indent, social)

decoders = {
    'v': Version(),
    'sw': Choices('Software Hacking', {
        '9': "I'm Bill Joy, Eric Raymond or JWZ.",
        '8': "I am an uberhacker; I wrote my shell/debugger/editor/compiler in "
             "30 lines of code.  Other people use and love my hacks.",
        '7': "Live to Hack, Hack to Live!",
        '6': "There is nothing better than an elegant hack.",
        '5': "Average, I've made a few software hacks in my day.",
        '4': "I've hacked code once or twice, software isn't my thing.",
        '3': "I don't hack software at all.  I'm a structured programmer!",
        '2': "I'm not even a programmer, much less a hacker!",
        '1': "I'm a manager and/or work at IBM.",
    }, Choices('Favored languages', {
        'C': "C(++)",
        'B': "(Visual) Basic",
        'L': "Lisp",
        'H': "PHP",
        'P': "Perl",
        'J': "Java",
        'G': "Prolog",
        'R': "Ruby",
        'Y': "Python",
        'U': "Unix Shell",
        'A': "Ada",
        'S': "Assembler",
        'M': "Scheme",
        'F': "Fortran",
    })),
    'hw': Choices('Hardware Hacking', {
        '9': "I am Steve Wozniak.",
        '8': "I have my own eeprom burner in the basement.  I won't use what I "
             "didn't make myself.",
        '7': "I have had my hardware designs used in actual products.",
        '6': "I've toyed with a few hardware drawings, but never made my own hardware. "
             "I know PC hardware like the back of my hand.",
        '5': "I built every PC in my home from the ground up.  Newegg knows me by "
             "my first name.",
        '4': "I've built a PC or two in my day. It's just easier to get them pre-made.",
        '3': "People ask me what USB stands for, I know a thing or two about hardware.",
        '2': "I know how to put my computer together without a diagram.",
        '1': "Dude!  You got a Dell!",
    }),
    'ln': Choices('Language Hacking', {
        '9': "I am J.R.R. Tolkien.",
        '8': "I've had my pet language used and studied by others.",
        '7': "People who don't know me have used words I've coined.  I've written "
             "my own artificial language.",
        '6': "I am known for certain words or phrases, my friends use my linguistic "
             "creations regularly.",
        '5': "I've coined a phrase or made up a new word or two.",
        '4': "I'm a grammar nazi; people hate to talk to me because I correct them "
             "mid-sentence.",
        '3': "I hate people who don't follow the basic rules of $LANG, which I "
             "strive to speak properly.",
        '2': "I'm illiterate and/or can only speak IM: l8r sk8r!",
        '1': "I am a Slashdot editor.",
    }),
    'pr': Choices(
        "Programming - Hey, some hackers think they're programmers ;-)", {
            '9': "I only program in ADA, and LOVE IT!",
            '8': "I'm currently programming my IDE-brain connectivity link.",
            '7': "I do a lot of programming, and spend a lot of time in my "
                 "IDE of choice.",
            '6': "I'm definitely a programmer, not a hacker.  I like it "
                 "basic, a text editor and a compiler/debugger.",
            '5': "I'm your average programmer, I prefer to think I'm a hacker",
            '4': "I will sacrifice elegant design for performance or size",
            '3': "Comments are for sissies!  If it was hard to write, it "
                 "should be hard to read.",
            '2': "I can write hello world, but my programs don't do that much.",
            '1': "I can't program at all.  ",
        },
        Choices('Programming methodology', {
            'O': "Object Oriented",
            'S': "Structured",
            'A': "Aspect Oriented",
            'U': "Unified",
            'R': "Rational Unified",
            'P': "Procedural",
            'F': "Functional",
        })
    ),
    'ck': Choices('Cracking (The malicious act of hacking into systems '
                  'that gets all the headlines; what non-technical people '
                  "consider 'hacking')", {
        '9': "I work for @Stake or write for 2600.",
        '8': "I write the 'sploits that all the kiddies use.  I would be a "
             "professional black hat except for this economy.",
        '7': "Black Hat - Cracking is old hat to me.  I only compromise a "
             "system if it looks like a challenge.  Script kiddies worship me.",
        '6': "Grey Hat - Some of my cracking is for good, some is for "
             "evil.  It really depends on who benefits from what I do.",
        '5': "I try to break into systems occasionally.  It's for educational "
             "purposes, that's it!",
        '4': "White Hat - I study exploits because I'm the one who has to "
             "patch the systems when they get released.  I subscribe to CERT.",
        '3': "I've tried once or twice, but it felt wrong.  I stick to warez.",
        '2': "I don't ever try to break into a computer, that's against the law!",
        '1': "I barely know how to crack an egg!",
    }),
    'ma': Choices('Mathematics (highest completed Math course)', {
        '9': "Advanced Calc/Theory",
        '8': "Differential Equations",
        '7': "Calc II",
        '6': "Probability/Statistics",
        '5': "Linear Algebra",
        '4': "Calc I",
        '3': "I finished HS Math and realized that was enough for me",
        '2': "Still in secondary school/high school & doing just fine.",
        '1': "I'm angry at numbers.  There's like, too many of 'em and stuff.",
    }),
    'u': Choices('Unix', {
        '9': "I am Ken Thompson.",
        '8': "I am a guru.  Everyone asks me for help with their Unix " 
             "machines. Unix is more than an OS, it's my religion!",
        '7': "I use exclusively Unix on all my computers.  If it's not Unix, "
             "it's CRAP!",
        '6': "I really like Unix; I've installed Linux once or twice but "
             "primarily use Windows or MacOS for my daily needs.",
        '5': "Unix is okay, it's just a tool like any other.",
        '4': "Mac OS X is as close to Unix as I like to get.",
        '3': "I dislike Unix, it's too cryptic, and what's this shell crap! "
             "The GUI is far superior!",
        '2': "Unix is an abomination.  It's one of those dead OSes that "
             "doesn't realize it's dead yet.",
        '1': "I am Bill Gates.",
    }, Choices("Most (least) favorite *nix OSes", {
        'L': "Linux",
        'I': "Irix",
        'A': "AIX",
        'S': "(Open)Solaris",
        'F': "FreeBSD",
        'N': "NetBSD",
        'O': "OpenBSD",
        'B': "Other BSD",
        'H': "HP-UX",
        'M': "MacOS X",
        'T': "Tru64",
    })),
    'w': Choices('Windows', {
        '9': "I work for Microsoft, they call me code monkey....well "
             "just monkey.",
        '8': "I install every beta build of the newest Windows release "
             "I can get off the 'net.  I want to work at Microsoft.",
        '7': "I've developed my own Windows programs, VB and .NET 0wn me.",
        '6': "I really like Windows; I have my desktop theme and screensaver "
             "just they way I want them.",
        '5': "Windows is OK, I'm pretty indifferent about it.",
        '4': "When are they going to stop stealing other people's ideas "
             "and come up with something INNOVATIVE?  I still run it though, "
             "what else is there?",
        '3': "I keep Windows on my hard drive only to test new hardware I "
             "buy. Linux rulez!",
        '2': "I have totally annihilated Windows off my hard and haven't "
             "looked back!",
        '1': "I've never actually run Windows, I am completely untainted!",
    }, Choices('Favorite Windows OS(es)', {
        'D': "MS-DOS/Windows 3.x",
        'N': "Windows NT",
        'T': "Windows 2000",
        'W': "Win95/98/Me",
        'C': "Windows CE/Mobile",
        'X': "Windows XP",
        'U': "Windows Server 2003",
        'V': "Vista",
        'G': "I use Windows machines but only with Cygwin installed",
    })),
    'm': Choices("MacOS X", {
        '9': "I am Steve Jobs.",
        '8': "I've written books about programming with Cocoa and Carbon.  See you "
             "at the WWDC!",
        '7': "I love it!  The power of Unix and a slick aqua GUI are the "
             "height of computing. I've written the occasional app or two.",
        '6': "I like the new MacOS.  Finally stability AND ease of use!",
        '5': "MacOS X is okay.  A nice GUI front-end for a Unix OS.",
        '4': "I don't like it.  There's not enough emphasis on the CLI.",
        '3': "I hate it.  The GUI takes up half my RAM and it still locks up on me "
             "even though it's UNIX based!",
        '2': 'I despise MacOS X.  I miss "Classic" MacOS (<= 9).',
        '1': "I despise MacOS X.  I miss the Apple ][.",
    }),
    'l': ChoicesWithOther("Linux", {
        '9': "I am Linus Torvalds or Alan Cox.",
        '8': "I am an active kernel hacker/package maintainer.  If Linux didn't "
             "exist, I might have to leave my house once in a while!",
        '7': "I'm a power user.  I've developed my own mini-distribution on a "
             "whim.  I can't count how many kernels I've compiled.",
        '6': "I really like Linux.  I've tried a few distributions and am almost "
             "decided on one.  Windows sucks.",
        '5': "Linux is okay.",
        '4': "I don't like Linux.  It still relies too much on the command line.",
        '3': "I hate Linux.  It's not really UNIX, it's a cheap knock off.  This "
             "whole open sores thing will never work!",
        '2': "I despise Linux and its commie supporters.  Thank God there's a "
             "great company like Microsoft around and a great OS like Windows!",
        '1': "I work for SCO.",
    }, Choices('Favorite distribution', {
        'A': "Damn Small",
        'D': "Debian",
        'E': "(Open)SuSE",
        'F': "Fedora Core",
        'G': "Gentoo",
        'I': "(Lin|Free)spire",
        'K': "Knoppix (et al)",
        'L': "Linux From Scratch",
        'M': "Mandriva",
        'P': "Puppy",
        'R': "Red Hat",
        'S': "Slackware",
        'U': "Ubuntu (et al)",
        'V': "Vector",
    })
    ),
    'i': ChoicesWithOther("IDE/Text Editor environment", {
        '9': "Xcode",
        'A': "Anjuta",
        'N': "Notepad (sick)",
        '8': "Visual Studio",
        'C': "Eclipse",
        'P': "Powerbuilder",
        '7': "CodeWarrior",
        'D': "Delphi",
        'T': "Textpad",
        '6': "emacs",
        'E': "ee",
        '5': "pico/nano",
        'F': "(v)FTE",
        '4': "jove",
        'G': "GNUstep",
        '3': "jed",
        'H': "HTE",
        '2': "vi (and clones)",
        'I': "NEdit",
        '1': "ed",
        'J': "jEdit",
        '0': "cat",
        'L': "Idle",
    }),
    'e': Choices("Education (highest level)", {
        '9': "I'm omniscient, you insensitive clod!",
        '8': "PhD.",
        '7': "Master's Degree.",
        '6': "Bachelor's Degree.",
        '5': "Trade/Technical school/Associates",
        '4': "Some college",
        '3': "High school diploma.",
        '2': "Currently in secondary school.",
        '1': "Elementary/Middle school.",
        '0': "We don't need no education.",
    }),
    't': Choices("Television (most hackers don't watch much)", {
        '9': "I'm Max Headroom.",
        '8': "I've long since given up other things like a job, talking to other "
             "people and leaving my house.  I have an IV and dialysis so I don't have "
             "to worry about pesky bodily functions.",
        '7': "I watch TV religiously.  The television is never shut off on my house.  "
             "I have a generator so I can watch TV even when there's a power outage!",
        '6': "I watch a lot of TV.  I have a TiVo just so I can get every show "
             "of my favorite series!",
        '5': "I watch TV a few hours a day, and have a few favorite shows.",
        '4': "I watch TV once in awhile.  There's just not much good on anymore.",
        '3': "TV is trash.  I'd rather read a book or go for a walk.",
        '2': "TV is just the (Devil|Government|Big Business)'s tool of control!  "
             "Throw out your TV!",
        '1': "I'm Amish",
    }, Choices("Hacker TV Series", {
        'T': "Star Trek",
        'N': "TNG",
        'D': "DS9",
        'V': "Voyager",
        'E': "Enterprise",
        'B': "Babylon 5",
        'S': "Stargate SG1/Atlantis",
        'H': "Xena/Hercules",
        'L': "LEXXX",
        'F': "Farscape",
        'R': "Red Dwarf",
        'X': "The X Files",
        'M': "Monty Python",
        'A': "Adult Swim",
        'G': "Battlestar Galactica",
        'W': "Doctor Who",
    })),
    'b': Choices('Books', {
        '9': "I'm obsessed with reading.  I actually write my own "
             "short stories/poetry as well as reading books.",
        '8': "I'm kind of a bookworm, I try to read a book a week/month.",
        '7': "I read actual books, not only technical references.",
        '6': "I pick up the odd book.  I know my way around the local Borders.",
        '5': "I read books occasionally.  You mean besides O'Reilly books?  Oh, then no.",
        '4': "I don't read books, the web is enough of a reading outlet for me.",
        '3': "I haven't read a book since college.",
        '2': "I haven't read a book since high school.",
        '1': "I'm illiterate (See ln1).",
    }, Choices('Hacker Favorite Books and Authors', {
        'A': "Isaac Asimov",
        'D': "The New Hacker's Dictionary",
        'G': "William Gibson",
        'H': "Hitchhiker's Guide",
        'I': "Illuminatus Trilogy",
        'L': "C. S. Lewis",
        'K': "Philip K. Dick",
        'M': "Man/texinfo pages",
        'O': "O'Reilly technical books",
        'P': "Harry Potter series",
        'R': "Request for Comments (RFCs)",
        'S': "Neil Stephenson",
        'T': "J R R Tolkien",
    })),
    'en': Choices('Encryption', {
        '9': "I have my own encryption algorithm named after me.",
        '8': "I can crack Enigma in my head.",
        '7': "I use GnuPG for all email and have many cryptographic "
             "filesystems on my hard drive.  If you want my data you're "
             "going to have to earn it!",
        '6': "I use encryption frequently.  Want to sign my key?",
        '5': "I like and use encryption.",
        '4': "I don't use encryption.   I have nothing to hide.",
        '3': "Encryption is needless overhead.  If you use it, you must have "
             "something to hide!",
        '2': "Only terrorists use encryption.  The government should have back "
             "doors to all encryption mechanisms.",
        '1': "Only the government should be able to use encryption to be able to "
             "keep its secrets safe!",
    }),
    'g': Choices("Gaming", {
        '9': "I am John Carmack.",
        '8': "I've developed the occasional game.  People consider me to be a "
             "gaming guru.  People disconnect from servers when they see me "
             "log on.",
        '7': "I play all the time, when I'm not hacking, eating or sleeping.",
        '6': "I play often, I have a console or two and/or quite a few PC "
             "games.",
        '5': "I'll pick up the occasional game.",
        '4': "I used to play video games back in the 8-bit days, but not "
             "anymore.",
        '3': "I don't like video games, that's time I could be hacking.",
        '2': "I don't like video games, that's time I could be sleeping.",
        '1': "I suck at video games.",
    }, Choices('Genre Categories', {
        'R': "Graphical RPG",
        'A': "Action/Adventure",
        'S': "Sports",
        'Z': "Puzzle",
        'T': "Text Based/MUD",
        'O': "Shooter",
        'V': "Massively Multiplayer games (how do you find time to hack??)",
        'G': "Realtime Strategy",
    }, # XXX this nesting is suboptimal
    Choices('Console Categories', {
        'C': '"Classic" (i.e. dead) systems - Nintendo [S]NES, Sega',
        'M': "Modern systems (PS2/3, XBOX (360), GameCube/Wii).",
        'H': "Handhelds (Gameboy/PSP)",
        'P': "PC Gamer",
    }))),
    'a': Age(),
    's': Choices('Sex', {
        '9': "I'm Jenna Jameson.",
        '8': "I'm a nympho.  If I don't have sex every 6 hours I get totally antsy.",
        '7': "I've had more than my share of sex.  I'm a stud!",
        '6': "I've had sex more than a few times.  Still out there in the dating "
             "scene so let's not get into numbers.",
        '5': "I've had sex.  Next subject.",
        '4': "I haven't had sex, not that I haven't had the opportunity but I'm "
             "saving myself.",
        '3': "I haven't even gotten past second base.  Once my face clears up "
             "I'm gonna get some though!",
        '2': "Sex is dirty!  Save it for someone you love!",
        '1': "I'm a member of the clergy.  None for me, thanks.",
        '0': "I'm a eunuch.",
        '$': "I'm a prostitute.",

        'M': "Male",
        'F': "Female",

        'G': "Gay/Lesbian",
        'B': "BiSexual",

        'S': "Single",
        'R': "Married",
        'I': "Involved (dating)",

        'T': "Transvestite",
        'D': "BDSM",

        'W': "Swinger",
    }),
    'r': Choices("Religion", {
        '9': "I Am God, you foolish mortal!",
        '8': "Pantheist - The Universe IS God.",
        '7': "Pagan/Wiccan - I worship nature moreso than a deity.",
        '6': "Polytheist - I believe in multiple Gods.",
        '5': "Monotheist - There is only one God who exists.",
        '4': "Theist - God exists and interacts with the world and His/Her/Its "
             "followers.",
        '3': "Deist - God exists, but does not intervene in the world.",
        '2': "Agnostic - Unsure of God's existence.",
        '1': "Atheist - There is no God.",
    }),
    'p': Politics(),
}

decoders['g/l'] = decoders['l'].renamed('GNU/Linux')

def parse(key):
    while True:
        tail = re.search('(?s)([a-z/]+)(.*)', key)
        if not tail: break
        category = tail.group(1)
        decoder = decoders.get(category, NotKnown())
        rest, key = decoder.parse(tail.group(2))
        decoder.explain(category, rest, '')


if __name__ == '__main__':
    if len(sys.argv) > 1:
        for key in sys.argv[1:]:
            parse(key)
    else:
        print "testing:"
        parse(kragens_hacker_key)
        print
        print "Done with Kragen's key; showing others"
        parse("v4l7OgNewSense/s7")
        parse("v4g/lOgNewSense/s6")
        parse("v4p-4.20/-5.10r3")
        parse("p-4/-5")
        parse("p6.10")
        parse("p6")
        parse("u5/")


Thu, 17 Jul 2008

I was reading [Doron Zeilberger's Opinion
65](http://www.math.rutgers.edu/~zeilberg/Opinion65.html) about the
Gelfand Principle, which he ascribes to Israel Gelfand.  I'll restate
what he says here.

He says it is better to begin by explaining that 1+2 = 2+1, and then
go on to explain that this is true for all pairs a+b = b+a and that
this is called the commutative property of addition, than to begin by
giving the name, explaining that it means that always a+b = b+a, and
then by giving an example.  And he points out that 1+2 = 2+1 is the
best example to use, because there's nothing particularly special
about it.  0+1 = 1+0 is true, but in general 0+x = x+0 = x, and so
someone might think that this commutative property is a special
feature of 0.  (In linear algebra, the identity matrix works this way:
for all conformable M and identity matrices I, IM = MI, even though
matrix multiplication is not commutative in general.)  And 1+1 = 1+1,
but that's because of the reflexive property of equality, not the
commutative property of addition.  So 1+2 = 2+1 is the simplest
nontrivial example.  It's a better example than, say, 
424 + 501 = 501 + 424, because it's easier to prove: 1+2 is 1+(1+1),
and 2+1 is (1+1)+1, and addition is associative, so those are the same.

In general, he argues, "Whenever you state a new concept, definition,
or theorem, (and better still, right before you do) [you should] give
the *simplest* possible non-trivial example."  He also says:

> The Gelfand Principle should also be used in research articles. It
> is much easier to follow a new definition or theorem after a simple
> example is first given. Even proofs would be easier to follow if
> they are first spelled out concretely for a special case.

I agree.  In fact, I've often groused to myself about mathematical
papers being written in the opposite style.  (I've spent some time
lately reading John Backus's "Can Programming Be Liberated from the
von Neumann Style?", which is not technically a math paper but
contains theorems anyway, and it would be considerably improved by an
application of this principle.)  I've wondered whether other people
--- certain mathematicians maybe --- actually find it easier to
understand things in what seems to me like a backwards order: theorem
first, then example.

(Amusingly, Zeilberger states the principle before he gives the above
example of it, thus violating the principle he is trying to promote;
however, he follows it in part, in that the commutative principle is
probably the simplest possible nontrivial example of stating a
theorem.)

Apparently, however, other people also feel that the Gelfand approach
is the correct one.  In response to Zeilberger's article, Tim Gowers
wrote:

> I've just looked at your opinions page for the first time for a
> while, and read your article on two pedagogical principles. I was
> particularly interested in the first [the Gelfand Principle],
> because as a result of editing the Princeton Companion I have become
> incredibly conscious of it myself -- I'm tempted to say that I
> discovered it independently. Of course, it doesn't bother me that
> Gelfand got there first -- it is SO clearly correct that it would be
> a miracle if I had not been anticipated. Instead, we have the
> depressing miracle that something so obvious should be practised by
> such a small percentage of mathematicians. I feel quite evangelistic
> about this, and have already started a one-man (except that now I
> see that you are an ally) campaign to publicize the principle.  For
> example, a few weeks ago I was asked to give a talk about the
> Princeton Companion, and EXAMPLES FIRST was one of the main themes
> (which I illustrated by an example first: I gave a ridiculous and
> unmemorable definition of a "C-space" which was in fact a
> mathematical model of a car, and as soon as the word "car" was
> uttered, the definition was magically easier to remember).
> 
> I had always been aware, of course, of the value of giving the
> simplest non-trivial example. The thing that has really struck me is
> the value of giving it FIRST. I think it is very important to stress
> that this is an independently important part of the Gelfand
> principle (or else, if you were not including it, a separate and
> equally important principle).
> 
> Here is my "proof" that it is better to start with concrete examples
> and proceed to abstract definitions than it is to begin with the
> abstract definitions. If you give the example first, then it is easy
> for the reader to understand, so not much effort is needed to
> remember anything. Then, when you are presented with the abstract
> definition, you have a mental picture of an example, so the various
> components of the abstract definition become labels that you attach
> to this picture. If, on the other hand, you give the abstract
> definition first, then the components are meaningless, so you have
> no choice but to memorize them as if you were learning Chinese
> vocabulary or something.  Then when you see the example, you have to
> go back and see how this meaningless stuff does in fact mean
> something. But that effort of memorization should have been
> unnecessary!

Dijkstra, unsurprisingly, disagrees (in EWD757-3):

> There exists a school (I wouldn't call it a "school of thought")
> that believes in "teaching by example" and in "discovery by
> example".  I don't.  I concluded EWD376, which describes in detail
> the actual steps in which I had solved a problem from graph theory,
> with:
> 
> > "Finally we draw attention to the fact that we did not draw a
> > single example to explain what we were talking about or (even
> > worse!) to discover what the program should do.  And this, of
> > course, is as it should be."

So what's the analogue in computer programming?  You could argue that
it's test-first programming, where you write the unit test before you
write the code, and hopefully people will read it in the same
sequence.  The unit test is usually a better unit test if it's exactly
this simplest non-trivial case.  (Maybe it's better if there's an
additional test that doesn't try to be simple, just in case you were
mistaken about how trivial the case was.)

But in the mathematical case, you don't just write down the premises
of the example, write down the conclusion, baldly assert that some
theorem exists to connect the two, and then proceed to explaining what
that theorem is and proving it in the general case.  Instead, you
demonstrate that the conclusion is true of that particular example,
and then state the theorem and proof for the general case.  This is
much more similar to walking through the program as it executes the
test case in a debugger and looking at all the intermediate values.
 
There was a thread on LtU about "[Ivory Towers and Gelfand's
Principle](http://lambda-the-ultimate.org/node/924)", on motivating
language features with examples:

> If an example has a solution that is nearly as good without a given
> language feature, then that example is not a good motivation for
> that feature. Perhaps not following this principle is partly what
> earned FP it's ivory tower reputation.

There was also [a thread about this on
LiveJournal](http://jcreed.livejournal.com/899682.html?view=2631778#t2631778).

Mon, 14 Jul 2008

[Raphael Finkel's
book](http://www.nondot.org/sabre/Mirrored/AdvProgLangDesign/) says,
"Enumerating binary trees (see Chapter 2) is quite difficult in most
languages, but quite easy in CLU."  Of course I am not enthusiastic
about the idea that CLU has any merits not shared by my own favorite
languages, and so I am curious how hard it would be in, say, Python or
Scheme.

So, I thought, enumerating the binary trees of a particular size
should be fairly straightforward in Python:

    # With a generator.
    def enum_bin_tree(n):
        if n == 0:
            yield None
        for leftsize in range(n):
            for left_tree in enum_bin_tree(leftsize):
                for right_tree in enum_bin_tree(n - leftsize - 1):
                    yield left_tree, right_tree

    # With a multi-for list comprehension.
    def enum_bin_tree_eager(n):
        if n == 0: return [None]
        return [(left_tree, right_tree)
                for leftsize in range(n)
                for left_tree in enum_bin_tree_eager(leftsize)
                for right_tree in enum_bin_tree_eager(n - leftsize - 1)]

    # With a simple nested loop.

    def enum_bin_tree_simple(n):
        if n == 0: return [None]
        rv = []
        for leftsize in range(n):
            for left_tree in enum_bin_tree_simple(leftsize):
                for right_tree in enum_bin_tree_simple(n - leftsize - 1):
                    rv.append((left_tree, right_tree))
        return rv

Or in Scheme:

    (define (enum-bin-tree n)
      (if (= n 0) '(())
          (mapappend (lambda (left-size)
                       (mapappend (lambda (left-tree)
                                    (map (lambda (right-tree)
                                           (list left-tree right-tree))
                                         (enum-bin-tree (- (- n left-size) 1))))
                                  (enum-bin-tree left-size)))
                     (iota n))))
    (define (iota n) (iotaplus (- n 1) '()))
    (define (iotaplus n rest) 
      (if (< n 0) rest (iotaplus (- n 1) (cons n rest))))
    (define (mapappend fn lst)
      (if (null? lst) '()
          (append (fn (car lst)) (mapappend fn (cdr lst)))))

Then I looked at Finkel's pseudo-CLU version; it is 24 lines, compared
to 14 for the Scheme version and 6-8 for the Python versions.
However, it happens to be very similar to the first (7-line) Python
version above; the only differences in the algorithm are the position
of the -1 and its use of side effects in place of constructing new
tree nodes.

A variant of the approach above can be used to enumerate the sentences
of a given length in at least some context-free languages; the tricky
part is making sure that the recursion terminates.  I think it will
work for grammars with no epsilon-productions and no cycles in which a
nonterminal can expand to itself A -> A.  I'm not quite sure if it can
be expanded to include those languages; I'm pretty sure allowing the
cycles don't add any power (you can rewrite the grammar to an
equivalent grammar without them) but I'm not sure about the
epsilon-productions.

Enumerating binary search tree keys
-----------------------------------

Another example Finkel's book gives, which is perhaps more to the
point, is comparing two binary trees to see if their nodes have the
same value in inorder traversal.  This is very similar to the
samefringe problem, in that the recursive formulation of inorder
traversal is very straightforward:

    def inorder_traverse(fn, bintree):
        if type(bintree) is type(()):
            left, middle, right = bintree
            inorder_traverse(fn, left)
            fn(middle)
            inorder_traverse(fn, right)

A nonrecursive procedural formulation, on the other hand, is
considerably more opaque.

    def inorder_traverse_nr(fn, bintree):
        stack = [("node", bintree)]
        while stack:
            action, arg = stack.pop()
            if action == "node":
                if type(arg) is type(()):
                    left, middle, right = arg
                    stack.push(("node", right))
                    stack.push(("visit", middle))
                    stack.push(("node", left))
            else:                           # action == "visit"
                fn(arg)
            
And if you want to be able to get those items on demand, for example
so that you can compare them with the items being produced from
another such traversal, you end up structuring it into some objects:

    class Inorder_Iterator:
        def __init__(self, bintree): self.stack = [Node(bintree)]
        def next(self):
            if self.stack: return self.stack.pop().next(self)
            raise StopIteration
        def push(self, other): self.stack.push(other)
        def __iter__(self): return self
    class Node:
        def __init__(self, bintree): self.node = bintree
        def next(self, stack):
            if type(self.node) is type(()):
                left, middle, right = self.node
                stack.push(Node(right))
                stack.push(Visit(middle))
                stack.push(Node(left))
            return stack.next()
    class Visit:
        def __init__(self, val): self.val = val
        def next(self, stack): return self.val

By contrast, Python generators let you write this:

    def inorder_traverse(bintree):
        if type(bintree) is type(()):
            left, middle, right = bintree
            for item in inorder_traverse(left): yield item
            yield middle
            for item in inorder_traverse(right): yield item

That's 6 lines of code instead of the 19 of the explicit object
version.  Both are noticeably shorter, however, than the 25-line
pseudo-Simula version with coroutines that Finkel presents (in chapter
2, subsection 2, p.33, figure 2.8).

Sat, 12 Jul 2008

Lucene comes with some "demo" applications that demonstrate how to use
it.  I translated one of them from Java into Jython, and it still
works pretty much the same way, but takes an extra ten seconds to
start up; here's the code, which is noticeably but not dramatically
shorter than the Java version.

I wrote this because I wrote a list of desiderata for making a
full-text index of my mail for the Nth time, and I realized that
Lucene pretty much had all the items on my list, so maybe I'd be
better off biting the bullet and using Java and Lucene instead of
writing another text indexer from scratch.

Jython seems to make Java a *lot* easier to deal with.  Just being
able to interactively import a package or class, inspect its
attributes, instantiate it, and so on, makes a big difference in my
experience of using Java.  (I wish it included a way to interactively
inspect the signatures and doc comments of the things thus inspected.)
And Java now works out of the box on Debian, thanks to `gij`, which is
another big plus, and even Sun's Java is supposed to be free software
now, although I haven't looked lately to see if they've finished that
process.  It's too bad my laptop is still too small and slow to run
Eclipse, and for some reason my `gcj-4.1` documentation is missing.

#!/usr/bin/env jython

"""A Jython version of org.apache.lucene.demo.IndexFiles, the Lucene demo.

I haven't gotten this working in `jythonc` yet, because of what I
think is a classpath problem.
"""

# Because this is a modified version of IndexFiles.java from the
# Lucene distribution, it carries the same licensing:

# Copyright 2004 The Apache Software Foundation
# Copyright 2008 Kragen Javier Sitaker

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import sys

from java.util import Date
from java.io import IOException, File, FileNotFoundException

from org.apache.lucene.index import IndexWriter
from org.apache.lucene.analysis.standard import StandardAnalyzer
from org.apache.lucene.demo import FileDocument

# Jython, being 2.1, doesn't have True and False.  Fortunately it does
# seem to map 1 to Java "true" when appropriate, which is kind of
# scary.
True = 1

def main(argv):
    usage = "jython %s <root_directory>" % argv[0]
    if len(argv) != 2:
        sys.stderr.write("Usage: %s\n" % usage)
        sys.exit(1)

    start = Date()

    try:
        writer = IndexWriter("index", StandardAnalyzer(), True)
        indexDocs(writer, File(argv[1]))

        writer.optimize()
        writer.close()

        print Date().getTime() - start.getTime(), "total milliseconds"

    except IOException, e:
        print " caught a", e.getClass()
        print " with message:", e.getMessage()

def indexDocs(writer, file):
    # do not try to index files that cannot be read
    if not file.canRead(): return
    if file.isDirectory():
        # "or []" because an IO error could occur, it says
        for ii in file.list() or []:
            indexDocs(writer, File(file, ii))
    else:
        print "adding", file
        try:
            writer.addDocument(FileDocument.Document(file))
        except FileNotFoundException, fnfe:
            # at least on Windows, some temporary files raise this
            # exception with an "access denied" message, and checking
            # if the file can be read doesn't help, it says.
            pass

if __name__ == '__main__': main(sys.argv)

Thu, 10 Jul 2008

>From a [comment on the Smoothspan blog, by Damon Edwards][0] of
dev2ops.org:

> A troubling trend I've noticed is how the benefits of "rock
> star" software development teams (small, highly skilled,
> highly motivated) are increasingly neutralized by poor
> operations.
> 
> In an ops heavy SaaS and on-demand world, the software
> development phase becomes an increasingly smaller part of an
> application's overall lifecycle. Time and time again we see
> great code sitting behind the bottleneck of QA, staging,
> performance testing, and then production deployment.
> 
> On the project plan, the "rock star" teams repeatedly deliver
> great code in record time... but at all but the smallest of
> enterprises, their "delivery" of code is a long way from where
> the business is actually realizing the benefit.

I don't know whether this is true or false, but if it's true, it
represents the reversal of a [trend touted by Philip Greenspun in
1998][1]:

> When I graduated from MIT in 1982, my classmates and I had but one
> choice if we wanted to get an idea to market: Join a big
> organization. When products, even software, needed to be distributed
> physically, you needed people to design packaging, write and mail
> brochures, set up an assembly line, fill shelves in a warehouse,
> fulfill customer orders, etc. We went to work for big companies like
> IBM and Hewlett-Packard. Our first rude surprise was learning that
> even the best engineers earned a pittance compared with senior
> management. Moreover, because of the vast resources that were needed
> to turn a working design into an on-sale product, most finished
> designs never made it to market. "My project was killed" was the
> familiar refrain among Route 128 and Silicon Valley engineers in
> 1982.
> 
> How does the Web/db world circa 1998 look to a programmer? If Joe
> Programmer can grab an IP address on a computer already running a
> well-maintained relational database, he can build an interesting
> service in a matter of weeks. By himself. If built for fun, this
> service can be delivered free to the entire Internet at minimal
> cost. If built for a customer, this service can be launched without
> further effort. Either way, there is only a brief period of several
> weeks during which a project can be killed. That won't stop the site
> from being killed months or years down the road, but very seldom
> will a Web programmer build something that never sees the light of
> day (during my entire career of Web/db application development,
> 1994-1998, I have never wasted time on an application that failed to
> reach the public).

And [by Paul Graham in 2001][2]:

> One of the most important changes in this new world is the way
> you do releases. In the desktop software business, doing a
> release is a huge trauma, in which the whole company sweats
> and strains to push out a single, giant piece of code. Obvious
> comparisons suggest themselves, both to the process and the
> resulting product.
> 
> With server-based software, you can make changes almost as you
> would in a program you were writing for yourself. You release
> software as a series of incremental changes instead of an
> occasional big explosion. A typical desktop software company
> might do one or two releases a year. At Viaweb we often did
> three to five releases a day.

I see four possible interpretations:

1. Damon Edwards is wrong, and in fact the software development phase
   is not "becoming an increasingly smaller part of an application's
   overall lifecycle" as a result of "SaaS", which is what Graham
   called "server-based software" and what Philip Greenspun called "a
   service", and in fact the per-user cost to deploy software is
   continuing to shrink.

   This is a plausible answer; Moore's Law continues to grind away
   giving us more MIPS per watt and MIPS per CPU, the cost of
   bandwidth was still falling last I checked, and perhaps more
   importantly, EC2 and S3 and Hadoop and Puppet and MogileFS and
   `aptitude` and Xen and `monit` and Cacti and `backuppc` and `nginx`
   and `perlbal` and `memcached` and Nagios and Erlang and Varnish and
   Capistrano are reducing the amount of human effort it takes to
   administer a given number of CPUs and increasing the number of
   users each MIPS can support.  Maybe Damon sees operations becoming
   more difficult because the internet is still growing, and so the
   biggest services have to deal with more users now than in 2001 or
   1998.

2. The effort required to deploy software on centralized servers
   really is growing out of proportion to the effort required to write
   it in the first place, and that's because of the dependence on
   centralized servers.  If the software could run on the machines of
   its users, the way Firefox or BitTorrent or Skype or Emacs does,
   the users would be the ones deploying it.  Of course, they might
   still find that difficult, but that wouldn't be visible to the
   software authors; they would just see that nobody was using their
   software, not the hours of frustration expended trying to install
   it.  But a lot of that deployment effort can still be moved into
   software (that's the point of InstallShield, `aptitude`,
   `easy_install`, RubyGems, CPAN.pm, Fink, Darwin Ports, Xen,
   AppEngine, and so on, although the diversity of items in that list
   suggests that the job is far from over), and with the effort that
   can't be, people can avoid duplication of effort by sharing
   solutions online.

   Just because the software runs on its users' machines doesn't mean
   it can't be providing a networked service; consider BitTorrent or
   Skype or, for that matter, Sendmail, ircd, or INN.

3. The effort required is growing, but not because of centralized
   servers.  But I do not know of any other plausible candidates.

4. A [post by Jesse Robbins on O'Reilly Radar][3] suggests that some
   startups get their operations highly automated early on, so they
   can easily deploy their changes, while others screw up and end up
   with a mess, and spend lots of time on operations.  If this is
   correct, then Damon Edwards is wrong in thinking that operations
   *inevitably* consumes a greater and greater proportion of the
   resources available; he's just working with dumb groups who dig
   themselves into big holes.

[0]: http://smoothspan.wordpress.com/2007/11/27/why-small-software-teams-grow-large-and-other-software-development-conundrums/#comment-1114
[1]: http://philip.greenspun.com/panda/future "The Future chapter of Philip and Alex's Guide to Web Programming"
[2]: http://www.paulgraham.com/road.html "Paul Graham's Road Ahead"
[3]: http://radar.oreilly.com/archives/2007/10/operations-advantage.html "Operations is a competitive advantage"

Tue, 08 Jul 2008

Last night there was a protest (by the Partido Obrero --- and
"Quebracho"?) that turned into a riot.  Fortunately nobody was killed,
but 20 people or so were injured, half split between police and
non-police.  Today there is a march to protest, as far as I can tell,
the injuries to the non-police.

So Beatrice is down by Congreso at the start of the march with our
friend Caitlin Kelly.  They're taking pictures.  I'm going to meet them
there soon.

I hope there are no deaths or injuries today, and I especially hope we
aren't among them.

Kragen

Mon, 07 Jul 2008

Relative links are great; they let you move your whole tree of HTML
files from one place to another and still retain the internal link
structure.  However, they start to suffer when you have multiple
levels of directory structure: is that `href="../../style.css"` or
`href="../../../style.css"`?  It's a bit confusing, and even if you
don't get confused, you still have to modify links when you copy them
from one file to another.

What would be more helpful would be the ability to say "up to a
directory named foo".  Suppose you have this setup:

    kragen/
      index.html
      resume.html
      style/style.css
      images/
        kragenlogo.png
        headshot.jpg
      blog/
        1.html
        2.html
        archive/
          2008-03.html

Now, suppose there's some text in `2008-03.html` that was originally
in `2.html` or one of its siblings.  It would be nice if that text
didn't have to be changed from `<a href="1.html">` to `<a
href="../1.html">`.  You can write `<a href="/kragen/blog/1.html">`,
but in addition to being verbose, that makes it hard to use a tree of
HTML that you've downloaded with `wget -r` or something similar.

Suppose you could instead write `<a href="$blog/1.html">`, meaning "go
up until you find an ancestor directory named `blog`, then use its
children".  Now you can write things like `<img
src="$kragen/headshot.jpg">` freely, and copy and paste them among all
the files.

By itself, this would be a backwards-incompatible change to browsers
and the URL spec, but it could degrade gracefully.  You could program
your web server to generate redirects for backwards-compatibility,
while implementing the change in newer browsers.  Compatibility
problems would only arise if someone had a relative link to a
directory whose name began with "$" whose name otherwise duplicated
that of a directory higher up in the hierarchy.

Sat, 05 Jul 2008

#!/usr/bin/python
import re, sys, StringIO
"""Enumerate sentences of a context-free grammar.

Some sample, and kind of creepy, output:

Python writes this.
Python writes Python.
Python writes code.
Python writes you.
Python writes me.
...
the Python is code.
the Python is you.
the Python is me.
the code writes this.
...
6 lines are Python.
6 lines are code.
6 lines are you.
6 lines are me.
the generators are this.
the generators are Python.
...
by contrast, this is code.
by contrast, this is you.
by contrast, this is me.
the Python writes 6 generators.
the Python writes 6 lines.
the Python writes the generators.
the Python writes the lines.
...
by contrast, the lines are Python.
by contrast, the lines are code.
by contrast, the lines are you.
by contrast, the lines are me.
...
6 generators let you think me.
6 generators let me write this.
6 generators let me write Python.
...
the lines let Python think this.
the lines let Python think Python.
...
by contrast, the code writes the generators.
by contrast, the code writes the lines.
by contrast, the code is 6 generators.
by contrast, the code is 6 lines.
...
Python writes the Python code generators.
Python writes the Python code lines.
...
code writes 6 lines of code.
code writes 6 lines of you.
code writes 6 lines of me.
code writes the generators of this.
code writes the generators of Python.
code writes the generators of code.
code writes the generators of you.
code writes the generators of me.
"""

default_cfg = '''
sentence "."

sentence:
        singular-subject " " singular-predicate
        plural-subject " " plural-predicate
        sentence-modifier ", " sentence
sentence-modifier:
        "by contrast"
plural-subject:
        plural-noun
singular-subject:
        mass-noun
        singular-subject-pronoun
        "the " mass-noun
singular-subject-pronoun:
        "that"
        "this"
mass-noun:
        "Python"
        "code"
plural-noun:
        number " " simple-plural-noun
        plural-article " " simple-plural-noun
        plural-noun " " prepositional-phrase
plural-article:
        "the"
singular-article:
        "a"
        plural-article
prepositional-phrase:
        preposition " " object
preposition:
        "of"
simple-plural-noun:
        mass-noun " " simple-plural-noun
        "generators"
        "lines"
number:
        "6"
plural-predicate:
        plural-transitive-verb-phrase " " object
plural-transitive-verb-phrase:
        "let " object " " infinitive
        "are"
infinitive:
        "write"
        "think"
object:
        "this"
        plural-noun
        mass-noun
        "you"
        "me"
singular-predicate:
        singular-transitive-verb-phrase " " object
singular-transitive-verb-phrase:
        "writes"
        "is"
'''


class Literal:
    "A literal string in the grammar."
    def __init__(self, contents):
        self.contents = contents.replace(r"\n", "\n")
    def alternatives(self, nwords, grammar):
        if nwords == 1:
            yield self.contents

class Nonterminal:
    "A nonterminal in the grammar; expands into some phrase."
    def __init__(self, name):
        self.name = name
    def alternatives(self, nwords, grammar):
        for production in grammar[self.name]:
            for alternative in production.alternatives(nwords, grammar):
                yield alternative

class Concatenation:
    "A concatenation of two literals, nonterminals, and concatenations."
    def __init__(self, left, right):
        self.left, self.right = left, right
    def alternatives(self, nwords, grammar):
        for ii in range(1, nwords):
            for left_alt in self.left.alternatives(ii, grammar):
                for right_alt in self.right.alternatives(nwords - ii,
                                                         grammar):
                    yield left_alt + right_alt

def concatenation(items):
    "Produce a concatenation of an arbitrary nonzero number of terms."
    assert len(items) > 0
    if len(items) == 1: return items[0]
    elif len(items) == 2: return Concatenation(items[0], items[1])
    else: return Concatenation(items[0], concatenation(items[1:]))

def parse_production(line):
    "Turn a production line into a Concatenation, Nonterminal, or Literal"
    tokens = re.findall(r'\s*([-\w]+|"(?:[^"\\]|\\.)+")', line)
    rv = []
    for tok in tokens:
        if tok.startswith('"'):
            rv.append(Literal(tok[1:-1]))
        else:
            rv.append(Nonterminal(tok))
    return concatenation(rv)

class CouldntParse(Exception): pass

class Grammar:
    "A container for the productions and the first rule."
    def __init__(self, productions, first_rule):
        self.first_rule = first_rule
        self.productions = productions
    def __getitem__(self, x):
        return self.productions[x]
    def alternatives(self, nwords):
        return self.first_rule.alternatives(nwords, self)

def parse_grammar(infile):
    current_rule = None
    first_rule = None
    productions = {}
    for line in infile:
        if line.startswith(" "):
            productions[current_rule].append(parse_production(line))
        elif line.endswith(":\n"):
            current_rule = line[:-2]
            productions[current_rule] = []
        elif re.match(r"\s*$", line) or re.match(r'#', line):
            pass
        elif first_rule is None:
            first_rule = parse_production(line)
        else:
            raise CouldntParse(line)
    return Grammar(productions, first_rule)

if __name__ == '__main__':
    if len(sys.argv) == 1:
        infile = StringIO.StringIO(default_cfg)
    else:
        infile = file(sys.argv[1])
    grammar = parse_grammar(infile)
    ii = 0
    while True:
        for sentence in grammar.alternatives(ii):
            print sentence
        ii += 1


Thu, 03 Jul 2008

* Kragen Javier Sitaker <kragen at canonical.org> [2008-07-03 09:40]:
> I agree that the code needs rewriting, but inspired by Scheme,
> I would do it without magic numbers, without nine separate
> variables for the struct options pointers, and with conditions
> and their consequents on the same line:

Your version isn’t tabular though. This is something I like about
both of the other rewrites better than yours. However, Spinellis
waffles on way too long to determine which array element to pick
and Steele commits the terrible sin of copypasting the x-related
conditional onto every one of the three lines.

Here’s how I’d fix both of these problems:

    struct options *locations[3][3] = {
         {upleft,  upper,  upright },
         {left,    normal, right   },
         {lowleft, lower,  lowright},
    };
    int h = x == last   ? 2 : !!x;
    int v = y == bottom ? 2 : !!y;
    op = &(locations[v][h])[w->orientation];

This makes use of magic numbers and exploits the C type system
quirks relating to ints and boolean logic. Is that bad style?

Or is it idiomatic C?

In any case it’s short enough to grasp quickly and the central
intent of the code is communicated visually by a table. I believe
quite strongly that code driven by tabular declarative sections,
even if the imperative part is somewhat tricky or even messy, is
easier to understand than simpler but more abstractly written code.

I do not see this as the be-all end-all version either, btw; if
there was a way to get the offset of a named member of a struct
in C, it would be possible to meld the advantages of your code
and mine. Think of the machine code that would be generated from
my code: there will multiplications with sizeof(options) in
there. The machine code version of your code would do the job
directly.

-- 
*AUTOLOAD=*_;sub _{s/(.*)::(.*)/print$2,(",$\/"," ")[defined wantarray]/e;$1}
&Just->another->Perl->hack;
#Aristotle Pagaltzis // <http://plasmasturm.org/>

// Diomidis Spinellis's book "Code Reading: An Open Source
// Perspective" has this messy code in chapter 2
// > http://www.spinellis.gr/codereading/spinellisch02.pdf
// from netbsdsrc/games/worms/worms.c:419:

op = &(!x ? (!y ? upleft : (y == bottom ? lowleft : left)) :
(x == last ? (!y ? upright : (y == bottom ? lowright : right)) :
(!y ? upper : (y == bottom ? lower : normal))))[w->orientation];

// He rewrites it as follows, the idea being to improve its
// readability:

struct options *locations[3][3] = {
     {upleft, upper, upright},
     {left,    normal, right},
     {lowleft, lower, lowright},
};
int xlocation, ylocation;
// Determine x offset
if (x == 0)
     xlocation = 0;
else if (x ==  last)
     xlocation = 2;
else
     xlocation = 1;
// Determine y offset
if (y == 0)
     ylocation = 0;
else if (y ==  bottom)
     ylocation = 2;
else
     ylocation = 1;
op = &(locations[ylocation][xlocation])[w->orientation];

// In my opinion, the above is actually worse than the original, as a
// result of being so much longer.  However, then Spinellis suggests
// this version, which he credits to Guy Steele:

op =
  &(        !y ? (!x ?  upleft : x!=last ?  upper :  upright ) :
     y!=bottom ? (!x ?    left : x!=last ? normal :    right ) :
                 (!x ? lowleft : x!=last ?  lower : lowright )
   )[w->orientation];


// I agree that the code needs rewriting, but inspired by Scheme, I
// would do it without magic numbers, without nine separate variables
// for the struct options pointers, and with conditions and their
// consequents on the same line:

struct bar { struct options *left, *middle, *right } upper, middle, lower;
struct bar *a = (y == 0      ? &upper :
                 y == bottom ? &lower :
                               &middle);
struct options *b = (x == 0    ? a->left  :
                     x == last ? a->right :
                                 a->middle);
op = &b[w->orientation];