Wed, 28 Nov 2007

My 31st year of life began November 18th, 2006, and ended Novenber
18th, 2007.

Just before this year began, I arrived in Argentina, which is now my
home.  I spent most of the year in Argentina, although I also went
back to the US for three months, and visited Uruguay for a few weeks
and Chile and Brazil for a week each.

The biggest thing I've done this year is that I've learned Spanish.  I
now feel comfortable saying that I can speak Spanish, even though I
still have to ask for a lot of clarification.  Surprisingly, on my
trip to Brazil for the International Free Software Forum FISL, I was
able to understand a surprising amount of the Portuguese spoken at the
conference, thanks in large part to my wonderfully helpful and
supportive fellow members of CouchSurfing, but also the similarity
between Portuguese and Spanish.

I had a lot of other memorable experiences:

- I wrote a draft of an essay called, "What's Wrong With HTTP?" about
  how to replace much of the current web with decentralized social
  software.

- I wrote a draft of an essay called, "the end-to-end principle in
  human society: scholarly writing and freedom of speech", which got a
  lot of good feedback, but needs a lot of work.

- I wrote an essay entitled, "my evolution as a programmer," which
  several people have liked a lot.

- At the beginning of the year, on my birthday, I participated in what
  may have been history's largest pillow fight, the Lucha de Almohadas
  Buenos Aires.  There were several thousand participants (nobody
  knows how many exactly) and it went on for several hours.  I
  probably lost some parts of my brain to vascular damage during this
  fight --- I was pretty punch-drunk by the end.

- Beatrice and I went to the two weddings of four dear friends: Sarah
  Torpey, Johann Hibschman, Mary Corey March, and Topher Saari.

- Beatrice and I celebrated our fourth wedding anniversary and the
  sixth anniversary of our meeting.

- I visited Itaipú dam, which was at the time the largest
  hydroelectric power plant in the world, and which is orders of
  magnitude larger than any other machine I've ever seen.  I stood in
  a room with a single rotating turbine shaft that supplied
  approximately the entire electricity demand of Paraguay, one of
  twenty such shafts in the plant.  This degree of centralization is,
  I think, an outdated relic of the twentieth century --- it creates
  vulnerabilities to coercion --- but I'm glad I got a chance to see
  it.

- Some thief privatized my laptop (a lovely Thinkpad that was a gift
  from Jesse Andrews) early in the year, here in Buenos Aires.  I've
  bought a couple more older Thinkpads for US$300 each since then, but
  they aren't nearly as nice.

- The hard disk on my main server crashed.  I ran the machine off
  LNX-BBC for several months, but I still haven't properly recovered
  it.

- I became acquanted with tango nuevo music like that of Supervielle
  and Tanghetto.

- With my friend Nadia, I went to my first milonga, although I didn't
  try to tango because I didn't know how.

- I went to the Feria de Libros, the world's largest retail book fair.

- I broke my front fork and flew over the handlebars of my bicycle,
  just a few weeks ago.  I was unhurt, but getting the rest of the way
  home took a while.

- I got my first MRI, as part of a neuroscience experiment at the
  University of California at San Francisco on how visual perception
  works.  This accounted for my entire income for the year, which I
  think was US$60.

- Beatrice got her first MRI as well.

- I got my first speeding ticket in years.

- I met Beatriz Busaniche, one of the most important free software
  activists in the world; she has in the past done a number of
  important jobs at the Free Software Foundation, Latin America, and
  is currently working with Fundación Via Libre.

- I wrote my first program on an OLPC XO laptop prototype, a graphics
  hack for its Open Firmware boot PROM.  (The laptop in question
  didn't have a working operating system installed at the time.)

- I wrote my first interesting programs in the programming languages
  OCaml, Forth, and 386 assembly.

- I got Bicicleta to a point where it could run some programs, and
  wrote a metacircular interpreter in it to clarify its semantics.

- I went to a number of conferences and unconferences: a couple of
  SuperHappyDevHouses, the Bar Camp Block Party for the anniversary of
  the original Bar Camp, the FISL I mentioned before, Bar Camp Buenos
  Aires, Wiki Wednesday, Dorkbot, and the CaFeConf free software
  conference here in Buenos Aires.

- I moved into an apartment across the street from Buenos Aires's
  first skyscraper, a gorgeous Art Deco thing that currently houses
  the Sofitel.

- I saw a lot of live music performances.  A couple of performances of
  Kevin Johansen and the Nada, Roger Waters, a couple of Pink Floyd
  imitation bands, New Order (at Personalfest), the national symphony
  orchestra, Supervielle (at Personalfest and later at the ND/Ateneo),
  some jazz band, Tanghetto, and some other things I've forgotten.

- I did my first install of OpenWRT on a Linksys WRT54GL.

- Beatrice and I took care of a vacant hotel belonging to a friend of
  her family for three months.

- I lived with a fairly minimal volume of physical possessions for
  most of the year.

- Beatrice and I spent three months couchsurfing with generous friends
  in the San Francisco Bay Area, a bounty for which I am immensely
  grateful; it was equaled only by the opportunity to spend time with
  so many people I care deeply about.

- I seem to have nearly healed the anal fissure that had been plaguing
  me since 2001.  It's only bled once in the last few months.

- I did my first interesting computer hardware hack, modifying an
  instance of Limor Fried's minipov2 to be able to sense light with
  one of its LEDs: http://pobox.com/~kragen/light_sensing/

- I designed my first font, starting with my handwriting:
  http://pobox.com/~kragen/oilpencil/ --- I use the "pencilitalic" as
  the default font for my web browsing now, even though it leaves a
  lot to be desired.

- I spent some time reading and providing comments on drafts of my
  friend Linley's book, "Who's Afraid of Marie Curie?", which I
  strongly recommend.

- I met Andrew Cooke and his partner Paulina for the first time, after
  corresponding with them occasionally for many years.

- I met Darius Bacon for the first time, after corresponding with him
  occasionally for many years.

- Beatrice and I hosted a series of "creativity labs" where a variety
  of artsy people got together to discuss their creative processes and
  try all kinds of goofy exercises to see if they would help.

- I visited New York for the first time in my life, with the exception
  of sleeping through it on the train between D.C. and Boston a couple
  of times a few years ago.

- Had lunch with Jonathan Edwards, who is working on Subtext, a system
  based on ideas similar to Bicicleta, at MIT.  This is the second
  time I'd had a chance to meet up with him.

- I went to RoboGames, which is the new name for Robolympics, where I
  saw robots fighting each other.  One of them caught on fire.

- I was called a slut for probably the first time in my life.  It was
  meant in a nice way (the utterer would wear the same title proudly),
  but I don't think it was really deserved.

- I went to the Fire Arts Festival and saw Dance Dance Immolation, the
  Flaming Lotus Girls' Serpent Mother, and a production of the Odyssey
  that included lightning from giant tesla coils, aerial dance with
  silks, a woman doing unbelievable contortions supported entirely by
  her teeth, ballet, break dancing, opera, giant fireballs rolling
  toward me, fire dancing, and ten-meter-tall welded sculptures.

- I saw Mother and Child, the prototypes for those ten-meter-tall
  welded sculptures, in their new home along the waterfront in San
  Francisco.  They look much smaller underneath the Bay Bridge than
  they did when I first met them, alone at the end of a kilometer of
  flaming footprints in the middle of the desert.

- I learned to make empanadas.

- I helped my friends Chip and Annie move to Colorado, my friend
  Elaine move to Arizona, and my friend Celeste move across town.

- I helped build robots for the OrbSwarm.

- I saw the CandyFab 2000, the first of a new wave of low-cost 3D
  printers, at Dorkbot.

- I watched my first Ultimate game, and the first soccer game I'd seen
  in a long time.

- I visited the Palace of the Legion of Honor and saw a number of
  Rodin sculptures for the first time.

- In Buenos Aires, I participated in a Hacklab for the first time;
  this is where I modified my minipov2 to be able to sense light.

- Here at a screening in Buenos Aires, I saw Youth Without Youth, a
  really intense and enjoyable film, but I think I will have to see it
  again to understand it.

- I made my first SIP phone call from my own computer.

- I won Prince of Persia (the original) for the first time.  I played
  Ur-Quan Masters for essentially the first time.

- I learned to bark like a dog, quite realistically.

- I read a lot of Forth systems: eForth 1.0, Bill Muench's later ITC
  eForth, parts of F-83, and Rich Jones's Forth implementation written
  in gas.

- During our first year (minus three months) in Buenos Aires, we have
  been visited by seven friends from the US, and six more have either
  visited since my birthday or will visit in the next few months.

- I read a bunch of books and papers; among them:
  - the beginning of Yochai Benkler's Wealth of Networks, which I
    think describes an economic change as fundamental as that of its
    namesake
  - most of Olin Shivers's dissertation about ORBIT, but I think I
    need to read it again a time or two to understand it
  - Craig Chambers's and Urs Hölzle's dissertations about Self
  - Ronald Coase's original paper in which he proposed the idea of
    transaction costs
  - Feeley and Dubé's PICBIT paper
  - Dan Bernstein's "10 years of qmail 1.0" paper
  - Naylor and Runciman's "Reduceron" paper, which explains how they
    think they can get Haskell to run five times faster than C on a
    Pentium by compiling it into FPGAs, because they got it to run
    half as fast on a five-year-old FPGA with a thirtieth of the
    Pentium's clock rate
  - Kiczales' "The Art of the Metaobject Protocol," which I had wanted
    to read for years and which Andrew Cooke was kind enough to lend
    me;
  - Vernor Vinge's "Rainbows End", which had been recommended to me
    before, but which is now on the web at vrinimi IIRC
  - Frank Herbert's "Dune" again, after reading the following item:
  - the first ever book of literary criticism that I bothered to
    finish: Tim O'Reilly's exegesis and analysis of the Dune trilogy.
  - "All you wanted to know about the HiPE compiler (but might have
    been afraid to ask)"
  - a bunch of papers about new Bentonite clay composite materials
  - Abdulaziz Ghuloum's wonderful tutorial on an incremental approach
    to compiler construction
  - Alexey Karpenko's report on the weather balloon he, his dad, and a
    couple of friends launched to 32km in elevation, in the lower
    stratosphere
  - All the stuff in the "Project Rho Atomic Rocket" repository, about
    how to calculate plausibility for hard-science space operas
  - part of Raph Levien's dissertation
  - the "Haaarg, world!" series of blog posts, about teaching a
    six-year-old to program in Python
  - some documents about the SCORE project to produce a thermoacoustic
    "Stove for COoking, Refrigeration, and Electricity" for the
    world's poor
  There were a lot of others; these are mostly from the last few
  months.

- I wrote a bunch of little bits of software, mostly in the nature of
  exercises:
  - a simple real-time 3-D rendering engine in 350 lines of JavaScript
  - a Lisp 1.5 interpreter in under 4kB of C (this is much larger than
    Lisp interpreters that have been entered into the IOCCC in past
    years)
  - a lambda-calculus interpreter in about 600 lines of JavaScript,
    using combinator graph reduction and therefore running painfully
    slowly; this was on top of the combinator graph reduction engine I
    had written earlier.  Maybe the most interesting part of this is
    that it includes a shift-reduce bottom-up parser in which it is
    practical to manually write out the reduction rules
  - Some DHTML translations of some old kragen-hacks posts
  - A sort of étude on various approaches to laziness in Python
  - A DHTML sketch of part of the Bicicleta UI
  - A DHTML program to draw curves of Lamé
  - A grayscale version of Naor and Shamir's visual one-time pads, in
    140 lines of Python and PostScript
  - A toy interpreter of a subset of APL in 250 lines of OCaml
  - A little dynamic query view thing in DHTML to help me buy a
    laptop
  - A boolean expression evaluator in the Gaim text replacement
    plugin, which is only intended to correct common misspellings as
    you type, but turns out to be able to do arbitrary computation
  - Real-time audio harmonic synthesis in 30 lines of Python with
    Numeric and PyGame
  - A tiny stack-machine bytecode interpreter in 386 assembly --- I
    can't call it a Forth interpreter yet

- I finished transcribing George Soros's Google talk on "The Age of
  Fallibility".  Although I can't find the final third of it at the
  moment, and haven't posted it online yet.

Frustrations
------------

I feel a little frustrated with how little I've been accomplishing.
I attribute that to the following causes:
- not keeping my surroundings in order; e.g. keeping servers running,
  computer files findable, backed up, and on my laptop where
  appropriate, papers properly filed, search indexes updated, mail
  readers efficient, laptops in physically operable condition, clothes
  washed and put away so I don't waste time finding them, etc. (This
  is not as bad as it sounds; the apartment usually needs a few
  minutes of picking up to make it neat enough to be proud to show to
  company, but there are never piles of stuff lying around on the
  floor, except in the designated Dirty Laundry Area.)
- not following through; I've started many more little projects than
  I've finished.  In many cases the trigger for this seems to be
  realizing that the project is either going to take much more effort
  than I expected, or that it is not going to be as good as I had
  expected.
- paradoxically, another cause is following through on the wrong
  things --- winning Prince of Persia, for example, which took several
  hours of puzzle-solving that I could have perhaps more productively
  spent on getting a server back up and running.  In the larger scheme
  of things, this is the thing that costs most people their careers:
  they spend their entire careers working on things that don't matter
  very much.  (I don't mean that they don't have careers as a result;
  I mean that they waste their entire careers.)  For example, the vast
  majority of software that has been written has been discarded after
  being used by only a few people for a short time, and many people
  have spent their careers writing that software.
- spending too much time "preparing" (e.g. reading, sketching, asking
  for feedback) and not enough time actually making stuff.  I think.
- I don't have a really good social support structure to help keep me
  motivated to do interesting stuff.  In a way, this is part of the
  objective of living in Buenos Aires --- I think that a lot of the
  things that excite people in Silicon Valley are kind of stupid, but
  when I'm there, I share in that same excitement.  But, like everyone
  else, I benefit greatly from collaborators and mentors.  I can't
  blame my current status on the environment; among others, Aaron
  Swartz, Andrew Cooke, Aristotle Pagaltzis, Benjamin Sergeant, Ben
  Sittler, Christoph Toshok, Danny O'Brien, Darius Bacon, Eric
  Tiedemann, Eugen Leitl, Jason Evans, Jo Walsh, Leito Monk, Matthew
  O'Connor, Michael Leonhard, Perry Lorier, Pupeno, Richard
  Uhtenwoldt, Rohit Khare, Ryan Tomayko, Seth Schoen, Shae Erisson,
  Slava Pestov, Stefan Sittler, Strata Chalup, Zooko O'Whielacronx,
  and especially Dave Long and Jesse Andrews, have all been generous
  with their time in sharing their expertise and helping me with my
  own pet projects, but I haven't been nearly as responsive and
  appreciative as they deserve; and if I were to work on, say,
  software that other people were already using, then I would probably
  get a lot more motivating responses.

I was extremely social during the three months in the Bay Area, and
immediately after coming back; during that time, I didn't achieve very
much toward my larger goals, nor did I keep up with email.  Now I feel
that I've let the pendulum swing a bit too far back the other way;
although I still talk with dozens of people in the course of a normal
day, I haven't followed up on my contacts from various conferences; I
haven't established working relationships with local free-software
organizations; I haven't found a way to fulfill my moral obligation to
share what knowledge I have by teaching --- haven't even become
acquainted with the local educational system really; I haven't put
most of my little software projects in a demoable state, and
explaining them is a lot harder than demoing them would be.

Sat, 24 Nov 2007

# Simple token-threaded "Forth" interpreter. -*- asm -*-
# by Kragen Javier Sitaker; dedicated to the public domain,
# i.e. I relinquish whatever exclusive rights copyright law
# gives me with regard to this work.
# Major parts taken from Richard W.M. Jones's public-domain
# JONESFORTH 42 by Richard W.M. Jones <rich at annexia.org>
# http://annexia.org/forth

# This program just outputs "hello, world, hello" under Linux.

# to compile:   
# gcc -m32 -nostdlib -static -o tokthr2 tokthr2.S


### Why Small Things are Interesting

# There are still a lot of computers out there that have tens of
# kilobytes of memory or less, and they cost a lot less than,
# say, a cellphone.  Cellphones are apparently still too
# expensive for half the world's population.  I want to see how
# close I can get to having a comfortable programming
# environment in a smaller device.

# Some smallish microcontroller chips from five different
# manufacturers, with current Digi-Key prices:
# Name              bytes RAM  bytes ROM  MHz  price    
# ATtiny2313        128        2048       20   US$1.38  
# ATMega48-20AU     512        4096       20   US$1.62  
# MSP430F1111AIPW   128        2264       8    US$2.43  
# LPC2101           2048       8192       70   US$2.52  
# H8/300H Tiny      1536       8192       12   US$3.58  
# M16C/R8C/Tiny/1B  1024       16384      12   US$3.54  
# SX28AC/SS         136        3072       50   US$2.79  

# More practically and short-termly, small projects can take
# less time to finish, and I feel like I need to learn about
# different approaches to implementing programming languages.

### Why This is Small

# The normal Forth representation of a function is as an array
# of pointers to the other functions it calls, in sequence; a
# few of those other functions may move the interpreter pointer
# around in that array, or snarf up a constant that's stored in
# the array, or stuff like that, but for the most part, the
# functions just get called in sequence.  This is called
# "threaded code" and it's fairly compact, especially on 16-bit
# systems where the pointers are only two bytes.

# A traditional approach taken by Forth implementations to
# reduce code size even further is called "token threading".
# Rather than making arrays of 16-bit or 32-bit pointers, they
# make lists of 8-bit indices into an array of pointers.  This
# has two advantages:

# 1. the indices are one fourth the size of a list of 32-bit
#    pointers;
# 2. it is possible to save these lists of indices somewhere
#    outside of memory and continue to use them even after
#    making some changes to the code, as long as the same
#    indices in the table have the same meanings.  So, for
#    example, you could write some boot firmware in this
#    "bytecode".

# It also has some disadvantages:
# 1. You run out of space in the table.  Even a fairly minimal
#    full Forth system contains close to 256 subroutines.  You
#    can mitigate this by packing, say, two 12-bit pointers
#    every three bytes, or maybe by having a special bytecode
#    that looks up the next byte in an extended table.
# 2. It's slower and makes the machine-code part of the program
#    take more space.  The traditional LODSW; JMP AX version of
#    $NEXT from the eForth Model, which fetches and executes the
#    next execution token in the threaded list, is three bytes
#    and two instructions; my 'next' here is 41 bytes and 14
#    instructions, which is big enough that I jump to it (2
#    bytes) rather than making an assembler macro.  Which blows
#    your branch target buffers to pieces.  Oh well.  The
#    performance penalty is probably two orders of magnitude
#    over native code, but I haven't measured it yet.  I
#    measured an earlier version on my 700MHz PIII laptop on an
#    empty loop at about a factor of 3.5 over simple
#    direct-threading, which in turn is on the order of 10 times
#    slower than machine code.

# Anyway, so this is an example program built using this
# technique.  It implements two Forthlike stacks and interpreted
# subroutines, but not yet the ability to define new subroutines
# at run-time.

### What's Here

# I've implemented all of the primitives from C. H. Ting and
# Bill Muench's public-domain (?) eForth Model 1.0, except for
# the following:
# - I haven't implemented their lowercase "next" (as in FOR
#   NEXT) because I think it's a bad idea, it's complex, and it
#   can be implemented at a higher level if you really need it;
# - I didn't implement !IO because it's a no-op in this context;
# - I haven't yet implemented ?RX, although I think it's
#   possible to implement it on top of syscall5, using select();

# However, most of it is untested and therefore probably broken.
# Procedure call and return and the system calls do work.

# Currently registers are used as follows:
# %esi --- interpreter pointer; points to next byte to execute
# %ebp --- return stack pointer; points to last thing pushed.  This stack, 
#          like the other one, grows downwards.
# %esp --- data stack pointer; points to last thing pushed.  This is
#  	   the processor's standard stack pointer; "push" and "pop"
#  	   instructions use it, which makes assembly code to use it
#  	   quite concise.  The Intel "call" and "ret" instructions
#  	   would also use this stack, but they aren't used in this
#  	   program.
# flags --- the "down" direction flag must be cleared.
#           Fortunately this seems to be the case by default.

# It's probably missing a couple of primitives needed because of
# the token-threading implementation strategy; the address of
# the token table probably needs to be knowable, at least.

# Direct and indirect threading, the normal Forth approaches to
# allowing unrestricted coexistence of words written in assembly
# language and interpreted Forth, both had heavy space costs
# here --- close to 100% for the bytecode currently in the
# system.  So the inner interpreter checks, for each bytecode,
# whether it is in the range of bytecodes whose interpretations
# are in native code, and picks the relevant code path.  This
# avoids consuming any space per-word for this distinction, but with
# what I assume is a heavy performance cost.

### How Small This Is

# eForth 1.0's machine-code part seems to be 171 instructions
# and 399 bytes, including some data that's mixed in there with
# it.

# Last I checked, this program uses only 19 different
# instructions: jmp, jz; push, pop, lodsb, lodsl, xchg, mov;
# movsbl, inc, and, xor, or, lea, cdq, rcl, add, idiv; and int.

# At the moment, this program is 229 bytes in 125 machine-code
# instructions, plus 13 bytes of read-only data, 104 bytes of
# token table for the 52 currently-defined words, and another
# 143 bytes of bytecode in those words, and then another 22
# bytes in the main program, for a total of 511 bytes.

# One important thing that's missing here is the dictionary
# structure, which will minimally use up another hundred bytes
# or so.

### Other Things I Tried

# I tried switching to caching the top of the data stack in a
# register, on the theory that it would shorten things like
# 'and'.  Currently 'and' is pop %eax; pop %ebx; and %ebx,
# %eax; push %eax; jmp next.  If top of stack is cached in %eax
# instead of being stored in memory, this becomes pop %ebx; and
# %ebx, %eax; jmp next, which is considerably shorter.
# However, most things don't change, and other things become
# longer due to the extra work to save top-of-stack.  I tried
# using both %ebx and %eax as the top-of-stack cache.

# In the version using %ebx as top-of-stack, the total size of the
# machine-code part was 216 bytes, 115 instructions, compared to 197
# bytes, 112 instructions for the version using the current strategy.
# In the version using %eax as top-of-stack, it was only 215 bytes,
# but that's still worse than 197.

# In previous versions, all routines were machine-code routines
# that you could just jmp to.  High-level bytecode words began
# with "call dolist", which took the saved %eip off the stack
# and stuck it in %esi.  Unfortunately, that added 5 bytes to
# each bytecode word; as I write this, the bytecode region is
# 143 bytes and contains 24 word definitions --- so 5 bytes each
# would have been 120 bytes of overhead, or 84%!  It also would
# have required a region to be both executable and writable to
# support run-time routine definition, which is kind of a pain
# thse days.

# In previous versions, the token-table entries were 32 bits
# each (instead of 16 bits as they are now), which added another
# 2 bytes of overhead per word.  There are currently 52 words,
# so that's another 104 bytes of shaved overhead.

### What's Wrong With This Program

# - It's a long way from doing anything useful.
# - Only these words are tested: hello, sub1, type, comma,
#   world, newline, dolit_s8, dot, bye, exit, branch_on_0,
#   c_bang, drop, dup, swap, negative, umplus, divmod,
#   syscall5, syscall3, zero, syscall1, rpop, rpush, one,
#   dolit_32, neg1, add, emit, tuck, udot, udot_nospc,
#   udot_nonzero, branch, invert, add1, negate, xor.  These are
#   not tested, and therefore may be broken: execute, bang, at,
#   c_at, rp_at, rp_bang, sp_at, sp_bang, over, and, or, r_at.
# - There's no dictionary structure yet.
# - It probably needs another couple of primitives.
# - There's no checking for stack overflow or underflow, but
#   they will break things.

### The Beginning of the Program

# .. include system library header so we know __NR_exit = 1 and
# __NR_write = 4
#include <asm/unistd.h>

### The token table

        ## I was frustrated with the unreadability of my
        ## bytecode lists; I was counting token table entries
        ## by hand and writing bytecodes numerically.  So I
        ## wrote a macro to help.

        ## Note that we are using a separate .subsection
        ## directive because gas 2.17 doesn't support putting
        ## that in the .pushsection line, even though it is
        ## documented to do so; see message from Maciej
        ## W. Rozycki on 2007-10-11, subject "Re: How to use
        ## .pushsection?",
        ## http://sourceware.org/ml/binutils/2007-10/msg00176.html
        ## for more details)

        ## The first few entries in the table of bytecodes are
        ## all defined in machine code; the rest are all
        ## defined in bytecode.  The inner interpreter examines
        ## each bytecode to determine which category it falls
        ## in in order to figure out how to execute it,
        ## including what base address to add its offset to.
        ## This sucks for extensibility but rocks for
        ## compactness.

        .macro define_bytecode name, origin
        .pushsection .data      # save current position, go to data section
        .subsection 1           # and subsection 1, where we put the addrs
        b_\name = (. - token_table) / 2 # define b_foo as the index of this ptr
        .ifeq b_\name - 256
        .error "\name got bytecode 256"
        .endif
        #.int \name              # insert pointer which will be resolved next:
        .short \name - \origin # insert offset which will be resolved next
        .popsection             # return to where we were, and
\name:                          # define the name
        .endm
        .macro defasm name
        define_bytecode \name, machine_code_primitives
        .endm
        .macro defbytes name
        define_bytecode \name, bytecode_start
        .endm

        .data 1                 # Start putting stuff in data subsection 1
        .align 4
        ## table to define the "bytecode" instructions
token_table:

        .data 3
instructions:
	# And here is the actual "program" in that bytecode.
        .byte b_hello           # string "hello" and count
        .byte b_sub1            # subtract 1 from count: "hell"
        .byte b_type            # spit it out
        .byte b_comma, b_type, b_world, b_type # ", world"
        .byte b_comma, b_type, b_hello, b_type, b_newline, b_type
        # test the "dot" command to print out numbers
	.byte b_dolit_s8, -120, b_dot
        .byte b_dolit_s8, 104, b_dot, b_newline, b_type
        .byte b_bye

### The Return Stack
# We put Forth return addresses here, but programs can also use
# it for other purposes.

        .bss
        .space 4096
initial_return_stack_pointer:   

### Initialization
        	
        .text                   # the following stuff goes in the text segment
        .global _start          # declare _start as a global symbol 
                                # (otherwise ld won't be able to find it)
_start:                         # this is the entry point for ELF I guess
	mov $initial_return_stack_pointer, %ebp
        mov $instructions, %esi # %esi is the interpreter pointer register
        jmp next                # and now we start the interpreter.
                                # (somewhat silly since we could just
                                # fall through..)

### The Machine-Code Primitives
# Also next (aka the address interpreter or inner interpreter)
# is in this section.

machine_code_primitives:
                
# dolit_s8 takes a signed 8-bit literal from the instruction
# stream and pushes it onto the stack.

        defasm dolit_s8
        lodsb
        movsbl %al, %eax
        push %eax
        jmp next

        defasm dolit_32         # more general dolit
        lodsl
        push %eax
        jmp next
        
        defasm exit             # Return from a colon defn.
        xchg %ebp, %esp
        pop %esi
        xchg %ebp, %esp
        jmp next

        defasm execute          # Run xt on data stack.
        pop %eax                # Here 'xt' is the one-byte token.
        jmp execute_eax

# Branch if top of stack is 0 (implementing IF).
# Both branch instructions take a signed byte offset from the bytecode
# stream.
        defasm branch_on_0
        pop %eax
        and %eax, %eax
        jz branch
        inc %esi                # skip 1-byte jump offset
        jmp next

        defasm branch
        lodsb
        movsbl %al, %eax        # same insn size as cbtw; cwde
        add %eax, %esi
        jmp next	

# Store a cell.
        defasm bang
        pop %ebx
        pop (%ebx)              # I'm amazed this is legal
        jmp next
# Fetch a cell.
        defasm at
        pop %ebx
        push (%ebx)             # I'm amazed this is legal too
        jmp next

# Store a byte.
        defasm c_bang
        pop %ebx
        pop %eax
        mov %al, (%ebx)         # push and pop don't do bytes
        jmp next

# Fetch a byte.
        defasm c_at
        pop %ebx
        xor %eax, %eax
	mov (%ebx), %al
        push %eax
        jmp next
        
# Get the return stack pointer.
        defasm rp_at
        push %ebp
        jmp next
        
# Set the return stack pointer.
        defasm rp_bang
        pop %ebp
        jmp next
        
# Pop the return stack to the data stack ( R> )
        defasm rpop
        push (%ebp)
        lea 4(%ebp), %ebp       # add or xchg/pop: same size
        jmp next
        
# Push the return stack from the data stack ( >R )
        defasm rpush
        lea -4(%ebp), %ebp
        pop (%ebp)
        jmp next

# Get the data stack pointer (before it gets pushed).
        defasm sp_at
        push %esp               # safe on 286 and later
        jmp next

# Set the data stack pointer.
        defasm sp_bang
        pop %esp
        jmp next
        
# Pop the stack.
        defasm drop
        pop %eax
        jmp next
        
# Push a copy of TOS.
# eForth 1.0 used BX to index the stack here, for a couple of
# reasons: on the 8086, SP got decremented prior to the fetch,
# and also wasn't valid as a base or index register.
        defasm dup
        push (%esp)
        jmp next
        
# Swap top two stack items ("exch" in PostScript)
        defasm swap
        pop %eax
        pop %ebx
        push %eax
        push %ebx
        jmp next
# Stack manipulation ( w1 w2 -- w1 w2 w1 )
# technically not necessary, but it's so easy and tiny
        defasm over
        push 4(%esp)           
#        jmp next               fall through because "next" is next
	
# "next" fetches the next bytecode and runs it.  It's placed
# here in the middle of the bytecode definitions so that more
# of them can use the short two-byte jump form to get to it.

next:
        xor %eax, %eax          # set %eax to 0
        xor %ebx, %ebx          # clear high half of %ebx
        lodsb                   # load %al from where %esi points
                                # (%esi is the interpreter pointer)
execute_eax:
        ## load offset of new word into %ebx
        mov token_table(,%eax,2), %bx  # bx := token_table[eax * 2bytes]
        cmp $last_asm_bytecode, %eax
        jbe next_primitive      # if primitive, handle primitive word
        ## otherwise, handle a bytecode definition or "colon list"
        # save old %esi on return stack
        xchg %ebp, %esp
        push %esi
        xchg %ebp, %esp
        lea bytecode_start(%ebx), %esi
        jmp next

next_primitive:
        lea machine_code_primitives(%ebx), %ebx
        jmp *%ebx
        

# Push true if n negative. ( n -- f )
        defasm negative
        pop %eax                
        cdq
        push %edx
        jmp next

# Bitwise operators:
        defasm and
        pop %eax
        pop %ebx
        and %ebx, %eax
        push %eax
        jmp next

        defasm or
        pop %eax
        pop %ebx
        or %ebx, %eax
        push %eax
        jmp next

        defasm xor
        pop %eax
        pop %ebx
        xor %ebx, %eax
        push %eax
	jmp next
        
# add two unsigned numbers, returning sum and carry.
# ( u1 u2 -- u3 cy )
        defasm umplus
        xor %ecx, %ecx
        pop %eax
        pop %ebx
        add %ebx, %eax
        rcl $1, %ecx
        push %eax
        push %ecx
        jmp next

# Divide double-precision by single-precision, unsigned.
# UM/MOD from eForth.  ( udl udh un -- ur uq )
        defasm divmod
        pop %ebx
        pop %edx
        pop %eax
        idiv %ebx
        push %edx
        push %eax
        jmp next

# Copy the top of the return stack onto the data stack.
# May be the traditional Forth word "I".
        defasm r_at
        push (%ebp)
        jmp next

# syscall5:   
# Linux system call with up to 5 arguments
# This is no longer the fashionable way to make system calls
# in Linux.  Now you're supposed to use SYSENTER on newer
# CPUs, and rather than have you figure out which one to use,
# the kernel mmaps a chunk of code called a VDSO into your
# memory space at a random address and tells you where to
# find it using the ELF auxiliary vector.  Then you're
# supposed to invoke the dynamic linker or something to parse
# the ELF executable mysteriously manifested in this way by
# the kernel, and then resolve an undefined symbol in libc
# into calls to it.  See "What is linux-gate.so.1?"
# http://www.trilithium.com/johan/2005/08/linux-gate/
# "The Linux kernel: System Calls" by Andries Brouwer, 2003-02-01
# http://www.win.tue.nl/%7Eaeb/linux/lk/lk-4.html
# "About ELF Auxiliary Vectors" by Manu Garg
# http://manugarg.googlepages.com/aboutelfauxiliaryvectors

# But the old int $0x80 approach still works, thank goodness,
# because all of that is *way* more than these ten
# instructions.
        defasm syscall5
        pop %edi
        ## we have to save %esi for the interpreter
        mov %esi, -4(%ebp)
        pop %esi
        pop %edx
        pop %ecx
        pop %ebx
        pop %eax
        int $0x80
        push %eax
        mov -4(%ebp), %esi
        jmp next

        last_asm_bytecode = b_syscall5

### Basic Interpreted Words
        ## a macro for defining interpreted words
        ## Because after I left off b_exit once, I wasted a long
        ## time trying to figure out what was wrong
        .macro def name, bytes:vararg
	defbytes \name
        .byte \bytes
        .byte b_exit
        .endm

        .data 2                 # separate subsection from token table
bytecode_start:
# System call with three arguments.
        def syscall3, b_zero, b_zero, b_syscall5
# System call with one argument.
        def syscall1, b_zero, b_zero, b_syscall3
        def bye, b_dolit_s8,__NR_exit, b_zero, b_syscall1 # exit program
        def zero, b_dolit_s8,0            # push 0
        def one, b_dolit_s8,1
	
# This word outputs a string whose address and count are on 
# the stack.  ( b u -- )

        defbytes type
        .byte b_rpush, b_rpush  # move two args onto rstack
                                # system call is __NR_write:    
        .byte b_dolit_s8,__NR_write
        .byte b_one             # push constant 1: stdout
        .byte b_rpop, b_rpop    # move two args back from rstack
        .byte b_syscall3        # call syscall with 3 args
        .byte b_drop            # discard return value
        .byte b_exit            # return

# The next few words exist just to poke string addresses
# and lengths onto the stack so "type" can print them.
        .macro make_counted_string name, contents
        defbytes \name
        .byte b_dolit_32        # dolit_32 pushes a 32-bit
        .int string_\name       # literal --- an addr, here
                                # now push literal length and return
        .byte b_dolit_s8,string_length_\name, b_exit
        .pushsection .rodata    # define the actual string:
string_\name:
        .ascii "\contents"
        ## Here we count the length of the string --- computers
        ## are for counting bytes so people don't have to!
        string_length_\name = . - string_\name
        .popsection
        .endm

        make_counted_string hello, "hello"
        make_counted_string world, "world"
        make_counted_string comma, ", "
        make_counted_string newline, "\n"

### Some More Basic Words

        def neg1, b_dolit_s8, -1   # ( -- -1 )
        def add, b_umplus, b_drop  # ( a b -- a+b ) drop the carry
        def sub1, b_neg1, b_add    # ( n -- n-1 )
        def rot, b_rpush, b_swap, b_rpop, b_swap # ( a b c -- b c a )
        def unrot, b_rot, b_rot    # ( a b c -- c a b )
        def tuck b_dup, b_unrot    # ( a b -- b a b )

# emit: output a single byte.  eForth calls this "TX!".

# This version is 11 bytes, including the buffer byte, plus the 2-byte
# token table pointer. a machine-code version I wrote the other day
# was 28 bytes.  However, I also added rot, unrot, and tuck to support
# this function, and they total 11 bytes, plus 6 bytes of overhead.
# For a total of 11+2+11+6 = 30 bytes.  Not winning yet on size over
# x86 asm!  But we're getting close.

emit_buffer:
        .byte 0
        defbytes emit
        .byte b_dolit_32
        .int emit_buffer
        .byte b_tuck            # save a copy of address for b_type
        .byte b_c_bang          # store into emit buffer
        .byte b_one, b_type, b_exit # output one-byte buffer

### "u." prints out an unsigned number.
# I had a version of this in x86 machine code in 52 bytes (23
# instructions), essentially exactly the same code as here.
# This is 31 bytes, plus 6 bytes of overhead, plus I had 
# to define b_divmod (9 bytes plus 2 bytes overhead).  Now we are
# starting to win!

        defbytes udot            # print space after number
        .byte b_udot_nospc, b_dolit_s8, 0x20, b_emit, b_exit 
        defbytes udot_nospc      # print number without space
        .byte b_dup, b_branch_on_0,2, b_udot_nonzero, b_exit
        .byte b_drop, b_dolit_s8, '0, b_emit, b_exit
	defbytes udot_nonzero
	.byte b_zero, b_dolit_s8,10, b_divmod # divide by 10
        .byte b_dup, b_branch_on_0,3, b_udot_nonzero # recurse if nonzero
	.byte b_branch,1        # else
        .byte b_drop            # drop zero quotient
        .byte b_dolit_s8, '0, b_add, b_emit # print digit
	.byte b_exit
	
### Add signed numeric output, ".".  This cost 20 bytes plus 8 bytes
# of overhead, but added some fundamental numeric operations; only 12
# of those 28 bytes are specific to "."

        # logical not: return true for 0, false (0) otherwise
	#def zeq, b_branch_on_0,2, b_zero, b_exit, b_neg1
        # logical bitwise not
        def invert, b_dolit_s8, -1, b_xor
        def add1, b_one, b_add
        # arithmetic negation
        def negate, b_invert, b_add1
        # print signed number
        defbytes dot
        .byte b_dup, b_negative, b_branch_on_0,4
        .byte b_dolit_s8, '-, b_emit, b_negate # in the negative case
        .byte b_udot, b_exit

### Obviously the next thing to do is to add ".S", print the
# stack, so that I can stop having to investigate problems by
# using gdb.

Coase argues that firms grow as long as the costs of organizing
transactions outside the firm, using the market pricing mechanism, are
smaller than the costs of organizing those same transactions inside the
firm, using management.

Today we are seeing a trend toward running more and more of our
networked applications inside of Google's data center.  (Or Amazon's, or
whatever.)  Perhaps this happens because the costs of organizing
exchanges of information between administrative domains on the internet
--- not necessarily using a market pricing mechanism, but not involving
complete mutual trust --- are nonzero, and the costs of organizing those
same exchanges inside of a single administrative domain are sometimes
smaller.

Suppose this is the case.  What can we predict?  What kinds of
technological or social developments might increase the costs (or
decrease the efficacy) of exchanges of information between
administrative domains?  Spam is an obvious example; so is software
insecurity in general.  Proprietary systems software, or systems
software that is not designed for internet scale, could also have such
effects.

What kinds of technological or social developments might decrease the
costs or increase the efficacy of these exchanges of information?
Better systems security, better decentralized source control systems (so
that software improvements can be shared more readily between
administrative domains, standardized protocols covering a wider range of
the needed functionality.

But the explanation for these trends might not be that inter-domain
costs are going up; it might also be that intra-domain costs are going
down.  Things like the Google Filesystem, Mapreduce, rsync, backuppc,
Xen, VMWare, and the like, may allow larger systems to be run by fewer
sysadmins.

(Riffs on a discussion tonight with Rohit Khare, who's visiting us here
in Argentina.)

Sat, 17 Nov 2007

I edit my address book in Emacs --- it's a text file, I can grep it, I
can isearch it, I keep it in CVS --- and I prefer keeping things
consistent in it.  So here's a little utility I wrote to reformat
phone numbers into a consistent style.

;;; still doesn't handle the formats 4154975513 and 415 4975513
(defun reformat-us-phone-number ()
  "Reformat a phone number in a US format into my normal phone number format."
  (interactive)
  (save-excursion
    ; XXX these regexen break at end of buffer
    (if (looking-at "[0-9]\\{3\\}[-. ][0-9]\\{3\\}[-. ][0-9]\\{4\\}[^0-9]")
	(progn
	  (insert "+1 ")
	  (forward-char 3)
	  (delete-char 1)
	  (insert " ")
	  (forward-char 3)
	  (delete-char 1)
	  (insert " ")
	  t)
      (if (looking-at
	   "([0-9]\\{3\\})[-. ]?[0-9]\\{3\\}[-. ][0-9]\\{4\\}[^0-9]")
	  (progn 
	    (insert "+1 ")
	    (delete-char 1)
	    (forward-char 3)
	    (delete-char 1)
	    (if (looking-at "[-. ]") (delete-char 1))
	    (insert " ")
	    (forward-char 3)
	    (delete-char 1)
	    (insert " ")
	    t)
	nil))))

(defun magic-reformat-us-phone-number ()
  "Reformat a US phone number somewhere in the vicinity."
  (interactive)
  (or (reformat-us-phone-number)
      (save-excursion (forward-char 1) (reformat-us-phone-number))
      (save-excursion (backward-char 1) (reformat-us-phone-number))
      (and (looking-at "1-[0-9]\\{3\\}")
	   (progn (delete-char 2) (reformat-us-phone-number)))
      (save-excursion (backward-char 12) (reformat-us-phone-number))
      (save-excursion (backward-char 13) (reformat-us-phone-number))
      (save-excursion (backward-char 14) (reformat-us-phone-number))))
(global-set-key [C-M-tab] 'magic-reformat-us-phone-number)

Sun, 11 Nov 2007

# Simple direct-token-threaded "Forth" interpreter. -*- asm -*-
# by Kragen Javier Sitaker; dedicated to the public domain,
# i.e. I relinquish whatever exclusive rights copyright law
# gives me with regard to this work.
# Major parts taken from Richard W.M. Jones's public-domain
# JONESFORTH 42 by Richard W.M. Jones <rich at annexia.org>
# http://annexia.org/forth

# This program just outputs "hello, world, hello" under Linux.

# to compile:   
# gcc -m32 -nostdlib -static -o tokthr tokthr.S


### Why Small Things are Interesting

# There are still a lot of computers out there that have tens of
# kilobytes of memory or less, and they cost a lot less than,
# say, a cellphone.  Cellphones are apparently still too
# expensive for half the world's population.  I want to see how
# close I can get to having a comfortable programming
# environment in a smaller device.

# Some smallish microcontroller chips from five different
# manufacturers, with current Digi-Key prices:
# Name              bytes RAM  bytes ROM  MHz  price    
# ATtiny2313        128        2048       20   US$1.38  
# ATMega48-20AU     512        4096       20   US$1.62  
# MSP430F1111AIPW   128        2264       8    US$2.43  
# LPC2101           2048       8192       70   US$2.52  
# H8/300H Tiny      1536       8192       12   US$3.58  
# M16C/R8C/Tiny/1B  1024       16384      12   US$3.54  
# SX28AC/SS         136        3072       50   US$2.79  

# More practically and short-termly, small projects can take
# less time to finish, and I feel like I need to learn about
# different approaches to implementing programming languages.

### Why This is Small

# The normal Forth representation of a function is as an array
# of pointers to the other functions it calls, in sequence; a
# few of those other functions may move the interpreter pointer
# around in that array, or snarf up a constant that's stored in
# the array, or stuff like that, but for the most part, the
# functions just get called in sequence.  This is called
# "threaded code" and it's fairly compact, especially on 16-bit
# systems where the pointers are only two bytes.

# A traditional approach taken by Forth implementations to
# reduce code size even further is called "token threading".
# Rather than making arrays of 16-bit or 32-bit pointers, they
# make lists of 8-bit indices into an array of pointers.  This
# has two advantages:

# 1. the indices are one fourth the size of a list of 32-bit
#    pointers;
# 2. it is possible to save these lists of indices somewhere
#    outside of memory and continue to use them even after
#    making some changes to the code, as long as the same
#    indices in the table have the same meanings.  So, for
#    example, you could write some boot firmware in this
#    "bytecode".

# It also has some disadvantages:
# 1. You run out of space in the table.  Even a fairly minimal
#    full Forth system contains close to 256 subroutines.  You
#    can mitigate this by packing, say, two 12-bit pointers
#    every three bytes, or maybe by having a special bytecode
#    that looks up the next byte in an extended table.
# 2. It's slower and makes the machine-code part of the program
#    take more space.  The traditional LODSW; JMP AX version of
#    $NEXT from the eForth Model, which fetches and executes the
#    next execution token in the threaded list, is three bytes
#    and two instructions; my 'next' here is 12 bytes and four
#    instructions, which is big enough that I jump to it (5
#    bytes) rather than making an assembler macro.  Which blows
#    your branch target buffers to pieces.  Oh well.  I measured
#    the speed hit on my 700MHz PIII laptop on an empty loop at
#    about a factor of 3.5 over simple direct-threading, which
#    in turn is on the order of 10 times slower than machine
#    code.

# Anyway, so this is an example program built using this
# technique.  It implements two Forthlike stacks and interpreted
# subroutines, but not yet the ability to define new subroutines
# at run-time.

### What's Here

# I've implemented all of the primitives from C. H. Ting and
# Bill Muench's public-domain (?) eForth Model 1.0, except for
# the following:
# - I haven't implemented their lowercase "next" (as in FOR
#   NEXT) because I think it's a bad idea, it's complex, and it
#   can be implemented at a higher level if you really need it;
# - I didn't implment !IO because it's a no-op in this context;
# - I haven't yet implemented ?RX, although I think it's
#   possible to implement it on top of syscall5, using select();
# - I haven't implemented TX!, although it would be
#   straightforward to implement on top of "type".

# However, most of it is untested and therefore probably broken.
# Procedure call and return and the system calls do work.

# Currently registers are used as follows:
# %esi --- interpreter pointer; points to next byte to execute
# %ebp --- return stack pointer; points to last thing pushed
# %esp --- data stack pointer; points to last thing pushed
# flags --- the "down" direction flag must be cleared.
#           Fortunately this seems to be the case by default.

# I'll probably try the colorForth thing of keeping the top of
# the data stack in a register and see if that makes things
# smaller.  It probably won't make them faster, since most of
# this interpreter's time will be spent in the interpreting
# loop anyway.

# It's probably missing a couple of primitives needed because of
# the token-threading implementation strategy; the address of
# the token table probably needs to be knowable, at least.

### How Small This Is

# The pure machine-code part of the program runs from 0x08048074
# to 0x08048139, which is 197 bytes.  This is fairly small.  I
# haven't been able to compile or disassemble eForth 1.0 to see
# how big its machine-code part is, but it seemed to be 171
# instructions, while this is 112.  So this is fairly small.

# It uses only 19 different instructions: jmp, jz; push, pop,
# lodsb, lodsl, xchg, mov; movsbl, inc, and, xor, or, lea,
# cdq, rcl; int, ret, and call.

# The bytecode part of the program, including the five-byte
# machine-code header on each word, runs from 0x08048139 to
# 0x080481a8, which is another 111 bytes.  At present this just
# includes some system-call glue, a word to exit cleanly, words
# to push literal zero and one, a word to output a string, and a
# few constant strings.

# These two parts add up to 308 bytes.

# Additionally there is some read-only data: 38 pointers to
# words, 13 bytes of "main program", and then some string data:
# "helloworld, \n".  This totals 178 bytes.

# The whole program, therefore, is 486 bytes.  Why "strip" only
# reduces it to 836 bytes, I don't know.  But I don't think that
# problem will carry over to microcontroller contexts.

# One important thing that's missing here is the dictionary
# structure, which will minimally use up another hundred bytes
# or so.

### What's Wrong With This Program

# - It's a long way from doing anything useful.
# - The bytecode is numeric and unreadable, and it confuses
#   disassemblers.
# - Most of the primitives are untested and therefore probably
#   broken, except for dolit_s8, dolit_32, dolist, exit, and
#   syscall5.
# - There's no dictionary structure yet.
# - It probably needs another couple of primitives.
# - There's no checking for stack overflow or underflow, but
#   they will break things.

### The Beginning of the Program

# .. include system library header so we know __NR_exit = 1 and
# __NR_write = 4
#include <asm/unistd.h>

### The token table
	
        .section .rodata        # stuff goes in ELF's read-only data section
        .align 4
token_table:                    # table to define the "bytecode" instructions
        ## These are pointers to machine-code subroutines
        ##   0      1     2      3      4      5
        .int bye, type, hello, world, comma, newline
        ##     6         7          8      9     10
        .int dolit_s8, dolit_32, dolist, exit, execute
        ##       11        12      13   14    15     16    17
        .int branch_on_0, branch, bang, at, c_bang, c_at, rp_at
        ##     18      19    20     21      22      23    24
        .int rp_bang, rpop, rpush, sp_at, sp_bang, drop, dup
        ##    25    26      27     28   29  30     31     32
        .int swap, over, negative, and, or, xor, umplus, r_at
        ##     33        34       35   36     37
        .int syscall5, syscall1, zero, one, syscall3
instructions:
	# And here is the actual "program" in that bytecode.
        .byte 2, 1, 4, 1, 3, 1, 4, 1, 2, 1, 5, 1, 0

### The Return Stack
# We put Forth return addresses here, but programs can also use
# it for other purposes.

        .bss
        .space 4096
initial_return_stack_pointer:   

### Initialization
        	
        .text                   # the following stuff goes in the text segment
        .globl _start           # declare _start as a global symbol 
                                # (otherwise ld won't be able to find it)
_start:                         # this is the entry point for ELF I guess
	mov $initial_return_stack_pointer, %ebp
        mov $instructions, %esi # %esi is the interpreter pointer register
        jmp next                # and now we start the interpreter.
                                # (somewhat silly since we could just
                                # fall through..)

### The Machine-Code Primitives
# Also dolist (aka DOCOL) and next (aka the address interpreter
# or inner interpreter) are in this section.

# dolit_s8 takes a signed 8-bit literal from the instruction
# stream and pushes it onto the stack.
dolit_s8:
        lodsb
        movsbl %al, %eax
        push %eax
        jmp next
dolit_32:                       # more general dolit
        lodsl
        push %eax
        jmp next
# Process colon list by popping saved %eip from stack and 
# putting it into %esi, saving old %esi on return stack.
dolist:                         # approach from eForth 1.0
        xchg %ebp, %esp         # 8 bytes, 5 insns.
        push %esi               # The sub $4, %ebp (or lea)
        xchg %ebp, %esp         # mov %esi, (%ebp) approach was 
        pop %esi                # 1 byte longer.
        jmp next
exit:   xchg %ebp, %esp         # Return from a colon defn.
        pop %esi
        xchg %ebp, %esp
        jmp next
execute:                        # Run xt on data stack.
        ret                     # Here 'xt' is the actual addr,
                                # not the one-byte token.

# Branch if top of stack is 0 (implementing IF).
# Both branch instructions take a signed byte offset from the bytecode
# stream.
branch_on_0:
        pop %eax
        and %eax, %eax
        jz branch
        inc %esi                # skip jump offset
        jmp next
branch:
        lodsb
        movsbl %al, %eax        # same insn size as cbtw; cwde
        add %eax, %esi
        jmp next	

# Store a cell.
bang:   pop %ebx
        pop (%ebx)              # I'm amazed this is legal
        jmp next
# Fetch a cell.
at:     pop %ebx
        push (%ebx)             # XXX illegal
        jmp next

# Store a byte.
c_bang: pop %ebx
        pop %eax
        mov %al, (%ebx)         # push and pop don't do bytes
        jmp next
# Fetch a byte.
c_at:   pop %ebx
        xor %eax, %eax
	mov (%ebx), %al
        push %eax
        jmp next
# Get the return stack pointer.
rp_at:  push %ebp
        jmp next
# Set the return stack pointer.
rp_bang:
        pop %ebp
        jmp next
# Pop the return stack to the data stack ( R> )
rpop:   push (%ebp)
        lea 4(%ebp), %ebp       # add or xchg/pop: same size
        jmp next
# Push the return stack from the data stack ( >R )
rpush:  lea -4(%ebp), %ebp
        pop (%ebp)
        jmp next

# Get the data stack pointer (before it gets pushed).
sp_at:  push %esp               # safe on 286 and later
        jmp next
# Set the data stack pointer.
sp_bang:
        pop %esp
        jmp next
# Pop the stack.
drop:   pop %eax
        jmp next
# Push a copy of TOS.
# eForth 1.0 used BX to index the stack here, for a couple of
# reasons: on the 8086, SP got decremented prior to the fetch,
# and also wasn't valid as a base or index register.
dup:    push (%esp)
        jmp next
# Swap top two stack items ("exch" in PostScript)
swap:   pop %eax
        pop %ebx
        push %eax
        push %ebx
        jmp next
# Stack manipulation ( w1 w2 -- w1 w2 w1 )
# technically not necessary, but it's so easy and tiny
over:   push 4(%esp)           
#        jmp next               fall through because "next" is next
	
# "next" fetches the next bytecode and runs it.  It's placed
# here in the middle of the bytecode definitions so that more
# of them can use the short two-byte jump form to get to it.

next:
        xor %eax, %eax          # set %eax to 0
        lodsb                   # load %al from where %esi points
                                # (%esi is the interpreter pointer)
        lea token_table(,%eax,4), %eax  # eax := token_table + eax * 4bytes
        jmp *(%eax)             # now %eax points at an address in the token
                                # table; so we fetch that address from the
                                # token table and jump to it.

# Push true if n negative. ( n -- f )
negative:
        pop %eax                
        cdq
        push %edx
        jmp next
# Bitwise operators:  
and:    pop %eax
        pop %ebx
        and %ebx, %eax
        push %eax
        jmp next
or:     pop %eax
        pop %ebx
        or %ebx, %eax
        push %eax
        jmp next
xor:    pop %eax
        pop %ebx
        xor %ebx, %eax
        push %eax
	jmp next
# add two unsigned numbers, returning sum and carry.
# ( u1 u2 -- u3 cy )
umplus:
        xor %ecx, %ecx
        pop %eax
        pop %ebx
        add %ebx, %eax
        rcl $1, %ecx
        push %eax
        push %ecx
        jmp next

# Copy the top of the return stack onto the data stack.
# May be the traditional Forth word "I".
r_at:   push (%ebp)
        jmp next

# syscall5:   
# Linux system call with up to 5 arguments
# This is no longer the fashionable way to make system calls
# in Linux.  Now you're supposed to use SYSENTER on newer
# CPUs, and rather than have you figure out which one to use,
# the kernel mmaps a chunk of code called a VDSO into your
# memory space at a random address and tells you where to
# find it using the ELF auxiliary vector.  Then you're
# supposed to invoke the dynamic linker or something to parse
# the ELF executable mysteriously manifested in this way by
# the kernel, and then resolve an undefined symbol in libc
# into calls to it.  See "What is linux-gate.so.1?"
# http://www.trilithium.com/johan/2005/08/linux-gate/
# "The Linux kernel: System Calls" by Andries Brouwer, 2003-02-01
# http://www.win.tue.nl/%7Eaeb/linux/lk/lk-4.html
# "About ELF Auxiliary Vectors" by Manu Garg
# http://manugarg.googlepages.com/aboutelfauxiliaryvectors

# But the old int $0x80 approach still works, thank goodness,
# because all of that is *way* more than these ten
# instructions.
syscall5:
        pop %edi
        ## we have to save %esi for the interpreter
        mov %esi, -4(%ebp)
        pop %esi
        pop %edx
        pop %ecx
        pop %ebx
        pop %eax
        int $0x80
        push %eax
        mov -4(%ebp), %esi
        jmp next

### Basic Interpreted Words

# System call with three arguments.
syscall3:
        call dolist
        .byte 35, 35, 33, 9     # push two 0s, syscall5, ret
# System call with one argument.
syscall1:       
        call dolist
        .byte 35, 35, 37, 9     # push two 0s, syscall3, ret
bye:    call dolist
        .byte 6,__NR_exit, 35, 34, 9
zero:   call dolist             # push 0
        .byte 6,0, 9
one:    call dolist             # push 1
        .byte 6,1, 9

	
# This word outputs a string whose address and count are on 
# the stack.  ( b u -- )
# "call dolist" magically treats whatever follows as bytecode.
# This confuses disassemblers.
type:    call dolist
        .byte 20, 20            # move two args onto rstack
        .byte 6,__NR_write      # system call is __NR_write
        .byte 36                # push constant 1: stdout
        .byte 19, 19            # move two args back from rstack
        .byte 37                # call syscall with 3 args
        .byte 23                # discard return value
        .byte 9                 # return

# The last few words exist just to poke string addresses
# and lengths onto the stack so "type" can print them.
# (I probably should have just written a gas macro...)
hello:  call dolist
        .byte 7                 # dolit_32 pushes a 32-bit
        .int hello_string       # literal, an addr here
        .byte 6,5, 9	        # push literal 5 and return
        .section .rodata        # define the actual string:
hello_string:
        .ascii "hello"
        .text

world:  call dolist
        .byte 7
        .int world_string
        .byte 6,5, 9	
        .section .rodata
world_string:
        .ascii "world"
        .text

comma:  call dolist
        .byte 7
        .int comma_string
        .byte 6,2, 9            # comma string only 2 in length
        .section .rodata
comma_string:
        .ascii ", "
        .text

newline:        
        call dolist
        .byte 7
        .int newline_string
        .byte 6,1, 9
        .section .rodata
newline_string:
        .ascii "\n"
        .text

Sat, 10 Nov 2007

An HTML attachment was scrubbed...
URL: http://lists.canonical.org/pipermail/kragen-discuss/attachments/20071110/22708647/attachment.htm

Fri, 09 Nov 2007

// Simple direct-token-threaded (in the Forth sense) interpreter. -*- asm -*-
// by Kragen Javier Sitaker; dedicated to the public domain, i.e. I
// relinquish whatever exclusive rights copyright law gives me with
// regard to this work.
// Major parts taken from Richard W.M. Jones's public-domain JONESFORTH 42
// by Richard W.M. Jones <rich at annexia.org> http://annexia.org/forth


// This program just outputs "hello, world, hello" under Linux.

// to compile:   
// gcc -m32 -nostdlib -static -o sdtbi simple-direct-threaded-bytecode.S


// The normal Forth representation of a function is as an array of
// pointers to the other functions it calls, in sequence; a few of
// those other functions may move the interpreter pointer around in
// that array, or snarf up a constant that's stored in the array, or
// stuff like that, but for the most part, the functions just get
// called in sequence.  This is called "threaded code" and it's fairly
// compact, especially on 16-bit systems where the pointers are only
// two bytes.

// A traditional approach taken by Forth implementations to reduce
// code size even further is called "token threading".  Rather than
// making arrays of 16-bit or 32-bit pointers, they make lists of
// 8-bit indices into an array of pointers.  This has two advantages:

// 1. the indices are one fourth the size of a list of 32-bit pointers;
// 2. it is possible to save these lists of indices somewhere outside
//    of memory and continue to use them even after making some
//    changes to the code, as long as the same indices in the table
//    have the same meanings.  So, for example, you could write some
//    boot firmware in this "bytecode".

// It also has some disadvantages:
// 1. You run out of space in the table.  Even a fairly minimal full
//    Forth system contains close to 256 subroutines.  You can
//    mitigate this by packing, say, two 12-bit pointers every three
//    bytes, or maybe by having a special bytecode that looks up the
//    next byte in an extended table.
// 2. It's slower and makes the machine-code part of the program take
//    more space.  The traditional LODSW; JMP AX version of $NEXT from
//    the eForth Model, which fetches and executes the next execution
//    token in the threaded list, is three bytes and two instructions;
//    my 'next' here is 12 bytes and four instructions, which is big
//    enough that I jump to it (5 bytes) rather than making an assembler
//    macro.  Which blows your branch target buffers to pieces.  Oh well.

// Anyway, so this is an example program built using this technique.
// It doesn't implement Forthlike stacks or interpreted subroutines or
// the ability to define new subroutines; it just runs a bunch of
// machine-code subroutines one after the other.

// include system library header so we know __NR_exit = 1 and __NR_write = 4
#include <asm/unistd.h>
	
        .section .rodata        // stuff goes in ELF's read-only data section
token_table:                    // table to define the "bytecode" instructions
	// These are pointers to machine-code subroutines
        //   0      1      2      3      4      5
        .int exit0, print, hello, world, comma, newline
instructions:
	// And here is the actual "program" in that bytecode.
        .byte 2, 1, 4, 1, 3, 1, 4, 1, 2, 1, 5, 1, 0
        	
        .text                   // the following stuff goes in the text segment
        .globl _start           // declare _start as a global symbol 
                                // (otherwise ld won't be able to find it)
_start:                         // this is the entry point for ELF I guess
        mov $instructions, %esi // %esi is the interpreter pointer register
        jmp next                // and now we start the interpreter.
                                // (somewhat silly since we could just
                                // fall through..)
	
// The rest of the program consists just of the definitions of the
// six things in the token_table.
next:
        xor %eax, %eax          // set %eax to 0
        lodsb                   // load %al from where %esi points
                                // (%esi is the interpreter pointer)
        lea token_table(,%eax,4), %eax  // eax := token_table + eax * 4bytes
        jmp *(%eax)             // now %eax points at an address in the token
                                // table; so we fetch that address from the
                                // token table and jump to it.
	
	// This is no longer the fashionable way to make system calls
	// in Linux.  Now you're supposed to use SYSENTER on newer
	// CPUs, and rather than have you figure out which one to use,
	// the kernel mmaps a chunk of code called a VDSO into your
	// memory space at a random address and tells you where to
	// find it using the ELF auxiliary vector.  Then you're
	// supposed to invoke the dynamic linker or something to parse
	// the ELF executable mysteriously manifested in this way by
	// the kernel, and then resolve an undefined symbol in libc
	// into calls to it.  See "What is linux-gate.so.1?"
        // http://www.trilithium.com/johan/2005/08/linux-gate/
        // "The Linux kernel: System Calls" by Andries Brouwer, 2003-02-01
        // http://www.win.tue.nl/%7Eaeb/linux/lk/lk-4.html
        // "About ELF Auxiliary Vectors" by Manu Garg
        // http://manugarg.googlepages.com/aboutelfauxiliaryvectors

        // But the old int $0x80 approach still works, thank goodness,
        // because all of that is *way* more than these three
        // instructions.

exit0:         
        mov $0, %ebx            // first argument to system call: 0
        mov $__NR_exit, %eax    // system call to call: exit
        int $0x80               // make a system call
	
print:  
        mov $1, %ebx  // stdout
        // we expect address in %ecx and length in %edx here
        mov $__NR_write, %eax
        int $0x80
        jmp next

        // The rest of the code exists just to poke string addresses
        // and lengths into %ecx and %edx so print can print them.
        // (I probably should have just written a gas macro...)
hello:
        mov $hello_string, %ecx
        mov $5, %edx
        jmp next
        .section .rodata
hello_string:
        .ascii "hello"
        .text

world:
        mov $world_string, %ecx
        mov $5, %edx
        jmp next
        .section .rodata
world_string:
        .ascii "world"
        .text

comma:
        mov $comma_string, %ecx
        mov $2, %edx
        jmp next
        .section .rodata
comma_string:
        .ascii ", "
        .text

newline:
        mov $newline_string, %ecx
        mov $1, %edx
        jmp next
        .section .rodata
newline_string:
        .ascii "\n"
        .text

Thu, 01 Nov 2007

#!/usr/bin/python
"""Demonstrate harmonic synthesis in Python.

Python itself is slow enough that if we had to execute multiple Python
bytecodes per sample, we'd probably not keep up, even on modern
hardware.  But with Numerical Python ("Numeric") we can run the inner
loops of harmonic synthesis in C and have "NSL" --- no stinking loops!

And PyGame is awesome in that (a) it's easy to get stuff up and
running quickly and (b) it comes with an interface to Numeric.

"""

import pygame, time, random, Numeric, pygame, pygame.sndarray
sample_rate = 44100

def sine_array_onecycle(hz, peak):
    "Compute one cycle of an N-Hz sine wave with given peak amplitude."
    length = sample_rate / float(hz)
    omega = Numeric.pi * 2 / length
    xvalues = Numeric.arange(int(length)) * omega
    return (peak * Numeric.sin(xvalues)).astype(Numeric.Int16)
def sine_array(hz, peak, n_samples = sample_rate):
    """Compute N samples of a sine wave with given frequency and peak amplitude.

    Defaults to one second."""
    return Numeric.resize(sine_array_onecycle(hz, peak), (n_samples,))
def second_harmonic(hz):
    "Compute a wave with a strong second harmonic."
    return sine_array(hz, 16384) + sine_array(hz * 2, 16384)
def brass(hz):
    "Compute a sound with some odd harmonics.  Doesn't really sound brassy."
    return sum([sine_array(hz, 4096),
                sine_array(hz * 3, 4096),
                sine_array(hz * 5, 4096)])
def play_for(sample_array, ms):
    "Play given samples, as a sound, for N ms."
    sound = pygame.sndarray.make_sound(sample_array)
    sound.play(-1)
    pygame.time.delay(ms)
    sound.stop()
    
def main():
    "Play some funky sounds, as a demo."
    pygame.mixer.pre_init(sample_rate, -16, 1) # 44.1kHz, 16-bit signed, mono
    pygame.init()
    play_for(sine_array(440, 4096), 500)
    play_for(second_harmonic(440), 500)
    play_for(brass(440), 500)

if __name__ == '__main__': main()