Sat, 31 Mar 2007

Great stuff, as usual. I wouldn't make all the initial IDs random.
Reason: you can e.g. set the MAC from a WGS84 GPS position fix
(about /m^2 Earth surface resolution, IIRC). Ideally, the ID
should be a real 3d coordinate, of course.

By prepopulating the ID landscape (there's probably some
critical density) you'd get lots of domains which will speed up
your annealing.

Refining is basically: one bit for the hemisphere, another bit
for a yet another bisection, iterate.

There's more to the ID encoding, you have to minimize
the amount of bits flipping for nodes in circular orbits
(anything moving on Earth surface also applies).

I wouldn't also limit this to just direct neighbours,
a node can sometimes see quite far -- but you'd
get other distance metrics, such as signal strenght
or a relativistic pingpong measurement.

On Sat, Mar 31, 2007 at 03:37:02AM -0400, Kragen Javier Sitaker wrote:
> Quick summary: it might work but I haven't figured out how to make it
> work.
> 
> <html><head><title>Random bit-vector network addressing</title>
> <!-- see end of file for explanation, where it's put so that it's readable in the browser -->
> <script src="../MochiKit-1.3.1/lib/MochiKit/MochiKit.js"></script>
> <script type="text/javascript">
> var bits_per_cell, bits_per_row, cells_per_row, cells_per_col, table

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820            http://www.ativel.com
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

Quick summary: it might work but I haven't figured out how to make it
work.

<html><head><title>Random bit-vector network addressing</title>
<!-- see end of file for explanation, where it's put so that it's readable in the browser -->
<script src="../MochiKit-1.3.1/lib/MochiKit/MochiKit.js"></script>
<script type="text/javascript">
var bits_per_cell, bits_per_row, cells_per_row, cells_per_col, table

var randbits = []
function replenish_bits() {
   var num = Math.random()
   for (;;) {
      num *= 2
      if (num > 1) {
	 randbits.push(1)
	 num -= 1
      } else if (num == 1) {
	 if (randbits.length) return
	 else num = Math.random()
      } else {
	 randbits.push(0)
      }
   }
}
function random_bit() {
   if (!randbits.length) replenish_bits()
   return randbits.pop()
}
function random_bitvector(length) {
   return map(random_bit, range(length)).join('')
}

function how_many_bits_per_row(total_bits) {
   /* make roughly square */
   var max = Math.ceil(Math.sqrt(bits_per_cell * 4))
   var min = Math.ceil(max/2)
   var ii = max
   for (var ii = max; ; ii--) {
      if (ii == min || total_bits % ii == 0) return ii
   }
}

function reset() {
   bits_per_cell = $('bits_per_cell').value
   cells_per_row = $('cells_per_row').value
   cells_per_col = $('cells_per_col').value
   bits_per_row = how_many_bits_per_row(bits_per_cell)
   table = map(function() { 
      return map(function() { 
	 return random_bitvector(bits_per_cell) 
      }, range(cells_per_row))
   }, range(cells_per_col))
   refresh()
}

function display_cell(cellvalue) {
   var rv = []
   while (cellvalue.length) {
      rv.push(cellvalue.substr(0, bits_per_row))
      rv.push(BR())
      cellvalue = cellvalue.substr(bits_per_row)
   }
   return rv
}

function refresh() {
   replaceChildNodes('dest', TABLE({cellPadding: 2, cellSpacing: 0},
      map(function(ii) {
	    return TR(null, map(function(jj) {
	       var tcc = function(ev) { turn_cells_colors(ii, jj) }
	       var rv = TD(null, display_cell(table[ii][jj]))
	       connect(rv, 'onclick', tcc)
	       return rv
	    }, range(table[0].length)))
      }, range(table.length))))
}

function similarity(r0, c0, r1, c1) {
   var b0 = table[r0][c0]
   var b1 = table[r1][c1]
   var count = 0
   forEach(range(b0.length), function(ii) {
      if (b0[ii] == b1[ii]) count++
   })
   return count / b0.length
}

function similarity_color(r0, c0, r1, c1) {
   var sim = similarity(r0, c0, r1, c1)
   if (sim == 1) return '#77ff77'
   /* Map similarities of one-third and up into colors ranging from
    * white, through pink, into red. Bizarrely fromHSV(1, 0, 1) is black,
    * though fromHSV(1, 0.001, 1) or fromHSV(1, -0.001, 1) are white */
   return Color.fromHSV(1, 1.5*sim - 0.499, 1).toHexString()
}

function turn_cells_colors(r0, c0) {
   set_cell_backgrounds(function(ii, jj) {
      return similarity_color(r0, c0, ii, jj)
   })
}

function colors_by_bit(n) {
   set_cell_backgrounds(function(ii, jj) {
      return table[ii][jj][n] == '1' ? 'white' : 'gray'
   })
}

function set_cell_backgrounds(cell_color_func) {
   var domtable = $('dest').firstChild
   var ii = 0
   var jj = 0
   function turn_a_cell() {
      domtable.childNodes[ii].childNodes[jj].style.background = 
         cell_color_func(ii, jj)
      jj++
      if (jj == table[0].length) {
	 ii++
	 jj = 0
      }
      if (ii < table.length) setTimeout(turn_a_cell, 0)
   }
   turn_a_cell()
}

function neighbors(ii, jj) {
   function in_bounds(arg) {
      var ii = arg[0], jj = arg[1]
      return ii >= 0 && jj >= 0 && ii < table.length && jj < table[0].length
   }
   function get_cell(arg) { return table[arg[0]][arg[1]] }
   return map(get_cell, filter(in_bounds, 
			       [[ii, jj], [ii-1, jj], [ii, jj-1], 
				[ii+1, jj], [ii, jj+1]]))
}

function majority_rule_digit(digits) {
   var counts = [0, 0]
   forEach(digits, function(digit) { counts[digit] += 1 })
   if (counts[0] > counts[1]) return 0
   else if (counts[1] > counts[0]) return 1
   else return random_bit()
}

function random_rule_digit(digits) {
   var counts = [0, 0]
   forEach(digits, function(digit) { counts[digit]++ })
   return (Math.random() < counts[0] / (counts[0] + counts[1])) ? 0 : 1
}

function map_digits(digit_rule, neighbors) {
   var rv = map(digit_rule, map(function(ii) { 
      return map(function(neighbor) { return neighbor[ii] },
		 neighbors)
   }, range(neighbors[0].length)))
   return rv.join('')
}
current_digit_rule = random_rule_digit

function change_rule(select_object) {
   var o = select_object.options[select_object.selectedIndex].value
   if (o == 'majority_rule_digit') current_digit_rule = majority_rule_digit
   else current_digit_rule = random_rule_digit
}

function mix_cell(ii, jj) {
   return map_digits(current_digit_rule, neighbors(ii, jj))
}

function propagate(continuation) {
   var newtable = []
   var ii = 0
   var jj = 0
   var newrow = []
   /* explicit CPS loop to avoid "script has stopped responding" message */
   function do_cell() {
      newrow.push(mix_cell(ii, jj))
      jj++
      if (jj == table[0].length) {
	 jj = 0
	 newtable.push(newrow)
	 newrow = []
	 ii++
      }
      if (ii < table.length) setTimeout(do_cell, 0)
      else {
	 table = newtable
	 refresh()
	 if (continuation) continuation()
      }
   }
   do_cell()
}

function propagate_n(n) {
   if (n > 0) propagate(function(){propagate_n(n-1)})
}

function init() {
   reset()
}

</script>
</head>
<body onload="init()">
<h1>Random bit-vector network addressing</h1>
<p style="font-size: 2em">
<button onclick="propagate_n($('n').value)">propagate</button> <input id="n" 
    value="1" size="3"> times using a
    <select onchange="change_rule(this)" id='current_digit_rule'>
        <option selected="selected" value="random_rule_digit">stochastic</option>
        <option value="majority_rule_digit">mostly deterministic</option>
    </select> rule, click on a cell to color other cells by similarity, 
    <button onclick="colors_by_bit($('bit').value)">color</button> by bit
    <input id="bit" value="0" size="3" /> (
      <button onclick="$('bit').value++; colors_by_bit($('bit').value)"
        >up</button>,
      <button onclick="$('bit').value--; colors_by_bit($('bit').value)"
        >down</button>),
    or <button onclick="reset()">reset</button> with 
    <input id="cells_per_col" value="15" size="3"> rows of 
    <input id="cells_per_row" value="15" size="3"> cells containing
    <input id="bits_per_cell" value="32" size="4"> bits each.
</p>
<div id="dest"></div>

<h2>Explanation</h2>

<p>So I had this very simple idea for doing geographical addressing
and routing in ad-hoc mesh networks.  Every node starts by generating
a random ID, and then for some number of generations, executes the
following procedure:</p>

<ol>
<li>Send all your neighbors your current ID.</li>
<li>Receive all your neighbors' IDs.</li>
<li>Given your own ID and your neighbors' IDs, calculate a new ID for
yourself that is somehow more similar to your neighbors' IDs than your
old ID was.</li>
</ol>

<p>The idea is that, after some number of generations, much of the
local variation will be smoothed out of the ID space, so that you can
route a message to a node by forwarding it to whichever one of your
neighbors has the ID most similar to the destination node's ID, or
possibly choose one of your neighbors at random weighted by that
function.</p>

<p>There's a bit of tension in this idea --- if the smoothing process
is totally effective, every node will end up with the same ID, making
them useless for addressing, and if it's not that effective but still
rather too effective, there won't be much local variation in ID, so it
will be hard to figure out which direction a message should go in.
But if it's not smooth enough, the message won't make progress towards
its destination.</p>

<p>So this is a little experiment to try this idea out.  Each node has
a 32-bit randomly-generated address to start out; it's topologically
connected to the cells above and below it and to its left and right;
"similarity" is Hamming distance, new IDs are calculated bit by bit,
and there are two algorithms available for making an ID bit "more
similar" to those of a node's neighbors':</p>

<dl>

<dt>stochastic</dt>

<dd>The node's and neighbors' 0 bits (in a particular position,
e.g. bit #11) are counted; the new bit is 0 with probability N/M,
where N is the number of 0 bits found, and M is the total number of
bits (5 for most nodes, ranging down to 3 in the corners).  So if all
the bits examined are the same, the new bit is guaranteed to have the
same value; but if e.g. a single bit out of 5 is 0, then the new bit
is 0 with 20% probability and 1 with 80% probability.</dd>

<dt>mostly deterministic</dt>

<dd>The same set of bits are examined, but the new bit is the value
taken by the majority of the old bits (or uniformly randomly selected
if there was a tie, which can happen at the edges)</dd>

</dl>

<h2>Results</h2>

<p>It doesn't work as well as I would like.  Neither of the algorithms
is very effective at eliminating "local optima" on 32-bit IDs in a
15x15 grid, even after a large number of generations.  The
mostly-deterministic algorithm does eliminate the majority of local
bumpiness in the closeness function, so messages started nearby would
propagate successfully.</p>

<p>Neither algorithm is "effective" enough to produce noticeable
numbers of identical IDs.</p>

<p>Unsurprisingly, both algorithms seem to work if the number of bits
is larger than the number of nodes, as long as you run them long
enough.  But maybe that's just because I haven't been able to wait
very long to run largish simulations with a lot of bits.</p>

<p>The deterministic algorithm seems to mostly settle down after a few
generations, maybe 10 or 20, which means that no new information is
being transmitted at that point.  This is bad news, since in 10 or 20
generations, information can only flow (optimistically) 10 or 20 hops.
It gets a little bit better with more bits, but not a lot.</p>

<h2>Other directions</h2>

<p>It might be possible to rescue this approach in one or more of the
following ways:</p>

<ul>
<li>Randomly flip bits from time to time as a modification to the
mostly-deterministic algorithm.</li>

<li>Rather than using vectors of individual bits, use vectors of small
integers, perhaps of 2 or 3 bits each; then "more similar" IDs might
be a median, or stochastically chosen from near the median.</li>

<li>Change some bits more slowly than others, or change some bits
according to the stochastic algorithm, and the others according to the
majority-rule algorithm.</li>

<li>Gradually "anneal" the IDs: start with the stochastic algorithm
(or an algorithm where the bits are just uniformly selected each
step), then gradually increase the probability weight from the local
majority.</li>

</ul>

</body>
</html>

Mon, 26 Mar 2007

* Kragen Javier Sitaker <kragen at pobox.com> [2007-03-20 19:15]:
> Aristotle Pagaltzis writes:
> > * Kragen Javier Sitaker <kragen at pobox.com> [2006-11-11 09:37]:
> > > # Generate a really big page of small inline JPEG images from a
> > > [fugly code omitted]
> > A cleaner version that crawls the filesystem without external aid:
> > [most of cleaner version omitted too]
> >         open my $fh, '<', $filename or die "Can't open $filename: $!";
> 
> You don't like use Fatal?  :)

So-so. I think its output format is pretty awful (and generating
that doubles the otherwise small module’s code!), though it is
tolerable. So I do use Fatal when it saves me substantial work,
but not for just one or two `open`s.

> > This code uses a minor trick: the <> operator will open and
> > read all files listed in @ARGV sequentially, so the code
> > stuffs the names of the data files of interest into that
> > array, then turns them into URL file names, then uses the
> > diamond operator to read them.
> 
> Yeah, I think that part would have taken me a while to figure
> out without the explanation --- although I've used while (<>) a
> thousand times, I've never used it to read a bunch of filenames
> the Perl program itself had stuffed into @ARGV.

Interesting; I picked it up from the community-accepted canonical
idiomatic way to write a `slurp` function:

    sub slurp {
        my ( $filename ) = @_;
        local ( @ARGV, $/ ) = $filename;
        return <>;
    }

Here you end up with a single-entry `@ARGV` and an undefined `$/`
for the duration of the function, which causes `<>` to gobble up
the file whose name was just put into `@ARGV`. (Sidenote since we
mentioned Fatal previously: when using `<>`, Perl will supply its
own (though non-fatal) error messages.)

(Actually, though, that is unsafe: the list of variables to
localise should  read `@ARGV, $ARGV, ARGV, $/`. Alternatively, in
Perl 5.8 and newer, you can localise the entire ARGV glob:

    local ( *ARGV, $/ ) = [ $filename ];

But globs are an ugly wart in Perl 5, and I like to pretend they
don’t exist wherever possible; not to mention how much subtlety
is involved in how/why that line work on top of the idioms in the
rest of the snippet. (I’m not going to explain it here.) So I
prefer the longer version, which has the bonus that it will run
on old perls.)

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Considered as a programming language, Bicicleta has some drawbacks.

1. It tends to err silently.  If you use it in the way I expect people
   normally will, with every argument to every function having a
   sensible default, the consequence is that forgetting or misspelling
   the argument in the function call will produce a wrong result
   rather than an error message.  And, by design, there's no way to
   detect and complain about extra arguments --- for example, if you
   misspell the argument name.

   You can reduce the first part of this problem by overriding all the
   arguments with errors in the public version of a function, or
   deleting them entirely from the definition, but I expect this to be
   uncommon.

2. There's no way to override the behavior on derivation.  "No way to
   detect extra arguments" is a special case of this.

3. There's no multiple inheritance.

4. There's no static typing.

5. Class hierarchies are likely to be relatively deep.  This has been
   an obstacle to understanding in other languages in the past, such
   as Smalltalk, but I am hoping that Bicicleta's user interface
   improvements will compensate.

6. You can define new methods on objects inherited from existing
   classes, and you can evaluate code in a context where it will use
   your modified objects instead of the basic ones, but it might be to
   add new methods to standard library objects, just as in Java or
   Python.

I expect that these drawbacks will be less important than its advantages.

Sun, 25 Mar 2007

> I wonder if the Symbolics microcode was horizontal?  If they used
> vertical microcode while the Dorado was horizontal, maybe that might
> explain the difference between their perception of Moore's Law and
> Alan Kay's --- Kay consistently complains that modern CPU
> architectures run Smalltalk and other late-bound things slowly, which
> is sort of the opposite of the apocryphal Symbolics comment I cited
> earlier.

I don't know Kay's argument well enough to address it directly, but I  
believe a modern CPU architect would say the critical factor is to run  
fast stuff fast, so early-bound things go as quickly as possible, and  
late-bound things wind up going as slowly as necessary -- if the only way  
to make late-bound things appear to run in the same number of clocks as  
early-bound is to slow down the machine, that's a tradeoff they've thus  
far been unlikely to make.

(this implies there ought to be a hierarchy of binding: late-binding for  
code whose data lives on the net or on disk, early-binding for code whose  
data lives in L1 or in registers, and a suitable mix for everything in  
between)

But I don't think this is the opposite of the Symbolics comment:

> I have a vague memory that some of the Symbolics folks said that when
> they reimplemented their 36-bit CPU as an Alpha AXP program
> ("OpenGenera"), it fit in the L1 cache of the Alpha, and it ran Lisp
> programs about as fast as you'd expect microcode to run on a machine
> with that kind of clock speed.

The two closest references I was able to track down were:
http://www.unlambda.com/lispm/memo528.html
http://pt.withy.org/publications/VLM.html

My understanding is the first, AI Memo 528, was roughly the architecture  
for all the lisp machines with the possible exception of LMI's K-machine.   
(After reading Stallman's inspiration for GNU in the demise of the AI lab,  
I have to wonder to what extent the "AI winter" came about because lisp  
machines were sold largely to projects flush with Star Wars money?   
Afterwards, with no raygun, they appeared more exotic than practical -- I  
am personally glad to have picked the web over Lucid in the mid-90's.   
This hypothesis implies there should now be an "AI spring" with the modern  
DOD, which, at least in robotics and speech tagging (consider automated  
wiretaps), seems to be true)

The second was an abstract for a paper that was never written.  They  
appear to have coded in a style similar to microcode, making use of  
multiple dispatch, paying attention to cache conflicts, and requiring 64  
bit data paths.  Given that simulating hardware is an early-bound problem  
(and that they had a wider and faster machine than the one being  
emulated), it doesn't sound surprising they got decent performance from a  
modern architecture.

Squint at emulation just right, and it looks like programming in general.   
One has an abstract problem, for which one wants to compute an abstract  
solution:

     A ---> A'

which (if one is not EWD, who saw no need to suffer from the diseases  
which he studied :-) tends to involve using a concrete machine to  
calculate a concrete solution:

     C ---> C'

then coding the abstract into the concrete (A->C), and decoding the result  
(C->A)

     A ...> A'
     |      ^
     v      |
     C ---> C'

That is, we want to emulate the abstract computation with our concrete  
calculation.

I think I now understand why Dijkstra was interested in adjunctions: when  
F:A->C and G:C->A form an adjunction, then GF is a nice endofunctor, and  
GFGF => G(FG)F => GF reduces to another nice endofunctor, (and GFGFGF is  
indeed associative) so when we string a bunch of these together we get a  
monad.

> Maybe [debugging] accounts, in part, for the great interest in static
> correctness proofs in the functional programming community?  OCaml has
> a reportedly very nice debugger that works only with its bytecode, not
> with its native code, and they simply try very hard to ensure that
> they have the same behavior so you never have to debug the native
> code.  Having a static type system that permits type erasure without
> loss of safety helps with this.

Yes, exactly.  The way to debug easily is to behave exactly as above, and  
map between the concrete and abstract at each step (GF...GF).  An  
interpreter does this "for free", that is, ignoring execution time.  The  
way to run quickly is to erase the intermediate (normally identity)  
C->A->C steps, and run the problem strictly in the concrete domain  
(Gccc...ccccF).  Compilers do this "for free", but then if we want to look  
at "the middle of" the computation, we must then have a debugger to  
reconstruct the equivalent A' from any given C' (and possibly synthesize a  
new C'', should we alter A' to A'').

Horowitz and Hill point out that similar games are played in hardware with  
sequential PLDs -- if one has the gates, it's easier to make a strictly  
synchronous design which keeps its (abstract) outputs in the registers.   
If one doesn't have the gates, it's more efficient to make a design which  
keeps the bare minimum of (concrete) state in the registers.

-Dave

:: :: ::

> It's too bad gdb hasn't acquired a decent scripting language, all
> these years after the Plan9 paper (was it "Acid: a debugger built from
> a language"?)

Sounds like the right paper.  I wonder to what extent Forthers,  
Smalltalkers, and Lispers use the multilevel nature of their languages to  
script debugging?

> If I remember correctly, bass started out using ColorForth as a basis?
> I'm afraid it's been a number of years since I looked at it.  Is there
> a place to see it these days?

It was on a laptop that didn't survive the move, so now it's on a bare  
2.5" disk and a few backup CDs packed somewhere in the "cave".  (a  
basement area that supposedly contains supplies in case of nuke attack,  
but in the case of swiss tends to be used for wine, prosecco, and skis;  
and in the case of at least some americans is stuffed full of things  
which, had we had half a mind when moving, would never have been shipped  
across the atlantic in the first place.  What's the news on you two?  Is  
it BA?)

I should dig it out sometime and GPL it, but it never progressed much  
beyond what you've seen.  The register allocation wasn't for general use,  
but was to allow full bandwidth when doing sequences of loads and stores  
-- somewhat like how the concatenative people are experimenting with  
interspersing linear code with stack-shuffling.  I was happy fairly with  
the code, but never came up with an equivalently structured approach to  
data literals.  (or rather, never came up with an approach that didn't  
seem to eventually lead to a traditional language)  I was interested in  
YAML for this purpose before it grew; now it'd be tempting to just put in  
something JSON-like (which would require changing the code syntax to  
preserve duality) and call it done.

The fellow who provides my alpine elevation data uses a homebrew language  
(which he actually ported from moto to intel at some point) that wound up  
in a similar place, but as his interests are array processing and  
raytracing, he's never bothered implementing much in the way of data types  
either.

Someday, I'd like to look at the initialization stuff in Virgil to see if  
it'd solve one of the big problems: literals, like code, live in readonly  
("text") and are expressed abstractly; we'd like for the intermediate  
variable values which live in data to be expressed concretely.  I had a  
decorator kludge in bass for literals that was a half-assed approach,  
something more elegant would not only handle specializing data  
representations but would also be able to specialize code fragments so the  
same source might have several machine-code translations to run in  
different monads -- and at this point the problem is to avoid simply  
reinventing macros.

The effort to stay 1-1 (and to prioritize sequences over graphs) may just  
be too restrictive; the approach I would try now would involve an algebra  
whose top level values would still be hardware-oriented bytestrings, but  
whose intermediates could be more symbolic.  It still wouldn't be powerful  
enough to turn a declarative text into an FSM, but at least it'd be able  
to transpose a COND form into a decision table or do peephole tail call  
optimization.

> There's the apocryphal story of Leonard Zubkoff implementing a
> debugger in a weekend for a new platform, rather than waiting weeks
> for someone else to do the job; I wonder how much more or less his
> debugger contained than DEBUG.COM.

Probably much more -- RPG's story is that it was a "symbolic UNIX  
debugger": when one controls the compiler symbols are simple, and  
ptrace(2) breaks things up into a debugging process and a debugged  
process, which also simplifies things.  So I'd imagine it at least had  
symbols and breakpoints {go, stop, step into}, and probably had expression  
evaluation and possibly step over.  (step out requires parsing opcodes,  
but the RT was a RISC, which would have helped)

> What do you mean, "loops with tails"?  You mean that you don't tend to
> factor the code into a lot of small functions?

Calling a function involves a label, so lots of small functions would mean  
lots of labels, which is what we're trying to avoid.  The head of a loop  
requires a label, but any straight-line code we can tack onto the end  
doesn't.  So we tend to wind up with chunks that consist of reducing  
something to a canonical form (the loop), followed by addition of more  
data which will be reduced in another chunk (the tail).  I guess it's  
effectively an apply(Y, eval(X)) -- somewhat reminiscent of writing in sed  
and fairly reminiscent of the algebraists' habit of canonically splitting  
maps into projection followed by injection.

> I wonder why DEBUG couldn't rememember labels or breakpoints.
> FIG-Forth ran on the same hardware and remembered the names of every
> routine in the system.  Maybe DEBUG had to fit into a tiny static
> corner of a 64K memory segment, to avoid limiting its applicability to
> small programs?  But it isn't restricted to disassembling or debugging
> tiny-model programs.

Of course if I was going to write more than 512 bytes of code that way,  
and wanted an interactive environment, I'd probably wind up with something  
like a forth.  The history seems to be that DEBUG.COM, like EDIT.COM, was  
part of the "quick and dirty OS" that Microsoft bought, and had been  
written figuring that someone else would soon replace them.

> Have you been following Ian Piumarta's Pepsi/Coke/Jolt/Albert work?
> He's very much focused on questions like how to allow you to modify
> Array>>#at: at runtime without breaking everything.  (Of course, if
> your modification is broken, you will still have problems...)

No, thanks for the links!  I think the reason for the ST-80 spec in ST at  
the back of the blue book was not only because metacirculars are cool, but  
also because they did cross development/upgrades from time to time by  
building up a foreign image and a foreign VM inside the usual image, and  
then bootstrapping them to standalone.

> I also think that's the first 8-page conference paper I've ever seen
> that includes a complete and working (?) synthesizable CPU design (in
> noweb and Verilog) in the text of the paper.  (It is synthesizable,
> isn't it?  My Verilog is not up to snuff.  The actual Verilog seems to
> be about 240 lines.)
>
> It's too bad that he doesn't report on how much FPGA space it needed
> or how fast it ran.

True.  I think FPGA boards are pretty big and cheap these days (so it  
should be possible to "try it and see"), but haven't had the excuse to  
play with one.  Any recommendations from kragen-discuss?

Sat, 24 Mar 2007

The experience of running the REPL included here is a far cry from the
spreadsheet-like exploration/debugging UI I have in mind, but it at
least allows interactive testing of simple programs.  It should get a
lot better in the coming days.

Here's a sample interactive session:

    Beauty:~/devel/bicicleta kragen$ ./bicicleta_repl
    Bicicleta version 3, Copyright (C) 2007  Kragen Javier Sitaker
    Bicicleta comes with ABSOLUTELY NO WARRANTY; for details see the file
    "COPYING".
    This is free software, and you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    '()' = 2 + 3 * 4    # note: no operator precedence, rather deliberately
    "20"
    '()' = {x=1, y=2}.y  # define an object literal, call a method/read a property
    "2"
    '()' = {'yo mama' = "so fat", 'my mama' = "don't you talk bout my mama!"}.'yo mama'  # method names contain arbitrary characters if you quote them
    "so fat"
    '()' = -3
    parse error
    '()' = 3.negated  # no prefix operators in the language
    "-3"
    '()' = 3 < 4
    "true"
    '()' = 3 < 3
    "false"
    '()' = 3 <= 3
    "true"
    '()' = {self: x=1, y=self.x+2}.y   # you can define one property in terms of another
    "3"
    '()' = {self: x=1, y=self.x+2}{x=5}.y  # overriding the value of x in a derived object
    "7"
    '()' = {'()' = "hi"}.'()'  # method names with weird characters again, but
    "hi"
    '()' = {'()' = "hi"}()  # this one has syntactic sugar to call it
    "hi"
    '()' = {self: x=1, '()' = self.x+2}(x=5)  # including with overrides
    "7"
    '()' = {self: arg1=1, '()' = self.arg1+2}(5)  # positional args are arg1, arg2...
    "7"
    '()' = {fac: arg1 = 3, n=fac.arg1, '()' = (fac.n < 2).if_true(then = 1, else = fac.n * fac(fac.n - 1))}(5) # here's a recursive factorial.
    "120"
    '()' = {self: '[]' = {idx: 4, '()' = idx.arg1 * 2 + 1}}[45]  # '[]' is called by the foo[] syntax
    "91"
    '()' = {this: x=43, y=44, show={userdata=this}}  # try to display an object instead of a string or a number, and ...
    #16=#17=#14# {
    this: x = 43 (in
    (omitted 330 lines where it dumps the entire standard library)
    '()' = 3.5 / 3.1  # there is floating-point math; mixed-mode is still broken
    "1.12903225806"

There is a tar.gz file of all this source code, plus a little more, at
http://pobox.com/~kragen/sw/bicicleta-3.tar.gz for easier downloading
--- about 33 kilobytes.  There are already an earlier version at
http://pobox.com/~kragen/sw/bicicleta-1.tar.gz (and 0).  I'm sorry
it's such a huge email.

Unlike everything posted to kragen-hacks without a notice to the
contrary, this code is not in the public domain; I retain copyright,
but you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any
later version.

First, here's the build script that compiles the REPL and other
things.  I wrote it in sh because I'm on a Mac without make, and I
don't know where to download make.


#!/bin/sh
# Build script for prototype Bicicleta interpreter
set -ve
: ${OCAMLC=ocamlc} ${EXTRAS=}
$OCAMLC -c bicicleta_syntax.ml
ocamlyacc bicicleta_parser.mly
$OCAMLC -c bicicleta_parser.mli
$OCAMLC -c bicicleta_parser.ml
ocamllex bicicleta_lexer.mll
$OCAMLC -c bicicleta_lexer.ml
ocaml $EXTRAS bicicleta_syntax.cmo bicicleta_parser.cmo bicicleta_lexer.cmo \
    bicicleta.ml  # for regression tests
$OCAMLC -c bicicleta.ml
ocaml $EXTRAS bicicleta_syntax.cmo bicicleta_parser.cmo bicicleta_lexer.cmo \
    bicicleta.cmo bicicleta_lib.ml  # more regression tests
$OCAMLC -c bicicleta_lib.ml
$OCAMLC -c bicicleta_repl.ml
$OCAMLC bicicleta_syntax.cmo bicicleta_lexer.cmo bicicleta_parser.cmo \
    bicicleta.cmo bicicleta_lib.cmo bicicleta_repl.cmo -o bicicleta_repl
ocaml $EXTRAS bicicleta_syntax.cmo bicicleta_lexer.cmo bicicleta_parser.cmo \
    bicicleta.cmo bicicleta_lib.cmo bicicleta_dump.ml  # to see show_bicexpr go
$OCAMLC -c bicicleta_run_script.ml
$OCAMLC bicicleta_syntax.cmo bicicleta_lexer.cmo bicicleta_parser.cmo \
    bicicleta.cmo bicicleta_lib.cmo bicicleta_run_script.cmo \
    -o bicicleta_run_script


Here's bicicleta_syntax.ml, which defines the structure of Bicicleta
expressions and values:


(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
type methods = NoDefs
               | Definition of string * bicexpr * methods (* name, body, ... *)
and bicexpr = Name of string
              | Call of bicexpr * string
              | Literal of string option * methods
              | Derivation of bicexpr * string option * methods
              | StringConstant of string
              | Integer of int
              | Float of float
              | NativeMethod of (lookup -> bicobj)
and userdata = UserString of string
              | UserInteger of int 
              | UserFloat of float
and bicobj = ProtoObject 
    (* Five positional parameters is bad style; six is worse.
       Derive of name, selfname, body, env, next, cache *)
             | Derive of string * string option * bicexpr * 
                 lookup * bicobj * (string, bicobj) Hashtbl.t option ref
             | UserData of userdata
             | Error of string * string
and lookup = Phi | Add of string * bicobj * lookup ;;
(* other potential definition of lookup:
  type lookup ;;
  type add = { name: string; value: string; next: lookup; } ;;
  type lookup = Phi | Add of lookup ;; *)


It is used in the usual way by bicicleta_parser.mly, an ocamlyacc
file, to produce Bicicleta expressions from token streams:

/* -*- mode: tuareg; compile-command: "./build" -*- */
/* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
%{
open Bicicleta_syntax;;

(* Reorder the definitions so the earliest ones are outermost; there's
no semantic reason for this, but changing it would require changing
the existing parse regression tests, and also show_bicexpr.  It's also
a convenient place to add names for positional arguments. *)
let mk_defs deflist = 
  let rec fd_defs argcount = function
      [] -> NoDefs
    | (Some name, expr) :: defs -> 
        Definition(name, expr, fd_defs argcount defs)
    | (None, expr) :: defs ->
        Definition("arg" ^ string_of_int argcount, expr,
                  fd_defs (argcount + 1) defs)
  in fd_defs 1 (List.rev deflist) ;;
%}
%token <string> NameTok
%token <string> StringTok
%token <string> OpTok
%token <string> IntTok
%token <string> FloatTok
%token Lparen Rparen Lbrace Rbrace Colon Period Newline Comma Equals Eof
%token Lsquare Rsquare
/* As written, this grammar has five ambiguities: is 1 + 2 + 3 supposed
   to be (1 + 2) + 3 or 1 + (2 + 3), and is 1 + x(3) supposed to be 
   1 + (x(3)) or (1 + x)(3) (with equivalent cases for 1+x[3], 1+x{3},
   and 1+x.y)?  The first one is resolved with %left OpTok below, and
   the others are resolved with the %nonassoc declaration, which gives
   those tokens higher precedence than OpTok by virtue of coming later
   in the source file. ocamlyacc -v is helpful for diagnosing this! */
%left OpTok
%nonassoc Period Lparen Lbrace Lsquare
%start main
%type <Bicicleta_syntax.bicexpr> main
%%

/* 22 productions --- a fairly small grammar.  If you try to write a
   grammar for Common Lisp, you end up with 16 productions to handle
   lists, dotted pairs, symbols, strings, integers, floats, ratios,
   chars, quote, quasiquote, unquote, unquote-splicing, and vectors,
   but that leaves out #=, ##, #*, #:, #|...|#, #+, #-, #., #a, #c,
   #b, #o, #x, #r, $p, #s, ",.", and user-defined read macros.  Also,
   some macros are pretty hairy; the CLHS entry for LOOP has a BNF
   grammar with 34 nonterminals. */

main:
    expr Eof { $1 }
expr:
    expr Period NameTok   { Call($1, $3) }
  | NameTok               { Name $1 }
  | StringTok             { StringConstant $1 }
  | literal               { Literal(fst $1, snd $1) }
  | expr literal          { Derivation($1, fst $2, snd $2) }
  | expr Lparen definitions Rparen { Call(Derivation($1, None, mk_defs $3),
                                         "()") }
  | expr Lsquare definitions Rsquare { Call(Derivation(Call($1, "[]"), None,
                                                          mk_defs $3), 
                                           "()") }
  | Lparen expr Rparen    { $2 }
  | IntTok                { Integer (int_of_string $1) }
  | FloatTok              { Float (float_of_string $1) }
  | expr OpTok expr       { Call(Derivation(Call($1, $2), None,
                                               mk_defs [None, $3]), "()") }
literal:
    Lbrace NameTok Colon definitions Rbrace { ((Some $2), mk_defs $4) }
  | Lbrace definitions Rbrace { (None, mk_defs $2) }
definitions:
    NameTok Equals expr   { [Some $1, $3] }
  | definitions def_separator NameTok Equals expr { (Some $3, $5) :: $1 }
  | expr { [None, $1] }
  | definitions def_separator expr { (None, $3) :: $1 }
  | /* empty */ { [] }
def_separator:
    Comma { () } | Comma Newline { () } | Newline { () }


The token streams, for their part, are produced by the lexer,
bicicleta_lexer.mll, which is written in ocamllex:

{ (* -*- mode: tuareg; compile-command: "./build" -*- *)
(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
    open Bicicleta_parser ;;
    let show_bictoken = function
        | NameTok name -> "NameTok " ^ name
        | StringTok string -> "StringTok " ^ string
        | OpTok op -> "OpTok " ^ op
        | IntTok num -> "IntTok " ^ num
        | FloatTok num -> "FloatTok " ^ num
        | Newline -> "Newline"
        | Lparen -> "(" | Rparen -> ")" | Lbrace -> "{" | Rbrace -> "}"
        | Colon -> ":"  | Period -> "." | Comma -> ","  | Equals -> "="
        | Lsquare -> "[" | Rsquare -> "]"
        | Eof -> "Eof" ;;
    (* this has to be in the standard library somewhere, right? *)
    (* the previous version of this code was arithmetic-heavy and
       bug-prone, but probably a lot faster.  Shorter too if you count
       list_of_string and string_of_list. *)
    let list_of_string string = 
      let rv = ref [] in 
        for i = String.length string - 1 downto 0 do 
          rv := string.[i] :: !rv 
        done ; 
        !rv ;;
    let rec string_of_list = function
        [] -> "" 
      | x :: y -> String.make 1 x ^ string_of_list y ;;
    exception ScrewedUpUnquotableList ;;
    let unquote_list = function x :: tail ->
      let rec unquote_tail accum = function
          [_] -> List.rev accum
        | '\\' :: x :: tail | x :: tail -> unquote_tail (x :: accum) tail
        | [] -> raise ScrewedUpUnquotableList
      in unquote_tail [] tail
      | [] -> raise ScrewedUpUnquotableList ;;
    let unquote s = string_of_list (unquote_list (list_of_string s)) ;;
}

let alus = ['A'-'Z' 'a'-'z' '_']
let alnumus = alus | ['0'-'9']
let ident = alus alnumus*

let quoted_name = "'" ([^ '\'' '\\'] | "\\'" | "\\\\" )* "'"

let string = '"' ([^ '"' '\\'] | "\\\"" | "\\\\")* '"'

(* see my rationals post: any sequence of these chars, except a single '=' *)
let non_eq_op_char = ['~' '`' '!' '@' '$' '%' '^' '&' '*' 
                      '-' '+' '<' '>' '?' '/' '|' '\\'] 
let op_char = non_eq_op_char | '='
let operator = non_eq_op_char | op_char op_char+

let integer = ['0'-'9']+
let float = ['0'-'9']* '.' ['0'-'9']+
(* let special = ['(' ')' '{' '}' ':' '.' '\n' ',' '='] *)
let lwsp = [' ' '\t']

let comment = '#' [^ '\n']* '\n'?
rule next =
   parse 
     lwsp { next lexbuf } (* skip whitespace; trick stolen from manual *)
   | ident as name { NameTok name }
   | quoted_name as name { NameTok (unquote name) }
   | string as str { StringTok (unquote str) }
   | operator as op { OpTok op }
   | integer as num { IntTok num }
   | float as num { FloatTok num }
   | '\n' { Newline } | comment { Newline }
   | '(' { Lparen } | ')' { Rparen } | '{' { Lbrace } | '}' { Rbrace } 
   | ':' { Colon }  | '.' { Period } | ',' { Comma }  | '=' { Equals }
   | '[' { Lsquare } | ']' { Rsquare }
   | eof { Eof }


The actual evaluation (and unit testing) is taken care of by this
file, bicicleta.ml:


(* -*- mode: tuareg; compile-command: "./build" -*- *)
(* translation of metacircular_bicicleta_interpreter into OCaml *)
(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
(* Because the types and the functions are all mutually recursive, I'm
   declaring them all (in bicicleta_syntax.ml), then putting pretty
   much all the code in a single let rec. *)
(* for interactive use:
#load "bicicleta_lexer.cmo" ;;
#load "bicicleta_parser.cmo" ;;
#load "bicicleta.cmo" ;;
open Bicicleta ;;
open Bicicleta_syntax ;;
*)
open Bicicleta_syntax ;;
let rec get gkey = function
    | Phi -> (Error("self-name not found", gkey))
    | Add(key, value, next) -> if gkey = key then value else get gkey next ;;

exception CantCreateCache ;;  (* can't happen *)
let sys_expr = Call(Name "prog", "sys") ;;
let (call_count: (string, int) Hashtbl.t) = Hashtbl.create 100 ;;
let get_call_count name = try Hashtbl.find call_count name 
  with Not_found -> 0 ;;
let rec eval env = function
    | Name name -> get name env
    | Call(object_, method_name) -> 
        Hashtbl.replace call_count method_name
          (get_call_count method_name + 1) ;
        let object_ = eval env object_
        in apply (objectget method_name object_) object_
    | Literal(self, methods) -> bind ProtoObject env self methods
    | Derivation(object_, self, methods) -> let object_ = eval env object_
        in bind object_ env self methods
    | Integer num ->
        wrap_userdata env (UserInteger num) "native_integer"
    | Float num ->
        wrap_userdata env (UserFloat num) "native_float"
    | StringConstant string ->
        wrap_userdata env (UserString string) "native_string"
    | NativeMethod f -> f env
  and wrap_userdata env datum typename = 
    let userdata = UserData datum
    in eval env (Derivation(Call(sys_expr, typename), None,
        Definition("userdata", NativeMethod (fun _ -> userdata), NoDefs)))
  and bind base env self = function
    | NoDefs -> base
    | Definition(name, body, next) -> 
        bind (derive name self body env base) env self next
  and objectget key = function
    | ProtoObject -> (Error("method not found", key))
    | Derive(name, _, _, _, next, _) as m -> 
        if key = name then m else objectget key next
    | Error(_, _) as obj -> obj
    | UserData _ -> (Error("userdata has no methods", key))
  and derive name self body env o = 
    (Derive(name, self, body, env, o, ref None))
  and apply method_ self = match method_ with
    | Derive(name, methodself, body, env, _, _) -> 
        callmethod name (methodself, body, env) self
    | Error(_, _) as err -> err (* OK for now... *)
    | _ -> (Error("trying to apply non-method", ""))
  and callmethod name (selfname, body, env) self =
  match self with
      Derive(_, _, _, _, _, maybecache) -> 
        (match !maybecache with None -> maybecache := Some (Hashtbl.create 7)
          | Some _ -> ());
        (match !maybecache with None -> raise CantCreateCache
          | Some cache ->
              try Hashtbl.find cache name
                with Not_found -> let result = 
                  callmethodfull (selfname, body, env) self
                  in Hashtbl.replace cache name result;
                  result)
    | _ -> callmethodfull (selfname, body, env) self
  and callmethodfull (selfname, body, env) self =
    (match selfname with 
        Some name -> eval (Add(name, self, env)) body 
      | None -> eval env body)
;;

(* for debugging: deparse an expression. Could be considered a
   specification of a subset of the grammar, but does not (yet)
   exercise the following:
   - commas
   - the x{...}.'()' as x(...) syntactic sugar
   - the x.'*'(y) as x * y syntactic sugar
   - the x{arg1=a, arg2=b, arg3=c} as x{a, b, c} syntactic sugar
*)

let is_identifier string =
    let is_id_start_char ch = let code = Char.code ch in
        (code >= Char.code 'A' && code <= Char.code 'Z') ||
        (code >= Char.code 'a' && code <= Char.code 'z') || ch = '_'
    in let is_id_char ch = is_id_start_char ch || let code = Char.code ch in
        (code >= Char.code '0' && code <= Char.code '9')
    in
    let rec test string ii =
        if ii = String.length string then true
        else is_id_char string.[ii] && test string (ii + 1)
    in String.length string > 0 && is_id_start_char string.[0] &&
        test string 1 ;;
let escstr quote string =
  let rec esclist quote = function
      [] -> []
    | '\\' :: cs -> '\\' :: '\\' :: esclist quote cs
    | c :: cs when c = quote -> '\\' :: c :: esclist quote cs
    | c :: cs -> c :: esclist quote cs
  in Bicicleta_lexer.string_of_list 
    (esclist quote (Bicicleta_lexer.list_of_string string));;
let escname name = 
  if is_identifier name then name
  else "'" ^ escstr '\'' name ^ "'" ;;
let rec show_bicexpr_i indent = function
  | Name name -> escname name
  | StringConstant string -> "\"" ^ escstr '"' string ^ "\""
  | Derivation(object_, self, methods) -> 
      show_bicexpr_i indent object_ 
      ^ show_bicexpr_i indent (Literal(self, methods))
  | Literal(Some self, methods) ->
      "{" ^ escname self ^ ": " ^ outer_show_methods indent methods ^ "}"
  | Literal(None, methods) ->
      "{" ^ outer_show_methods indent methods ^ "}"
  | Call(object_, method_name) -> 
      show_bicexpr_i indent object_ ^ "." ^ escname method_name 
  | Integer n -> string_of_int n
  | Float n -> string_of_float n
  | NativeMethod _ -> "(;native method;)"
and outer_show_methods indent = function
  | Definition(_, _, Definition(_, _, _)) as m -> 
      let newindent = indent ^ "  " in
        "\n" ^ newindent ^ show_methods newindent m ^ "\n" ^ indent
  | m -> show_methods indent m
and show_methods indent = function
  | NoDefs -> ""
  | Definition(name, body, NoDefs) -> show_method indent name body
  | Definition(name, body, next) ->
      show_method indent name body ^ "\n" ^ indent ^ show_methods indent next
and show_method indent name body = 
  escname name ^ " = " ^ (show_bicexpr_i indent body) ;;
let show_bicexpr = show_bicexpr_i "" ;;

(* show_bicobj: mostly for error reporting, also used in the REPL.
   Doesn't work well, but that's not surprising if you look at the
   code.  The idea is that the "labels" list is supposed to keep us
   from printing out the same thing more than once. *)
exception ShowBicobjIsBroken;;
let rec _show_bicenv level labels = function 
    Phi -> ""
  | Add(name, obj, rest) ->
      String.make level ' ' ^ name ^ ": " 
      ^ __show_bicobj (level+2) labels obj ^ ";\n" 
      ^ _show_bicenv level labels rest
and show_selfname = function None -> "" | Some selfname -> selfname ^ ": "
and _show_bicobj level labels = function
    UserData (UserInteger x) -> string_of_int x
  | UserData (UserString x) -> "\"" ^ x ^ "\""
  | UserData (UserFloat x) -> string_of_float x
  | Error (a, b) -> "Error(\"" ^ a ^ "\", \"" ^ b ^ "\")"
  | Derive (name, selfname, body, env, next, _) ->
      __show_bicobj level labels next ^ " {\n" 
      ^ String.make level ' ' ^ show_selfname selfname ^ name 
      ^ " = " ^ show_bicexpr body ^ " (in\n" 
      ^ _show_bicenv level labels env ^ ")}" 
  | ProtoObject -> "{}" 
and findq obj = function
  | [] -> None
  | x :: y -> if obj == x then Some (List.length y) else findq obj y
and labelfor labels obj = 
  match findq obj !labels with 
      None -> raise ShowBicobjIsBroken 
    | Some label -> "#" ^ string_of_int label ^ "="
and __show_bicobj level labels obj = 
  match findq obj !labels with
      None -> 
        labels := obj :: !labels; 
        labelfor labels obj ^ _show_bicobj level labels obj
    | Some label -> "#" ^ string_of_int label ^ "#"
;;
let show_bicobj x = _show_bicobj 0 (ref []) x ;;

(* for unit tests and interactive testing *)
let tokenize_ next string = let buf = Lexing.from_string string 
  in let rec get_toks toks = 
    let tok = next buf in
      match tok with 
          Bicicleta_parser.Eof -> List.rev toks
        | _ -> get_toks (tok :: toks)
  in get_toks [] ;;
let tokenize = tokenize_ Bicicleta_lexer.next ;;  (* for unit tests *)

(* This ugly, hairy piece of rot is supposed to simplify the parser by
   keeping the poor widdle parser from having to cope with extra
   newlines in random places.  See, we want to use newlines to
   separate method definitions, but we want to ignore them the rest of
   the time.  So the obvious thing to do is to remove newlines that
   aren't strictly needed to support method definitions in a
   post-processing stage for the lexer.  So in between the lexer (32
   lines of ocamllex at present) and the parser (50 lines of
   ocamlyacc) we interpose this 42-line monstrosity.  It seemed like a
   good idea when Meredith Patterson suggested it, but she didn't know
   I was using an ocamllex-generated parser and would therefore have
   to do this in an intermediate step... *)

exception EmptyWindowCantHappen;;
let drop_unnecessary_newlines lexer_next =
  let (window : Bicicleta_parser.token list ref) = ref [] 
  and last_token = ref None 
  in let next lexbuf =
    let rec really_read_token tok = 
      window := !window @ [tok] ; last_token := Some tok; main_switch ()
    and read_token () = 
      let tok = lexer_next lexbuf
      in match !last_token with 
          None -> really_read_token tok
        | Some Bicicleta_parser.Newline -> (match tok with
              Bicicleta_parser.Newline -> read_token ()
            | _ -> really_read_token tok)
        | Some _ -> really_read_token tok
    and output_token () = match !window with
        [] -> raise EmptyWindowCantHappen
      | x :: y -> window := y; x
    and discard_newline () = match !window with 
        [] -> raise EmptyWindowCantHappen
      | x :: y -> window := y ; main_switch ()
    and main_switch () = match !window with
        [] -> read_token ()
      | [Bicicleta_parser.Newline] -> read_token ()
      | [_] -> output_token ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _] ->
          read_token ()
      | [Bicicleta_parser.Newline; _] -> discard_newline ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         Bicicleta_parser.Equals] -> output_token ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         Bicicleta_parser.Newline] -> read_token ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         _] -> discard_newline ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         Bicicleta_parser.Newline; Bicicleta_parser.Equals] ->
          output_token ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         Bicicleta_parser.Newline; _] -> discard_newline ()
      | _ :: _ -> output_token ()
    in main_switch () 
  in next;;

let parse string = Bicicleta_parser.main 
  (drop_unnecessary_newlines Bicicleta_lexer.next)
  (Lexing.from_string string) ;;
let unquote = Bicicleta_lexer.unquote ;;

(* free variable computation, for error detection. *)
module StringSet = Set.Make(String) ;;
let rec stringset = function
    [] -> StringSet.empty
  | s :: rest -> StringSet.add s (stringset rest) ;;

let rec freevars = function
    Name n -> stringset [n]
  | Integer _ | StringConstant _ | Float _ -> stringset ["prog"]
  | NativeMethod _ -> stringset []
  | Literal (Some selfname, methods) -> 
        StringSet.diff (freevars_methods methods) (stringset [selfname])
  | Literal (None, methods) -> freevars_methods methods
  | Derivation(object_, self, methods) ->
        StringSet.union (freevars object_) (freevars (Literal(self, methods)))
  | Call(object_, _) -> freevars object_
 and freevars_methods = function
    NoDefs -> stringset []
  | Definition (name, body, rest) ->
        StringSet.union (freevars body) (freevars_methods rest) ;;
let freevars_list expr = StringSet.elements (freevars expr) ;;

(* What method names are implemented in some piece of source? *)
let rec implemented = function
    Name _ | NativeMethod _ -> stringset []
  | Integer _ | StringConstant _ | Float _ -> stringset ["userdata"]
  | Derivation(object_, _, methods) ->
      StringSet.union (implemented object_) (implemented_methods methods)
  | Literal(_, methods) -> implemented_methods methods
  | Call(object_, _) -> implemented object_
and implemented_methods = function
    NoDefs -> stringset []
  | Definition(name, body, rest) -> 
      StringSet.union (stringset [name])
        (StringSet.union (implemented body) (implemented_methods rest))
;;
let implemented_list expr = StringSet.elements (implemented expr) ;;

(* What method names are called in some piece of source? *)
let rec called = function
    Name _ | NativeMethod _ -> stringset []
  | Integer _ -> stringset ["sys"; "native_integer"]
  | StringConstant _ -> stringset ["sys"; "native_string"]
  | Float _ -> stringset ["sys"; "native_float"]
  | Derivation(object_, _, methods) ->
      StringSet.union (called object_) (called_methods methods)
  | Literal(_, methods) -> called_methods methods
  | Call(object_, method_name) ->
      StringSet.union (called object_) (stringset [method_name])
and called_methods = function
    NoDefs -> stringset []
  | Definition(_, body, rest) ->
      StringSet.union (called body) (called_methods rest)
;;
let called_list expr = StringSet.elements (called expr) ;;

(* Looking at the output from this routine has caught a number of bugs. *)
let lint expr =
  let callees = called expr and implementees = implemented expr
  in ["only called", 
         StringSet.elements (StringSet.diff callees implementees);
     "only implemented", 
         StringSet.elements (StringSet.diff implementees callees);
     "free variables", freevars_list expr] ;;

let unit_tests () =
    assert (is_identifier "foo") ;
    assert (is_identifier "x") ;
    assert (is_identifier "x3") ;
    assert (not (is_identifier "3")) ;
    assert (is_identifier "x_y") ;
    assert (is_identifier "_x") ;
    assert (not (is_identifier "()")) ;
    assert (not (is_identifier "(3")) ;
    assert (not (is_identifier "")) ;

    assert ((escname "hips") = "hips") ;
    assert ((escname "()") = "'()'") ;
    assert ((escname "don't") = "'don\\'t'") ;
    assert ((escname "\\") = "'\\\\'") ;

    (* environment lookup *)
    let ustr x = UserData (UserString x)
    in 
    assert (get "foo" (Add("foo", ustr "bar", Phi)) = ustr "bar") ;
    assert ((get "foo" Phi) = Error("self-name not found", "foo")) ;
    assert ((get "foo" (Add("bar", ustr "baz", Phi))) = 
        Error("self-name not found", "foo")) ;

    (* evaluation tests without parsing *)
    let string_value_is expr str = 
      (eval Phi (Call(expr, "userdata"))) = ustr str
    in
    assert (string_value_is (StringConstant "foo") "foo") ;
    assert (string_value_is (Call(Literal(Some "self", 
               Definition("foo", StringConstant("quux"), NoDefs)), "foo"))
           "quux") ;
    assert ((eval Phi (Call(Literal(Some "self", 
           Definition("foo", StringConstant("quux"), NoDefs)), "baz"))) 
           = Error("method not found", "baz")) ;
    (* note that this expression is very similar to the definition of
       prog.sys.bool in bicicleta_lib.ml.  That's because it's an earlier
       version. *)
    let booleans = Literal(Some "booleans", 
        Definition("true", Literal(Some "boolean", 
            Definition("if_true", Literal(Some "self", 
                Definition("()", Call(Name "self", "then"),
                Definition("then", StringConstant("no consequent given"),
                Definition("else", StringConstant("no alternate given"),
                NoDefs)))),
            Definition("negated", Call(Name "booleans", "false"),
            Definition("if_false", Call(Call(Name "boolean", "negated"), 
                                                              "if_true"),
            NoDefs)))),
        Definition("false", Derivation(Call(Name "booleans", "true"), 
            Some "boolean",
            Definition("if_true", Derivation(Call(Call(Name "booleans", "true"),
                "if_true"), Some "self",
                Definition("()", Call(Name "self", "else"), 
                NoDefs)),
            Definition("negated", Call(Name "booleans", "true"),
            NoDefs))),
        NoDefs)))
    in

    (* print_endline (show_bicexpr booleans) ; *)
    assert (string_value_is(Call(Call(Call(booleans, "true"), 
                                     "if_true"), "()"))
                           "no consequent given") ;
    assert (string_value_is(Call(Derivation(Call(Call(booleans, "true"), 
                                                "if_true"), None,
             Definition("then", StringConstant("is true"),
             Definition("else", StringConstant("is false"),
             NoDefs))),
           "()")) 
           "is true") ;
    assert (string_value_is (Call(Derivation(Call(Call(booleans, "false"),
                                                "if_true"), None,
             Definition("then", StringConstant("is true"),
             Definition("else", StringConstant("is false"),
             NoDefs))),
           "()")) 
           "is false") ;

    (* unquoting *)
    assert((unquote "'foo'") = "foo");
    assert((unquote "\"foo\"") = "foo");
    assert((unquote "'hasn\\'t'") = "hasn't");
    assert((unquote "\"\\\"no\\\" \"") = "\"no\" ");
    assert((unquote "\"\\\"no\\\"\"") = "\"no\"");

    (* tokenizing *)
    (* wish I could say 
       'from Bicicleta_parser import Lbrace, Rbrace, NameTok, Colon, Equals',
       but I really don't want to open Bicicleta_parser;; here. *)

    let lbrace = Bicicleta_parser.Lbrace and rbrace = Bicicleta_parser.Rbrace
        and nametok n = Bicicleta_parser.NameTok n 
        and colon = Bicicleta_parser.Colon and equals = Bicicleta_parser.Equals
        and optok o = Bicicleta_parser.OpTok o 
        and period = Bicicleta_parser.Period 
        and inttok n = Bicicleta_parser.IntTok n 
        and lparen = Bicicleta_parser.Lparen
        and rparen = Bicicleta_parser.Rparen and comma = Bicicleta_parser.Comma
        and newline = Bicicleta_parser.Newline
        and stringtok x = Bicicleta_parser.StringTok x
    in
    assert((tokenize "a") = [nametok "a"]);
    assert((tokenize "a.b") = [nametok "a"; period; nametok "b"]);
    assert((tokenize "aa+bb") = [nametok "aa"; optok "+"; nametok "bb"]);
    assert((tokenize "{xx: }") = [lbrace; nametok "xx"; colon; rbrace]);
    assert((tokenize "'()' * (2**32)") = [nametok "()"; optok "*"; lparen;
                                          inttok "2"; optok "**"; inttok "32";
                                          rparen]);
    assert((tokenize "\"foo\".length") = [stringtok "foo"; period; 
                                          nametok "length"]);
    assert((tokenize "(){}:.,") = [lparen; rparen; lbrace; rbrace; colon; 
                                   period; comma]);
    assert((tokenize "~`!@$%^&*-+=<>?/|\\") = [optok "~`!@$%^&*-+=<>?/|\\"]);
    assert((tokenize "{x:y=x}") = [lbrace; nametok "x"; colon; nametok "y";
                                   equals; nametok "x"; rbrace]);
    assert((tokenize "foo { foo : \n x = foo. \tbletch\ny=foo.x\n}\n")
           = [nametok "foo"; lbrace; nametok "foo"; colon; newline; 
              nametok "x"; equals; nametok "foo"; period; nametok "bletch";
              newline;
              nametok "y"; equals; nametok "foo"; period; nametok "x";
              newline; rbrace; newline]);
    assert((tokenize "x[3]") = [nametok "x"; Bicicleta_parser.Lsquare; 
                                inttok "3"; Bicicleta_parser.Rsquare]);

    (* parsing *)
    assert((parse "x") = Name "x");
    assert((parse "'()'") = Name "()");
    assert((parse "foo.bar.baz") = Call(Call(Name "foo", "bar"), "baz"));
    assert((parse "\"foo\".length") = Call(StringConstant "foo", "length"));
    assert((parse "{x:y=x}") 
          = Literal(Some "x", Definition("y", Name "x", NoDefs)));
    assert((parse "{x:}") = Literal(Some "x", NoDefs));
    assert((parse "{xx:y=xx,z=xx}") 
          = Literal(Some "xx", Definition("y", Name "xx",
                  Definition("z", Name "xx", NoDefs))));
    assert((parse "{x: x = \"x\", y = \"y\", z = \"z\"}") = Literal(Some "x",
                  Definition("x", StringConstant("x"),
                  Definition("y", StringConstant("y"),
                  Definition("z", StringConstant("z"), NoDefs)))));
    assert((parse "point {p: x = \"x\", y = \"y\"}") = Derivation(Name "point",
        Some "p", Definition("x", StringConstant("x"),
                  Definition("y", StringConstant("y"), NoDefs))));
    assert((parse "(a.x)") = Call(Name "a", "x"));             
    assert((parse "(((a).x))") = Call(Name "a", "x"));
    assert((parse "{x: y=x,\n  z=x}") = Literal(Some "x", 
        Definition("y", Name "x",
        Definition("z", Name "x", NoDefs))));
    assert((parse "{x: y=x,\n\n  z=x}") = Literal(Some "x", 
        Definition("y", Name "x",
        Definition("z", Name "x", NoDefs))));
    assert((parse "x\n") = Name "x");
    (* support end-line comments at EOF *)
    assert((parse "x # a variable!") = (parse "x"));
    (* separate methods with newlines: *)
    assert((parse "{x:y=x\nz=x}") = Literal(Some "x",        
        Definition("y", Name "x",
        Definition("z", Name "x", NoDefs))));
    (* allow newlines around period *)
    assert((parse "x\n.\ny \n . \n z") = Call(Call(Name "x", "y"), "z"));
    (* allow newlines in parens *)
    assert((parse "(\nx\n)") = Name "x");
    (* allow leading newlines *)
    assert((parse "\nx") = Name "x");
    (* allow newlines between prototype and overrides *)
    assert((parse "x{y: z = w}") = (parse "x\n{y: z = w}"));
    (* allow newlines around self-name *)
    assert((parse "x {y: z = w}") = (parse "x {\ny\n:z=w}"));
    (* allow newlines around equals sign *)
    assert((parse "x{y:z=w}") = (parse "x{y:z\n=\nw}"));
    assert((parse "x{y:z=w,v=u}") = (parse "x{y:z\n=\nw\nv\n=\nu}"));
    (* treat end-line comments as newlines *)
    assert((parse "{a=b # b!\n c=d}") = parse("{a=b, c=d}"));

    (* various kinds of syntactic sugar: *)
    (* omitting self-names *)
    assert((parse "x { z = w }") 
           = Derivation(Name "x", None, Definition("z", Name "w", NoDefs)));
    assert((parse "{x = 1, y = 2}") = Literal(None,
        Definition("x", Integer 1, Definition("y", Integer 2, NoDefs))));
    (* parenthesized arguments *)
    assert((parse "x(verbose = 1)") = (parse "x { verbose = 1 }.'()'"));
    assert((parse "{env: fac = {fac: x = 3, 
            '()' = fac.x.'<'(arg1=2).if_true(then=1, 
                else=fac.x.'*'(arg1=env.fac(x=fac.x.'-'(arg1=1))))}}.fac(x = 4)
        ") = (parse "{env: fac = {fac: x = 3,
            '()' = fac.x.'<'{arg1=2}.'()'.if_true{then=1, 
                 else=fac.x.'*'{arg1=env.fac{
                     x=fac.x.'-'{arg1=1}.'()'}.'()'}.'()'}.'()'}
            }.fac{x = 4}.'()'")) ;

    (* positional arguments *)
    assert((parse "x(3)") = (parse "x(arg1 = 3)"));
    assert((parse "x{3}") = (parse "x{arg1 = 3}"));
    assert((parse "x{y: 3}") = (parse "x{y: arg1 = 3}"));
    assert((parse "x(37, quiet=true)") = (parse "x(arg1 = 37, quiet = true)"));
    assert((parse "x(37, 38)") = (parse "x(arg1 = 37, arg2 = 38)"));

    (* infix *)
    assert((parse "3 + 4") = (parse "3.'+'(4)"));
    assert((parse "1 + 2 + 3") = (parse "1.'+'(2).'+'(3)"));

    (* indexing notation *)
    assert((parse "x[3]") = (parse "x.'[]'(3)"));

    (* combination of infix with other things *)
    assert((parse "1 + x(3)") = (parse "1 + (x(3))"));
    assert((parse "1 + x[3]") = (parse "1 + (x[3])"));
    assert((parse "1 + x{3}") = (parse "1 + (x{3})"));

    (* just a check to see if our parser doesn't crash. *)
    ignore(parse "prog.sys.normal_commutative_number {self:
        clientdata = 5.clientdata
        show = self.clientdata.show
        as_integer = self
        as_float = self.clientdata.as_float
        coerce = {op: arg1 = 2, '()' = op.arg1.as_integer}
        add = {op: arg1 = 3, '()' = self.clientdata.add(op.arg1.clientdata)}
        negated = self.clientdata.negated
        multiply = self.add {op:
            '()' = self.clientdata.multiply(op.arg1.clientdata)
        }
        modulo = {op: arg1 = 2
            '()' = self.clientdata.divmod(op.arg1.clientdata).mod
        }
        # As Jamie McCarthy points out, rationals made out of machine
        # integers are really pretty limited... but for now I'm using them
        # anyway.
        divide = {op: arg1 = 2, '()' = prog.sys.rational.new(self, op.arg1)}
        reciprocal = 1 / self
        gcd = self.binop {op: arg1 = 5
            '()' = (arg1 == 0).if_true(then = self, 
                else = op.arg1.gcd(self % op.arg1))
        }
        less_than = self.add {op:
            '()' = self.clientdata.less_than(op.arg1.clientdata)
        }
    }");

    (* ensure newlines don't separate positional parameters *)
    assert((parse "x{{}\n{}}") = (parse "x{{}{}}"));

    (* more evaluation tests now that we know parsing works *)
    assert((eval Phi (parse "3.userdata")) = UserData (UserInteger 3));
    assert((eval Phi (parse "3.5.userdata")) = UserData (UserFloat 3.5));
    assert(string_value_is (parse "{}{\n\tx = \"foo\"\n\ty=\"bar\"}.x") "foo");
    assert(string_value_is (parse "{\n\tx = \"foo\"\n\ty=\"bar\"}.y") "bar");

    (* evaluation with native method *)
    let native_add = NativeMethod (fun env -> 
      let x = eval env (Call(Name "method", "arg1"))
      and y = eval env (Call(Name "method", "arg2"))
      in match (x, y) with
          (UserData (UserInteger xs), UserData (UserInteger ys)) -> 
            UserData (UserInteger (xs + ys))
              (* saving the invalid args is a pain, so we don't bother *)
        | _ -> Error ("invalid addition args", "??"))
    in let intrinsics = derive "integer_add" (Some "method") native_add Phi
                              ProtoObject
    in let addition = parse "intrinsics { 3.userdata, 4.userdata }.integer_add"
    in let seven = eval (Add("intrinsics", intrinsics, Phi)) addition
    in 
    assert(seven = UserData (UserInteger 7));
      (* That the above works is a little strange.  If we just eval
         "3" in an environment without "prog", we get an evaluation
         error saying it needs "prog"; and likewise if we eval
         "3.beanstalk"; but if we eval "3.userdata", we get the right
         thing.  That's because the "3" is actually inheriting from an
         error object --- but that works!  As long as you don't try to
         call any methods you were hoping the error object would be
         defining, but only methods you put there. *)

    (* showing expressions *)
    assert((parse (show_bicexpr booleans)) = booleans);
    assert((show_bicexpr (parse "{x = 1}")) = "{x = 1}");
    assert((show_bicexpr (parse "{'doesn\\'t' = 1}")) = "{'doesn\\'t' = 1}");
    assert((show_bicexpr (parse "{foo = \"bar\"}")) = "{foo = \"bar\"}");
    assert((show_bicexpr (parse "{foo = \"ba\\\"r\"}")) = 
        "{foo = \"ba\\\"r\"}");
    assert((show_bicexpr (parse "3.5")) = "3.5");

    (* computing free variables *)
    let freevars_are expr vars = 
        StringSet.equal (freevars(parse(expr))) (stringset vars)
    in
    assert(freevars_are "x" ["x"]) ;
    assert(freevars_are "1" ["prog"]) ;
    assert(freevars_are "1.2" ["prog"]) ;
    assert(freevars_are "\"foo\"" ["prog"]) ;
    assert(freevars_are "{x: y=z, w=x, t=uv}" ["z"; "uv"]) ;
    assert(freevars_are "x{y: z=w, zz=x, zzz=y}" ["w"; "x"]) ;
    assert(freevars_are "x.y" ["x"]) ;
    (* this code contains no free variables, ... *)
    assert(freevars_are "{env: fac = {fac: x = 3,
    '()' = fac.x.'<'{lt: arg1=2}.'()'.if_true{i: then=1, 
         else=fac.x.'*'{mu: arg1=env.fac{f:
             x=fac.x.'-'{m: arg1=1}.'()'}.'()'}.'()'}.'()'}
    }.fac{f: x = 4}.'()'" ["prog"]) ;
    (* but this buggy version did: *)
    assert(freevars_are "{env: fac = {fac: x = 3,
    '()' = x.'<'{lt: arg1=2}.'()'.if_true{i: then=1, 
         else=fac.x.'*'{mu: arg1=env.fac{f:
             x=fac.x.'-'{m: arg1=1}.'()'}.'()'}.'()'}.'()'}
    }.fac{f: x = 4}.'()'" ["x"; "prog"]) ;

    assert(freevars_are "{}" []);
    assert(freevars_are "{x=y}" ["y"]);
    assert(freevars_are "x {y = z}" ["x"; "z"]);

    (* computing lists of method names for linting *)
    let implemented_are expr vars =
      StringSet.equal (implemented(parse(expr))) (stringset vars)
    in
    assert(implemented_are "x" []);
    assert(implemented_are "3" ["userdata"]);
    assert(implemented_are "3.5" ["userdata"]);
    assert(implemented_are "\"foo\"" ["userdata"]);
    (* not really a good way to tell here... *)
    assert(StringSet.equal (implemented (NativeMethod (fun _ -> ProtoObject)))
            (stringset []));
    assert(implemented_are "{}" []);
    assert(implemented_are "{a=b}" ["a"]);
    assert(implemented_are "{a=b, c=d}" ["a"; "c"]);
    assert(implemented_are "{a=b}{c=d}" ["a"; "c"]);
    assert(implemented_are "{a=b}.c.d" ["a"]);
    assert(implemented_are "{a={b={c=d}.e}.f}" ["a"; "b"; "c"]);

    let called_are expr vars =
      StringSet.equal (called(parse(expr))) (stringset vars)
    in
    assert(called_are "x" []);
    assert(called_are "3" ["sys"; "native_integer"]);
    assert(called_are "3.5" ["sys"; "native_float"]);
    assert(called_are "\"foo\"" ["sys"; "native_string"]);
    (* there is no way to tell here: *)
    assert(StringSet.equal (called (NativeMethod (fun _ -> ProtoObject)))
            (stringset []));
    assert(called_are "b.c" ["c"]);
    assert(called_are "b.c.d.e.f" ["c"; "d"; "e"; "f"]);
    assert(called_are "{}" []);
    assert(called_are "{a=b.c.d}" ["c"; "d"]);
    assert(called_are "{a=b.c.d, e=f.g.h}" ["c"; "d"; "g"; "h"]);
    assert(called_are "{a=b.c.d, e=f.g.h}" ["c"; "d"; "g"; "h"]);
    assert(called_are "{a=b.c.d, e=f.g.h}{i=j.k}.mnop"
            ["c"; "d"; "g"; "h"; "k"; "mnop"]);
    assert(called_are "{a={b={c=d}.e}.f}" ["e"; "f"]);
;;
unit_tests();;



This file, bicicleta_lib.ml, contains some native functions and the
beginnings of a standard library.

(* -*- mode: tuareg; compile-command: "./build" -*- *)
(* The beginnings of a standard library, with some basic native methods. *)
(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
(* for interactive use:
#load "bicicleta_lexer.cmo" ;;
#load "bicicleta_parser.cmo" ;;
#load "bicicleta.cmo" ;;
#load "bicicleta_lib.cmo" ;;
open Bicicleta ;;
open Bicicleta_syntax ;;
open Bicicleta_lib ;;
*)
open Bicicleta_syntax ;;
open Bicicleta ;;

let prog_sys_source = "{prog: sys = {
object = {self:
        # One day, things will inherit from this implicitly if not 
        # otherwise specified.
        # These two methods are intended to be defined differently on errors.
        '!!' = {op: '()' = self}
        is_ok = prog.sys.bool.true
        # But not this one.
        '->' = {op: arg1 = self, key = self, value = op.arg1}
    }
bool =
    {bool: 
        true = prog.sys.object {self: 
            if_true = {if:
                '()' = if.then
                then = prog.error(\"no consequent given\")
                else = prog.error(\"no alternate given\")
            }
            not = bool.false
            if_false = self.not.if_true
            '||' = {or: '()' = self}
            '&&' = {and: arg1 = self, '()' = and.arg1}
            show = \"true\"
        }
        false = bool.true {self:
            if_true = bool.true.if_true{if: '()' = if.else}
            not = bool.true
            '||' = {or: arg1 = self, '()' = or.arg1}
            '&&' = {and: '()' = self}
            show = \"false\"
        }
    }

# prog.sys.normal_commutative_number:

# This is a base class for number-like things that support the usual
# commutative numerical operations, such as native integers, integers
# in Z_n, floating-point numbers, rational numbers, vectors,
# polynomials, rational expressions, continued fractions, complex
# numbers, sampled audio signals, dates, time intervals, images,
# multidimensional arrays, and so on.

# Here's a table of the levels of support you can provide to the
# methods herein.

# If you implement:     You get proper support for:
# add                   +
# add, negated          +, negated, -
# multiply              *
# multiply, reciprocal  *, reciprocal, /
# power                 **
# modulo                %
# less_than             <, ==, >, <=, >=, !=, between
# less_than, coerce(0)  <, ==, >, <=, >=, !=, between, abs, negative, positive

# You can also override 'subtract', 'divide', 'equals', and
# 'greaterthan' to get the expected effects; there are default
# implementations built from the above six methods.  If you do that,
# you may want to override 'reverse -' and 'reverse /' as well.

# If you override the 'coerce' method to convert objects to some type
# your arithmetic methods know how to support, it will convert them to
# the right type ahead of time.

# The 'coerce' method need not return objects supporting the same
# protocol as your own, although they need to support 'negated' if you
# want to subtract them, and they need to support 'reciprocal' if you
# want to divide by them.  So the 'coerce' method for a date object
# might coerce things to time intervals and then support + and - with
# them.  By the same token, your arithmetic methods might return
# objects of some other type --- but remember that 'negated' might get
# fed to 'add', and 'reciprocal' might get fed to 'multiply'.

# In order to support arithmetic on either side of the operator, the
# operator methods, such as '+', delegate to the 'reverse ...' method
# on the other object if the 'add' method fails --- and they provide a
# 'reverse +' method that does the right thing, in case the other
# object wants to delegate to it.  However, the 'reverse ...' methods
# depend on the operations being commutative; this is in exchange for
# the relatively loose requirements on 'coerce' described above.

# No implementations are provided for 'reverse **' and 'reverse %'.

normal_commutative_number = 
    prog.sys.object {self:
        coerce = {op: arg1 = 1, '()' = op.arg1}
        binop = {op: arg1 = 1, other = self.coerce(op.arg1)}

        normal_binop = self.binop {op:
            op = self.add
            revop = op.arg1.'reverse +'
            result = op.op(op.other)
            commuted = op {op: '()' = op.result}
            '()' = op.result !! op.revop(self)
        }
        '+' = self.normal_binop
        'reverse +' = self.'+'.commuted
        subtract = {op: arg1=2, '()' = self.add(op.arg1.negated)}
        '-' = self.normal_binop {op: op = self.subtract, 
            revop = op.arg1.'reverse -'}
        'reverse -' = self.binop {op: '()' = self.negated.add(op.other)}

        '*' = self.normal_binop {op: op = self.multiply, 
            revop = op.arg1.'reverse *' }
        'reverse *' = self.'*'.commuted
        divide = {op: arg1=2, '()' = self.multiply(op.arg1.reciprocal)}
        '/' = self.normal_binop {op: op = self.divide, 
            revop = op.arg1.'reverse /'}
        'reverse /' = self.binop {op: '()' = self.reciprocal.multiply(op.other)}

        '%' = self.normal_binop {op: op = self.modulo, 
            revop = op.arg1.'reverse %' }
        '**' = self.normal_binop {op: op = self.power, 
            revop = op.arg1.'reverse **'}

        '<' = self.normal_binop {op: op = self.less_than, 
            revop = op.arg1.'reverse <'}
        greaterthan = {op: '()' = op.arg1 < self}
        '>' = self.normal_binop {op: op = self.greaterthan, 
            revop = op.arg1.'reverse >'}
        'reverse <' = self.'>'.commuted
        'reverse >' = self.'<'.commuted
        # BEWARE! As the Common Lisp HyperSpec says about EQUALP:
        # Object equality is not a concept for which there is a uniquely
        # determined correct algorithm. The appropriateness of an equality
        # predicate can be judged only in the context of the needs of some
        # particular program. Although these functions take any type of
        # argument and their names sound very generic, equal and equalp
        # are not appropriate for every application.
        equals = {op: arg1 = 2
            '()' = (self < op.arg1).not && (self > op.arg1).not}
        '==' = self.normal_binop {op: op = self.equals, 
            revop = op.arg1.'reverse =='}
        'reverse ==' = self.'=='.commuted
        inverse_comparator = self.normal_binop {op:
            baseop = self.less_than
            revop = op.arg1.'reverse >='
            op = {ge: '()' = op.baseop(ge.arg1).not}
        }
        '>=' = self.inverse_comparator
        '<=' = self.inverse_comparator {op: baseop = self.greaterthan, 
            revop = op.arg1.'reverse <='}
        'reverse >=' = self.'<='.commuted
        'reverse <=' = self.'>='.commuted
        '!=' = self.inverse_comparator {op: baseop = self.'==',
            revop = op.arg1.'reverse !='}
        'reverse !=' = self.'!='.commuted
        between = {op: arg1 = 2.negated, arg2 = 2
            '()' = (self >= op.arg1) && (self < op.arg2)}
        negative = self < 0
        positive = self.negative.not
        abs = self.negative.if_true(then=self.negated, else=self)
    }

# XXX Doesn't work because prog.if doesn't exist yet because we don't 
# have variadic functions because we don't have lists.
rational = 
    prog.sys.normal_commutative_number {self:
        numer = 1
        denom = 2
        new = {op: arg1 = 6, arg2 = 9,
            g = op.arg1.gcd(op.arg2)
            numer = op.arg1 / op.g
            denom = op.arg2 / op.g
            denom_is_1 = op.denom.is_ok && (op.denom == 1)
            ok_test = (op.numer * op.denom)
            # XXX note that prog.if doesn't exist yet!
            '()' = prog.if(
                op.denom_is_1 -> op.numer,
                op.ok_test.is_ok -> self { numer = op.numer, denom = op.denom },
                else = op.ok_test)
        }
        show = \"{numer} <hr> {denom}\" % self
        coerce = {op: arg1 = 2
            '()' = prog.if(
                op.arg1.denom.is_ok -> op.arg1,
                op.arg1.as_integer.is_ok -> self.new(op.arg1.as_integer, 1),
                else = prog.error(\"could not coerce {arg1} to rational\" % op)
            )
        }
        add = {op: arg1 = self.new(2, 3),
            '()' = self.new(
                (self.numer * op.arg1.denom) + (self.denom * op.arg1.numer),
                self.denom * op.arg1.denom
            )
        }
        negated = self.new(self.numer.negated, self.denom)
        multiply = self.add {op:
            '()' = self.new(self.numer * op.arg1.numer, 
                            self.denom * op.arg1.denom)
        }
        reciprocal = self.new(self.denom, self.numer)
        less_than = {op: arg1 = self.new(2, 4),
            '()' = (self.numer * op.arg1.denom) < (self.denom * op.arg1.numer)
        }
        as_float = self.numer.as_float / self.denom.as_float
        as_integer = prog.if(self.denom == 1, 
            then=self.numer,
            else=prog.error(\"{frac} is not an integer\" % {frac=self}))
    }

# prog.sys.native_integer:

# The idea is that the interpreter will derive things from this object
# by overriding their \"userdata\" to point to an opaque integer
# object that can be passed to the native methods that do show,
# as_float, add, negated, multiply, divmod, and less_than.

# I'm still undecided about what those operations should return ---
# just the userdata result (making this code a little uglier) or the
# native_integer object (in which case the primitive arithmetic
# operations as well as the parser have to know about the
# prog.sys.native_integer name)?  For now I'm going with the former.

native_integer =
    prog.sys.normal_commutative_number {self:
        userdata = 5.userdata # as an example
        new = {new: arg1 = 6.userdata
            '()' = prog.sys.native_integer { userdata = new.arg1 }}
        intrinsics = intrinsics{self.userdata}
        show = prog.sys.native_string.new(self.intrinsics.integer_show)
        as_integer = self
        as_float = prog.sys.native_float.new(self.intrinsics.integer_as_float)
        coerce = {op: arg1 = 2, '()' = op.arg1.as_integer}
        intrinsic_op = {op: arg1 = 3,
            intrinsics = self.intrinsics {
                arg2 = op.arg1.userdata
                # these two are for less_than:
                true = prog.sys.bool.true
                false = prog.sys.bool.false
            }
            userdata = op.intrinsics.integer_add
            '()' = self.new(op.userdata)   # by default
        }
        add = self.intrinsic_op
        negated = self.new(self.intrinsics.integer_negated)
        multiply = self.intrinsic_op {op:
            userdata = op.intrinsics.integer_multiply
        }
        modulo = self.intrinsic_op {op:
            userdata = op.intrinsics.integer_divmod.mod
        }
        # As Jamie McCarthy points out, rationals made out of machine
        # integers are really pretty limited... but for now I'm using them
        # anyway.
        divide = {op: arg1 = 2, '()' = prog.sys.rational.new(self, op.arg1)}
        reciprocal = 1 / self
        gcd = self.binop {op: arg1 = 5
            '()' = (op.arg1 == 0).if_true(then = self, 
                else = op.arg1.gcd(self % op.arg1))
        }
        less_than = self.intrinsic_op {op:
            '()' = op.intrinsics.integer_less_than
        }

        # Here are some faster versions of ops with less indirection, 
        # but which fail badly in mixed-mode arithmetic.
        #'+' = {op: '()' = self.new(
        #    intrinsics{self.userdata, op.arg1.userdata}.integer_add)}
        #'-' = {op: '()' = self.new(
        #    intrinsics{self.userdata, op.arg1.userdata}.integer_subtract)}
        #'<' = {op:
        #    '()' = intrinsics{
        #        self.userdata, op.arg1.userdata,
        #        true=self.true,
        #        false=self.false
        #    }.integer_less_than
        #}
    }

# The float userdata has divide instead of modulo; otherwise it's
# quite similar to the integer.  My original design let it inherit
# pretty much all the methods, because the primitives were properties
# of the userdata, so they were named things like 'show' instead of
# 'integer_show', but I decided sticking all the operations in an
# 'intrinsics' namespace was simpler.

native_float = prog.sys.native_integer {self:
        userdata = 3.2.userdata
        new = {new: arg1 = 3.5.userdata
            '()' = prog.sys.native_float { userdata = new.arg1 }}
        show = prog.sys.native_string.new(self.intrinsics.float_show)
        as_float = self
        as_integer = prog.error(\"Can't coerce floats to integers\")
        modulo = prog.error(\"Can't take modulo of floats yet\")
        coerce = {op: arg1 = 2, '()' = op.arg1.as_float}
        add = self.intrinsic_op {op:
            userdata = op.intrinsics.float_add
        }
        negated = self.new(self.intrinsics.float_negated)
        multiply = self.intrinsic_op {op:
            userdata = op.intrinsics.float_multiply
        }
        divide = self.intrinsic_op {op:
            userdata = op.intrinsics.float_divide
        }
        less_than = self.intrinsic_op {op:
            '()' = op.intrinsics.float_less_than
        }
    }

native_string = prog.sys.object {self:
        userdata = \"bethmolnar\".userdata
        new = {new: arg1 = \"matthew\".userdata
            '()' = prog.sys.native_string { userdata = new.arg1 }}
        show = self
    }

}}" ;;

let getarg arg env = eval env (Call(Name "method", arg)) ;;
let arg1 = getarg "arg1" ;;
let arg2 = getarg "arg2" ;;
let primitive_binary myfun = 
  NativeMethod(fun env -> myfun (arg1 env, arg2 env) env) ;;
let integer_binary op = primitive_binary(fun args env ->
  match args with
      (UserData (UserInteger xi), UserData (UserInteger yi)) ->
        op xi yi env
    | (x, y) -> Error ("invalid binary operation args", 
                      show_bicobj x ^ ", " ^ show_bicobj y)) ;;
let integer_unary op = primitive_binary(fun args env ->
  match args with (UserData (UserInteger xi), _) -> op xi env
    | (x, _) -> Error ("invalid unary operation arg", show_bicobj x)) ;;
let intret op a b env = UserData (UserInteger (op a b)) ;;
let true_expr = parse "method.true" ;;
let false_expr = parse "method.false" ;;
let boolret op a b env = eval env (if op a b then true_expr else false_expr) ;;
let defintrinsic name contents rest = 
  derive name (Some "method") contents Phi rest ;;

let integer_intrinsics = 
  (defintrinsic "integer_multiply"  (integer_binary (intret ( * )))
  (defintrinsic "integer_add"       (integer_binary (intret (+)))
  (* integer_subtract: not currently used *)
  (defintrinsic "integer_subtract"  (integer_binary (intret (-)))
  (defintrinsic "integer_less_than" (integer_binary (boolret (<)))
  (defintrinsic "integer_negated"   (integer_unary (fun x env ->
    UserData (UserInteger ~-x)))
  (defintrinsic "integer_show"      (integer_unary (fun x env ->
    UserData (UserString (string_of_int x))))
        (* XXX also need divmod *)
  ProtoObject)))))) ;;

let float_binary op = primitive_binary(fun args env ->
  match args with
      (UserData (UserFloat xi), UserData (UserFloat yi)) ->
        op xi yi env
    | (x, y) -> Error ("invalid binary operation args", 
                      show_bicobj x ^ ", " ^ show_bicobj y)) ;;
let float_unary op = primitive_binary(fun args env ->
  match args with (UserData (UserFloat xi), _) -> op xi env
    | (x, _) -> Error ("invalid unary operation arg", show_bicobj x)) ;;
let floatret op a b env = UserData (UserFloat (op a b)) ;;

let basic_intrinsics = 
  (defintrinsic "float_show"       (float_unary (fun x env ->
    UserData (UserString (string_of_float x))))
  (defintrinsic "float_add"        (float_binary (floatret (+.)))
  (defintrinsic "float_negated"    (float_unary (fun x env -> 
    UserData (UserFloat ~-.x)))
  (defintrinsic "float_multiply"   (float_binary (floatret ( *.)))
  (defintrinsic "float_divide"     (float_binary (floatret (/.)))
  (defintrinsic "float_less_than"  (float_binary (boolret (<)))
  (defintrinsic "integer_as_float" (integer_unary (fun x env ->
    UserData (UserFloat (float_of_int x))))
  integer_intrinsics))))))) ;;

let basic_prog = eval (Add("intrinsics", basic_intrinsics, Phi)) 
  (parse prog_sys_source) ;;
let eval_with_lib expr = eval (Add("intrinsics", basic_intrinsics, 
                               Add("prog", basic_prog, Phi)))
  (Call(Derivation(Name "prog", Some "prog",
    Definition("()", expr, NoDefs)), "()")) ;;

let unit_tests () =
    (* really complicated evaluation *)
    let lib_eval_is str userdata =
      let actual_result = (eval_with_lib (parse ("(" ^ str ^ ").userdata")))
      in let rv = actual_result = UserData(userdata)
      in (if not rv then print_endline (show_bicobj actual_result));
        rv
    in
    (* basic boolean behavior *)
    assert(lib_eval_is "prog.sys.bool.true.show" (UserString "true"));
    assert(lib_eval_is "prog.sys.bool.false.show" (UserString "false"));
    assert(lib_eval_is "prog.sys.bool.true.not.show" (UserString "false"));
    assert(lib_eval_is "prog.sys.bool.false.not.show" (UserString "true"));
    assert(lib_eval_is "(prog.sys.bool.true && prog.sys.bool.true).show"
            (UserString "true"));
    assert(lib_eval_is "(prog.sys.bool.true && prog.sys.bool.false).show"
            (UserString "false"));
    assert(lib_eval_is "(prog.sys.bool.false && prog.sys.bool.false).show"
            (UserString "false"));
    assert(lib_eval_is "(prog.sys.bool.false && prog.sys.bool.true).show"
            (UserString "false"));
    assert(lib_eval_is "(prog.sys.bool.true || prog.sys.bool.true).show"
            (UserString "true"));
    assert(lib_eval_is "(prog.sys.bool.true || prog.sys.bool.false).show"
            (UserString "true"));
    assert(lib_eval_is "(prog.sys.bool.false || prog.sys.bool.false).show"
            (UserString "false"));
    assert(lib_eval_is "(prog.sys.bool.false || prog.sys.bool.true).show"
            (UserString "true"));
    (* basic integer arithmetic *)
    assert(lib_eval_is "3.negated" (UserInteger (-3)));
    assert(lib_eval_is "3 + 4" (UserInteger (7)));
    assert(lib_eval_is "(3 + 4).show" (UserString "7"));
    assert(lib_eval_is "\"foo\".show" (UserString "foo"));
    assert(lib_eval_is "prog.sys.bool.true.if_true(then=3, else=4)"
            (UserInteger 3));
    assert(lib_eval_is "(3 > 4).show" (UserString "false"));
    assert(lib_eval_is "(3 < 4).show" (UserString "true"));
    assert(lib_eval_is "(3 < 4).if_true(then=5, else=6)" (UserInteger 5));
    assert(lib_eval_is "(3 > 4).if_true(then=5, else=6)" (UserInteger 6));
    assert(lib_eval_is "(3 > 4).if_false(then=5, else=6)" (UserInteger 5));
    assert(lib_eval_is "{env:
        fac = {fac:
            arg1 = 3
            '()' = (fac.arg1 < 2).if_true(
                then = 1
                else = fac.arg1 * env.fac(fac.arg1 - 1)
            )
        }
    }.fac(5)" (UserInteger 120));
    assert(lib_eval_is "3 * 4" (UserInteger 12));
    (* basic integer comparisons *)
    assert(lib_eval_is "(3 == 3).show" (UserString "true"));
    assert(lib_eval_is "(3 == 4).show" (UserString "false"));

    (* floats *)
    assert(lib_eval_is "3.5" (UserFloat 3.5));
    assert(lib_eval_is "3.5.show" (UserString "3.5"));
    assert(lib_eval_is "3.5 + 4.0" (UserFloat 7.5));
    assert(lib_eval_is "4.5 - 3.0" (UserFloat 1.5));
    assert(lib_eval_is "0.5 * 0.25" (UserFloat 0.125));
    assert(lib_eval_is "3.5 / 2.0" (UserFloat 1.75));
    assert(lib_eval_is "1.0 + 1.0 + 1.0" (UserFloat 3.0));
    assert(lib_eval_is "(3.5 < 4.0).show" (UserString "true"));
    assert(lib_eval_is "(3.5 > 4.0).show" (UserString "false"));
    assert(lib_eval_is "(4.0 > 3.5).show" (UserString "true"));
    assert(lib_eval_is "(3.5 == 3.5).show" (UserString "true"));

    (* a little bit of mixed-mode arithmetic *)
    assert(lib_eval_is "3.5 / 2" (UserFloat 1.75));
    (* This one doesn't work yet because it depends on prog.error 
       existing and having the right '!!' behavior *)
(*     assert(lib_eval_is "2 * 1.5" (UserFloat 3.0)); *)
;;
unit_tests();;


And then, so you can use it interactively, there's this
read-eval-print loop.  It prints some minimal profiling information on
exit.

(* -*- mode: tuareg; compile-command: "./build" -*- *)
(* something that lets you try out expressions *)
(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
open Bicicleta ;;
open Bicicleta_syntax ;;
open Bicicleta_lib ;;

let call_count_report () =
  let f key value tail = (value, key) :: tail
  in let calls_list = Hashtbl.fold f call_count []
  in (List.fold_left (fun t (v, k) -> t + v) 0 calls_list,
      List.sort (Pervasives.compare) calls_list) ;;

let rec main inp outp = 
  try
    output_string outp "'()' = "; flush outp;
    (try 
        let inval = input_line inp in
        let inexpr = parse inval 
        in let rv = eval_with_lib inexpr
        in let show = eval (Add("rv", rv, Phi)) (parse "rv.show.userdata")
        in 
             output_string outp (show_bicobj show ^ "\n");
          flush outp;
      with
          Parsing.Parse_error -> output_string outp "parse error\n"
    );
    main inp outp
  with End_of_file -> 
    let total, per_method = call_count_report ()
    in List.iter (fun (v, k) -> 
      output_string outp (k ^ ": " ^ string_of_int v ^ "\n"))
      (List.rev per_method);
      output_string outp ("total method calls: " ^ string_of_int total ^ "\n")
;;
Hashtbl.clear call_count ;;
print_endline "Bicicleta version 3, Copyright (C) 2007  Kragen Javier Sitaker
Bicicleta comes with ABSOLUTELY NO WARRANTY; for details see the file
\"COPYING\".
This is free software, and you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
" ;;
main stdin stdout ;;

Thu, 22 Mar 2007

Contents
--------

Introduction
Extensions
Scoping
* Syntactic sugar for list comprehensions, using "higher-order" patterns
* Explicit declarations
* Implicit pattern augmentation
Global Scope
SIMD
Prefix syntax: An Enlightening Syntactic Digression
A More Extended Example: A Recipes File
Efficiency
Connection To SnikiSniki

Introduction
------------

I was reading Wouter van Oortmerssen's brilliant thesis again, and I
had an idea.  His Aardappel language is a linear eager
(innermost-first) tree-rewriting system, and he points out that
although it doesn't support any kind of inheritance, you can
substitute new kinds of objects for old ones, simply by adding new
rewrite rules for the functions that need to be successfully
applicable to the new objects.

So I got to thinking.  What if we take that approach to the extreme?
Suppose you could define properties on records in terms of rewrite
rules on those records' properties?

    { x: X, y: Y }.r = (X*X + Y*Y).sqrt

That would define an "r" method, or property, on any record that had X
and Y properties.  But you could nest the pattern more deeply, use
constants, and just use the property names to name the property values
in cases where it wasn't ambiguous; here's an example taken from my
toy APL in OCaml (recently posted to kragen-hacks, I think) that shows
how to render APL expressions:

    { unop, value }.show = Unop + " " + Value.show
    { atom_value }.show = Atom_value.show_atom
    { parenthesized_value }.show = "(" + Parenthesized_value + ")"
    { left_op: { atom_value } as Left_op, bin_op, right_op }.show
        = (Left_op, Bin_op, Right_op).show_binop
    { left_op: { parenthesized_value } as Left_op, bin_op, right_op}.show
        = (Left_op, Bin_op, Right_op).show_binop
    (L, O, R).show_binop = L + " " + O + " " + R
    { left_op, bin_op, right_op}.show 
        = { parenthesized_value: Left_op }.show + " " + Bin_op + Right_op.show

    [].show_atom = "()"
    [N, M, ...Lst].show_atom = N.show_num + " " + [M, ...Lst].show_atom
    [N].show_atom = N.show_num

Here I'm using [N, M, ...Lst] to mean a list whose first two items are
N and M and whose remainder is Lst, a la Lisp (N . (M . Lst)), Prolog
[N|[M|Lst]], or OCaml N :: M :: Lst.  Without any syntactic sugar, you
could still write it as { car: N, cdr: { car: M, cdr: Lst } }.

Here's a fragment from my prototype Bicicleta implementation showing
the use of a constant ("nodefs") in a pattern:

    { method_name, method_body, rest: nodefs }.definition 
        = (Method_name, Method_body).show_method
    { method_name, method_body, rest }.definition 
        = (Method_name, Method_body).show_method + ", " + Rest.show_methods

(The real code is in OCaml; that's just a translation.)

Presumably you'd want to namespace the property names by some other
mechanism to reduce spurious pattern matches.

This pattern-matching syntax is less concise than the Caml or
Aardappel equivalent, but it seems that it should make it easier to
define new kinds of data that implement multiple protocols.  It is
probably harder to implement efficiently, though.

However, you could imagine that many pattern matches will be for the
simplest cases, where we're only interested in the properties of a
single record and have no particular requirements for them.  In this
case, you could even omit the entire record!  So, for some of the
previous examples, you could write:

    r = (X*X + Y*Y).sqrt
    show = Unop + " " + Value.show
    show = Atom_value.show_atom
    show = "(" + Parenthesized_value + ")"
    { left_op: { atom_value } as Left_op, bin_op, right_op }.show
        = (Left_op, Bin_op, Right_op).show_binop
    { left_op: { parenthesized_value } as Left_op, bin_op, right_op}.show
        = (Left_op, Bin_op, Right_op).show_binop
    (L, O, R).show_binop = L + " " + O + " " + R
    show = { parenthesized_value: Left_op }.show + " " + Bin_op + Right_op.show

Perhaps you could shorten it further by inferring extra required property
names even in cases where the pattern record is not entirely omitted:

    { left_op: { atom_value } as Left_op }.show
        = (Left_op, Bin_op, Right_op).show_binop
    { left_op: { parenthesized_value } as Left_op }.show
        = (Left_op, Bin_op, Right_op).show_binop

By itself, the ability to define these rewrite rules, create records
with properties whose names are known at compile time, and read properties
whose names are known at compile time, suffices for a Turing-complete
higher-order functional programming language; the rest of the above
(with infix operators, tuples, lists, and so on) can be viewed as
syntactic sugar.  (X + Y might rewrite as {left: X, right: Y}.sum, for
example.)

(The "as Left_op" requires some explanation; it means that the pattern
on the left side of the "as" should be bound to the name on the right
side, in the above cases the objects that matched {atom_value} and
{parenthesized_value}, which might have arbitrary other data in them.
I'm not sure if this feature is more than just syntactic sugar, but I
suspect so.)

In a way, it has a very APLish feel to it --- r = (X*X + Y*Y).sqrt,
thought of as a statement, simultaneously creates a new "column"
called "r" for all the values known as "x" and "y" (when they are on
the same records).  In Bicicleta, you can occasionally reach that same
level of brevity, but only inside the context of the object.

Extensions
----------

There are other obvious extensions --- unification, properties with
property names not known at compile-time (in patterns, property
accesses, and even property definitions) and inheritance, but these
are not necessary --- not even inheritance!  Defining a new property
will automatically create that property on any record having the
necessary attributes, which gives you a sort of multiple inheritance
already.

I was thinking that this might be an interesting basis for an
end-user-oriented database.  Doing the equivalent of SQL's 
SELECT ... WHERE would require a way to name attributes, preferably
anonymous ones, so that you could say e.g. "(R < 4).where" or (with an
infix operation "where") "Mypoints where (R < 4)", with "(R < 4)"
evaluating to an anonymous function or anonymous property that is True
for records whose R is less than 4, and False otherwise.

Scoping
-------

But the database idea in the "extensions" section introduces multiple
name scopes in the same expression: R presumably comes from each
point, and Mypoints presumably comes from some other context --- it
clearly can't come from each point.  And we probably don't want to
restrict where-expressions to contain only constants and properties of
the queried items --- and if we're using the set of properties
mentioned in each subexpression to determine when they are even
applicable, we have an untenable situation.  Here are the options I
have come up with.

* Syntactic sugar for list comprehensions, using "higher-order" patterns

The language as described earlier is already powerful enough to handle
this sort of query without the ability to do things like apply
anonymous functions.  Here the first four lines define a "filter"
function, the fifth line defines an rlt4 "function", and the sixth
line is the translation of "Mypoints where (R < 4)".

    ([], _).filter = []
    ([A, ...As], F).filter = ((F, A).apply, As, F).filternext
    (true, As, F).filternext = [A, ...(As, F).filter]
    (false, As, F).filternext = (As, F).filter
    (rlt4, A).apply = A.r < 4
    (Mypoints, rlt4).filter

You could add syntactic sugar so that you could write 
(Mypoints, (A -> A.r < 4)).filter or 
or [A in Mypoints if A.r < 4] that translated into the above.  A more
general list-comprehension would also support 
[A for A in Mypoints if A.r < 4], and perhaps also multiple sequences to
loop over.

Still, it would probably be better to be able to write that with
get-property-by-name, as follows:

    ([], _).filter = []
    ([A, ...As], F).filter = (A.F, As, F).filternext
    (true, As, F).filternext = [A, ...(As, F).filter]
    (false, As, F).filternext = (As, F).filter
    A.rlt4 = A.r < 4
    (Mypoints, rlt4).filter

Because then you could say (Mypoints, is_visible).filter.  Maybe that
doesn't matter if you can say [A in Mypoints if A.is_visible].

In the language as I've discussed it so far, (rlt4, A).apply (or
A.rlt4) is still defined for records that don't have an r --- the
left-hand-side pattern doesn't mention r, and the right-hand side
doesn't mention R (which would cause an inferred r on the
left-hand-side).  But presumably "A.r < 4" would evaluate as 
"error < 4" or "nil < 4" or something, and that should presumably not
get ignored by default.  But maybe you could write a different kind of
filter that did ignore it.

Maybe you could also write ({r} -> R < 4), in which case 
(rlt4, {}).apply would fail to match (rlt4, {r}).apply, and your
evaluation would get stuck in the middle when 
((rlt4, {}).apply, As, F).filternext failed to reach normal form.
Probably that should also cause it to return an error.  Unlike
Aardappel, this language family does distinguish between data
structures and functions at a basic level, which would give it an
excuse to return an error.

The anonymous-property syntactic sugar would probably benefit from
multiple pattern-match cases, so you could write 
({r} -> R < 4 | {nonpolar} -> false) or some such.

Anyway, with this kind of filtering, you could add attributes to
master records from joins --- using Avi Bryant's example from
DabbleDB, where you have a table of talks with presenter names, and a
table of presenters with presenter names and biographies:

    presenter_rec = [P in Presenters if P.name == Presenter].first
    presenter_bio = Presenter_rec.bio

* Explicit declarations

Here's a second way out.  In Python and Tcl, you can only assign to
variables in the innermost syntactic scope.  You could take an
analogous approach and only implicitly introduce new pattern elements
when a previously undeclared variable was mentioned, and then
introduce it automatically in the innermost scope, in which case you
could write "Mypoints where (R < 4)" but you wouldn't get the right
effect from "Presenters where (Name == Presenter)" --- that would
return records that had two properties, name and presenter, with the same
value.  But you could write

    {Presenter}.presenter_rec = (Presenters where (Name == Presenter)).first
    presenter_bio = Presenter_rec.bio

* Implicit pattern augmentation

You could, also, just declare that property accesses implicitly
augment the pattern match, and then you could write

    Talk.presenter_rec = (Presenters where (Name == Talk.presenter)).first

It would be somewhat undesirable to ignore all properties whose
computation asked for an undefined property, which I think would be a
result of this approach.

Global scope
------------

I didn't mention where the variable Presenters comes from in the
above.  I assumed it came from some global scope, not from every
record.  Maybe it should have some Ruby-style sigil to indicate that.

SIMD
----

Having properties that might have values of vectors of other records
could simplify the process, by making query screens, tables of detail
records, and the like, just ordinary records.  The ().first thing in
the example above is ugly; it would be nicer to say this instead:

    presenter_rec = [P in Presenters if P.name == Presenter]
    presenter_bio = Presenter_rec.bio

For that to work, though, the .bio property access has to
automatically map over whatever "where" returns --- which probably
means that all properties should be potentially multivalued, as in
Prolog, Icon, Pick, Lotus Agenda, or perhaps APL.  This suggests the
need for either a small and well-defined set of properties that apply
to the entire collection rather than each item, or some special syntax
for applying any property to the entire collection rather than each
item.  I'm going to ignore that problem for now and just let some
methods apply to the whole thing, while others apply only to part of
it, the same mess most of those languages have.

This could also provide a neat solution to clashing rewrite rules,
although only time will tell if the neat solution is also a useful
solution --- it might be preferable to be able to do the equivalent of
overriding a method in a subclass so that the base class method is
ignored, by providing a more specific rewrite rule.

Prefix syntax: An Enlightening Syntactic Digression
---------------------------------------------------

Suppose I rewrite some of my previous examples with prefix syntax.

    r { x: X, y: Y } = sqrt(X*X + Y*Y)
    r = sqrt(X*X + Y*Y)

    show { left_op: { atom_value } } = show_binop (Left_op, Bin_op, Right_op)
    show_binop (L, O, R) = L + " " + O + " " + R
    show = show { parenthesized_value: Left_op } + " " + Bin_op + show Right_op

    show_atom [] = "()"
    show_atom [N, M, ...Lst] = show_num N + " " + show_atom [M, ...Lst]
    show_atom [N] = show_num N
    definition { rest: nodefs } = show_method (Method_name, Method_body)
    definition 
        = show_method (Method_name, Method_body) + ", " + show_methods Rest

    filter (_, []) = []
    filter (F, [A, ...As]) = filternext ((F, A).apply, F, As)
    filternext (true, F, As) = [A, ...filter(F, As)]
    filternext (false, F, As) = filter (As, F)
    apply (rlt4, A) = A.r < 4
    filter (Mypoints, rlt4)

    presenter_rec = [P in Presenters where name P == Presenter].first
    presenter_bio = Presenter_rec.bio

This doesn't change the semantics at all, but it ought to look
familiar to users of Haskell or OCaml; now the "methods" look like
functions.  There are only a few important differences:

1. The patterns are defined, not on the structural representation of
   the objects, but on the set of functions applicable to those
   objects.  Remember, the list, tuple, and infix notation is just
   syntactic sugar --- in the list case, [N, M, ...Lst] means { car:
   N, { car: M, cdr: Lst } }, in the tuple case, (true, F, As) means {
   n: 3, arg1: true, arg2: F, arg3: As }; and in the infix case, (X*X
   + Y*Y) means ((X, X).'*', (Y, Y).'*'),'+'.  So you can always
   define new objects that implement whatever protocol is desired for
   some existing operation, unlike in OCaml.

   This probably implies that the process of figuring out how and
   whether a function can be applied will resemble some kind of
   deduction system.  The simplest solution is probably to maintain
   the set of functions applicable to any particular object.
   (Fortunately, in the language so far presented, this doesn't
   require actually running any of the functions --- just examining
   their dependencies.  See "Efficiency".)

2. The objects' "contents" are really point-wise overrides of
   functions perhaps not otherwise applicable to those objects.  You
   can view a definition like f = { x: 1, y: 2 } as syntactic sugar for
   the following:

   x g23132 = 1
   y g23132 = 2
   f = g23132

   Except that { x: 1, y: 2 } is potentially garbage-collectable,
   while g23132, taken literally, would not be.

3. You can define a new pattern-action rule for an existing function
   anywhere.  This probably implies some kind of specificity ordering,
   as in Aardappel, and some kind of feedback about when it's
   applicable.

A More Extended Example: A Recipes File
---------------------------------------

Suppose you have a recipe file of the following form:
{recipes: [ { 
    instructions: "This is how we do it..."
    ingredients: [ {ingredient: "celery", quantity: 3, unit: "stalk"}
                   ...],
    servings: 4,
  } ...]
}

You can imagine a bunch of queries that might not be too hard to write:

    is_celery = Ingredient == "celery"
    celery_ingredients = [I in Ingredients if I.is_celery]
    {celery_ingredients: [N, ...Lst]}.has_celery = true
    celery_quantity = Celery_ingredients.quantity.sum  # assuming SIMD auto-map
    celery_per_serving = Celery_quantity / Servings
    # It would be cool to be able to define quantity_per_serving as a
    # property of each ingredient, but that requires access to the
    # surrounding context.
    ingredients_per_serving = (Ingredients, Servings).ing_divide
    (List, Divisor).ing_divide = [{
           ingredient: Item.ingredient, 
           quantity: Item.quantity / Divisor,
           unit: Item.unit
    } for Item in List]  # assuming list comprehensions
    # This next item only applies when you add a Desired_servings field to a
    # particular recipe.
    ingredients_for_desired_servings = 
        (Ingredients, Servings / Desired_servings).ing_divide
    calories_per_unit = [Nut.calories for Nut in Nutrition_database if
        Nut.ingredient_name == Ingredient && Nut.ingredient_unit == Unit]
    calories = Calories_per_unit * Quantity
    calories = Ingredients.calories.sum
    calories_per_serving = Calories / Servings

    cost_per_gram = [Item.cost for Item in Latest_shopping_price if
        Item.ingredient_name = Ingredient]
    cost = Cost_per_gram * Grams
    grams = [[Conversion.factor * Quantity for Conversion in Conversions if
        Conversion.from == Unit && Conversion.to == "grams"],
        [Density.factor * Quantity for Density in Food_densities if
        Density.per_what == Unit && Density.units == "grams"]].coalesce
    cost = Ingredients.cost.sum
    cost_per_serving = Cost / Servings

    # Search for a recipe to use up the leftovers in the fridge:

    ({ingredients}, ingredient).contains_ingredient = 
        [I in Ingredients if I.ingredient == Ingredient].any
    [Term, ...Terms].search_results = 
        [Recipe for Recipe in Recipes if 
        [(Recipe, Ingredient).contains_ingredient 
            for Ingredient in [Term,...Terms]].all]

Most of these should be no harder to write in Bicicleta, but being
able to define methods "out-of-line" like this might have its
advantages.

Efficiency
----------

For a given set of method definitions, the set of methods that apply
to a particular object can be mostly statically computed from the set
of methods that are assigned values for that object; we can call that
set the "base object shape".  A finite program contains a finite set
of base object shapes, at most one for each object literal in the
program, and probably usually a lot less, so you can precompute most
of the applicable method definitions for each object shape.

It is possible to define methods that exist on some objects of a
particular shape, but not others.  For example, if we have only this
definition:

    [N, M, ...Lst].show_atom = N.show_num + " " + [M, ...Lst].show_atom

then some objects of shape {car, cdr} will have show_atom defined,
while others will not, depending on what their cdr is; and some of
those that have it defined will have an error when they try to call
show_atom on their cdr.  (If we take the suggestion from earlier that
we augment the pattern on the left-hand side so that the call on the
right-hand-side cannot fail, then we end up with a pattern that
matches only infinite lists.)

Adding this definition helps matters a bit:

    [N].show_atom = N.show_num

Now any object of {car, cdr} whose cdr is either of shape [] or {car,
cdr} has a match.  It seems plausible that some kind of type inference
might be able to move this kind of pattern-matching to compile-time,
leaving only a single conditional branch to run-time.

Effectively, this proposal suggests inferring a set of classes in a
system from a set of objects and method inference rules, in order to
be able to use efficient lookup methods.

Some kind of "cut", as in Prolog, is probably also necessary.  In much
of the above, I've assumed that only the most specific method
definition ever applies, which is a kind of "cut".

Connection To SnikiSniki
------------------------

In Darius Bacon's SnikiSniki, you can make little tables out of little
conjunctions of clauses, which Prolog-style look for solutions in the
triple database; if you say

    [[Person parent Parent, Parent parent Grandparent]]

then you get a table of people with their parents and grandparents.
In SnikiSniki, this doesn't create a new "grandparent" relationship,
but you could imagine that it could.

You could express something like the above in this approach as
follows:

    {parent: {parent: Grandparent}}.grandparent = Grandparent

Most pattern-matching languages (OCaml, Haskell, etc.) have some kind
of "as" feature that lets you give a name an intermediate part of the
pattern-matching tree, so that you don't have to write

    [N, M, ...Lst].show_atom = N.show_num + " " + [M, ...Lst].show_atom

and can instead write

    [N, ...[M, ...Lst] as Rest].show_atom = N.show_num + " " + Rest.show_atom

(the point of the M here is to ensure that there are at least two
elements in the list --- we don't want a space at the end of the
list's show_atom.)

With this feature, you could write

    {parent: {parent: Grandparent} as Parent}.grandparent 
        = (Parent, Grandparent)

and get a closer match to the SnikiSniki semantics.

I have often thought that SnikiSniki would be dramatically more
powerful if you could define object property inference rules like
this, and more convenient if you could leave some columns out.
However, SnikiSniki's expressions are strictly more powerful, because
they allow pattern-matching of arbitrary graphs, not just DAGs:

    [[Selfdealer owns Foundation, Foundation givesGrantsTo Selfdealer]]
    [[Doctor caresFor Patient, Nurse caresFor Patient, 
        Doctor is doctor, Nurse is nurse]]
    [[Person parent Mother, female isSexOf Mother]]

I suspect that for cases where the QBE-like equational rewriting
approach applies, it is generally terser and easier to understand.

You could gain back the expressive power of SnikiSniki where it's
needed, at some additional cost to terseness and readability in these
cases, by defining the patterns with unification (as SnikiSniki does)
and allowing multiple object-tree fragments on the left side:

    {owns: Foundation} as Selfdealer,
        ({givesGrantsTo: Selfdealer} as Foundation).self_dealer = Selfdealer
    {caresFor: Patient, is: doctor} as Doctor,
        {caresFor: Patient, is: nurse}.works_with_doctor = Doctor
    female.isSexOf = Mother, {parent: Mother}.mother = Mother

These last examples rather imply that the attributes are all
multivalued, which is clearly part of Sniki but only potentially part
of an equational rewrite system on properties.

Tue, 20 Mar 2007

Dave Long writes:
> > Bytecode interpreters can offer unsurpassed code compactness, ...[This  
> > box has] only 2MiB of L2 cache, and presumably something like
> > 16-64KiB of L1 cache.  Thrashing the cache is soundly punished.
> 
> One problem, as you point out earlier, is that the question is not
> size of native code loop vs. size of bytecoded loop, but size of
> native loop vs.  size of (bytecoded loop + its working path through
> the bytecode interpreter).
> 
> I haven't used MSVC in ages, but their compiler used to have the
> option to compile to a bytecode -- still, I think this was only ever
> useful for space, not for speed.  (for that, one must try to arrange
> the functionality to ensure only the outermost loops are being
> interpreted)

The way I remember Joel Spolsky telling the story, when he was on the
Excel team, they maintained their own C compiler specifically in order
to be able to compile to bytecode --- which was crucial to their
ability to compete on the memory-constrained machines of the early
1990s.  (I think this period may have included the RAM bubble, where
memory stayed at $40 a meg for four years.)  Maybe the feature was
integrated into MSVC so the Excel team wouldn't have to maintain their
own C compiler any more.

I have a vague memory that some of the Symbolics folks said that when
they reimplemented their 36-bit CPU as an Alpha AXP program
("OpenGenera"), it fit in the L1 cache of the Alpha, and it ran Lisp
programs about as fast as you'd expect microcode to run on a machine
with that kind of clock speed.

> > I know of no other reason to implement an interpreter using bytecode.
> >
> > So I'm surprised it's such a popular thing to do!  I think the reason
> > is probably that code space and compilation time used to be quite
> > precious resources (not to mention portability), and programmers just
> > haven't adjusted to the new realities.
> 
> Architecture Neutral Debugging Format.

Good point!

> Generating native code is not that difficult -- at least it doesn't
> take much to generate code that will beat an interpreter.  Debugging
> native code, however, can be a real pain (one either has to
> understand the debugging facilities of both the processor and the
> OS[0], or one has to understand the debugger's favorite symbol
> format and provide a decent map to it -- for bass, I was able to
> provide some simple gdb macros as long as the only register used was
> a TOS cache, but that broke immediately after providing register
> allocation)

Maybe this accounts, in part, for the great interest in static
correctness proofs in the functional programming community?  OCaml has
a reportedly very nice debugger that works only with its bytecode, not
with its native code, and they simply try very hard to ensure that
they have the same behavior so you never have to debug the native
code.  Having a static type system that permits type erasure without
loss of safety helps with this.

It's too bad gdb hasn't acquired a decent scripting language, all
these years after the Plan9 paper (was it "Acid: a debugger built from
a language"?)

If I remember correctly, bass started out using ColorForth as a basis?
I'm afraid it's been a number of years since I looked at it.  Is there
a place to see it these days?

> [0] it is somewhat instructive to look at DEBUG.COM (an artifact of
> 1980's era PCs still shipped with Redmond OS's to this day) for an
> example of how spartan an IDE can be.

There's the apocryphal story of Leonard Zubkoff implementing a
debugger in a weekend for a new platform, rather than waiting weeks
for someone else to do the job; I wonder how much more or less his
debugger contained than DEBUG.COM.

> Usually one wants an assembler to remember labels, and a debugger to
> remember breakpoints.  DEBUG.COM will turn opcodes into machine
> language, but you're on your own for labels (leading to a style in
> which one tries to minimize entry points, turning code into a
> collection of "loops with tails" and requiring utilities similar to
> BASIC's RENUM)

You can avoid needing RENUM by patching in your additions with jump
instructions, exacerbating the label problem.

What do you mean, "loops with tails"?  You mean that you don't tend to
factor the code into a lot of small functions?

I wonder why DEBUG couldn't rememember labels or breakpoints.
FIG-Forth ran on the same hardware and remembered the names of every
routine in the system.  Maybe DEBUG had to fit into a tiny static
corner of a 64K memory segment, to avoid limiting its applicability to
small programs?  But it isn't restricted to disassembling or debugging
tiny-model programs.

> Similarly, DEBUG.COM will handle all the machine level hassle of
> setting breakpoints in code, taking the interrupt, and then
> restoring the original code -- but you type in the list of
> breakpoints at each step.  Once upon a time, people actually
> developed apps with these bear skins and stone knives.

An acquaintance entered an ACM programming competition, back in the
early 1990s; the programming tools available on the competition
machine included Pascal compilers, but no compiler for any language he
knew well.  So he tried creating his entries in DEBUG, and he seemed
to be doing OK for a while, but there was a problem when the judges
asked to see the source code.  They were not impressed when he
disassembled the object code.

> [1] bytecode also makes "eval" almost as trivial as in assembly.  Too much  
> dynamism can get one in trouble, though.  Consider the "house of cards"  
> cartoon on p.112 of the Smalltalk green book: "here, let me modify Array  
> At:"
> http://www.iam.unibe.ch/~ducasse/FreeBooks/BitsOfHistory/BitsOfHistory.pdf

Added to my list of things to download next time I get online.

> In that situation, bytecode can be the difference between hosing a
> process and triple-faulting a box.

Have you been following Ian Piumarta's Pepsi/Coke/Jolt/Albert work?
He's very much focused on questions like how to allow you to modify
Array>>#at: at runtime without breaking everything.  (Of course, if
your modification is broken, you will still have problems...)

    http://piumarta.com/pepsi/objmodel.pdf  (the most comprehensible)
    http://piumarta.com/papers/colas-whitepaper.pdf
    http://stanford-online.stanford.edu/courses/ee380/070214-ee380-300.asx
    (I haven't seen that one yet)

There are some more details and some broader context in

    http://www.vpri.org/html/work/NSFproposal.pdf
    http://www.peerbox.com:8668/space/Albert
    http://irbseminars.intel-research.net/AlanKay.wmv

I'm not sure that the difference between hosing a process and crashing
a box (triple-faulting?) necessarily lies in bytecode versus native
code; if the Array>>#at: implementation is shared among a lot of
different processes, the machine is still going to be fairly useless
if you break it, while if each process has its own memory space and
(at least potentially) its own Array>>#at:, recovery may be easier.

The Art of the Metaobject Protocol, which Andrew Cooke was kind enough
to lend me when we visited him last month, talks a bit about the
metastability issues involved in implementing CLOS within CLOS ---
mostly ensuring that various recursions have a base case.  I think Ian
Piumarta's approach to the problem looks simpler, though.

> > The OCaml version is 24 instructions, 8 of which have immediate
> > constants.  I don't know very much about PowerPC assembly, but let's
> > suppose that every instruction is 32 bits, including any immediate
> > constants; that means the whole function weighs 96 bytes.
> 
> Using gas or "objdump -D -b binary" would help if you want more accurate  
> numbers.  If I remember properly, you're correct about instructions, but  
> not necessarily immediates.

I don't know where to get gas or objdump for this Mac without four
gigabytes of other development environment cruft, which is part of why
I've been using OCaml.

> > [the MuP21] executes a stream of 5-bit zero-operand two-stack
> > operations packed into 20-bit words.
> 
> For more along these lines, see http://www.jwdt.com/~paysan/b16.html

That's a fascinating page!  In the b16-small sources listed there,
there's a LyX paper for EuroForth 2004, which seems to mention a new
Forth core from Chuck Moore called the "c18", described in a paper
cited as "c18 ColorForth Compiler, Chuck Moore, 17th EuroForth
Conference Proceedings, 2001".

I also think that's the first 8-page conference paper I've ever seen
that includes a complete and working (?) synthesizable CPU design (in
noweb and Verilog) in the text of the paper.  (It is synthesizable,
isn't it?  My Verilog is not up to snuff.  The actual Verilog seems to
be about 240 lines.)

It's too bad that he doesn't report on how much FPGA space it needed
or how fast it ran.

> But these sort of games are better played in hardware than in
> software -- hardware is great at parallel things (instruction
> decoding) and pretty weak at serial things, while software is the
> opposite.  (this was a traditional reason for bytecode -- to
> minimize work for dispatch) FSMs seem to be where the two meet: to
> add behavior to hardware (think microcode) or to add
> parallel-pattern-matching to software (think parsers), one tends to
> wind up synthesizing state machines.

That sounds like a pretty fundamental insight.

I wonder if the Symbolics microcode was horizontal?  If they used
vertical microcode while the Dorado was horizontal, maybe that might
explain the difference between their perception of Moore's Law and
Alan Kay's --- Kay consistently complains that modern CPU
architectures run Smalltalk and other late-bound things slowly, which
is sort of the opposite of the apocryphal Symbolics comment I cited
earlier.

Aristotle Pagaltzis writes:
> * Kragen Javier Sitaker <kragen at pobox.com> [2006-11-11 09:37]:
> > # Generate a really big page of small inline JPEG images from a
> > [fugly code omitted]
> A cleaner version that crawls the filesystem without external aid:
> [most of cleaner version omitted too]
>         open my $fh, '<', $filename or die "Can't open $filename: $!";

You don't like use Fatal?  :)

> [rest of cleaner version omitted]

I agree --- that is considerably easier to read!

> This code uses a minor trick: the <> operator will open and read
> all files listed in @ARGV sequentially, so the code stuffs the
> names of the data files of interest into that array, then turns
> them into URL file names, then uses the diamond operator to read
> them.

Yeah, I think that part would have taken me a while to figure out
without the explanation --- although I've used while (<>) a thousand
times, I've never used it to read a bunch of filenames the Perl
program itself had stuffed into @ARGV.  I suppose it would eventually
have dawned on me that you intended the stuff going into @ARGV to get
used for something eventually, and that the "while <>" wouldn't do any
good if it was trying to open and read the directory
/var/cache/wwwoffle/http.

> But the punchline is the implementation of this same thing in
> terms of the File::Find::Rule module from the CPAN:
>     #!/usr/bin/perl
>     use strict;
>     use warnings;
>     use File::Find::Rule;
>     # perl this-script /var/cache/wwwoffle/http > tmp.html
>     @ARGV = File::Find::Rule
>         ->name( 'D*' )
>         ->file
>         ->grep( qr/ \A Content-Type: [ ]* image/jpeg \r\n /ix )
>         ->in( @ARGV );
>     s[ \A (.*) /D ]{ $1 . '/U' }ex for @ARGV;
>     print qq(<img src="$_" width="128" height="128" />\n) while <>;
> You can see that F::F::R is the bee’s knees.

Now that, that is quite remarkable.

* Richard Maxwell Underwood <ru at river.org> [2007-03-11 22:00]:
> What perl program does not use tricks?
> 
> Perl programmers are tricksy :)

Hehe. :-) Actually I use such tricks only when they really buy a
significant amount of conciseness, which usually requires that
the rest of the code be pretty short in relation to the tricksy
code. Such tricks generally reduce code length by increasing
coupling along some axis, so in modules or larger programs, it is
more sensible to make the part in question more verbose but also
amenable to abstraction and decoupling.

-- 
*AUTOLOAD=*_;sub _{s/(.*)::(.*)/print$2,(",$\/"," ")[defined wantarray]/e;$1}
&Just->another->Perl->hack;
#Aristotle

Mon, 19 Mar 2007

* A. Pagaltzis <pagaltzis at gmx.de> [2007-03-12 14:50]:
> Unfortunately, I find it runs significantly slower than the
> Perl version regardless of which Parrot “runcore” I use and
> whether or not I enable optimisations. The CGP core is the
> fastest (faster than the JIT core) at ~8.6s user time on my
> system whereas the Perl version runs in ~6.3s (per bash’s
> `time` builtin).

I forgot (for the sake of future reference): this comparison is
between Perl 5.8.8 and Parrot 0.49.

The perl is from the Slackware package, ie a stock compile
without patches, configured non-threaded and with conservative
optimisation, but no other chanegs to the defaults.

Parrot is a stock compile configured with optimisations but no
other changes to the defaults.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Kragen Javier Sitaker writes:
>                                                            It seems
>to me that lazy evaluation only requires whole-program analysis when
>you want to know how fast something is or how much memory it needs,
>while mutable variables potentially require whole-program analysis to
>know what value some function computes.  (There are various ways to
>limit the scope of analysis required, of course, so whole-program
>analysis is not always needed.)
>
>To the kind of functional programmer who "knows the value of
>everything and the cost of nothing," the above paragraph should make
>it clear that lazy evaluation is superior.  But, myself, I'm often
>just as interested in the speed of my code as in whether it's correct,
>partly because I've found that it's often easier to make incorrect
>code correct than to make slow code fast.

What Kragen writes here is a good two-paragraph summary of the 
advantages and disadvantages of the functional programming paradigm.

I, too, am frustrated by the lack of control over characteristics
like execution time and have engaged in (amateur) research to try
to preserve some of the benefits -- particularly, referential
transparency -- of functional programming while allowing more of
the control over execution time available to imperative
programmers.  (The research has arrived at a style in which
programs use the IO monad very heavily.)

Richard

Benjamin Sergeant writes:
> Follwing this post:
> http://lists.canonical.org/pipermail/kragen-hacks/2006-November/000441.html
> 
> I just gave a try to the emailer and am very interested in it. I am a
> pine user that would like to hook some more features, and hacking old
> C code from pine does not look like a cool option.

That's great!  I'm glad it's turning out to be useful to more than
just me.

I think pine additionally has the problem of the dispute about its
legal status.

I haven't been using it for the last couple of months because somebody
privatized my laptop while I was sitting in a café, so I've been using
my wife's --- but it doesn't have space for my current mailbox.
That's one reason I'm so far behind on my mail :)

> I created a project on google hosting about that:
> http://code.google.com/p/bine/

Cool!

> With some features or thought on an email client I'd like:
>  * LDAP
>  * easy gpg signing
>  * using an external editor
>  * better bogofilter integration
>  * send mail from command line (like sendmail)
>  * integrated smtp
>  * unicode handling
>  * contact completion (a la bash)

Those would all be awesome; the external editor and contact completion
are probably things I would use myself.  I did write some unicode
handling for it, which I think was probably in the version you looked
at, but I guess you had something additional in mind?

> Also I wasn't able to start the emacs composer that you are talking
> about.

Oh, the "r" command dumps draft replies into the file "tmp.replies",
delimited with ~m and ~e sequences, for use with some other
very-low-rent mail infrastructure I use (basically to avoid the hassle
that is SMTP these days).  I just edit the outgoing mail batch in
fundamental-mode in Emacs.

I'd really like to have a better way of doing replies, but I also
really like that all my outgoing mail is in a single Emacs buffer
(within a single batch, I mean).

> And my terminal is an iso-latin so some characters have a bad look
> (convert to unicode ?)

Well, I think non-UTF-8 encodings are evil, since they create the
necessity for encoding conversions, but I recognize that not everybody
agrees with me :)

It should probably be fairly straightforward to change the recoding
from utf-8 to iso-latin-1 or whatever you use.  Maybe it could be set
from some environment variables?

> A cool feature would also be to span a program associated with the
> attachment mime type when parsing attachments.
> For HTML mails, one solution may be to use lynx -dump.

Yeah, for my own use I mostly just want to save the files, but maybe
that wouldn't be such a bad idea in general.

> And I'd also wonder if your program could switch from one mailbox to
> another, by hitting the left arrow one more time from the thread mode.

I'd like to be able to display mail from multiple mailboxes all
together in the same screen, actually, so I can search through all of
it at once, and see mail I sent and mail I received all together ---
no doubt you know the value of this feature from using Gmail.

> In his main
>    signal.signal(signal.SIGWINCH, self.handler_resize)
> 
> With this handler:
>    def handler_resize(self, sig, frame):
>         # curses trickery
>         while 1:
>             try: curses.endwin(); break
>             except: time.sleep(1)
>         self.w.refresh()
>         self.win_root.resize()
>         self.win_root.update()
> 
> Maybe it can help.

Awesome!  Thanks!  I think it will.

> I see that you are using darcs, is there a way to clone your
> repository if I want to try to work on your code.

Um, I'm not sure if I've put it on the web yet!  Stupid oversight.  As
you probably already know, "darcs get URL" will clone a repository
that's on the web.  I guess I will put it at
http://pobox.com/~kragen/sw/repos/cursmail/ --- any suggestions for a
cleverer name?

Shae Erisson writes:
> "Richard Underwood/Uhtenwoldt" <ru at river.org> writes:
> > [quoting Kragen Javier Sitaker]
> >>   A simpler syntax with fewer levels of precedence.  Maybe
> >>   writing OCaml in S-expressions would be going too far, or maybe
> >>   not.  Clearly this would make it effectively a different
> >>   language.
> >
> > Do you know enough about Haskell to have a definite opinion about its
> > syntax?

Not really.  My vague opinion about its syntax is that it appears to
be a simpler syntax with fewer levels of precedence, and the
indentation thing sounds good.

> In my opinion, Haskell has advantages over OCaml in syntax,
> typeclasses, and community. I'd be happy to elaborate on any of
> these points if there's any interest.

I'm interested in your opinion especially on the syntax; I think I
know about the other two.

> OCaml has advantages over Haskell in ability to decrease execution
> time without much effort, but at the expense of abstraction.
> 
> In short, if you know C, you can make a roughly equivalent OCaml
> program that's anywhere from not quite as fast to faster than any
> given C program.

>From my limited OCaml experience so far, this seems to be true, the
OCaml code tends to be a great deal shorter (because more abstract)
and less likely to crash than the C.

> Haskell is, in my opinion, better at abstractions than OCaml, but at
> the expense of requiring mental whole program analysis in order to
> understand non-strict evaluation. Also in my opinion, mutable
> variables require more whole program analysis than non-strict
> evaluation, but I don't know how many people agree with me.

Maybe it depends on what you want to know about the program?  It seems
to me that lazy evaluation only requires whole-program analysis when
you want to know how fast something is or how much memory it needs,
while mutable variables potentially require whole-program analysis to
know what value some function computes.  (There are various ways to
limit the scope of analysis required, of course, so whole-program
analysis is not always needed.)

To the kind of functional programmer who "knows the value of
everything and the cost of nothing," the above paragraph should make
it clear that lazy evaluation is superior.  But, myself, I'm often
just as interested in the speed of my code as in whether it's correct,
partly because I've found that it's often easier to make incorrect
code correct than to make slow code fast.  It depends on the
circumstances.

Richard Underwood writes:
> So, write your unit test as
>     assert ([45; 50] = eval (atom [45; 50]));
>     atom x = Atom x
> Or am I missing something basic?

No, that's clearly what I should do, and it's just a
search-and-replace on the existing code.  I guess I was just annoyed
at how I inadvertently ended up committing to more than I knew when I
wrote the original unit tests, but a search-and-replace isn't that big
a cost to cancel that commitment.

Aardappel is a very inspiring language that has some important things
in common with Bicicleta:

- Both languages use a simple tree structure.
- There is no fundamental distinction between functions and data
  structures.
- Function arguments have example values.
- Consequently, the program is normally never unrunnable while
  editing, so you can evaluate any subexpression of your program
  in-place.
- Editing normally takes place by dragging bits of the program around
  with your mouse to make references to or copies of them.

But there are some differences:

- I'm not entirely satisfied that I know what Bicicleta's imperative
  side is going to look like.  Aardappel's approach to state (and
  concurrency!) preceded the rest of the language design, and is the
  most interesting part of the language.
- Subexpressions of an Aardappel program are not evaluated
  automatically while it is being edited; you have to ask for an
  expression to be evaluated explicitly.
- Aardappel uses positional parameters for many things; Bicicleta
  generally uses named parameters instead.
- You can't walk through the evaluation of an Aardappel expression
  step by step.
- Aardappel entirely rejects names for variables, while Bicicleta uses
  them extensively.
- Aardappel uses unification in place of conditionals and consequently
  can use eager evaluation; Bicicleta uses lazy evaluation and
  polymorphism in place of conditionals, and does not support
  unification or even pattern-matching.  It's possible that this is a
  mistake on Bicicleta's part.
- In Aardappel, while there isn't a fundamental distinction between
  functions and data structures, it generally isn't possible for a
  single object to be both.  In Bicicleta, it is.  This is less of a
  difference than it sounds like --- if you have an object X that's a
  data structure and you want to use it as a function, in Bicicleta
  you call a method on it, such as X() or X.width or something, and in
  Aardappel you would say value X or width X.  The difference is that
  Bicicleta lets you do this to any object, even one originally
  intended for use as a function.
- Aardappel doesn't have much in the way of separate namespaces.
  Bicicleta has so many namespaces you wouldn't believe it; a two-line
  dumb recursive factorial pogram creates five namespaces.
- Aardappel has an existing implementation that produces good code.

Sat, 17 Mar 2007

A few years ago I reinvented Naor and Shamir's visual one-time pad
system, although I didn't get all the way to their k-of-n secret sharing
scheme.  The interesting thing about this scheme is that it allows you
to decrypt the image by placing the key on top of the ciphertext and
looking at it --- no further computation is needed.

It occurred to me that it was possible to extend the scheme to grayscale
and thus make it easier to coregister the two images properly --- by
making the pixels bigger --- without losing perfect security.  I think a
bunch of other people also have discovered this, but I don't know which
ones yet.

Here's an insecure proof-of-concept.

It's necessary for perfect security that the random numbers have enough
resolution to erase any trace of the original numbers from the encrypted
image.  Quantizing the original numbers to 8 bits would allow you to get
by with 8 bits of randomness per pixel.

It's also necessary to use real randomness, not a PRNG like
random.random().

I wrote this on Bea's laptop, which doesn't have PIL, so I coded an
ad-hoc BMP file reader for the front end.

#!/usr/bin/python
# Read a BMP file and turn it into PostScript of multiple pages that
# contain the image encrypted.
import sys, random, struct
def decode_bmp_header(header):
    "Decode the header of a 24-bit BMP file."
    # here's the entry from /etc/magic I mean /usr/share/file/magic
    # # PC bitmaps (OS/2, Windoze BMP files)  (Greg Roelofs, newt at uchicago.edu)
    # 0	string		BM		PC bitmap data
    # >14	leshort		12		\b, OS/2 1.x format
    # >>18	leshort		x		\b, %d x
    # >>20	leshort		x		%d
    # >14	leshort		64		\b, OS/2 2.x format
    # >>18	leshort		x		\b, %d x
    # >>20	leshort		x		%d
    # >14	leshort		40		\b, Windows 3.x format
    # >>18	lelong		x		\b, %d x
    # >>22	lelong		x		%d x
    # >>28	leshort		x		%d

    # I think that means that at byte 14 there is a little-endian
    # short 40, at byte 18 there is a little-endian long for width, at
    # byte 22 there is a little-endian long for height, and at byte 28
    # there is a little-endian short for depth.
    # On a particular 333-pixel-wide, 500-pixel-high, 24-bit BMP, I get:
    # PC bitmap data, Windows 3.x format, 333 x 500 x 24
    (version,) = struct.unpack('<h', header[14:16])
    assert version == 40, version
    (width, height) = struct.unpack('<ll', header[18:26])
    (depth,) = struct.unpack('<h', header[28:30])
    assert depth == 24, depth
    return width, height

def read_bmp(fname):
    """Return the contents of a 24-bit BMP file as nested Python lists.
    
    Actually takes the average of the R, G, and B components, rather
    than returning them separately.
    """
    infile = file(fname)
    # my sample image has a 56-byte header; I hope that's true of all BMPsxs
    header = infile.read(56)
    width, height = decode_bmp_header(header)
    rv = []
    for ii in range(height):
        row = []
        for jj in range(width):
            pix = infile.read(3)  # assume 24-bit BMP
            rgb = [ord(c) for c in pix]
            avg = float(sum(rgb)) / (255*3)
            row.append(avg)
        rowbytes = width * 3
        padbytes = 4 - (rowbytes % 4)
        if padbytes == 4: padbytes = 0
        infile.read(padbytes)
        rv.append(row)
    return rv
    return [[float(ii)/width for ii in range(width)]] * height

def print_postscript(image):
    "Obsolete routine for testing --- prints a graycale PostScript image."
    print """%!
    % total dumb-ass PostScript to draw an image pixel by pixel
    /pix { setgray gsave 0 1 rlineto 1 0 rlineto 0 -1 rlineto closepath fill
           grestore 1 0 rmoveto } bind def
    /row { grestore 0 1 rmoveto gsave } bind def
    10 10 translate 1.5 dup scale 0 0 moveto gsave
    """
    for row in image:
        for pixel in row:
            print "%f pix" % pixel
        print "row"
    print "grestore showpage"

def make_pad(image, randomsource):
    # XXX note that this is not cryptographically secure without using
    # a cryptographically secure source of random numbers
    return [[randomsource() for pixel in row] for row in image]

def pad_image(image, pad):
    return [[(0.5 - image[ii][jj] * 0.5 + pad[ii][jj]) % 1 
             for jj in range(len(image[0]))]
            for ii in range(len(image))]
def print_postscript_boxy(images):
    """Prints the images given on top of each other in an obscured format.

    Each number gives the horizontal offset to provide to the empty
    part of a half-filled square.  Thus overlaying two images will
    show the absolute difference of their pixels mod 1, as long as it
    does not exceed 0.5.

    """
    width = len(images[0][0])
    height = len(images[0])
    maxscalex = 8.0 * 72 / width
    maxscaley = 10.5 * 72 / height
    scale = max(maxscalex, maxscaley)
    print """%!
    % Draw half-filled boxes for pixels with the fill being a vertical
    % half of the box, but rotated to the right by a specified amount.
    /pbox {
            dup 0.5 lt {  % draw two black boxes
                gsave dup 0 rlineto 0 1 rlineto
                      dup neg 0 rlineto closepath fill grestore
                gsave dup 0 rmoveto 0.5 0 rmoveto dup 0.5 exch sub dup 0 rlineto
                      0 1 rlineto neg 0 rlineto closepath fill grestore
            } {  % else draw one black box
                gsave dup 0.5 sub 0 rmoveto 0.5 0 rlineto 0 1 rlineto
                      -0.5 0 rlineto closepath fill grestore
            } ifelse pop
            1 0 rmoveto
    } bind def
    /row { grestore 0 1 rmoveto gsave } bind def
    18 dup translate
    """
    print "%f dup scale" % scale
    for image in images:
        print "0 0 moveto gsave"
        for row in image:
            for pixel in row:
                print "%f pbox" % pixel
            print "row"
        print "grestore"
    print "showpage"

if __name__ == '__main__':
    image = read_bmp(sys.argv[1])
    pad = make_pad(image, random.random)
    padded = pad_image(image, pad)
    print_postscript_boxy([pad])
    print_postscript_boxy([padded])
    #print_postscript_boxy([pad, padded])

Thu, 15 Mar 2007

* Kragen Javier Sitaker <kragen at pobox.com> [2007-02-26 09:40]:
> The advantage of mandatory information hiding is that this task
> becomes simpler. If I'm changing the semantics of a private
> method in C++, I need only look within that method's class for
> calls to the method; I don't need to look anywhere else.
> 
> I have occasionally wished for this feature in Perl on a
> 100 000-line project.

This is indeed a weakness. In Perl circles, a common slogan is
that Perl prefers that you stay out of someone else’s living room
because it’s the polite thing, not because they’re standing there
with a loaded shotgun. When I first heard this, I eagerly agreed
with it and scoffed at the notion that mandatory controls were
necessary. Just use a convention, they said, that lets people
know which things are private; if they insist on going there
anyway they’ll eventually get what they asked for.

I still agree with the basic stance. It falls broadly in the
scope of “freedom vs. safety” (as Kevin Barnes places it). Stated
in the absolute the way Perl folks like to put it, however, is
naïve. The problem isn’t how to keep away the people that won’t
stay away; it’s how to keep yourself from spilling into all of
your surroundings. F.ex., because it’s not possible to have a
private method in Perl, only private-by-convention methods, the
author of any subclass must be aware of the names of the private
methods in all of its superclasses, so as not to inadvertently
redefine one of them. This is not just a theoretical scenario,
since subclasses are closely related to their superclasses. So
they may well have private methods for related things, which may
then well suggest the same name.

Of course, Perl being Perl, you can actually get semantics out of
it that it doesn’t initially seem to provide, if you work for
them. This is much like getting useful OOP out of the language,
only in this case, the solution is straightforward: use anonymous
functions stored in file-scoped lexicals for private stuff.

    # canonicalises the parameters of public methods
    # based on some object-internal state
    my $check_foo_flag = sub {
        my $self = shift;
        # ...
    };

    sub do_some_public_operation {
        my $self = shift;
        my ( $foo, $bar, $baz ) = $self->$check_foo_flag( @_ );
        # ...
    }

Note that using the `$invocant->$subref()` syntax disables
dynamic lookup – which happens to be exactly what we want for
private methods.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Summary
-------

My OCaml-hosted Bicicleta interpreter is now to the point that it can
run programs with integers and booleans, but it's about 8000 times
slower than the OCaml interpreter at doing simple arithmetic, and it
leaks huge quantities of space, apparently about a byte per method
call in my microbenchmark.  I need to come up with a solution to the
excessive space usage if the system is going to be usable, even for
experimentation, and I may need to make it faster.

The interpreter is under 400 lines of OCaml, not counting unit tests
and some 250 lines of code in the Bicicleta language itself.

Without Attribute Caching
-------------------------

So I thought I'd try the dumb fib microbenchmark to get a vague idea
of how slow the current, tree-reduction-based OCaml implementation of
the Bicicleta language is.  Here's the OCaml version:

    let rec fib n = if n < 2 then 1 else fib (n-1) + fib (n-2) in
    print_int (fib 32) ; print_newline() ;;

Byte-compiled, in PowerPC emulation, that runs at about 5 million
base-case recursions per second.  (Since it adds together all the
values produced by the base-case recursions, and all those values are
1, the return value is the number of base-case recursions.  The total
number of fib calls is one less than twice the number of base-case
recursions.)

Here's a version in Bicicleta's language:

    {env: fib = {fib: arg1 = 3
        '()' = (fib.arg1 < 2).if_true(then = 1,
            else = env.fib(fib.arg1-1) + env.fib(fib.arg1-2))}
    }.fib(16)

That runs about about 120-140 base-case recursions per second, running
on top of the OCaml implementation mentioned earlier, and it seems
to take time roughly linear in the number of base-case recursions.

That's slower by about a factor of 44000 than its host interpreter,
which is a lot.  It's still fast enough that you could use it for
small experiments.  There's some headroom (about a factor of 10 or 20)
above the OCaml implementation that I can probably take advantage of
by using ocamlopt and not doing CPU emulation.

The basic plan was as follows:

- spend as little effort as possible to write a very minimal
  implementation in some language with an implementation that already
  runs, providing as few primitives as possible.  The current
  implementation is 372 non-blank non-comment non-unit-test lines of
  OCaml, ocamllex, and ocamlyacc, and it can now run "hello world"
  style programs like the one above.  It's still missing some
  features, especially introspective and imperative ones.  This has
  taken me two weeks, so I'm glad I didn't start out with a bigger
  piece of the project.

- write an IDE for the language in itself.  This doesn't have to be
  anything fancy, but it needs to be enough that I start to experience
  the benefits supposedly accruing to the spreadsheet-style UI enabled
  by the language semantics.

- write a compiler in Bicicleta for itself, so that we can generate C
  or JavaScript from some version of the Bicicleta metacircular
  interpreter.

- compile a native-code version of the IDE and compiler using the
  metacompiler.  Ideally this version will support dynamic compilation
  to machine code without restarts, reducing the performance problems
  inherent in the language's difficult semantics to a tolerable level,
  allowing focus on the next task:

- polishing the IDE and tailoring it to particular application areas.

But it looks like maybe things are slow enough that I'd better put in
a little more work on the first step before proceeding any further.  I
tossed in a few lines of code to count method calls by name, and
here's what I found, on exactly the code above, which returns 1597:

    sys: 326595
    arg1: 313819
    (): 252596
    userdata: 140008
    object: 110994
    normal_commutative_number: 106203
    native_integer: 106203
    intrinsics: 106203
    new: 66013
    result: 36997
    other: 36997
    op: 36997
    coerce: 36997
    binop: 36997
    as_integer: 36997
    arg2: 36997
    add: 36997
    +: 36997
    !!: 36997
    integer_add: 33804
    subtract: 32208
    negated: 32208
    integer_negated: 32208
    -: 32208
    true: 6386
    if_true: 4789
    less_than: 3193
    integer_less_than: 3193
    fib: 3193
    bool: 3193
    <: 3193
    false: 3192
    then: 1597
    else: 1596
    native_string: 2
    show: 1
    integer_show: 1
    total: 2094769

So we're only calling fib 3193 times, which seems about right.  But we
end up calling prog.sys 326595 times and various arg1 things 313819
times, and so on, and doing about 2.1 million calls in all.  It's a
little strange that every call to 'fib' involves almost 12 calls to
'+'.

With Attribute Caching
----------------------

So this suggests that we can get a substantial speedup already by
caching object attributes, converting tree reduction into graph
reduction: maybe a factor of 100?

Well, I added the code to cache object attributes, and here were the
results:

    arg1: 52675
    (): 49484
    userdata: 23944
    sys: 19162
    intrinsics: 19155
    native_integer: 15963
    result: 7981
    other: 7981
    op: 7981
    new: 7981
    coerce: 7981
    as_integer: 7981
    arg2: 7981
    add: 7981
    !!: 7981
    +: 6385
    binop: 4789
    integer_add: 4788
    true: 3196
    if_true: 3194
    less_than: 3193
    integer_less_than: 3193
    fib: 3193
    bool: 3193
    <: 3193
    subtract: 3192
    negated: 3192
    integer_negated: 3192
    false: 3192
    -: 3192
    then: 1597
    else: 1596
    object: 3
    native_string: 2
    show: 1
    normal_commutative_number: 1
    integer_show: 1
    total: 309690

That's about a factor of 6.8 improvement on this microbenchmark, which
is noticeable but still not that great.  This adds up to:
- 16.5 accesses to arg1;
- 15.5 calls to '()';
- 2.5 calls to '!!', add, as_integer, coerce, new, op, other, and
  result;
- 2 calls to '+';
- and 1 call to '-'
per call to fib.

And it runs considerably faster, at 650 base-case recursions per
second, about 5 times faster.  A lot less than the factor of 100 I
hoped for!  But not bad for adding the 12 lines of code to cache the
results.

Some form of this optimization is absolutely necessary for any larger
program.

Memory Usage With Attribute Caching
-----------------------------------

However, it also took 100 MB of RAM to calculate fib(20).  Another six
lines of code to only allocate caches only for objects that use them
cuts it down to 70MB (and speeds it up very slightly), but 70MB is
still way too much --- that's for about 2.1 million calls.  It runs in
more or less constant space (hovering around 3.3MB for four minutes)
if I clear the cache immediately after putting each item into it,
which of course makes it run very slowly again, but it makes it clear
that it's the caches that are the problem and not, say, a lack of
tail-recursion.

The size of the problem suggests that it's hanging on to some amount
of stuff from every one of those 2.1 million calls --- not just the 20
that might be on the call stack at any one time.  This surprised me,
because I would expect expressions like the env.fib{fib.arg1-2} in
env.fib{fib.arg1-2}.'()' to become garbage fairly quickly --- it's an
anonymous class produced by inheriting from env.fib, and the only
thing we hold on to from it is its '()'.

Since this is just a cache-management problem, coming up with a
version that doesn't break stuff is easy, but performance
characteristics will be potentially very tricky.

Slowness Is Partly Due To Indirection
-------------------------------------

It's still about 8000 times slower than OCaml.

Looking at it a slightly different way, it's executing about 92000
method calls per second in order to perform the 650 base-case
recursions or 1300 calls to "fib" per second, while the OCaml and
other versions need only execute one function-call and return per call
to "fib".  Considered that way, it's only about 110 times slower than
its host OCaml, which is pretty much what you'd expect for an
interpreter.  It's just that building each call to "fib" out of 70
lower-level method calls ends up costing an additional factor of 70 in
performance.

So if, for example, you took out some of the library magic to support
multimode arithmetic and double dispatch and the very small primitive
set, it could get a lot faster.  Compare --- fib(20) on the normal
version:

    arg1: 361192
    (): 339303
    userdata: 164179
    sys: 131350
    intrinsics: 131343
    native_integer: 109453
    result: 54726
    other: 54726
    op: 54726
    new: 54726
    coerce: 54726
    as_integer: 54726
    arg2: 54726
    add: 54726
    !!: 54726
    +: 43781
    binop: 32836
    integer_add: 32835
    true: 21894
    if_true: 21892
    less_than: 21891
    integer_less_than: 21891
    fib: 21891
    bool: 21891
    <: 21891
    subtract: 21890
    negated: 21890
    integer_negated: 21890
    false: 21890
    -: 21890
    then: 10946
    else: 10945
    object: 3
    native_string: 2
    show: 1
    normal_commutative_number: 1
    integer_show: 1
    total: 2123396

    real	0m23.360s
    user	0m23.084s
    sys	0m0.252s

And a version where '+', '-', and '<' just go straight to a primitive
without any further layers of indirection, and we have a '-'
primitive:

    '()' = arg1: 186070
    (): 131345
    sys: 109460
    userdata: 109454
    native_integer: 87563
    arg2: 54726
    true: 32840
    new: 32836
    false: 32835
    if_true: 21892
    integer_less_than: 21891
    fib: 21891
    bool: 21891
    <: 21891
    integer_subtract: 21890
    -: 21890
    then: 10946
    integer_add: 10945
    else: 10945
    +: 10945
    object: 3
    native_string: 2
    show: 1
    normal_commutative_number: 1
    intrinsics: 1
    integer_show: 1
    total: 974155

    real	0m8.908s
    user	0m8.764s
    sys	0m0.131s

The normal version has 2.18 times as many calls and takes 2.62 times
as much time; that makes this version only about 3000 times as slow as
OCaml.  So this suggests that more symbolic work should suffer
somewhat less of a speed penalty than the fib microbenchmark
suggests.  Also, although I feel a little silly suggesting it at this
point given the gross ineffiency of the interpreter in general, that's
the kind of cost that polymorphic inline caches and the like can help
a lot with.

Could It Be Fast Enough Already?
--------------------------------

I'm kind of thinking that 1300 calls per second, plus a hypothetical
10x speedup from compiling OCaml to native code, makes it about the
same speed as the Bourne shell, far slower than Tcl.  I wouldn't
really want to develop a compiler written in the Bourne shell, but
that has more to do with the semantics of the language than with its
performance.

Here's a dumb fib in bash, coded to avoid forking:

    #!/bin/bash
    fib () {
        if [[ $1 -lt 2 ]] ; then rv=1 ; else
            fib $(( $1 - 1 )); set $1 $rv
            fib $(( $1 - 2 )); rv=$(( $rv + $2 ))
       fi
    }
    fib 25
    echo $rv

The bash version runs at about 7800 base cases per second; we can
expect the Bicicleta one to run at 6500 if the speedup is exactly 10x.

Vector Operations?
------------------

If I implement some APL-style vector operations as primitives, which
would probably be another 50-100 lines of OCaml, then Bicicleta
programs will get essentially native speed on vectorizable operations.
That would be great for interactive image processing and 3-D graphics,
but I'm not sure it would help with writing a Bicicleta metacompiler.

It also has the drawback that it requires that many more primitives
from future implementations of the language.  Right now, I'm not even
using an integer subtraction primitive --- I'm using the negation and
addition primitives instead.  In theory, I should be able to get all of
integer comparison and arithmetic in exchange for just add, negated,
multiply, divmod, power, and less-than.

Tue, 13 Mar 2007

"Richard Underwood/Uhtenwoldt" <ru at river.org> writes:

>>   A simpler syntax with fewer levels of precedence.  Maybe writing OCaml in
>>   S-expressions would be going too far, or maybe not.  Clearly this would
>>   make it effectively a different language.
>
> Do you know enough about Haskell to have a definite opinion about its
> syntax?
>
> (I think it is basically good though I would reduce the number of
> levels of operator precedence a lot.)

Haskell has been my primary fun language for about seven years, 
and my primary employment language for about half a year.

In my opinion, Haskell has advantages over OCaml in syntax, typeclasses, and
community. I'd be happy to elaborate on any of these points if there's any
interest.

OCaml has advantages over Haskell in ability to decrease execution time without
much effort, but at the expense of abstraction.

In short, if you know C, you can make a roughly equivalent OCaml program that's
anywhere from not quite as fast to faster than any given C program.

Another difference is that OCaml allows mutable variables, and Haskell forbids
them.

OCaml is a strict language, Haskell is non-strict according to the standard,
and most implementations use full laziness.

Haskell is, in my opinion, better at abstractions than OCaml, but at the
expense of requiring mental whole program analysis in order to understand
non-strict evaluation. Also in my opinion, mutable variables require more whole
program analysis than non-strict evaluation, but I don't know how many people
agree with me.

I'd roughly say that OCaml is the vi, and Haskell the emacs of FP :-)

I do not think that either language is *better*. I instead think that people
should try both and decide what fits their own approach to life.
-- 
I've tried to teach people autodidactism,                | ScannedInAvian.com
but it seems they always have to learn it for themselves.| Shae Matijs Erisson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
Url : http://lists.canonical.org/pipermail/kragen-discuss/attachments/20070313/cadbaf5d/attachment.pgp

Mon, 12 Mar 2007

> Bytecode interpreters can offer unsurpassed code compactness, ...[This  
> box has] only 2MiB of L2 cache, and presumably something like
> 16-64KiB of L1 cache.  Thrashing the cache is soundly punished.

One problem, as you point out earlier, is that the question is not size of  
native code loop vs. size of bytecoded loop, but size of native loop vs.  
size of (bytecoded loop + its working path through the bytecode  
interpreter).

I haven't used MSVC in ages, but their compiler used to have the option to  
compile to a bytecode -- still, I think this was only ever useful for  
space, not for speed.  (for that, one must try to arrange the  
functionality to ensure only the outermost loops are being interpreted)

> I know of no other reason to implement an interpreter using bytecode.
>
> So I'm surprised it's such a popular thing to do!  I think the reason
> is probably that code space and compilation time used to be quite
> precious resources (not to mention portability), and programmers just
> haven't adjusted to the new realities.

Architecture Neutral Debugging Format.

Generating native code is not that difficult -- at least it doesn't take  
much to generate code that will beat an interpreter.  Debugging native  
code, however, can be a real pain (one either has to understand the  
debugging facilities of both the processor and the OS[0], or one has to  
understand the debugger's favorite symbol format and provide a decent map  
to it -- for bass, I was able to provide some simple gdb macros as long as  
the only register used was a TOS cache, but that broke immediately after  
providing register allocation)

Bytecode allows the debugger to control execution[1], and while bytecode  
constructs still have to be mapped back to source, at least the language  
developer (having done the mapping in the other direction) has an easier,  
single, job than when going all the way down to the iron, where a more  
difficult job must be repeated for each platform.

-Dave

:: :: ::

[0] it is somewhat instructive to look at DEBUG.COM (an artifact of 1980's  
era PCs still shipped with Redmond OS's to this day) for an example of how  
spartan an IDE can be.  Usually one wants an assembler to remember labels,  
and a debugger to remember breakpoints.  DEBUG.COM will turn opcodes into  
machine language, but you're on your own for labels (leading to a style in  
which one tries to minimize entry points, turning code into a collection  
of "loops with tails" and requiring utilities similar to BASIC's RENUM)   
Similarly, DEBUG.COM will handle all the machine level hassle of setting  
breakpoints in code, taking the interrupt, and then restoring the original  
code -- but you type in the list of breakpoints at each step.  Once upon a  
time, people actually developed apps with these bear skins and stone  
knives.

[1] bytecode also makes "eval" almost as trivial as in assembly.  Too much  
dynamism can get one in trouble, though.  Consider the "house of cards"  
cartoon on p.112 of the Smalltalk green book: "here, let me modify Array  
At:"
http://www.iam.unibe.ch/~ducasse/FreeBooks/BitsOfHistory/BitsOfHistory.pdf
In that situation, bytecode can be the difference between hosing a process  
and triple-faulting a box.

> The OCaml version is 24 instructions, 8 of which have immediate
> constants.  I don't know very much about PowerPC assembly, but let's
> suppose that every instruction is 32 bits, including any immediate
> constants; that means the whole function weighs 96 bytes.

Using gas or "objdump -D -b binary" would help if you want more accurate  
numbers.  If I remember properly, you're correct about instructions, but  
not necessarily immediates.

> [the MuP21] executes a stream of 5-bit zero-operand two-stack
> operations packed into 20-bit words.

For more along these lines, see http://www.jwdt.com/~paysan/b16.html

But these sort of games are better played in hardware than in software --  
hardware is great at parallel things (instruction decoding) and pretty  
weak at serial things, while software is the opposite.  (this was a  
traditional reason for bytecode -- to minimize work for dispatch) FSMs  
seem to be where the two meet: to add behavior to hardware (think  
microcode) or to add parallel-pattern-matching to software (think  
parsers), one tends to wind up synthesizing state machines.

* Kragen Javier Sitaker <kragen at pobox.com> [2007-03-12 08:40]:
> The Microbenchmark (OCaml, Python, Perl, Ruby, Elisp, Tcl)

I thought it would be interesting to know how Parrot fares as a
target.

What I gather is that targetting Parrot is quite nice, with a
bunch of tools to make compiler-writing nice that aren’t exactly
finished, but are shaping up. F.ex you can write the parser in
a large subset of Perl 6 grammars, though it is not finished and
as yet only implements a subset of the subset.

> There's a microbenchmark I like to run on language
> implementations; here's the OCaml version:
> 
>     let rec fib n = if n < 2 then 1 else fib (n-1) + fib (n-2) in
>     print_int (fib 32) ; print_newline() ;;

This is a version written in PIR (Parrot Intermediate
Representation):

    .sub fib
        .param int n
        if n < 2 goto BASE
        .local int f2
        .local int ret
        n -= 1
        ret = fib(n)
        n -= 1
        f2 = fib(n)
        ret += f2
        .return(ret)
    BASE:
        .return(1)
    .end

    .sub main :main
        .local int result
        result = fib(32)
        print result
        print "\n"
    .end

There may be better ways to write this; it’s the first time I
wrote anything in PIR at all.

Unfortunately, I find it runs significantly slower than the Perl
version regardless of which Parrot “runcore” I use and whether or
not I enable optimisations. The CGP core is the fastest (faster
than the JIT core) at ~8.6s user time on my system whereas the
Perl version runs in ~6.3s (per bash’s `time` builtin).

For completeness, here is the disassembled code of the fib
function:

    get_params_pc PMC_CONST(3)
    lt_i_ic_ic I2,2,L1
    sub_i_ic I2,1
    set_args_pc PMC_CONST(3)
    set_p_pc P0,PMC_CONST(6)
    get_results_pc PMC_CONST(3)
    invokecc_p P0
    sub_i_ic I2,1
    set_args_pc PMC_CONST(3)
    set_p_pc P1,PMC_CONST(6)
    get_results_pc PMC_CONST(3)
    invokecc_p P1
    add_i_i I0,I1
    set_returns_pc PMC_CONST(3)
    returncc 
    L1: set_returns_pc PMC_CONST(1)
    returncc 

Of course, since the code was PIR to begin with, the difference
is not that interesting – it’s the same thing, except with the
convenient PIR shortcuts expanded to bare-metal PASM.

The resulting bytecode file is 1088 bytes in size, but looking at
it with a hex viewer, I see there’s a lot of extra file format
glop in there. So I have no idea how much space the actual
bytecode is taking up. However, a file that also contains the
`main` function comes out at 1504 bytes. Even if the presence of
an extra function adds a lot more file format glop, it seems
evident that Parrot bytecode is pretty bulky.

Of course, Parrot is far from finished and the stated performance
goal for Perl 6 is to be at least twice as fast as Perl 5, which
means Parrot is supposed to get a lot faster yet. In particular
I’m pretty sure that little work has gone into the JIT runcore.

I’m not sure what to take away from this.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

So I've been prototyping a Bicicleta interpreter in OCaml, and
generally the experience has been great.  I'd never used lex or yacc,
but I know most of the theory, so given the reference-manual
documentation for ocamllex and ocamlyacc, I was able to put together
useful parsers pretty quickly.

And OCaml is nice and fast.  For most things, fast doesn't matter all
that much any more; but a language interpreter, by its nature, imposes
some significant slowdown on the language it's interpreting, usually
from one to three orders of magnitude.  So implementing your language
interpreter in, say, Python, may not be a good idea --- the two orders
of magnitude slowdown imposed by Python are two orders of magnitude
that get imposed on every program running in your interpreter, on top
of the 1-3 your interpreter imposes.  Multiply it all out, and you can
end up iterating through loops thousands of times per second, instead
of hundreds of millions.

OCaml really seems to be optimized for language implementations:
pattern-matching, the facility it calls "variants", ocamllex and
ocamlyacc, garbage collection, and static type-checking all make a lot
of sense for compilers.

And you can compile and deliver a binary.

But I'm being tempted by SBCL!

The Microbenchmark (OCaml, Python, Perl, Ruby, Elisp, Tcl)
----------------------------------------------------------

There's a microbenchmark I like to run on language implementations;
here's the OCaml version:

    let rec fib n = if n < 2 then 1 else fib (n-1) + fib (n-2) in
    print_int (fib 32) ; print_newline() ;;

This computes a Fibonacci number in a pretty slow way, recursing into
the base case N times in order to compute the number N.  On Bea's
laptop, one of the new Intel MacBooks, it runs at about 3.1 million
base-case recursions per second in the OCaml interpreter, or 5.2
million per second if I ocamlc it first.  This may be a little unfair
to OCaml, since I'm running a PowerPC version on this machine, through
transparent binary translation.

But the Python version runs at 0.73 million base-case recursions per
second:

    def fib(n):
        if n < 2: return 1
        return fib(n-1) + fib(n-2)
    print fib(32)

That's Python 2.3, compiled for Intel.  The Perl version runs at the
same speed:

    #!/usr/bin/perl -w
    use strict;
    sub fib { $_[0] < 2 ? 1 : fib($_[0] - 1) + fib($_[0] - 2) }
    print fib(32), "\n";

So does Ruby (well, actually, 0.75 million):

    #!/usr/bin/ruby
    def fib(n)
      if n < 2 then 1 else fib(n-1) + fib(n-2) end
    end
    print fib(32), "\n"

Byte-compiled elisp is a bit slower, at 0.46 million per second, timed
with this:

    (let ((start (current-time))) (setq fib30 (fib 30)) (time-since start))

(see below in the SBCL section for the Lisp code for fib).  Tcl is
even worse, at 0.083 million per second:

    proc fib {x} { if {$x < 2} { return 1 } { 
        return [expr [fib [expr $x - 1]] + [fib [expr $x - 2]]] 
    } }
    puts [fib 28]

So OCaml really isn't doing that badly!  Especially considering it's
running on an emulated CPU.

I probably should have included JavaScript here (since it is, after
all, The Next Mainstream Programming Language, after all) but I
didn't.  My experience is that, in both Firefox 1.5 and Safari, it's
usually in about the same speed range as Python, Perl, Ruby, and
Elisp.

SBCL
----

But then I tried it in (Intel) SBCL:

    (defun fib (n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))
    (print (fib 43))

And it ran, in the interactive interpreter, at 19 million per second,
four times as fast as OCaml.  SBCL, it turns out, immediately compiles
everything to native code, even in the interactive interpreter.
Compiling the above function takes 12 milliseconds.

SBCL also lets you do (disassemble 'fib), which I did, and I saw that
this version was still calling out to do generic arithmetic and
comparison operations, even though SBCL's compiler does some type
inference.  So I whacked at it a bit until it didn't do that any more:

    (defun fib (n) (if (< (the fixnum n) 2) 
                       1 
                     (+ (the fixnum (fib (- n 1))) 
                        (the fixnum (fib (- n 2))))))
    (print (fib 43))

This doesn't omit the type-checking entirely, but it omits the dynamic
dispatch of the numeric operations, and consequently it ran more than
twice as fast, at 47 million iterations per second, 9 times as fast as
OCaml.  I haven't figured out how to turn off the run-time
type-checking on each call entirely, which I think might trim it from
91 instructions to 20 or so.

The main body of those instructions, with added comments, is as follows:

115B0F9A:       8B55F0           MOV EDX, [EBP-16]  ; no-arg-parsing entry point
    0F9D:       F6C203           TEST DL, 3   ; examine low two bits (tag)
    0FA0:       0F858E000000     JNE L4       ; should be 00 for fixnum
    0FA6:       83FA08           CMP EDX, 8   ; 8 is fixnum 2 (<<2)
    0FA9:       7C70             JL L3        ; so if it's less, we go to L3
    ; This next instruction is unnecessary because none of the previous four
    ; instructions modify the registers!
    0FAB:       8B55F0           MOV EDX, [EBP-16]  ; fetch the argument again
    0FAE:       83EA04           SUB EDX, 4   ; subtract fixnum 1
    ; Here's the call sequence for FIB:
    0FB1:       8BDC             MOV EBX, ESP ; save old stack pointer
    0FB3:       83EC0C           SUB ESP, 12  ; allocate three words
    ; Load function pointer from #<FDEFINITION object for FIB>
    0FB6:       8B05340F5B11     MOV EAX, [#x115B0F34]
    0FBC:       B904000000       MOV ECX, 4   ; load fixnum 1
    0FC1:       896BFC           MOV [EBX-4], EBP  ; save old base pointer
    0FC4:       8BEB             MOV EBP, EBX ; stick old stack pointer in EBP
    0FC6:       FF5005           CALL DWORD PTR [EAX+5] ; recursive call
    0FC9:       7302             JNB L0       ; skip next instruction ...?
    0FCB:       8BE3             MOV ESP, EBX ; restore old stack pointer
    0FCD: L0:   F6C203           TEST DL, 3   
    0FD0:       7568             JNE L5       ; type-tag test on returned value
    0FD2:       8955F4           MOV [EBP-12], EDX  ; save returned value
    0FD5:       8B55F0           MOV EDX, [EBP-16]  ; fetch the argument again
    0FD8:       83EA08           SUB EDX, 8         ; subtract fixnum 2
    ; same call/return sequence again:
    0FDB:       8BDC             MOV EBX, ESP
    0FDD:       83EC0C           SUB ESP, 12
    0FE0:       8B05340F5B11     MOV EAX, [#x115B0F34]
                ; #<FDEFINITION object for FIB>
    0FE6:       B904000000       MOV ECX, 4
    0FEB:       896BFC           MOV [EBX-4], EBP
    0FEE:       8BEB             MOV EBP, EBX
    0FF0:       FF5005           CALL DWORD PTR [EAX+5]
    0FF3:       7302             JNB L1
    0FF5:       8BE3             MOV ESP, EBX
    ; End of call-return sequence.  Whew!
    0FF7: L1:   F6C203           TEST DL, 3 ; type-tag test, again
    0FFA:       7544             JNE L6
    0FFC:       8B45F4           MOV EAX, [EBP-12] ; load other returned value
    ; This next sequence does the fixnum arithmetic in full 32-bit
    ; registers instead of the 30-bit fixnum form for some reason.
    0FFF:       C1F802           SAR EAX, 2    ; shift arithmetic right 2 bits
    1002:       C1FA02           SAR EDX, 2    ; on both
    1005:       01D0             ADD EAX, EDX  ; DO THE ADDITION!
    1007:       8BD0             MOV EDX, EAX  ; make useless copy of result
    1009:       D1E2             SHL EDX, 1    ; shift left 1 bit
    100B:       7039             JO L7         ; jump if overflow
    100D:       D1E2             SHL EDX, 1
    100F:       7035             JO L7

    ; End of addition sequence.  For positive fixnums, this would have
    ; been better and exactly equivalent (except that it doesn't
    ; clobber EAX):
    ; ADD EDX, EAX
    ; JC L7
    ; but I think that won't work for negative fixnums.

    ; At this point, we have the return value in EDX, and we're ready to return.
    1011: L2:   8D65F8           LEA ESP, [EBP-8] ; pop stack frame
    1014:       F8               CLC              ; clear carry flag (for JNB?)
    1015:       8B6DFC           MOV EBP, [EBP-4] ; restore old frame pointer?
    1018:       C20400           RET 4
    101B: L3:   BA04000000       MOV EDX, 4  ; fixnum 1 if n<2
    1020:       EBEF             JMP L2      ; ret val fixnum 1 is in EDX...

There's another 44 instructions having to do mostly with exception and
error handling: if the argument n wasn't fixnum (L4), if the first
recursion returned a non-fixnum (L5), if the second recursion returned
a non-fixnum (L6), if the addition overflowed to a bignum (L7); and
there are some NOPs and dead code for handling invalid arg counts.

Presumably OCaml programs would run at that speed too, or faster, if I
compiled them to native code, but the native-code compiler needs an
assembler, and installing an assembler under MacOS X apparently
requires 3+ gigabytes of disk space.  ocamlopt, for its part, compiles
the fib routine in this microbenchmark to 24 PowerPC instructions,
which, if I had an assembler, I could benchmark.  (It might be more to
the point to test the Intel OCaml.)

Here's the ocamlopt PowerPC assembly; notice that it is much simpler:

    _camlFib__fib_57:
            mflr        r0
            addi        r1, r1, -16
            stw r0, 12(r1)
    L101:
            cmpwi       r3, 5
            bge L100
            lwz r11, 12(r1)
            mtlr        r11
            li  r3, 3
            addi        r1, r1, 16
            blr
    L100:
            stw r3, 0(r1)
            addi        r3, r3, -4
    L102:
            bl  _camlFib__fib_57
            lwz r4, 0(r1)
            stw r3, 4(r1)
            addi        r3, r4, -2
    L103:
            bl  _camlFib__fib_57
            lwz r11, 12(r1)
            mtlr        r11
            lwz r4, 4(r1)
            add r7, r3, r4
            addi        r3, r7, -1
            addi        r1, r1, 16
            blr

And I got to thinking.  

Dynamic Code Generation
-----------------------

If I write an implementation in OCaml, I get all of OCaml's shiny
machinery for interpreting or compiling, but as far as I can tell,
there is no "eval".  There's the Dynlink module, which lets bytecode
programs dynamically load bytecode files, so if I compiled stuff into
OCaml, I could dynamically load the result (if I forgo native-code
compilation); but the OCaml compiler isn't actually all that fast, so
it might cause long pauses.

But if I write an implementation in SBCL, I can dynamically compile
code into a running SBCL process very easily, like with two lines of
code:

    This is SBCL 0.9.15, an implementation of ANSI Common Lisp.
    ...
    * (defun return-const-fun (name const) `(defun ,name () ,const))
    RETURN-CONST-FUN
    * (eval (return-const-fun 'foobar 40))
    FOOBAR
    * (disassemble 'foobar)
    ; 11608A1A:       BAA0000000       MOV EDX, 160 ; no-arg-parsing entry point
    ;       1F:       8D65F8           LEA ESP, [EBP-8]
    ;       22:       F8               CLC
    ;       23:       8B6DFC           MOV EBP, [EBP-4]
    ;       26:       C20400           RET 4
    ... a bunch of nops and boilerplate omitted ...

The first line defines a function that generates code; the second one
invokes it and compiles the result, into, as it turns out, five
machine instructions.  (160 is 40, shifted left by two bits.)

And I have the impression that generating code in Common Lisp will be
easier than generating code in OCaml, C, or assembler.

Java
----

Oh, yeah, Java is even faster:

    class Fib {
        public static int fib(int n) {
            if (n < 2) return 1;
            return fib(n-1) + fib(n-2);
        }
        public static void main(String[] argv) {
            System.out.println(new Integer(fib(40)).toString());
        }
    }

That returns 165580141 in 2.4 seconds, about 69 million base-case
recursions per second, or 14 nanoseconds each.

But I'm not feeling tempted by Java.  Sorry.

Squeak
------

I'm also a little tempted by Smalltalk.

The Smalltalk version is 

    fib: n
            n < 2 ifTrue: [^1] 
                  ifFalse: [^(self fib: n - 1) + (self fib: n - 2)]

In Squeak 3.8-6665full on an Intel VM, that takes 4410 ms to run 
fib: 35, which is 14930352; that's 0.295 μs per base-case recursion,
or 3.4 million per second, slightly slower than OCaml.  Which is
fairly impressive, considering that it's doing the same level of
dynamic dispatch as SBCL, but in a bytecode engine like OCaml's.

The bytecode looks like this:

    9 <10> pushTemp: 0
    10 <77> pushConstant: 2
    11 <B2> send: <
    12 <99> jumpFalse: 15
    13 <76> pushConstant: 1
    14 <7C> returnTop
    15 <70> self
    16 <10> pushTemp: 0
    17 <76> pushConstant: 1
    18 <B1> send: -
    19 <E0> send: fib:
    20 <70> self
    21 <10> pushTemp: 0
    22 <77> pushConstant: 2
    23 <B1> send: -
    24 <E0> send: fib:
    25 <B0> send: +
    26 <7C> returnTop
 
Or, as I read it in pseudo-FORTH, "n 2 < if 1 return then self n 1 -
recurse self n 2 - recurse + return".  The <E0> bytecode that calls
#fib: seems to be implemented by a method called
sendLiteralSelectorBytecode, which occupies all of the bytecodes <D0>
through <FF>, with the last four bits indexing into a "literal table"
associated with the method.  It's impressive to me that Squeak got
this into 18 bytes of bytecode (plus 8 bytes of other stuff),
considering that the original is 27 tokens.  (Subtracting all the
delimiters and considering #ifTrue:ifFalse: as one "token", we get
down to 18 source tokens.)

Python's bytecode
-----------------

Compare the Python bytecode, from dis.dis(fib.fib):

  3           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (2)
              6 COMPARE_OP               0 (<)
              9 JUMP_IF_FALSE            8 (to 20)
             12 POP_TOP             
             13 LOAD_CONST               2 (1)
             16 RETURN_VALUE        
             17 JUMP_FORWARD             1 (to 21)
        >>   20 POP_TOP             

  4     >>   21 LOAD_GLOBAL              1 (fib)
             24 LOAD_FAST                0 (n)
             27 LOAD_CONST               2 (1)
             30 BINARY_SUBTRACT     
             31 CALL_FUNCTION            1
             34 LOAD_GLOBAL              1 (fib)
             37 LOAD_FAST                0 (n)
             40 LOAD_CONST               1 (2)
             43 BINARY_SUBTRACT     
             44 CALL_FUNCTION            1
             47 BINARY_ADD          
             48 RETURN_VALUE        
             49 LOAD_CONST               0 (None)
             52 RETURN_VALUE        

Python's bytecode, very similar in design, uses 53 bytes for the same
method!

Much of the difference comes from Python using three bytes for
anything that takes a parameter, so it only uses a single bytecode
(called LOAD_FAST) for pushing parameters and the like, where Squeak
uses all the bytes from <10> to <1F> for pushing the first 16 local
variables; then there's <80> for up to 32 local variables.  I don't
know what it does when you have more than 32 local variables.

Similarly, CALL_FUNCTION takes two bytes to tell it how many arguments
the function has, I guess in case you have a function with 16383
arguments, and additionally it's separated from the LOAD_GLOBAL
bytecode.

Then Python's JUMP_IF_FALSE doesn't pop the boolean value it uses, so
we need a separate bytecode for that, and then there are dead
bytecodes at 17, 49, and 50, immediately after RETURN_VALUE bytecodes,
and a dead bytecode at 20, immediately after an unconditional jump.

The consequence of all of this is that Python is only currently using
109 of the 256 possible bytecodes, that Python bytecode is bloated and
slow, and that the Python compiler and bytecode interpreter are fairly
simple.

Other Bytecodes
---------------

As far as I can tell, OCaml doesn't come with anything analogous to
SBCL's (disassemble 'fib), Python's dis module, or Squeak's browser
option "What to show: bytecodes", so all I know about the OCaml
bytecode is that the whole fib.cmo bytecode file produced by ocamlc
was 320 bytes, and that .cmo files seem to largely consist of 32-bit
big-endian binary two's-complement integers, with a big blob of stuff
at the end containing strings and other data.

I don't know anything about the implementation of the Ruby
interpreter.

perl -MO=Concise,fib fib.pl (using the B::Concise module) tells me the
following:

main::fib:
7  <1> leavesub[1 ref] K/REFC,1 ->(end)
-     <@> lineseq KP ->7
1        <;> nextstate(main 2 fib.pl:3) v/2 ->2
-        <1> null K/1 ->-
5           <|> cond_expr(other->6) K/1 ->8
4              <2> lt sK/2 ->5
-                 <1> ex-aelem sK/2 ->3
-                    <1> ex-rv2av sKR/3 ->-
2                       <#> aelemfast[*_] s ->3
-                    <0> ex-const s ->-
3                 <$> const[IV 2] s ->4
6              <$> const[IV 1] s ->7
k              <2> add[t10] sK/2 ->7
d                 <1> entersub[t5] sKS/TARG,3 ->e
-                    <1> ex-list sK ->d
8                       <0> pushmark s ->9
b                       <2> subtract[t4] sKM/2 ->c
-                          <1> ex-aelem sK/2 ->a
-                             <1> ex-rv2av sKR/3 ->-
9                                <#> aelemfast[*_] s ->a
-                             <0> ex-const s ->-
a                          <$> const[IV 1] s ->b
-                       <1> ex-rv2cv sK/3 ->-
c                          <#> gv[*fib] s/EARLYCV ->d
j                 <1> entersub[t9] sKS/TARG,3 ->k
-                    <1> ex-list sK ->j
e                       <0> pushmark s ->f
h                       <2> subtract[t8] sKM/2 ->i
-                          <1> ex-aelem sK/2 ->g
-                             <1> ex-rv2av sKR/3 ->-
f                                <#> aelemfast[*_] s ->g
-                             <0> ex-const s ->-
g                          <$> const[IV 2] s ->h
-                       <1> ex-rv2cv sK/3 ->-
i                          <#> gv[*fib] s/EARLYCV ->j

This is pretty confusing; it helps to know the following operations
occur before and to the left of their operands; the two "entersub"
nodes are the recursive calls and the const[IV 2] nodes are integers
such as 2; Perl's treecode runs with a stack as its working memory; it
builds lists on the stacks PostScript-style, starting with a
"pushmark"; every procedure call involves constructing a list of
parameters.

There's a '-exec' option to B::Concise that puts things in execution
order, which makes it look a lot like Python, but leaves out all the
ops labeled above with a "-".

Java is yet another stack-based bytecode virtual machine.  javap -c
Fib says:

    public static int fib(int);
      Code:
       0:	iload_0
       1:	iconst_2
       2:	if_icmpge	7
       5:	iconst_1
       6:	ireturn
       7:	iload_0
       8:	iconst_1
       9:	isub
       10:	invokestatic	#2; //Method fib:(I)I
       13:	iload_0
       14:	iconst_2
       15:	isub
       16:	invokestatic	#2; //Method fib:(I)I
       19:	iadd
       20:	ireturn

I suspect that the numbers in the left column are byte offsets, which
would make this method 21 bytes, but I don't know enough about Java
bytecode to be sure.

Elisp compiles the same Lisp I fed to SBCL into the following
stack-based bytecode:

    byte code for fib:
      args: (n)
    0       varref    n
    1       constant  2
    2       lss       
    3       goto-if-nil 1
    6       constant  1
    7       return    
    8:1     constant  fib
    9       varref    n
    10      sub1      
    11      call      1
    12      constant  fib
    13      varref    n
    14      constant  2
    15      diff      
    16      call      1
    17      plus      
    18      return    

Or, more briefly, asciified with cat -vt:

    (defalias 'fib #[(n) 
        "^H\301W\203^H^@\302\207\303^HS!\303^H\301Z!\\\207" [n 2 1 fib] 4])

That is 19 bytes of bytecode, although much of it is represented with
octal backslash-escapes.

The bytecode is not documented in section 16.8 of the Elisp Reference
Manual, "Disassembled Byte-Code".

Bytecode as an Implementation Strategy
--------------------------------------

Bytecode has several claimed advantages over dynamic compilation as a
strategy for implementing high-level languages: it's smaller than
machine code, interactive compilation is faster because the compiler
does less work, it's an architecture-neutral distribution format, and
porting to new platforms is easier.  There are, however, some
disadvantages.

* It's Smaller Than Machine Code

The compiled optimized SBCL version totaled 247 bytes of machine code.
The unoptimized version, with calls to GENERIC-< and the like, was 162
bytes.  And this is on the x86, which has historically been pretty
good for code density, although as you can see in the above, the
32-bit immediate constants diminish that advantage significantly; 60
of the 247 bytes are in 32-bit constants.  

The OCaml version is 24 instructions, 8 of which have immediate
constants.  I don't know very much about PowerPC assembly, but let's
suppose that every instruction is 32 bits, including any immediate
constants; that means the whole function weighs 96 bytes.

Compare this to the 26 bytes of the Squeak version, and you can see
that bytecode systems have a compelling advantage in situations where
code size is critical.  Even indirect-threaded code would probably
have been at least 32 bytes, assuming 16-bit addresses.

So at least this claimed advantage is well-founded.  If you write the
same program in the same way in SBCL and in Squeak, it is likely to
take considerably more code space in SBCL.

* Smaller Than How Much Machine Code?

Speaking of code compactness, though, for small programs, the
machine-code version of your program may be smaller than the bytecode
interpreter.  I tried to build the C version of Squeak with "VMMaker
new generateEntire" to see how big it was, but apparently there are
some .h files and things that live outside of the Squeak image; the
best I can say is that Interpreter.st is 337404 bytes and
ObjectMemory.st is 96646 bytes, for a total of 434050 bytes, about
7000-10000 lines of Smalltalk, gzipping to 98274 bytes.  I am guessing
that the machine-code version would be around 100KiB.

Presumably you can build an almost arbitrarily-small bytecode
interpreter (say, one that implements just the S and K combinators)
but the question is how big the bytecode interpreter has to be to make
the bytecode version of your program smaller than the corresponding
machine-code version, while still running at an acceptable speed.

Chuck Moore's Novix NC-4000 and MuP21 work suggests that a useful
bytecode interpreter could be very small indeed; the MuP21 is a
6000-transistor chip in which the CPU (there's also some other stuff
on the chip) executes a stream of 5-bit zero-operand two-stack
operations packed into 20-bit words.  (I forget how LITERAL, aka
load-immediate, is implemented.)  The code for the machine is claimed
to be quite compact.  Unfortunately, the follow-on chips (the F21, the
iTV i21, and the 25x) have not been successfully fabricated.

* Interactive Compilation is Faster

This advantage is probably irrelevant now, except for very small
machines, say a 70MHz LPC2103 microcontroller.  The SBCL compiler
compiled the unoptimized Lisp version in 12ms on a 1.8GHz dual-core
CPU; a cleverer implementation, like Apple ][ Integer Basic, could
parse and compile as you typed, keeping pauses to a minimum.

* Architecture-Neutral Distribution Format

That is to say, you don't lose your software when you change CPU
architectures.

Source code is also an architecture-neutral distribution format; I am
going to ignore the political reasons for not distributing source code
and focus on the technical reason, which is that bytecode is smaller.
In effect, the claimed advantage is that bytecode is a good
compression algorithm for source code.

If this is the case, we should perhaps compare it against generic
compression algorithms like gzip.  Here are some results from gzip -9:

file                 size  gzipped  ratio  language
bicicleta_lexer.mll  2380      965    2.5  ocamllex
readbmp.py           5022     1910    2.6  Python
js-calc.js          15520     5767    2.7  JavaScript
webrick/cgi.rb       6779     2251    3.0  Ruby
cursmail.py         11627     3938    3.0  Python
bicicleta.ml        13426     3461    3.9  OCaml

I think these are fairly typical compression ratios for gzip on source
code.

The Perl source code version of the fib function is 60 bytes; OCaml is
61, Ruby is 62, unoptimized Lisp is 63, Python is 66, Squeak is 75 if
you put it all on one line, Java is 87 if you put it all one line, and
Tcl is 108.  So if 2.7 is a typical compression ratio and 63 bytes is
a typical size for implementations of this function, we'd expect gzip
to produce a result of about 23.3 bytes; if you apply the 2.7 ratio to
the Smalltalk function, you get 27.8 bytes.  These compare favorably
with the Squeak bytecode version at 27 bytes.

For larger functions, bytecode seems to work better as compression.
Here's SkipList>>add:ifPresent:, which was the first largish method I
found while looking around Squeak at random:

    add: element ifPresent: aBlock
            | node lvl s |
            node := self search: element updating: splice.
            node ifNotNil: [aBlock ifNotNil: [^ aBlock value: node]].
            lvl := self randomLevel.
            node := SkipListNode on: element level: lvl.
            level + 1 to: lvl do: [:i | splice at: i put: self].
            1 to: lvl do: [:i |
                                    s := splice at: i.
                                    node atForward: i put: (s forward: i).
                                    s atForward: i put: node].
            numElements := numElements + 1.
            splice atAllPut: nil.
            ^ element

That's 619 bytes of source if tabified, or 465 bytes if untabified.
The bytecode representation is 118 bytes, including 32 bytes of
non-bytecode (literal and constant tables, I guess?) and 86 bytes of
bytecode.  That's a compression ratio of 3.9 to 5.2, which is
substantially better than gzip would do.

(Maybe I should try 7-zip, but it's not installed on this Mac.  bzip2
does worse than gzip on such small files.)

The names of the method selectors and classes referenced herein are
not included in those 32 bytes --- rather, there are seven four-byte
pointers: #search:updating: #randomLevel, #on:level:, SkipListNode,
#atForward:put:, #forward:, and #atAllPut:.  (The other method
selectors, like #+, #at:, #value: and #at:put:, have special bytecodes
for them.)  These 69 bytes have to be included somewhere in a bytecode
file intended for use in an architecture-neutral distribution format
that supports dynamic linking, but they only have to be included once,
and if you don't need to link dynamically or modify the software after
delivery, they don't need to be included at all.

So it looks like bytecode is a somewhat more compact architecture-
neutral distribution format than gzipped source code; the latter has
some additional technical advantages, such as being easier to modify,
being easier to optimize, containing comments, and not requiring a
special-purpose compressor.

However, this advantage is less compelling than the code-size
advantage; while Squeak needed a factor of 3 to 10 less code space
than native-code compilation systems, it looks like bytecode uses more
like a factor of 1 or 2 less space than gzipped source code --- in a
couple of trivial examples, which might not be completely realistic.

* Porting to new platforms is easier

I'm not sure if this is true, but it seems plausible: a native-code
generator is more tightly coupled to the CPU architecture than a
bytecode interpreter, because if the bytecode interpreter is written
in a language available on the new target platform, you may be able to
port it with a simple recompile.  This may seem irrelevant today when
all "personal computers" and most servers use the Intel 80386
instruction set, but there is a lot of uncertainty on the horizon:

- ARM is still very popular on mobile phones, which are computers, are
  personal, and outnumber "PCs" perhaps ten to one;
- AVR and PIC are very popular in the deep-embedded space;
- the smoothness of Apple's recent move to Intel processors
  demonstrates that the barrier to entry for new CPU architectures is
  at an all-time low;
- most networked appliances (from companies like Cisco, D-Link, Axis,
  and Broadcom) use non-386 CPUs that run Linux, and there are a huge
  number of them;
- the Cell chip in the Playstation 3 has two different kinds of
  non-386 CPUs, and like the above, game consoles outnumber "PCs", but
  unlike the above, have enormous computational power;
- after the coming transition to highly concurrent software takes
  place, the metric by which CPUs are valued may change from serial
  processing throughput to aggregate processing throughput, which may
  create an opportunity for disruptive CPU innovation.

But, for the moment, portability matters much less than it used to.

* Other advantages?

It could be argued that bytecode compiler/interpreters are simpler
than dynamic native-code compilers.  I don't have enough experience to
know whether this is true in general, but I'm pretty sure that if
you're running on top of an environment like SBCL, it isn't.  Even
spawning an instance of gcc and dlopening the resulting object is
likely to be an acceptable speed, and then you only have to generate C
code.  (You have to be careful not to #include any large .h files,
though.)

Parse-tree walkers, such as the Perl interpreter and the kind of Lisp
interpreter found in books like SICP and EOPL, are simpler than either
one, but they tend to be slower too.

* Disadvantages

Bytecode interpreters are usually pretty slow; the bytecode
interpreter used by Squeak probably accounts for most of the
difference between it (295ns) and unoptimized SBCL (52ns).

JIT bytecode interpreters, such as recent Java virtual machines, can
be less slow, but they clearly lose the code-size advantage normally
enjoyed by bytecode interpreters, the postulated simplicity advantage,
and probably the compilation-time advantage.

* Conclusion

Bytecode interpreters can offer unsurpassed code compactness, although
Python's and OCaml's bytecode formats completely fail to do so.  This
matters a lot on microcontrollers, but I don't know whether it matters
a lot or a little on modern personal computer CPUs --- this Mac has
2GiB of RAM, but only 2MiB of L2 cache, and presumably something like
16-64KiB of L1 cache.  Thrashing the cache is soundly punished.

They also offer a size advantage over gzipped source as a software
distribution format, but this advantage is less compelling.  This
suggests that they ought to be popular in the demoscene, where there's
apparently a lot of action in the "coolest audiovisual demo under N
bytes" categories these days, for values of N such as 128, 256, 4096,
and 65536; if they aren't, it casts doubt on this proposed advantage.

I know of no other reason to implement an interpreter using bytecode.

So I'm surprised it's such a popular thing to do!  I think the reason
is probably that code space and compilation time used to be quite
precious resources (not to mention portability), and programmers just
haven't adjusted to the new realities.

Sun, 11 Mar 2007

A. Pagaltzis writes:
>This code uses a minor trick: 

What perl program does not use tricks?

Perl programmers are tricksy :)

* Kragen Javier Sitaker <kragen at pobox.com> [2006-11-11 09:37]:
> #!/usr/bin/perl -w
> 
> # find /var/cache/wwwoffle/http -type f | perl this-script > tmp.html
> 
> # Generate a really big page of small inline JPEG images from a
> # listing of files in the WWWOFFLE cache.  The WWWOFFLE cache
> # is in /var/cache/wwwoffle/http on my system, and it's not the
> # most convenient possible structure for doing this kind of
> # thing (it has files containing HTTP responses, not the actual
> # resources), but it doesn't take all that much work to grab
> # stuff out of it.  You still have to be running WWWOFFLE for
> # the resulting page not to take an absurdly long time to load,
> # and it still might hork your browser.
> 
> while (<>) {
>   chomp;
>   my $fname = $_;
>   my ($path, $type, $id) = /^(.*)\/([DU])([^\/]*)$/;  # data or URL
>   print STDERR "$_ path $path type $type id $id\n" if not defined $type;
>   if ($type eq 'D') {
>     open FILE, "<$fname" or die "Can't open $fname: $!";
>     while (<FILE>) {  # grab content-type out of HTTP header, hope it's there
>       if (/^Content-Type: (.*)\r\n/) {  # specify \r so . doesn't match it
> 	my $ctype = $1;
> 	#print STDERR "content-type: <$ctype>\n";
> 	if ($ctype eq 'image/jpeg') {
> 	  open URL, "<$path/U$id" or die "Can't open $id: $!"; # crappy msg
> 	  my $url = <URL>;
> 	  print qq(<img src="$url" width="128" height="128" />\n);
> 	}
>         last;  # don't bother reading the rest of the file
>       }
>     }
>   }
> }
> __END__

A cleaner version that crawls the filesystem without external aid:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use File::Find;

    # perl this-script /var/cache/wwwoffle/http > tmp.html

    sub file_matches {
        my ( $rx, filename ) = @_;
        open my $fh, '<', $filename or die "Can't open $filename: $!";
        my $line;
        $line =~ $rx and return 1 while defined $line = <$fh>;
        return;
    }

    sub find_response_where (&;@) {
        my $is_interesting = shift;
        my @path;
        find sub {
            my ( $type, $id ) = m[ \A ([DU]) (.*) ]x;
            push @path, $File::Find::name
                if defined $type
                and $type eq 'D'   # D = data, U = URL
                and -f
                and $is_interesting->( $_ )
            }
        }, @_;
        return @path;
    }

    my $criterion = qr[ \A Content-Type: [ ]* image/jpeg \r\n ]ix;

    @ARGV = find_response_where { file_matches $criterion, shift } @ARGV;

    s[ \A (.*) /D ]{ $1 . '/U' }ex for @ARGV;

    print qq(<img src="$_" width="128" height="128" />\n) while <>;


This code uses a minor trick: the <> operator will open and read
all files listed in @ARGV sequentially, so the code stuffs the
names of the data files of interest into that array, then turns
them into URL file names, then uses the diamond operator to read
them.

But the punchline is the implementation of this same thing in
terms of the File::Find::Rule module from the CPAN:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use File::Find::Rule;

    # perl this-script /var/cache/wwwoffle/http > tmp.html

    @ARGV = File::Find::Rule
        ->name( 'D*' )
        ->file
        ->grep( qr/ \A Content-Type: [ ]* image/jpeg \r\n /ix )
        ->in( @ARGV );

    s[ \A (.*) /D ]{ $1 . '/U' }ex for @ARGV;

    print qq(<img src="$_" width="128" height="128" />\n) while <>;


You can see that F::F::R is the bee’s knees.

-- 
*AUTOLOAD=*_;sub _{s/(.*)::(.*)/print$2,(",$\/"," ")[defined wantarray]/e;$1}
&Just->another->Perl->hack;
#Aristotle

Sat, 10 Mar 2007

> (* I wonder if there's an efficient way to randomly shuffle a list
>    without mutation? *)

What primitives do we have?  Each permutation of the list is a random  
shuffle, so we should be able to pick one by taking a random number mod  
the factorial of the length of the list.  This code:

>     for i = Array.length result - 1 downto 0 do
>         let other = Random.int (i + 1) and tmp = result.(i) in
>         result.(i) <- result.(other) ; result.(other) <- tmp
>     done; Array.to_list result

is essentially generating a list of swaps, where the first swap can be any  
of Array.length, the next of Array.length - 1, usw.  (substitute "location  
of maximum" for Random, and one has a selection sort[0] instead of a  
shuffle[1])  This list of swaps can also be viewed as the digits of a  
number written in a varying-base system.

In APL[2], the "base encode" (drawn here as tau, should be a tack)  
primitive produces the radix representation without mutation:

  RR:1+(Φιω)τ_1+ι!ω

in python[3] (if one isn't worried about lexicographical order, and wishes  
to select a single permutation), a simple recursive equivalent is:

  rr = lambda n,k: (n > 1) and [k%n] + rr(n-1,k/n) or [0]

The trick here seems to be getting to the direct representation (list of  
indices into the original array instead of list of swaps in the swapped  
subarrays, which requires folding over all the swaps), efficiently and  
without mutation -- although if one is using the shuffle to get mp3s from  
disk, generating n lists using O(n**2) integer increments (see "dfr" in  
[1,2]) is unlikely to be expensive.

-Dave

[0] http://en.literateprograms.org/Selection_sort_(Python%2C_arrays)
[1] a sort is one given shuffle, but there are also others which are not  
so random -- for instance, a common application seems to be allowing users  
to shuffle columns in a database without any physical rearrangement by  
keeping track of which logical shuffle of the physical data they want.  In  
the same domain, one might wish to reshuffle a naive sort so that APR AUG  
DEC FEB JAN JUL JUN MAR MAY NOV OCT SEP come out in calendar order.
[2] Iverson, "Notation as a Tool of Thought", 1979 Turing award lecture
     http://elliscave.com/APL_J/tool.pdf
[3] http://en.literateprograms.org/Kth_permutation_(Python%2C_functional)

:: :: ::

> if memory serves it involved a multiplication deriving from the current  
> index somehow so that the fractional probabilities of picking an element  
> would sum up to 1.0 for all elements by the end of the shuffle(?).

I think that's the "pick a random element in a single pass" -- the  
equivalent of Random.int (i+1) when one doesn't know i.  It's also  
interesting to try without mutation.

* Kragen Javier Sitaker <kragen at pobox.com> [2007-03-10 09:40]:
> (* I wonder if there's an efficient way to randomly shuffle a
>    list without mutation? *)

I think I remember seeing one somewhere, but I’m not sure whether
it was a shuffle or another similar operation. It was so simple
it seemed like it shouldn’t work, but did; if memory serves it
involved a multiplication deriving from the current index somehow
so that the fractional probabilities of picking an element would
sum up to 1.0 for all elements by the end of the shuffle(?).
Unfortunately I completely fail to recall where I saw it so I
can’t look up whether it was a shuffle and how it worked. :-(

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

(* I'd previously written this in Perl, but I didn't have the Perl
   version handy when I wanted to shuffle Bea's MP3 library through
   mplayer:
     find "$1" -name "*.mp3" | /home/kragen/bin/unsort | \
     while read song ; do /home/kragen/bin/mplayer "$song" < /dev/tty ; done
 *)

(* randomly reorder the lines in a short file given on stdin *)
(* my first useful OCaml program *)
(* I wonder if there's an efficient way to randomly shuffle a list 
   without mutation? *)

let randomly_reordered lines = 
    let result = Array.of_list lines in
    for i = Array.length result - 1 downto 0 do
        let other = Random.int (i + 1) and tmp = result.(i) in
        result.(i) <- result.(other) ; result.(other) <- tmp
    done; Array.to_list result
let rec input_lines () = 
    try let line = read_line ()
        in line :: (input_lines ())
    with End_of_file -> []
;;
Random.self_init () ;
List.iter print_endline (randomly_reordered (input_lines())) ;;

Thu, 08 Mar 2007

So now I am at the point where Bicicleta needs only native functions
in order to run any interesting programs --- things like fac:

    {env: fac = {fac: x = 3,
        '()' = fac.x.'<'{lt: arg1=2}.'()'.if_true{i: then=1, 
             else=fac.x.'*'{mu: arg1=env.fac{f:
                 x=fac.x.'-'{m: arg1=1}.'()'}.'()'}.'()'}.'()'}
    }.fac{f: x = 4}.'()'

or fib:

    {env: fib = {fib: x = 3,
        '()' = fib.x.'<'{lt: arg1=2}.'()'.if_true{i: then=1,
            else = env.fib{f: x=fib.x.'-'{m: arg1=1}.'()'}.'()'.'+'{p:
                arg1 = env.fib{f: x=fib.x.'-'{m: arg1=2}.'()'}.'()'
            }
        }.'()'
    }}.fib{f: x = 5}.'()'

These parse, but they return errors, because numbers don't yet have
any methods.  Fac depends on integers having '*' and '-' methods that
return integers, and a '<' method that returns a boolean
(i.e. something with an if_true method that returns its then or else).
Fib is similar, but wants '+' instead of '*'.

So what's the best way to implement this?  Here are some approaches
I've seen before.

The C++/Perl/CLOS/OCaml Approach: Primitive Objects Aren't Objects
------------------------------------------------------------------

In C++, Perl, CLOS, and OCaml, primitive objects don't have any
methods, and you can't inherit from them.  You access them in other
non-method non-inheritance ways that exist in each language.  In CLOS,
you can at least define methods on them, but they don't come with any
to start with.

In all four cases, I think this is the result of retrofitting an
object system to an existing non-object-oriented language.

This is not an approach I am considering.

The Python Approach: Native Objects And Native Functions
--------------------------------------------------------

In Python, primitive objects like strings and numbers are different
kinds of objects from user-defined objects.  (This is less true now
than before the "class-type unification".)  They use different
mechanisms for looking up their properties (such as methods), and you
cannot add properties to them the way you can add properties to normal
objects.

Before the class-type unification, you couldn't inherit from them,
either.  Now you can, but the process has some pitfalls.

This approach discourages the introduction of methods like Squeak's
asWords method:

        3252523 asWords
    'three million, two hundred fifty-two thousand, five hundred twenty-three'

Because it really wouldn't be worth it to write that in C.

Another kind of primitive object is the "built-in function", meaning a
function written in C.

The Wheat Approach: Invisible Native Methods
--------------------------------------------

In Wheat, primitive objects like strings belong to some class whose
path is hardcoded into the language interpreter.  If you put ordinary
user-defined methods in that class, they start working on all objects
of that primitive type.  However, there are also non-user-defined
methods in these classes, stuck there by the interpreter at startup.

This means that there are two places to look to see whether, say,
strings override some particular method, or inherit the version in the
standard object.

The Squeak Approach: Objects With Some Hidden State, And Primitive Methods
--------------------------------------------------------------------------

Squeak has primitive methods, which are executed directly by the
interpreter, and some objects like SmallInteger that don't have any
instance variables and thus whose contents can only be accessed
through primitives.

If you browse ByteString>>at:, you will see a thing at the beginning
that says <primitive: 63>.  The comment above it says, "See Object
documentation whatIsAPrimitive," and it turns out Object has a class
method called "whatIsAPrimitive", consisting mostly of a long comment,
part of which reads as follows:

	When the Smalltalk interpreter begins to execute a method
	which specifies a primitive response, it tries to perform the
	primitive action and to return a result. If the routine in the
	interpreter for this primitive is successful, it will return a
	value and the expressions in the method will not be evaluated.
	If the primitive routine is not successful, the primitive
	'fails', and the Smalltalk expressions in the method are
	executed instead. These expressions are evaluated as though
	the primitive routine had not been called.

There are also certain class-selector pairs for native methods that
are not looked up through the normal method lookup mechanism, which
the comments claim is for efficiency; but I suspect that this is also
necessary to provide a base case for the recursion involved in things
like method lookup.

There are about 149 methods in Squeak3.8-6665full.image that specify
primitive responses.

The comment about "if the primitive routine is not successful" above
is interesting.  The most apparent reason the primitive routine would
not be successful is that it may not be implemented in the Squeak
virtual machine running your image at the moment, but there are other
possibilities as well.  For example, SmallInteger>>* has a primitive
implementation that fails on overflow, allowing the Smalltalk version
to fall back to arbitrary-precision, and
BlockContext>>valueWithArguments: fails, for example, if the number of
arguments is wrong, and the error reporting is all handled by this
fallback.

The Lua Approach: I Don't Know What It Is
-----------------------------------------

Apparently Lua has a "clientdata" type which is basically an opaque
pointer into C-land.  It also has, I think, native functions which
call into C when you call them, just as in Python.  But you can define
a "metatable" on a piece of "clientdata" to arrange for table lookup
on the clientdata to return things, such as functions, in Lua or C, to
be invoked as methods.

This seems very similar to the Wheat approach.  But I should finish
reading the Lua manual before passing judgment.

My Approach For the First Bicicleta Prototype
---------------------------------------------

On one hand, at the bottom level, I am taking a Python-like approach:
I have primitive objects, with a fixed set of primitive methods, from
which you cannot inherit.  However, integer literals and string
literals do not evaluate to these objects; integers, for example,
evaluate the expression "prog.sys.machine_integer" and derive from its
result by overriding the method 'clientdata' to return the primitive
object representing that particular integer.

This level of indirection allows the program to wrap more methods
around the primitive object, and even allows integer literals in
certain parts of the program to evaluate to something different.

Wed, 07 Mar 2007

Being able to predict what the _user_ will be thinking is also
an important skill for a programmer to have.

Kragen Sitaker [kragen at pobox.com] said:
> Previously, I hadn't understood the psychological aspect of
> code-reading --- you have to understand not just what the code does,
> but what the previous programmer or programmers were thinking when
> they wrote it.

This echoes of some of the things I've read in The Art of Software
Security Assessment. It's been a few months, so I'm paraphrasing, but
they're basically saying "To really get good at this code auditing
thing, you've got to start thinking the programmer that wrote the code
being audited". Then you'll be able to know what other classes of
vulnerabilities (or corner cases in classes of vulnerabilities) the
authors are likely to have over looked.

--paulv

Mon, 05 Mar 2007

a similar point is made here -
http://www.transhumanist.com/volume1/moravec.htm

scroll down to the third diagram, which has the caption:

  The big freeze. From 1960 to 1990 the cost of computers used in AI
  research declined, as their numbers dilution absorbed computer-
  efficiency gains during the period, and the power available to
  individual AI programs remained almost unchanged at 1 MIPS, barely
  insect power. AI computer cost bottomed in 1990, and since then power
  has doubled yearly, to several hundred MIPS by 1998.

andrew

[...]
> But Moore's Law is about price-performance, not absolute performance;
> here I estimate that the actual loss of price-performance attributable
> to bad CPU architectures is perhaps a factor of 10 to 50, and it is
> plausible that better compilers can remedy this.
[...]

Previous version posted at
http://lambda-the-ultimate.org/node/531#comment-23457 on 2006-12-25.

This is a partial rebuttal to Alan Kay's occasional assertion that
computers aren't nearly as much faster at executing late-bound things
like Smalltalk as you would expect from Moore's Law.

In an interview with ACM Queue, Kay writes [7]:

    Just as an aside, to give you an interesting benchmark --- on
    roughly the same system, roughly optimized the same way, a
    benchmark from 1979 at Xerox PARC runs only 50 times faster
    today. Moore’s law has given us somewhere between 40,000 and
    60,000 times improvement in that time. So there’s approximately a
    factor of 1,000 in efficiency that has been lost by bad CPU
    architectures.

But Moore's Law is about price-performance, not absolute performance;
here I estimate that the actual loss of price-performance attributable
to bad CPU architectures is perhaps a factor of 10 to 50, and it is
plausible that better compilers can remedy this.

Guesswork
=========

"Resuna" writes [6]:

    The [VAX] 11/780 was 3.6 MHz, 32-bit words. I don't know how fast
    the Alto or Dorado were, but with the Dorado being the
    archetypical "3M" machine I assume its performance was comparable
    to a nominally 1-MIPS 11/780.

According to Wikipedia [0], the Dorado was an all-ECL machine.  The
abstract to Lampson and Pier's paper on the Dorado [1], which I
haven't read, says it ran at 20MHz, had 16 hardware threads to provide
zero-context task switching, and was built out of "approximately 3000
MSI [ECL] components".  So it was considerably faster than a VAX.
Maybe one of the older D-machines is "the archetypal 3M-machine".

Apparently it could run 200k-400k Smalltalk bytecodes per second [2].
I'm guessing that the Dorado is the particular machine Kay was
alluding to benchmarking, since it was introduced in 1979, and the
context of the conversation is how machines designed to be efficient
at high-level language execution were worthwhile.

I don't think it was ever sold commercially (or even mass-produced
in-house), which makes per-unit costs difficult to calculate.
However, if we assume that each of the 3000 chips in the thing cost
$20 each (unfortunately I have no real idea how much ECL chips cost in
1980), that's a $60 000 bill-of-materials cost.  So it might have cost
$100 000 per machine if it had been mass-produced, and since it was
ECL, the electrical power cost of running it would likely be higher
per chip as well.

According to the squeak-dev thread on the subject [3], modern 600MHz
uniprocessors are about 20x the speed of the Dorado when running
Squeak, or 35 million bytecodes per second (which sounds more like
100x the speed of the Dorado, actually).

However, the uniprocessors in question cost US$150 or so, which is
inflation-equivalent to maybe US$75 in 1980 dollars.  (They also
include hundreds of megabytes of RAM, instead of the 8MB on the
Dorado.)

If you were going to spend $100 000 today (or when Kay gave this
interview) on a computer to run Smalltalk on, you would probably get a
Beowulf of 50 nodes, each node of which could run bytecodes at 50 to
200 times the speed of a Dorado, and that's running Squeak, which is
not designed to be a particularly high-performance Smalltalk.  But
Moore's Law has still given us, by my rough estimates, a factor of
2500 to 10 000 in price/performance in this case.  (That's not
counting the difference between 8 megs of RAM and 50 000 megs of RAM,
or the advantage of having 10TB of disk, etc.)  A factor of 2500 is
still noticeably less than the 131072x improvement that you might
predict from a naive application of Moore's law, but the remaining
factor of 10-50 is probably explicable in terms of Kay's explanation:
the architecture is not optimized for Smalltalk bytecode execution, so
you get a 10-50x slowdown when you use it as if it were a Dorado.

(You might be able to get a Beowulf of 300 nodes at that price,
depending on other circumstances.)

How much faster are other Smalltalk implementations than Squeak?
Various microbenchmarks seem to peg Strongtalk at 3x-10x faster than
Squeak (Avi Bryant's [4], David Griswold/Klaus Witzel's [5]), which
would nicely compensate for the remainder of Kay's complaint.

References
==========

[0] Wikipedia article "Xerox Alto", section "Diffusion and Evolution",
as of 2006-12-25
> http://en.wikipedia.org/wiki/Xerox_Alto#Diffusion_and_evolution

[1] "A Processor for a High-Performance Personal Computer", from
Butler W. Lampson and Kenneth A. Pier, Xerox PARC, 1980, IEEE
"CH1494-4/80/0000-0146" (whatever that means), 15 pp.; mentions, among
other things, that the first machine "came up in the spring of 1979".
> http://research.microsoft.com/Lampson/24-DoradoProcessor/Acrobat.pdf

[2] Squeak-dev post "Dorado bytecodes per second", from Bruce ONeel
(edoneel at sdf.lonestar.org), 2005-05-28T16:41:49 CEST, quoting
previous post from Jecel Assumpcao Jr (jecel at merlintec.com):

    By running the benchmarks for the "green book" and doing a lot of rough
    extrapolations, my guess is that the Dorado would get between 200K and
    400K bytecodes/sec.

And followup from Tim Rowledge (tim at rowledge.org):

    That is pretty much what I remember as the claim for Dorados.

> http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091211.html

[3] Squeak-dev post "Dorado bytecodes per second", from Jecel
Assumpcao Jr (jecel at merlintec.com), 2005-05-28T22:38:19 CEST ---
he's talking about 600MHz ARMs.
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091215.html

[4] Blog post "Ruby and Strongtalk II", by Avi Bryant, on his blog
"HREF Considered Harmful"; the microbenchmark in question did a
billion accesses of a thousand-element array of small integers, took
0.7 seconds in Java, 7 seconds in Strongtalk, 70 seconds in Squeak, or
16 if you use Array instead of ByteArray.
> http://smallthought.com/avi/?p=17

[5] Squeak-dev thread "Thue-Morse and performance: Squeak
v.s. Strongtalk v.s. VisualWorks", started by Klaus D. Witzel
2006-12-17; several people, including David Griswold, point out flaws
in Witzel's initial benchmark, and the results are interesting.
> http://www.nabble.com/Thue-Morse-and-performance:-Squeak-v.s.-Strongtalk-v.s.-VisualWorks-t2834773.html

[6] Comment "I still want to see Kay's benchmark...", from "Resuna",
2005-07-22
> http://lambda-the-ultimate.org/node/531#comment-7895

[7] ACM Queue article "A Conversation with Alan Kay: Big Talk with the
creator of Smalltalk --- and much more.", by Stuart Feldman and Alan
Kay, vol. 2, no. 9, Dec/Jan 2004-2005, is the origin of this quote.
> http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=273&page=3

Sat, 03 Mar 2007

This will also be online at http://pobox.com/~kragen/sw/lame.html for
easier viewing.  Like everything else posted here without a notice
to the contrary, I dedicate this to the public domain.

<html><head><title>Curves of Lamé</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<!-- about an hour of work, 2007-02-08 -->
<script type="text/javascript">//<![CDATA[
function $(id) { return document.getElementById(id) }
function drawCurve(ctx, points, x, y) {
  ctx.beginPath()
  ctx.moveTo(0, y)
  for (var ii = 0; ii < points.length; ii++) {
    ctx.lineTo(points[ii][0] * x, points[ii][1] * y)
  }
  ctx.stroke()
}

function redraw(canvas, exponent) {
  var ctx = canvas.getContext('2d')
  var w = canvas.width
  var h = canvas.height
  ctx.strokeStyle = 'grey'

  // set coordinate system
  ctx.save()
  ctx.translate(w/2, h/2)
  // the -1s are so that antialiased lines don't become half-width
  // when they reach the edge:
  ctx.scale(w/2-1, h/2-1)  
  ctx.lineWidth = 1/(w/2-1)

  // draw axes
  if ($('draw_axes').checked) {
    ctx.moveTo(-1, 0)
    ctx.lineTo(1, 0)
    ctx.moveTo(0, -1)
    ctx.lineTo(0, 1)
    ctx.stroke()
  }
  
  // compute a point per pixel
  var points = []
  for (var xx = 0; xx <= 1; xx += 1/(w/2-1)) {
    // x^n + y^n = 1; y = (1 - x^n)^(1/n)
    var yy = Math.pow(1 - Math.pow(xx, exponent), 1/exponent)
    points.push([xx, yy])
  }
  points.push([1, 0])  // ensure it's closed to take care of numerical error

  // draw them
  drawCurve(ctx, points, 1, 1)
  drawCurve(ctx, points, 1, -1)
  drawCurve(ctx, points, -1, 1)
  drawCurve(ctx, points, -1, -1)

  // restore coordinate system
  ctx.restore()
}
function cls(canvas) {
  canvas.getContext('2d').clearRect(0, 0, canvas.width, canvas.height)
}
function draw() {
  var ss = $('exponent')
  redraw($('canvas'), ss.options[ss.selectedIndex].value)
}
function erase_old() {
  cls($('canvas'))
  draw()
}
// ]]>
</script>
</head>
<body onload="draw()">
<h1>Curves of Lamé</h1>
<div style="float:right; margin-left: 0.5em">
<div>
<label for="exponent">Exponent: 
</label><select onchange="draw()" id="exponent">
<option>0.125</option>
<option>0.25</option>
<option>0.368</option>
<option>0.5</option>
<option>0.707</option>
<option>1</option>
<option>1.414</option>
<option>2</option>
<option selected="selected">2.718</option>
<option>3</option>
<option>4</option>
<option>8</option>
<option>50</option>
<option>500</option>
<option>5000</option>
</select>
<button onclick="erase_old()">Erase Old</button>
<input type="checkbox" id="draw_axes" checked="checked" 
  onchange="erase_old()" /><label for="draw_axes"> Draw Axes</label>
</div>
<canvas id="canvas" width="400" height="400" style="margin: 0.5em">
</canvas>
</div>

<p>These are curves of Lamé, generalizations of the circle given by
the formula <nobr><i>x<sup>n</sup> + y<sup>n</sup></i> = 1</nobr>.
When <i>n</i> = 2, you get a circle; when <i>n</i> is a bit greater
than 2, you get Piet Hein’s “supercircles”.  He liked the effect you
got when <i>n</i> was roughly <i>e</i>, which is the default here, but
you can change the exponent in the drop-down box to see the different
kinds of curves you can get.</p>

</body></html>

Thu, 01 Mar 2007

> "real programming", which the author defines as "to increase the
> computational capacity, to begin with a set of operations, and develop
> them into new operations that were not obviously implicit in the
> original set."

A bit like parallel parking -- a priori, cars don't seem to have anything  
in their API for moving sideways, towards the passenger side by a little  
over a car width, but the right combination of inputs produces the desired  
behavior.  I suppose the difference is that the possibility of parallel  
parking is, through experience and example, far more obvious to the  
debutant than the possibility of powerful programming.

> I'm not going to claim that everyone goes through the same phases, or
> goes through them in the same order, but no doubt any programmer will
> recognize some of their own history in what lies below.  In each
> phase, I failed to understand that it was merely a phase, and that my
> knowledge was still limited --- that a few years later, I would be
> able to do things I could only dream of at the moment.

Not just any programmer.  When Scott McCloud talks about his evolution as  
a comic book writer, it follows a similar pattern: the naive artist is  
fixated on the surface details and graphic style and tends to dismiss what  
has come before on those grounds, where the experienced artists, having  
had to wrestle on his own (I presume the gendered pronoun is about as  
applicable to comics as to code) with the tradeoffs involved in putting  
together a piece is able to recognize the mastery in earlier work, despite  
the drift in fashion.

> I had the idea that "knowing how to program" was knowing the details
> about all the programming interfaces I could use --- how to use the
> PLAY statement to make music, how to use the RND function to generate
> random numbers, and so on.  I didn't understand that there was a kind
> of programming knowledge that wasn't specific to a particular
> language...

The McCloud parallel is that one moves from "how do I draw characters  
*just like* the ones in Foo" to "how do I tell a *story* with my  
characters?" (from "how can I use the Foo package to make something that  
works *just like* Bar" to "how should this computation be structured",  
respectively?)

> I didn't really understand that the PLAY statement was really
> implemented by other code, not so different from the BASIC code I
> could read myself, rather than some kind of magic --- I didn't know
> how to take it apart and see what was inside, I didn't know the
> language it was written in, and it ran a lot faster than code I could
> write myself.

Just as an armillary sphere was a small model[0] of the visible universe,  
metacircular interpreters help demonstrate that the next turtle down isn't  
any more magical -- usually faster, often more hassle[1], but no more  
magical -- than the turtle one currently knows. "What one fool can learn  
to do, another can"

[0] I, too, succumb to the temptation of model railroading:  
http://en.literateprograms.org/Literate_Programming_(Python)

[1] there is a certain symmetry in programming languages -- algol 60 was  
chock full of features, and people spent the next couple of decades after  
that tossing them out, trying to find the sweet spot where the language  
gave the ability to say what you wanted to say, but not so much power that  
saying it safely took forever and a half.  With the passing of the  
generation of programmers who remembered how well machine code allowed  
themselves to laboriously shoot themselves in the feet, we've spent the  
last couple of decades adding complexity and dynamism.  (maybe the  
hardware people are just now catching up to where software people wished  
they were in the late 50s -- to which the hardware people might reply that  
we might all be a lot better off if we didn't make their machines slower  
almost as fast they make them faster)

cf. Landin, Histories of Discoveries of Continuations: Belles-Lettres with  
Equivocal Tenses

> Today, I don't need [a language]; even if I were programming in
> 1980s MBASIC, I could use functional programming techniques to
> structure the program.  At the time, this was still beyond my
> capacity.

   "The withering away of the statement" is a phenomenon we may live to see.
   --Peter J. Landin, Getting rid of labels, 1965.

As far as I have yet learned, to know that programming is more a matter of  
the workman than the tools is the crux.  One technical problem here is  
that the best applications of programming techniques don't leave obvious  
traces -- even removing code instead of adding it -- because the heavy  
work is done in the theory, to avoid any work in the practice.

> In retrospect, I didn't need teachers to teach me things; I needed
> teachers to force me to get out of my comfort zone, and partners to
> get me unstuck.

In sport, the temptation to "step up one's game" is an easy spur to  
stretching the comfort zone.  But usually, there the next challenges to  
tackle are clearer than they are in programming.  In laying out an  
equestrian cross-country course, the good designer sets it so that there  
are two approaches to the obstacles -- the quick, but challenging line,  
and the safer, but slower, line.  It might be interesting to come up with  
some sample programming problems that offered a similar series of  
decisions in the implementation, and then anyone could measure their  
comfort zone against the height of the fences on offer.  (Also, just as it  
is better to save one's horse and only take the tricky options where they  
save the most time, such a course would implicitly demonstrate that it is  
better to save one's brain and only take the tricky options where they  
count most)

(the closest I can think of, but related more to puissance than endurance,  
is "evolution of a Haskell programmer"[2] -- it parallels the quasizen  
"when I first learned to program, code was code and data was data; when I  
was studying programming, code was no longer code" etc., all the way up to  
the punch line)

Unfortunately, I have no idea what range of programming abilities might be  
an appropriate target.  To explain the FizzBuzz buzz[3] to my wife, I took  
the analogy of asking the new guy to tack up a horse and to walk, trot,  
and canter.  At that point, having observed their style, you still don't  
necessarily know if they can do any work, but at least you can have some  
confidence that they're not going to get in a big wreck learning to do it.

[2] http://www.willamette.edu/~fruehr/haskell/evolution.html

[3] it is interesting that the top scoring declared majors on the LSAT are  
phys/maths and philosophy/religion.  Given that programmers are among the  
few people who make inferences and immediately get several million chances  
a second to discover they've made an error, it seems odd that they would,  
as a group, do so poorly on the LSAT.  My interpretation is that CS  
students who are hackers don't take the LSAT, and it is instead the CS  
students who originally declared their major on the basis of expected  
earnings who take it.  (perhaps they now believe that wheedling into law  
practice will be easier than coaxing an unforgiving machine[4] into doing  
its job?)

[4] "They say Princes learn no art truly, but the art of horsemanship. The  
reason is, the brave beast is no flatterer. He will throw a prince as soon  
as his groom."

> Unsurprisingly (in retrospect), reading code made me better at reading
> code.

I attribute my early successes in computing to have misspent much of my  
youth downloading programs from net.sources (eventually  
net.sources.games), and porting them to work on the mini.  I doubt I wrote  
much notable on my own at this point -- verb/noun adventure game engines  
and wireframe flight simulators -- but I did get very good at reading  
other people's code, and figuring out why it was not working as intended  
and how to make the necessary patch.

In retrospect, I had some amazing advantages.  In the early 80's, my  
father was working at a company that was building (not cloning) personal  
computers.  So, yeah, the chips themselves were magic, but for everything  
else, you knew, or at least knew of, the guys who were making it work (and  
would tell you of practical jokes, like the time the h/w guys reversed the  
leads on the monitors one day to see if the s/w people would notice  
instead of flipping all the signs in the driver code) -- and if you poked  
around in the filesystem, they had not only sources for what they were  
working on, but (depending upon whose schedule was crunched at what time)  
all sorts of side programs.  Failing that, the company library was well  
stocked with ACM titles.

I think the biggest advantage was not the availability of all that source  
code (between open source and citeseer, I think kids now *should* win big  
over both of us -- do they?), but how it set what was "normal" and  
"aspirational" for me.  At the time, I'd thought everyone else had the  
fancy SIGGRAPH displays and Xerox workstations and optimizing Scheme  
compilers, and I was deprived because my father worked for a company that  
was merely producing a little micro with a 512x256 bitmapped display and  
no mouse whose most compelling app was a graphical rogue-like game.   
(written by a couple of the engineers as a quick hack, who also provided a  
small bitmap editor which one of the secretaries used to draw the  
artwork)  Never mind that I could log on to the VAX and read the code and  
even hack my own small changes (using an editor, and a compiler, written  
by others at the company) into a private copy.  I became convinced that  
there must be more to programming that what I'd seen thus far.  (which,  
after all, was something even my old man could do)  Note that, like any  
n00b, I was still fixated on flash gear, instead of how well these guys  
were making things work with what they had, but at least I got the idea  
that what one fool can engineer, another can, and that the frontiers of  
the subject were a long way from what I knew.

> Of course, I don't know what my next phase will look like.  Presumably
> certain things I currently struggle with --- to the point of not
> knowing how they're even possible --- will get easy, and it will seem
> strange that they were ever difficult.  But I don't know what those
> things will be.  But I'm pretty sure I'm not done learning yet...

   "You learn something every day ... unless you're careful" --Tom Van Vleck

At this point, I'm taking a detour, as backfill, into the algebra I never  
learned. (mathematicians' modules are a bit more modular than ours)  To  
see the connection between "real computing" and Lie Algebras, refer to the  
example at the top of this mail.

-Dave

I was reading an article on "Lambda the Ultimate" about Bruce Mills's
book "A Theoretical Introduction to Programming," and in particular
about the difference between "menu-lookup" writing of glue code, and
"real programming", which the author defines as "to increase the
computational capacity, to begin with a set of operations, and develop
them into new operations that were not obviously implicit in the
original set."

This brought me to thinking about my evolution so far as a programmer,
which I can divide into the following phases.

- BASIC, Logo, arrogance, and incompetence: 1980
- Pascal and C: beginning to learn the depth of my ignorance: 1988
- Grokking the Basics of Functional Programming: 1993
- Learning How Magic and Beauty are Possible: 1993
- Getting Practical and Eclectic: 1995
- Getting a Job: 1996
- Joining a Startup: 2000
- Learning to Read: 2002
- Today: 2007

I'm not going to claim that everyone goes through the same phases, or
goes through them in the same order, but no doubt any programmer will
recognize some of their own history in what lies below.  In each
phase, I failed to understand that it was merely a phase, and that my
knowledge was still limited --- that a few years later, I would be
able to do things I could only dream of at the moment.

BASIC, Logo, arrogance, and incompetence: about 8 years
-------------------------------------------------------

This stage lasted from the time I got access my first computer (about
1980) until my Arthur Sittler explained recursion to me,
approximately, around 1988.

I learned Logo during this time, which was interesting, but I never
really made the connection with "real programming" in BASIC; I figured
that since my Logo procedures didn't have line numbers, they weren't
programs.  I used recursion, but only tail-recursion; I remember one
of my early Logo programs looked like this:

    TO EAT:
      PRINT [OM]
      EAT2
    TO EAT2:
      PRINT [NOM]
      EAT2

I thought it was cool that I could define new commands in the
language, but I never made the connection that I could use GOSUB in
BASIC to do something similar, probably because the equivalent of
Logo's SQUARE 200 (calling a user-defined routine to draw a 200-pixel
square) would look like this:

    1180 L = 200
    1190 GOSUB 3200

and not like this:

    1180 SQUARE 200

So I never made the connection, and I never wrote any interesting
BASIC programs during this time.  I learned how to use the graphics
functions in TRS-80 Color Computer BASIC, so I wrote programs that did
all kinds of cool random graphics, but basically their control
structure was just loops within loops, so they couldn't do anything
interesting.

* "Is that a real program or is that something somebody wrote?"

During this period, I was very interested in the outward appearance of
things.  A hacker friend told me this story:

    I had written a starfield screensaver, much like many other
    screensavers of the time, and it was running on my Mac.  A
    co-worker walked by and saw the screensaver, and he asked me, "Is
    that a real program, or is that something somebody wrote?"

This imaginary difference was something I was very concerned with ---
between a "real program" that I could invoke by typing "INVADERS" on
the command line, and "something somebody wrote" in BASIC, that I had
to invoke indirectly through MBASIC, which ran slowly.

I had the idea that "knowing how to program" was knowing the details
about all the programming interfaces I could use --- how to use the
PLAY statement to make music, how to use the RND function to generate
random numbers, and so on.  I didn't understand that there was a kind
of programming knowledge that wasn't specific to a particular
language, although for a period of time, all the computers I came in
contact with ran BASIC, so I could usually figure out how to use them.

Looking back, I was pretty self-assured about knowing "a lot" about
computer programming, but also pretty insecure.  Part of this was that
certain family members thought that I knew "a lot", and this
introduced an external-approval complication into the mix.  This was
to the point that I failed to seek out people who knew more than I
did, and failed to understand why I hadn't, say, written any
interesting games.  I did spend endless time reading books about
various microcomputer dialects of BASIC, and writing the same programs
over and over again.

I didn't really understand that the PLAY statement was really
implemented by other code, not so different from the BASIC code I
could read myself, rather than some kind of magic --- I didn't know
how to take it apart and see what was inside, I didn't know the
language it was written in, and it ran a lot faster than code I could
write myself.

I still meet a lot of people who think that knowing how to program is
about knowing programming languages or the APIs of libraries or
"frameworks", and who are much more impressed with the quality of the
gradients an a program's UI than with its underlying functionality.
In effect, they think programming is just writing "glue code".  These
beliefs amount to cognitive handicaps that prevent them from having
much success as "programmers", finding "programming" interesting, or
(so I hear) having successful work interactions with actual
programmers.  Once, I thought that I had believed these things myself
because I was 6-12 years old, but apparently many adults have these
cognitive limitations too.

It isn't completely wrong, since of course learning APIs and languages
is a necessity for getting certain things done, and UI chrome affects
your mood every second you're interacting with the program.  But
knowledge of APIs and language details is neither necessary nor
sufficient for "real programming"; nor is writing in a "fast
language".  And it only takes a few hours to weeks (of constant work)
to become productive with a new language or a new API, while learning
to program takes decades.

The other thing that characterized this phase, for me, was a
fascination with things like graphics and sound.  I observe that
generally these, together with continuous low-latency feedback from
gradually enhancing a working program, seem to be good "hooks" to get
people to start to learn to program.

This phase began to came to a close as Arthur Sittler introduced me to
Pascal, and showed me a Towers of Hanoi program with no line numbers
that fit in half a page and didn't need an interpreter to run --- but
didn't move any discs on the screen.

Pascal and C: beginning to learn the depth of my ignorance: about 4 years
-------------------------------------------------------------------------

This phase lasted from the time Arthur introduced me to the concept of
recursion, around 1988, until I read Robert Wilensky's "Lispcraft",
probably around 1993 or so.

I had earlier seen the Towers of Hanoi problem as a game, written in
BASIC, for an H89.  You could play it interactively (trying to move
the stack of discs from one peg to another) or you could make the
computer solve it for you.

Arthur showed me recursive programs to solve the Towers of Hanoi
problem in Pascal and some other languages.  They fit on one 80x24
screen.  I had a hard time accepting them as programs at first, partly
due to the lack of line numbers, but he was persuasive.  ("We don't
need line numbers any more.  Line numbers were for when we dropped our
stack of stones on the floor on our way from the stone punch to the
stone reader, but we don't have that problem any more.")  Also, the
resulting program could run without an interpreter.

I wonder what I would have thought of this OCaml version?

    (* move n discs from a to z (using third peg named x) *)
    let rec hanoi move n a z x = if n = 0 then () else 
      (hanoi move (n-1) a x z ; move n a z ; hanoi move (n-1) x z a)
    let printmove disc a b = print_endline 
	("Move disc " ^ (string_of_int disc) ^ " from " ^ a ^ " to " ^ b) ;;
    hanoi printmove 4 "peg A" "peg C" "peg B"

This was my first encounter with recursion as anything other than a
way to express iteration through tail-recursion, and also my first
encounter with compilers, with programs being stored in text files and
edited with a separate program, and with "real programming", in the
sense above.

This began to show me that there was a big world of programming out
there that I didn't know about, and that it was accessible to me.
Unfortunately, I didn't have a copy of the Turbo Pascal Arthur was
using on my Z-100, and I didn't know how I would go about learning
Pascal anyway, so this was still a fairly theoretical realization.

A couple of years later, I took an independent-study computer
programming class at the Career Enrichment Center, where David Mains
and Greg Gurule began to confront the depths of my ignorance and
intransigence, and challenge me to do more --- to learn "real
programming".  I struggled with Pascal, and with my ignorance and
insecurity, and with the level of self-direction needed in the
independent-study environment.  (Perhaps many twelve-year-olds would
have the same problem.)

Still, I made substantial progress because of that class.  I learned
Pascal, DEC VMS DCL, and some C, and I wrote my first interesting
programs --- a Pascal's Triangle generator and a one-dimensional
cellular automaton that generated a Sierpinski triangle --- and my
first useful programs, command-line utility programs written in DCL.

In retrospect, I didn't need teachers to teach me things; I needed
teachers to force me to get out of my comfort zone, and partners to
get me unstuck.

I still had far too much respect for my own skill at programming, and
it still impeded my learning.  Although I could solve "fizzbuzz" level
problems, even toward the end of this time, I hardly ever wrote an
interesting program, and I would struggle for days with problems I can
solve now in an hour.

During this time, I also learned x86 assembler, although the only
program I wrote in it was a tiny Towers of Hanoi program.

Grokking the Basics of Functional Programming
---------------------------------------------

Sometime around 1993, Arthur Sittler lent me "Lispcraft", which taught
me a completely different way of looking at programming problems --- I
began to understand recursion at last.  But I didn't have a Lisp
system available to me, so I couldn't actually use the insights I got
from the book.  Today, I don't need one; even if I were programming in
1980s MBASIC, I could use functional programming techniques to
structure the program.  At the time, this was still beyond my
capacity.

I still had the idea that skill at programming consisted primarily of
competence with a particular set of tools (languages and libraries),
rather than anything intrinsic.

Learning How Magic and Beauty are Possible
------------------------------------------

Also, sometime around 1993, I read Robert Sedgewick's textbook,
"Algorithms in C", which I borrowed from my father, Greg Sittler.  It
opened a whole new world to me.  The programs in the book are all
concise crystals of beauty, showing how a few lines of code can
transform a pile of structs of integers and pointers into a binary
search tree, or a hash table, or the minimal spanning tree of a set of
points.  It used only the basics of C, and did not rely on any
libraries.

This was a revelation to me; there was beauty and magic in these
programs, and it wasn't because they were calling on powerful
libraries hidden, Wizard-of-Oz-style, behind curtains.  They were just
plain C code, and not very much of it, that "developed a set of
operations that were not obviously implicit in the original set," to
borrow Bruce Mills's phrase.

So I began learning about algorithms and data structures and big-O
notation and so on.  It was a mathematical side to programming that I
hadn't even suspected of existing.

"Algorithms in C" taught me by example about the miracle of Turing,
which is sort of a computational analogue of Archimedes' boast, "Give
me a fixed point on which to stand, and I will move the Earth."
Turing's miracle is that given a computational substrate, even one
without much in the way of built-in features, you can still build a
program that does anything.  (If you have enough space.)

I didn't write a lot of useful software during this time period,
because I became obsessed with writing code that was as beautiful and
spare as Sedgewick's example code in the book.  Sadly, code that
touches the real world of existing libraries and hardware is never
that beautiful, so I didn't get a lot written.

I can think of a couple of things I pondered for days during this
time, trying to get the object models right, that I hacked out in a
few minutes to a few hours more recently --- a program for plotting
curves of Lamé and a program for 3D modeling of toruses.

Getting Practical and Eclectic
------------------------------

Around 1995, two revelations befell me, both from Matt Hudson.

First, he wrote a Perl program called "eyefun.pl" (unless that was
Brendan Conoboy, which is possible), which opened a bunch of "xeyes"
windows in random places on your screen.  Reading this program (and
then the perl4 man page) was how I learned Perl, and thereafter I
wrote a lot of little programs in Perl --- much quicker instant
gratification than in C, and I didn't feel the same urge to equal
Sedgewick's C in beauty --- obviously impossible in Perl4.

Second, he recommended that I read Steve McConnell's "Code Complete",
which he said changed the way he thought about programming.  It
changed the way I thought about programing too --- it's a book two
inches thick of explanations about the practical aspects of
programming, from the point of view of someone who spent a lot of time
programming at Microsoft.  Previously, all the books I'd read and all
the (few) classes I'd taken took a fairly doctrinaire attitude about
what you should and shouldn't do, essentially because they were aimed
at complete novices, and in many cases were written by novices as
well; "Code Complete" talks about each of the various doctrinaire
points of view with a fair summary of the pluses and minuses of each.

Essentially, "Code Complete" gave me a dose of practicality and
just-get-down-to-it-ism that helped to counteract the worse effects of
Sedgewick and Wilensky's books --- without leading me to dismiss their
value.

I also talked with Arthur some more, and he pointed out that I still
needed some more theoretical grounding.  I asked him for
recommendations, and as a result, I ended up reading maybe five
thousand pages of computer-science textbooks of various kinds during
late 1995 --- the Dragon Book being the one I remember best, although
various books about APL were pretty interesting too.  I learned a lot,
but none of them really changed my thinking as much as "Code
Complete."

During this later part of the time, I didn't actually have a computer
at all; I would telnet from the local university computer center to my
ISP's Linux box in another state to write and compile my C++ programs.

Getting a Job
-------------

In 1996, I moved to Silicon Valley to get a job, at a company making
telecom network management software.  I was still pretty incompetent;
while I had written a fair number of toy scripts, I was terrible at
debugging.

There I met Jay Lark.  Jay didn't have a lot of time (he was an
executive at the company) but he explained a little bit about Lisp to
me --- told me what a closure was, which Lispcraft had omitted since
it was about dynamically-scoped Franz Lisp --- and lent me Abelson and
Sussman's "Structure and Interpretation of Computer Programs."  This
book taught me more about programming than all the books I had read up
to this point --- and I didn't even read the whole thing!

What Sedgewick had started, Abelson and Sussman completed.  SICP
demonstrated not just the construction of algorithms and data
structures on a substrate that lacked them; they demonstrated the
construction of entire computational paradigms from almost nothing:
arithmetic, object orientation, nondeterministic backtracking
evaluation, and functions.

The job also put me in contact with many other programmers more adept
than I, and gave me the opportunity to learn a lot from them.  And it
forced me out of my comfort zone again.  But programming with them
wasn't my job there; I just ran the build system, the source-control
system, and the bug tracker, all of which probably needed complete
replacement rather than merely nursing along as I was doing.  But I
wasn't yet up to the task.

After a year, I left for Ohio to start a marriage.  But I began
working on programming projects with other programmers, which could
have given me many more opportunities to enhance my programming
skills.  This part is pretty depressing to think about; for several
years, I didn't really make a lot of progress, partly because there
wasn't a lot to force me outside of my comfort zone.

Toward the end of this time, I became active on comp.lang.perl.misc,
which was extremely helpful at improving my Perl and my
programming-in- the-small skills.  It was extremely helpful having
other people look at my code (which, sadly, was lacking in my day job)
and comment on how it could be improved --- and doing the same for
others.  This was extremely valuable for deepening my knowledge of the
Perl language, but as I've said before, knowledge of languages is much
less important than you might suspect at first.

Sadly, I can't recommend comp.lang.perl.misc today; it was already
somewhat dysfunctional at the time, and it has gotten much worse.

During this time, I read "The Practice of Programming", which is a lot
like "Code Complete", but shorter and much higher in quality.  I had
read the same authors' "The Elements of Programming Style" back in
1995, on much the same subjects, but that book is nearly unreadable
today --- it's written in PL/1 and FORTRAN IV.  TPoP, aside from being
written with modern programming languages, also contains insights from
several decades more of the authors' experience.

Joining a Startup
-----------------

In 2000, I moved across the country to join Rohit Khare and Adam
Rifkin at KnowNow, a startup they had just begun.  For the next year,
I was constantly forced out of my comfort zone, forced to learn new
programming languages and APIs (including a bunch that didn't work
very well), and given the opportunity to work with several highly
competent technical people, including Rohit himself, Ben Sittler,
Strata Chalup, Mike Dierken, Scott Andrew, Lane Becker, Derek
Robinson, Adam Zell, Greg Burd, Matt Haughey, and Meg Hourihan.

This was my first time working with people who had such a high level
of competence, and it taught me a lot about programming very quickly.
Unfortunately, KnowNow made some bad business decisions (such as
taking funding from Kleiner Perkins and hiring some really third-rate
management), so the company kind of got stuck, and I got myself fired;
very few of the highly competent technical people I listed above
remained long after me.

Learning to Read
----------------

In 2002, I joined a startup company named AirWave, which practiced
Extreme Programming with a team of six; I stayed there for three
years.  The practical consequences of this included constant social
interaction, high-bandwidth skill sharing, and constantly maintaining
code I didn't write.

Two of the XP core practices are "collective ownership" and "pair
programming".  The XP reasons for these are that they "produce
business value," meaning they are valuable to the people who are using
the software being developed, but they were also very beneficial to me
as a member of the team.

Collective ownership means that nobody owns any part of the code, so
when you're making some modification, you're not any more likely to be
modifying code you wrote than code any other team member wrote.  In a
team of six, that means that five sixths of the time you're
programming, you're editing code somebody else wrote.  This alone
gives you a lot of practice reading code.

"Pair programming" means that half the time you're working on a task,
you're not even at the keyboard --- you're watching somebody else
type.  It's expected that you're paying attention, too, so that you
can spot bugs, departures from the standard coding conventions, and
excessive complexity in the code that's being written.

Due to these two practices, I spent a heck of a lot of time reading
code.  This improved my skills in three unanticipated areas: reading,
maintainability, and debugging.

Unsurprisingly (in retrospect), reading code made me better at reading
code.  This is handy when I start any new programming project,
especially one that someone else has been working on.  I thought I
could read code before this experience, but I got much, much quicker
at it.  Previously, I hadn't understood the psychological aspect of
code-reading --- you have to understand not just what the code does,
but what the previous programmer or programmers were thinking when
they wrote it.

Reading code also made me better at writing maintainable code.
"Maintaining" code means changing it, usually to add features, and
most of the time spent in a typical "maintenance" task is spent
understanding the existing code.  Consequently, the most important
aspect of maintainability is readability --- avoiding unnecessarily
complex constructs, clear naming, helpful commenting.  Of course,
"Code Complete" and "The Practice of Programming" have advice about
how to do these things, but spending three years reading other
people's code in order to figure out how to modify it is much more
educational than reading a book or two.

The biggest surprise was how much it helped in debugging.  Debugging
is what you do when the original programmer thought the program would
do one thing, and it actually does something different.  You can
perform experiments of various kinds to figure out what it's actually
doing, and you often must; but in the end, it comes down to finding
the piece of wrong code, reading what it is actually doing, figuring
out what its author wanted it to do, and understanding how to make it
do that.  It turns out this is the same process as reading code in
general.

I think I learned more about programming in three years from the
AirWave team (Matt Albright, Joe Arnold, Darrell Bishop, Dan DiPasquo,
Jason Luther, Blake Mills, Dave W. Smith, and Sujatha Mandava) than I
had in my previous 22 years of programming.  Unlike most of the
previous teachers I listed, this was a relationship of partnership,
where I think I taught as much as I learned.

Today
-----

Of course, I don't know what my next phase will look like.  Presumably
certain things I currently struggle with --- to the point of not
knowing how they're even possible --- will get easy, and it will seem
strange that they were ever difficult.  But I don't know what those
things will be.  But I'm pretty sure I'm not done learning yet, and
surely there are folks out there who look at the stuff I have trouble
with now and scoff.  And I'm pretty sure that having good partners or
mentors will be crucial in getting to that next stage; I'm wandering
around South America, in part, to find them.