Sat, 31 Mar 2007

Great stuff, as usual. I wouldn't make all the initial IDs random.
Reason: you can e.g. set the MAC from a WGS84 GPS position fix
(about /m^2 Earth surface resolution, IIRC). Ideally, the ID
should be a real 3d coordinate, of course.

By prepopulating the ID landscape (there's probably some
critical density) you'd get lots of domains which will speed up
your annealing.

Refining is basically: one bit for the hemisphere, another bit
for a yet another bisection, iterate.

There's more to the ID encoding, you have to minimize
the amount of bits flipping for nodes in circular orbits
(anything moving on Earth surface also applies).

I wouldn't also limit this to just direct neighbours,
a node can sometimes see quite far -- but you'd
get other distance metrics, such as signal strenght
or a relativistic pingpong measurement.

On Sat, Mar 31, 2007 at 03:37:02AM -0400, Kragen Javier Sitaker wrote:
> Quick summary: it might work but I haven't figured out how to make it
> work.
> 
> <html><head><title>Random bit-vector network addressing</title>
> <!-- see end of file for explanation, where it's put so that it's readable in the browser -->
> <script src="../MochiKit-1.3.1/lib/MochiKit/MochiKit.js"></script>
> <script type="text/javascript">
> var bits_per_cell, bits_per_row, cells_per_row, cells_per_col, table

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820            http://www.ativel.com
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

Quick summary: it might work but I haven't figured out how to make it
work.

<html><head><title>Random bit-vector network addressing</title>
<!-- see end of file for explanation, where it's put so that it's readable in the browser -->
<script src="../MochiKit-1.3.1/lib/MochiKit/MochiKit.js"></script>
<script type="text/javascript">
var bits_per_cell, bits_per_row, cells_per_row, cells_per_col, table

var randbits = []
function replenish_bits() {
   var num = Math.random()
   for (;;) {
      num *= 2
      if (num > 1) {
	 randbits.push(1)
	 num -= 1
      } else if (num == 1) {
	 if (randbits.length) return
	 else num = Math.random()
      } else {
	 randbits.push(0)
      }
   }
}
function random_bit() {
   if (!randbits.length) replenish_bits()
   return randbits.pop()
}
function random_bitvector(length) {
   return map(random_bit, range(length)).join('')
}

function how_many_bits_per_row(total_bits) {
   /* make roughly square */
   var max = Math.ceil(Math.sqrt(bits_per_cell * 4))
   var min = Math.ceil(max/2)
   var ii = max
   for (var ii = max; ; ii--) {
      if (ii == min || total_bits % ii == 0) return ii
   }
}

function reset() {
   bits_per_cell = $('bits_per_cell').value
   cells_per_row = $('cells_per_row').value
   cells_per_col = $('cells_per_col').value
   bits_per_row = how_many_bits_per_row(bits_per_cell)
   table = map(function() { 
      return map(function() { 
	 return random_bitvector(bits_per_cell) 
      }, range(cells_per_row))
   }, range(cells_per_col))
   refresh()
}

function display_cell(cellvalue) {
   var rv = []
   while (cellvalue.length) {
      rv.push(cellvalue.substr(0, bits_per_row))
      rv.push(BR())
      cellvalue = cellvalue.substr(bits_per_row)
   }
   return rv
}

function refresh() {
   replaceChildNodes('dest', TABLE({cellPadding: 2, cellSpacing: 0},
      map(function(ii) {
	    return TR(null, map(function(jj) {
	       var tcc = function(ev) { turn_cells_colors(ii, jj) }
	       var rv = TD(null, display_cell(table[ii][jj]))
	       connect(rv, 'onclick', tcc)
	       return rv
	    }, range(table[0].length)))
      }, range(table.length))))
}

function similarity(r0, c0, r1, c1) {
   var b0 = table[r0][c0]
   var b1 = table[r1][c1]
   var count = 0
   forEach(range(b0.length), function(ii) {
      if (b0[ii] == b1[ii]) count++
   })
   return count / b0.length
}

function similarity_color(r0, c0, r1, c1) {
   var sim = similarity(r0, c0, r1, c1)
   if (sim == 1) return '#77ff77'
   /* Map similarities of one-third and up into colors ranging from
    * white, through pink, into red. Bizarrely fromHSV(1, 0, 1) is black,
    * though fromHSV(1, 0.001, 1) or fromHSV(1, -0.001, 1) are white */
   return Color.fromHSV(1, 1.5*sim - 0.499, 1).toHexString()
}

function turn_cells_colors(r0, c0) {
   set_cell_backgrounds(function(ii, jj) {
      return similarity_color(r0, c0, ii, jj)
   })
}

function colors_by_bit(n) {
   set_cell_backgrounds(function(ii, jj) {
      return table[ii][jj][n] == '1' ? 'white' : 'gray'
   })
}

function set_cell_backgrounds(cell_color_func) {
   var domtable = $('dest').firstChild
   var ii = 0
   var jj = 0
   function turn_a_cell() {
      domtable.childNodes[ii].childNodes[jj].style.background = 
         cell_color_func(ii, jj)
      jj++
      if (jj == table[0].length) {
	 ii++
	 jj = 0
      }
      if (ii < table.length) setTimeout(turn_a_cell, 0)
   }
   turn_a_cell()
}

function neighbors(ii, jj) {
   function in_bounds(arg) {
      var ii = arg[0], jj = arg[1]
      return ii >= 0 && jj >= 0 && ii < table.length && jj < table[0].length
   }
   function get_cell(arg) { return table[arg[0]][arg[1]] }
   return map(get_cell, filter(in_bounds, 
			       [[ii, jj], [ii-1, jj], [ii, jj-1], 
				[ii+1, jj], [ii, jj+1]]))
}

function majority_rule_digit(digits) {
   var counts = [0, 0]
   forEach(digits, function(digit) { counts[digit] += 1 })
   if (counts[0] > counts[1]) return 0
   else if (counts[1] > counts[0]) return 1
   else return random_bit()
}

function random_rule_digit(digits) {
   var counts = [0, 0]
   forEach(digits, function(digit) { counts[digit]++ })
   return (Math.random() < counts[0] / (counts[0] + counts[1])) ? 0 : 1
}

function map_digits(digit_rule, neighbors) {
   var rv = map(digit_rule, map(function(ii) { 
      return map(function(neighbor) { return neighbor[ii] },
		 neighbors)
   }, range(neighbors[0].length)))
   return rv.join('')
}
current_digit_rule = random_rule_digit

function change_rule(select_object) {
   var o = select_object.options[select_object.selectedIndex].value
   if (o == 'majority_rule_digit') current_digit_rule = majority_rule_digit
   else current_digit_rule = random_rule_digit
}

function mix_cell(ii, jj) {
   return map_digits(current_digit_rule, neighbors(ii, jj))
}

function propagate(continuation) {
   var newtable = []
   var ii = 0
   var jj = 0
   var newrow = []
   /* explicit CPS loop to avoid "script has stopped responding" message */
   function do_cell() {
      newrow.push(mix_cell(ii, jj))
      jj++
      if (jj == table[0].length) {
	 jj = 0
	 newtable.push(newrow)
	 newrow = []
	 ii++
      }
      if (ii < table.length) setTimeout(do_cell, 0)
      else {
	 table = newtable
	 refresh()
	 if (continuation) continuation()
      }
   }
   do_cell()
}

function propagate_n(n) {
   if (n > 0) propagate(function(){propagate_n(n-1)})
}

function init() {
   reset()
}

</script>
</head>
<body onload="init()">
<h1>Random bit-vector network addressing</h1>
<p style="font-size: 2em">
<button onclick="propagate_n($('n').value)">propagate</button> <input id="n" 
    value="1" size="3"> times using a
    <select onchange="change_rule(this)" id='current_digit_rule'>
        <option selected="selected" value="random_rule_digit">stochastic</option>
        <option value="majority_rule_digit">mostly deterministic</option>
    </select> rule, click on a cell to color other cells by similarity, 
    <button onclick="colors_by_bit($('bit').value)">color</button> by bit
    <input id="bit" value="0" size="3" /> (
      <button onclick="$('bit').value++; colors_by_bit($('bit').value)"
        >up</button>,
      <button onclick="$('bit').value--; colors_by_bit($('bit').value)"
        >down</button>),
    or <button onclick="reset()">reset</button> with 
    <input id="cells_per_col" value="15" size="3"> rows of 
    <input id="cells_per_row" value="15" size="3"> cells containing
    <input id="bits_per_cell" value="32" size="4"> bits each.
</p>
<div id="dest"></div>

<h2>Explanation</h2>

<p>So I had this very simple idea for doing geographical addressing
and routing in ad-hoc mesh networks.  Every node starts by generating
a random ID, and then for some number of generations, executes the
following procedure:</p>

<ol>
<li>Send all your neighbors your current ID.</li>
<li>Receive all your neighbors' IDs.</li>
<li>Given your own ID and your neighbors' IDs, calculate a new ID for
yourself that is somehow more similar to your neighbors' IDs than your
old ID was.</li>
</ol>

<p>The idea is that, after some number of generations, much of the
local variation will be smoothed out of the ID space, so that you can
route a message to a node by forwarding it to whichever one of your
neighbors has the ID most similar to the destination node's ID, or
possibly choose one of your neighbors at random weighted by that
function.</p>

<p>There's a bit of tension in this idea --- if the smoothing process
is totally effective, every node will end up with the same ID, making
them useless for addressing, and if it's not that effective but still
rather too effective, there won't be much local variation in ID, so it
will be hard to figure out which direction a message should go in.
But if it's not smooth enough, the message won't make progress towards
its destination.</p>

<p>So this is a little experiment to try this idea out.  Each node has
a 32-bit randomly-generated address to start out; it's topologically
connected to the cells above and below it and to its left and right;
"similarity" is Hamming distance, new IDs are calculated bit by bit,
and there are two algorithms available for making an ID bit "more
similar" to those of a node's neighbors':</p>

<dl>

<dt>stochastic</dt>

<dd>The node's and neighbors' 0 bits (in a particular position,
e.g. bit #11) are counted; the new bit is 0 with probability N/M,
where N is the number of 0 bits found, and M is the total number of
bits (5 for most nodes, ranging down to 3 in the corners).  So if all
the bits examined are the same, the new bit is guaranteed to have the
same value; but if e.g. a single bit out of 5 is 0, then the new bit
is 0 with 20% probability and 1 with 80% probability.</dd>

<dt>mostly deterministic</dt>

<dd>The same set of bits are examined, but the new bit is the value
taken by the majority of the old bits (or uniformly randomly selected
if there was a tie, which can happen at the edges)</dd>

</dl>

<h2>Results</h2>

<p>It doesn't work as well as I would like.  Neither of the algorithms
is very effective at eliminating "local optima" on 32-bit IDs in a
15x15 grid, even after a large number of generations.  The
mostly-deterministic algorithm does eliminate the majority of local
bumpiness in the closeness function, so messages started nearby would
propagate successfully.</p>

<p>Neither algorithm is "effective" enough to produce noticeable
numbers of identical IDs.</p>

<p>Unsurprisingly, both algorithms seem to work if the number of bits
is larger than the number of nodes, as long as you run them long
enough.  But maybe that's just because I haven't been able to wait
very long to run largish simulations with a lot of bits.</p>

<p>The deterministic algorithm seems to mostly settle down after a few
generations, maybe 10 or 20, which means that no new information is
being transmitted at that point.  This is bad news, since in 10 or 20
generations, information can only flow (optimistically) 10 or 20 hops.
It gets a little bit better with more bits, but not a lot.</p>

<h2>Other directions</h2>

<p>It might be possible to rescue this approach in one or more of the
following ways:</p>

<ul>
<li>Randomly flip bits from time to time as a modification to the
mostly-deterministic algorithm.</li>

<li>Rather than using vectors of individual bits, use vectors of small
integers, perhaps of 2 or 3 bits each; then "more similar" IDs might
be a median, or stochastically chosen from near the median.</li>

<li>Change some bits more slowly than others, or change some bits
according to the stochastic algorithm, and the others according to the
majority-rule algorithm.</li>

<li>Gradually "anneal" the IDs: start with the stochastic algorithm
(or an algorithm where the bits are just uniformly selected each
step), then gradually increase the probability weight from the local
majority.</li>

</ul>

</body>
</html>

Mon, 26 Mar 2007

* Kragen Javier Sitaker <kragen at pobox.com> [2007-03-20 19:15]:
> Aristotle Pagaltzis writes:
> > * Kragen Javier Sitaker <kragen at pobox.com> [2006-11-11 09:37]:
> > > # Generate a really big page of small inline JPEG images from a
> > > [fugly code omitted]
> > A cleaner version that crawls the filesystem without external aid:
> > [most of cleaner version omitted too]
> >         open my $fh, '<', $filename or die "Can't open $filename: $!";
> 
> You don't like use Fatal?  :)

So-so. I think its output format is pretty awful (and generating
that doubles the otherwise small module’s code!), though it is
tolerable. So I do use Fatal when it saves me substantial work,
but not for just one or two `open`s.

> > This code uses a minor trick: the <> operator will open and
> > read all files listed in @ARGV sequentially, so the code
> > stuffs the names of the data files of interest into that
> > array, then turns them into URL file names, then uses the
> > diamond operator to read them.
> 
> Yeah, I think that part would have taken me a while to figure
> out without the explanation --- although I've used while (<>) a
> thousand times, I've never used it to read a bunch of filenames
> the Perl program itself had stuffed into @ARGV.

Interesting; I picked it up from the community-accepted canonical
idiomatic way to write a `slurp` function:

    sub slurp {
        my ( $filename ) = @_;
        local ( @ARGV, $/ ) = $filename;
        return <>;
    }

Here you end up with a single-entry `@ARGV` and an undefined `$/`
for the duration of the function, which causes `<>` to gobble up
the file whose name was just put into `@ARGV`. (Sidenote since we
mentioned Fatal previously: when using `<>`, Perl will supply its
own (though non-fatal) error messages.)

(Actually, though, that is unsafe: the list of variables to
localise should  read `@ARGV, $ARGV, ARGV, $/`. Alternatively, in
Perl 5.8 and newer, you can localise the entire ARGV glob:

    local ( *ARGV, $/ ) = [ $filename ];

But globs are an ugly wart in Perl 5, and I like to pretend they
don’t exist wherever possible; not to mention how much subtlety
is involved in how/why that line work on top of the idioms in the
rest of the snippet. (I’m not going to explain it here.) So I
prefer the longer version, which has the bonus that it will run
on old perls.)

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Considered as a programming language, Bicicleta has some drawbacks.

1. It tends to err silently.  If you use it in the way I expect people
   normally will, with every argument to every function having a
   sensible default, the consequence is that forgetting or misspelling
   the argument in the function call will produce a wrong result
   rather than an error message.  And, by design, there's no way to
   detect and complain about extra arguments --- for example, if you
   misspell the argument name.

   You can reduce the first part of this problem by overriding all the
   arguments with errors in the public version of a function, or
   deleting them entirely from the definition, but I expect this to be
   uncommon.

2. There's no way to override the behavior on derivation.  "No way to
   detect extra arguments" is a special case of this.

3. There's no multiple inheritance.

4. There's no static typing.

5. Class hierarchies are likely to be relatively deep.  This has been
   an obstacle to understanding in other languages in the past, such
   as Smalltalk, but I am hoping that Bicicleta's user interface
   improvements will compensate.

6. You can define new methods on objects inherited from existing
   classes, and you can evaluate code in a context where it will use
   your modified objects instead of the basic ones, but it might be to
   add new methods to standard library objects, just as in Java or
   Python.

I expect that these drawbacks will be less important than its advantages.

Sun, 25 Mar 2007

> I wonder if the Symbolics microcode was horizontal?  If they used
> vertical microcode while the Dorado was horizontal, maybe that might
> explain the difference between their perception of Moore's Law and
> Alan Kay's --- Kay consistently complains that modern CPU
> architectures run Smalltalk and other late-bound things slowly, which
> is sort of the opposite of the apocryphal Symbolics comment I cited
> earlier.

I don't know Kay's argument well enough to address it directly, but I  
believe a modern CPU architect would say the critical factor is to run  
fast stuff fast, so early-bound things go as quickly as possible, and  
late-bound things wind up going as slowly as necessary -- if the only way  
to make late-bound things appear to run in the same number of clocks as  
early-bound is to slow down the machine, that's a tradeoff they've thus  
far been unlikely to make.

(this implies there ought to be a hierarchy of binding: late-binding for  
code whose data lives on the net or on disk, early-binding for code whose  
data lives in L1 or in registers, and a suitable mix for everything in  
between)

But I don't think this is the opposite of the Symbolics comment:

> I have a vague memory that some of the Symbolics folks said that when
> they reimplemented their 36-bit CPU as an Alpha AXP program
> ("OpenGenera"), it fit in the L1 cache of the Alpha, and it ran Lisp
> programs about as fast as you'd expect microcode to run on a machine
> with that kind of clock speed.

The two closest references I was able to track down were:
http://www.unlambda.com/lispm/memo528.html
http://pt.withy.org/publications/VLM.html

My understanding is the first, AI Memo 528, was roughly the architecture  
for all the lisp machines with the possible exception of LMI's K-machine.   
(After reading Stallman's inspiration for GNU in the demise of the AI lab,  
I have to wonder to what extent the "AI winter" came about because lisp  
machines were sold largely to projects flush with Star Wars money?   
Afterwards, with no raygun, they appeared more exotic than practical -- I  
am personally glad to have picked the web over Lucid in the mid-90's.   
This hypothesis implies there should now be an "AI spring" with the modern  
DOD, which, at least in robotics and speech tagging (consider automated  
wiretaps), seems to be true)

The second was an abstract for a paper that was never written.  They  
appear to have coded in a style similar to microcode, making use of  
multiple dispatch, paying attention to cache conflicts, and requiring 64  
bit data paths.  Given that simulating hardware is an early-bound problem  
(and that they had a wider and faster machine than the one being  
emulated), it doesn't sound surprising they got decent performance from a  
modern architecture.

Squint at emulation just right, and it looks like programming in general.   
One has an abstract problem, for which one wants to compute an abstract  
solution:

     A ---> A'

which (if one is not EWD, who saw no need to suffer from the diseases  
which he studied :-) tends to involve using a concrete machine to  
calculate a concrete solution:

     C ---> C'

then coding the abstract into the concrete (A->C), and decoding the result  
(C->A)

     A ...> A'
     |      ^
     v      |
     C ---> C'

That is, we want to emulate the abstract computation with our concrete  
calculation.

I think I now understand why Dijkstra was interested in adjunctions: when  
F:A->C and G:C->A form an adjunction, then GF is a nice endofunctor, and  
GFGF => G(FG)F => GF reduces to another nice endofunctor, (and GFGFGF is  
indeed associative) so when we string a bunch of these together we get a  
monad.

> Maybe [debugging] accounts, in part, for the great interest in static
> correctness proofs in the functional programming community?  OCaml has
> a reportedly very nice debugger that works only with its bytecode, not
> with its native code, and they simply try very hard to ensure that
> they have the same behavior so you never have to debug the native
> code.  Having a static type system that permits type erasure without
> loss of safety helps with this.

Yes, exactly.  The way to debug easily is to behave exactly as above, and  
map between the concrete and abstract at each step (GF...GF).  An  
interpreter does this "for free", that is, ignoring execution time.  The  
way to run quickly is to erase the intermediate (normally identity)  
C->A->C steps, and run the problem strictly in the concrete domain  
(Gccc...ccccF).  Compilers do this "for free", but then if we want to look  
at "the middle of" the computation, we must then have a debugger to  
reconstruct the equivalent A' from any given C' (and possibly synthesize a  
new C'', should we alter A' to A'').

Horowitz and Hill point out that similar games are played in hardware with  
sequential PLDs -- if one has the gates, it's easier to make a strictly  
synchronous design which keeps its (abstract) outputs in the registers.   
If one doesn't have the gates, it's more efficient to make a design which  
keeps the bare minimum of (concrete) state in the registers.

-Dave

:: :: ::

> It's too bad gdb hasn't acquired a decent scripting language, all
> these years after the Plan9 paper (was it "Acid: a debugger built from
> a language"?)

Sounds like the right paper.  I wonder to what extent Forthers,  
Smalltalkers, and Lispers use the multilevel nature of their languages to  
script debugging?

> If I remember correctly, bass started out using ColorForth as a basis?
> I'm afraid it's been a number of years since I looked at it.  Is there
> a place to see it these days?

It was on a laptop that didn't survive the move, so now it's on a bare  
2.5" disk and a few backup CDs packed somewhere in the "cave".  (a  
basement area that supposedly contains supplies in case of nuke attack,  
but in the case of swiss tends to be used for wine, prosecco, and skis;  
and in the case of at least some americans is stuffed full of things  
which, had we had half a mind when moving, would never have been shipped  
across the atlantic in the first place.  What's the news on you two?  Is  
it BA?)

I should dig it out sometime and GPL it, but it never progressed much  
beyond what you've seen.  The register allocation wasn't for general use,  
but was to allow full bandwidth when doing sequences of loads and stores  
-- somewhat like how the concatenative people are experimenting with  
interspersing linear code with stack-shuffling.  I was happy fairly with  
the code, but never came up with an equivalently structured approach to  
data literals.  (or rather, never came up with an approach that didn't  
seem to eventually lead to a traditional language)  I was interested in  
YAML for this purpose before it grew; now it'd be tempting to just put in  
something JSON-like (which would require changing the code syntax to  
preserve duality) and call it done.

The fellow who provides my alpine elevation data uses a homebrew language  
(which he actually ported from moto to intel at some point) that wound up  
in a similar place, but as his interests are array processing and  
raytracing, he's never bothered implementing much in the way of data types  
either.

Someday, I'd like to look at the initialization stuff in Virgil to see if  
it'd solve one of the big problems: literals, like code, live in readonly  
("text") and are expressed abstractly; we'd like for the intermediate  
variable values which live in data to be expressed concretely.  I had a  
decorator kludge in bass for literals that was a half-assed approach,  
something more elegant would not only handle specializing data  
representations but would also be able to specialize code fragments so the  
same source might have several machine-code translations to run in  
different monads -- and at this point the problem is to avoid simply  
reinventing macros.

The effort to stay 1-1 (and to prioritize sequences over graphs) may just  
be too restrictive; the approach I would try now would involve an algebra  
whose top level values would still be hardware-oriented bytestrings, but  
whose intermediates could be more symbolic.  It still wouldn't be powerful  
enough to turn a declarative text into an FSM, but at least it'd be able  
to transpose a COND form into a decision table or do peephole tail call  
optimization.

> There's the apocryphal story of Leonard Zubkoff implementing a
> debugger in a weekend for a new platform, rather than waiting weeks
> for someone else to do the job; I wonder how much more or less his
> debugger contained than DEBUG.COM.

Probably much more -- RPG's story is that it was a "symbolic UNIX  
debugger": when one controls the compiler symbols are simple, and  
ptrace(2) breaks things up into a debugging process and a debugged  
process, which also simplifies things.  So I'd imagine it at least had  
symbols and breakpoints {go, stop, step into}, and probably had expression  
evaluation and possibly step over.  (step out requires parsing opcodes,  
but the RT was a RISC, which would have helped)

> What do you mean, "loops with tails"?  You mean that you don't tend to
> factor the code into a lot of small functions?

Calling a function involves a label, so lots of small functions would mean  
lots of labels, which is what we're trying to avoid.  The head of a loop  
requires a label, but any straight-line code we can tack onto the end  
doesn't.  So we tend to wind up with chunks that consist of reducing  
something to a canonical form (the loop), followed by addition of more  
data which will be reduced in another chunk (the tail).  I guess it's  
effectively an apply(Y, eval(X)) -- somewhat reminiscent of writing in sed  
and fairly reminiscent of the algebraists' habit of canonically splitting  
maps into projection followed by injection.

> I wonder why DEBUG couldn't rememember labels or breakpoints.
> FIG-Forth ran on the same hardware and remembered the names of every
> routine in the system.  Maybe DEBUG had to fit into a tiny static
> corner of a 64K memory segment, to avoid limiting its applicability to
> small programs?  But it isn't restricted to disassembling or debugging
> tiny-model programs.

Of course if I was going to write more than 512 bytes of code that way,  
and wanted an interactive environment, I'd probably wind up with something  
like a forth.  The history seems to be that DEBUG.COM, like EDIT.COM, was  
part of the "quick and dirty OS" that Microsoft bought, and had been  
written figuring that someone else would soon replace them.

> Have you been following Ian Piumarta's Pepsi/Coke/Jolt/Albert work?
> He's very much focused on questions like how to allow you to modify
> Array>>#at: at runtime without breaking everything.  (Of course, if
> your modification is broken, you will still have problems...)

No, thanks for the links!  I think the reason for the ST-80 spec in ST at  
the back of the blue book was not only because metacirculars are cool, but  
also because they did cross development/upgrades from time to time by  
building up a foreign image and a foreign VM inside the usual image, and  
then bootstrapping them to standalone.

> I also think that's the first 8-page conference paper I've ever seen
> that includes a complete and working (?) synthesizable CPU design (in
> noweb and Verilog) in the text of the paper.  (It is synthesizable,
> isn't it?  My Verilog is not up to snuff.  The actual Verilog seems to
> be about 240 lines.)
>
> It's too bad that he doesn't report on how much FPGA space it needed
> or how fast it ran.

True.  I think FPGA boards are pretty big and cheap these days (so it  
should be possible to "try it and see"), but haven't had the excuse to  
play with one.  Any recommendations from kragen-discuss?

Sat, 24 Mar 2007

The experience of running the REPL included here is a far cry from the
spreadsheet-like exploration/debugging UI I have in mind, but it at
least allows interactive testing of simple programs.  It should get a
lot better in the coming days.

Here's a sample interactive session:

    Beauty:~/devel/bicicleta kragen$ ./bicicleta_repl
    Bicicleta version 3, Copyright (C) 2007  Kragen Javier Sitaker
    Bicicleta comes with ABSOLUTELY NO WARRANTY; for details see the file
    "COPYING".
    This is free software, and you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    '()' = 2 + 3 * 4    # note: no operator precedence, rather deliberately
    "20"
    '()' = {x=1, y=2}.y  # define an object literal, call a method/read a property
    "2"
    '()' = {'yo mama' = "so fat", 'my mama' = "don't you talk bout my mama!"}.'yo mama'  # method names contain arbitrary characters if you quote them
    "so fat"
    '()' = -3
    parse error
    '()' = 3.negated  # no prefix operators in the language
    "-3"
    '()' = 3 < 4
    "true"
    '()' = 3 < 3
    "false"
    '()' = 3 <= 3
    "true"
    '()' = {self: x=1, y=self.x+2}.y   # you can define one property in terms of another
    "3"
    '()' = {self: x=1, y=self.x+2}{x=5}.y  # overriding the value of x in a derived object
    "7"
    '()' = {'()' = "hi"}.'()'  # method names with weird characters again, but
    "hi"
    '()' = {'()' = "hi"}()  # this one has syntactic sugar to call it
    "hi"
    '()' = {self: x=1, '()' = self.x+2}(x=5)  # including with overrides
    "7"
    '()' = {self: arg1=1, '()' = self.arg1+2}(5)  # positional args are arg1, arg2...
    "7"
    '()' = {fac: arg1 = 3, n=fac.arg1, '()' = (fac.n < 2).if_true(then = 1, else = fac.n * fac(fac.n - 1))}(5) # here's a recursive factorial.
    "120"
    '()' = {self: '[]' = {idx: 4, '()' = idx.arg1 * 2 + 1}}[45]  # '[]' is called by the foo[] syntax
    "91"
    '()' = {this: x=43, y=44, show={userdata=this}}  # try to display an object instead of a string or a number, and ...
    #16=#17=#14# {
    this: x = 43 (in
    (omitted 330 lines where it dumps the entire standard library)
    '()' = 3.5 / 3.1  # there is floating-point math; mixed-mode is still broken
    "1.12903225806"

There is a tar.gz file of all this source code, plus a little more, at
http://pobox.com/~kragen/sw/bicicleta-3.tar.gz for easier downloading
--- about 33 kilobytes.  There are already an earlier version at
http://pobox.com/~kragen/sw/bicicleta-1.tar.gz (and 0).  I'm sorry
it's such a huge email.

Unlike everything posted to kragen-hacks without a notice to the
contrary, this code is not in the public domain; I retain copyright,
but you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any
later version.

First, here's the build script that compiles the REPL and other
things.  I wrote it in sh because I'm on a Mac without make, and I
don't know where to download make.


#!/bin/sh
# Build script for prototype Bicicleta interpreter
set -ve
: ${OCAMLC=ocamlc} ${EXTRAS=}
$OCAMLC -c bicicleta_syntax.ml
ocamlyacc bicicleta_parser.mly
$OCAMLC -c bicicleta_parser.mli
$OCAMLC -c bicicleta_parser.ml
ocamllex bicicleta_lexer.mll
$OCAMLC -c bicicleta_lexer.ml
ocaml $EXTRAS bicicleta_syntax.cmo bicicleta_parser.cmo bicicleta_lexer.cmo \
    bicicleta.ml  # for regression tests
$OCAMLC -c bicicleta.ml
ocaml $EXTRAS bicicleta_syntax.cmo bicicleta_parser.cmo bicicleta_lexer.cmo \
    bicicleta.cmo bicicleta_lib.ml  # more regression tests
$OCAMLC -c bicicleta_lib.ml
$OCAMLC -c bicicleta_repl.ml
$OCAMLC bicicleta_syntax.cmo bicicleta_lexer.cmo bicicleta_parser.cmo \
    bicicleta.cmo bicicleta_lib.cmo bicicleta_repl.cmo -o bicicleta_repl
ocaml $EXTRAS bicicleta_syntax.cmo bicicleta_lexer.cmo bicicleta_parser.cmo \
    bicicleta.cmo bicicleta_lib.cmo bicicleta_dump.ml  # to see show_bicexpr go
$OCAMLC -c bicicleta_run_script.ml
$OCAMLC bicicleta_syntax.cmo bicicleta_lexer.cmo bicicleta_parser.cmo \
    bicicleta.cmo bicicleta_lib.cmo bicicleta_run_script.cmo \
    -o bicicleta_run_script


Here's bicicleta_syntax.ml, which defines the structure of Bicicleta
expressions and values:


(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
type methods = NoDefs
               | Definition of string * bicexpr * methods (* name, body, ... *)
and bicexpr = Name of string
              | Call of bicexpr * string
              | Literal of string option * methods
              | Derivation of bicexpr * string option * methods
              | StringConstant of string
              | Integer of int
              | Float of float
              | NativeMethod of (lookup -> bicobj)
and userdata = UserString of string
              | UserInteger of int 
              | UserFloat of float
and bicobj = ProtoObject 
    (* Five positional parameters is bad style; six is worse.
       Derive of name, selfname, body, env, next, cache *)
             | Derive of string * string option * bicexpr * 
                 lookup * bicobj * (string, bicobj) Hashtbl.t option ref
             | UserData of userdata
             | Error of string * string
and lookup = Phi | Add of string * bicobj * lookup ;;
(* other potential definition of lookup:
  type lookup ;;
  type add = { name: string; value: string; next: lookup; } ;;
  type lookup = Phi | Add of lookup ;; *)


It is used in the usual way by bicicleta_parser.mly, an ocamlyacc
file, to produce Bicicleta expressions from token streams:

/* -*- mode: tuareg; compile-command: "./build" -*- */
/* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
%{
open Bicicleta_syntax;;

(* Reorder the definitions so the earliest ones are outermost; there's
no semantic reason for this, but changing it would require changing
the existing parse regression tests, and also show_bicexpr.  It's also
a convenient place to add names for positional arguments. *)
let mk_defs deflist = 
  let rec fd_defs argcount = function
      [] -> NoDefs
    | (Some name, expr) :: defs -> 
        Definition(name, expr, fd_defs argcount defs)
    | (None, expr) :: defs ->
        Definition("arg" ^ string_of_int argcount, expr,
                  fd_defs (argcount + 1) defs)
  in fd_defs 1 (List.rev deflist) ;;
%}
%token <string> NameTok
%token <string> StringTok
%token <string> OpTok
%token <string> IntTok
%token <string> FloatTok
%token Lparen Rparen Lbrace Rbrace Colon Period Newline Comma Equals Eof
%token Lsquare Rsquare
/* As written, this grammar has five ambiguities: is 1 + 2 + 3 supposed
   to be (1 + 2) + 3 or 1 + (2 + 3), and is 1 + x(3) supposed to be 
   1 + (x(3)) or (1 + x)(3) (with equivalent cases for 1+x[3], 1+x{3},
   and 1+x.y)?  The first one is resolved with %left OpTok below, and
   the others are resolved with the %nonassoc declaration, which gives
   those tokens higher precedence than OpTok by virtue of coming later
   in the source file. ocamlyacc -v is helpful for diagnosing this! */
%left OpTok
%nonassoc Period Lparen Lbrace Lsquare
%start main
%type <Bicicleta_syntax.bicexpr> main
%%

/* 22 productions --- a fairly small grammar.  If you try to write a
   grammar for Common Lisp, you end up with 16 productions to handle
   lists, dotted pairs, symbols, strings, integers, floats, ratios,
   chars, quote, quasiquote, unquote, unquote-splicing, and vectors,
   but that leaves out #=, ##, #*, #:, #|...|#, #+, #-, #., #a, #c,
   #b, #o, #x, #r, $p, #s, ",.", and user-defined read macros.  Also,
   some macros are pretty hairy; the CLHS entry for LOOP has a BNF
   grammar with 34 nonterminals. */

main:
    expr Eof { $1 }
expr:
    expr Period NameTok   { Call($1, $3) }
  | NameTok               { Name $1 }
  | StringTok             { StringConstant $1 }
  | literal               { Literal(fst $1, snd $1) }
  | expr literal          { Derivation($1, fst $2, snd $2) }
  | expr Lparen definitions Rparen { Call(Derivation($1, None, mk_defs $3),
                                         "()") }
  | expr Lsquare definitions Rsquare { Call(Derivation(Call($1, "[]"), None,
                                                          mk_defs $3), 
                                           "()") }
  | Lparen expr Rparen    { $2 }
  | IntTok                { Integer (int_of_string $1) }
  | FloatTok              { Float (float_of_string $1) }
  | expr OpTok expr       { Call(Derivation(Call($1, $2), None,
                                               mk_defs [None, $3]), "()") }
literal:
    Lbrace NameTok Colon definitions Rbrace { ((Some $2), mk_defs $4) }
  | Lbrace definitions Rbrace { (None, mk_defs $2) }
definitions:
    NameTok Equals expr   { [Some $1, $3] }
  | definitions def_separator NameTok Equals expr { (Some $3, $5) :: $1 }
  | expr { [None, $1] }
  | definitions def_separator expr { (None, $3) :: $1 }
  | /* empty */ { [] }
def_separator:
    Comma { () } | Comma Newline { () } | Newline { () }


The token streams, for their part, are produced by the lexer,
bicicleta_lexer.mll, which is written in ocamllex:

{ (* -*- mode: tuareg; compile-command: "./build" -*- *)
(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
    open Bicicleta_parser ;;
    let show_bictoken = function
        | NameTok name -> "NameTok " ^ name
        | StringTok string -> "StringTok " ^ string
        | OpTok op -> "OpTok " ^ op
        | IntTok num -> "IntTok " ^ num
        | FloatTok num -> "FloatTok " ^ num
        | Newline -> "Newline"
        | Lparen -> "(" | Rparen -> ")" | Lbrace -> "{" | Rbrace -> "}"
        | Colon -> ":"  | Period -> "." | Comma -> ","  | Equals -> "="
        | Lsquare -> "[" | Rsquare -> "]"
        | Eof -> "Eof" ;;
    (* this has to be in the standard library somewhere, right? *)
    (* the previous version of this code was arithmetic-heavy and
       bug-prone, but probably a lot faster.  Shorter too if you count
       list_of_string and string_of_list. *)
    let list_of_string string = 
      let rv = ref [] in 
        for i = String.length string - 1 downto 0 do 
          rv := string.[i] :: !rv 
        done ; 
        !rv ;;
    let rec string_of_list = function
        [] -> "" 
      | x :: y -> String.make 1 x ^ string_of_list y ;;
    exception ScrewedUpUnquotableList ;;
    let unquote_list = function x :: tail ->
      let rec unquote_tail accum = function
          [_] -> List.rev accum
        | '\\' :: x :: tail | x :: tail -> unquote_tail (x :: accum) tail
        | [] -> raise ScrewedUpUnquotableList
      in unquote_tail [] tail
      | [] -> raise ScrewedUpUnquotableList ;;
    let unquote s = string_of_list (unquote_list (list_of_string s)) ;;
}

let alus = ['A'-'Z' 'a'-'z' '_']
let alnumus = alus | ['0'-'9']
let ident = alus alnumus*

let quoted_name = "'" ([^ '\'' '\\'] | "\\'" | "\\\\" )* "'"

let string = '"' ([^ '"' '\\'] | "\\\"" | "\\\\")* '"'

(* see my rationals post: any sequence of these chars, except a single '=' *)
let non_eq_op_char = ['~' '`' '!' '@' '$' '%' '^' '&' '*' 
                      '-' '+' '<' '>' '?' '/' '|' '\\'] 
let op_char = non_eq_op_char | '='
let operator = non_eq_op_char | op_char op_char+

let integer = ['0'-'9']+
let float = ['0'-'9']* '.' ['0'-'9']+
(* let special = ['(' ')' '{' '}' ':' '.' '\n' ',' '='] *)
let lwsp = [' ' '\t']

let comment = '#' [^ '\n']* '\n'?
rule next =
   parse 
     lwsp { next lexbuf } (* skip whitespace; trick stolen from manual *)
   | ident as name { NameTok name }
   | quoted_name as name { NameTok (unquote name) }
   | string as str { StringTok (unquote str) }
   | operator as op { OpTok op }
   | integer as num { IntTok num }
   | float as num { FloatTok num }
   | '\n' { Newline } | comment { Newline }
   | '(' { Lparen } | ')' { Rparen } | '{' { Lbrace } | '}' { Rbrace } 
   | ':' { Colon }  | '.' { Period } | ',' { Comma }  | '=' { Equals }
   | '[' { Lsquare } | ']' { Rsquare }
   | eof { Eof }


The actual evaluation (and unit testing) is taken care of by this
file, bicicleta.ml:


(* -*- mode: tuareg; compile-command: "./build" -*- *)
(* translation of metacircular_bicicleta_interpreter into OCaml *)
(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
(* Because the types and the functions are all mutually recursive, I'm
   declaring them all (in bicicleta_syntax.ml), then putting pretty
   much all the code in a single let rec. *)
(* for interactive use:
#load "bicicleta_lexer.cmo" ;;
#load "bicicleta_parser.cmo" ;;
#load "bicicleta.cmo" ;;
open Bicicleta ;;
open Bicicleta_syntax ;;
*)
open Bicicleta_syntax ;;
let rec get gkey = function
    | Phi -> (Error("self-name not found", gkey))
    | Add(key, value, next) -> if gkey = key then value else get gkey next ;;

exception CantCreateCache ;;  (* can't happen *)
let sys_expr = Call(Name "prog", "sys") ;;
let (call_count: (string, int) Hashtbl.t) = Hashtbl.create 100 ;;
let get_call_count name = try Hashtbl.find call_count name 
  with Not_found -> 0 ;;
let rec eval env = function
    | Name name -> get name env
    | Call(object_, method_name) -> 
        Hashtbl.replace call_count method_name
          (get_call_count method_name + 1) ;
        let object_ = eval env object_
        in apply (objectget method_name object_) object_
    | Literal(self, methods) -> bind ProtoObject env self methods
    | Derivation(object_, self, methods) -> let object_ = eval env object_
        in bind object_ env self methods
    | Integer num ->
        wrap_userdata env (UserInteger num) "native_integer"
    | Float num ->
        wrap_userdata env (UserFloat num) "native_float"
    | StringConstant string ->
        wrap_userdata env (UserString string) "native_string"
    | NativeMethod f -> f env
  and wrap_userdata env datum typename = 
    let userdata = UserData datum
    in eval env (Derivation(Call(sys_expr, typename), None,
        Definition("userdata", NativeMethod (fun _ -> userdata), NoDefs)))
  and bind base env self = function
    | NoDefs -> base
    | Definition(name, body, next) -> 
        bind (derive name self body env base) env self next
  and objectget key = function
    | ProtoObject -> (Error("method not found", key))
    | Derive(name, _, _, _, next, _) as m -> 
        if key = name then m else objectget key next
    | Error(_, _) as obj -> obj
    | UserData _ -> (Error("userdata has no methods", key))
  and derive name self body env o = 
    (Derive(name, self, body, env, o, ref None))
  and apply method_ self = match method_ with
    | Derive(name, methodself, body, env, _, _) -> 
        callmethod name (methodself, body, env) self
    | Error(_, _) as err -> err (* OK for now... *)
    | _ -> (Error("trying to apply non-method", ""))
  and callmethod name (selfname, body, env) self =
  match self with
      Derive(_, _, _, _, _, maybecache) -> 
        (match !maybecache with None -> maybecache := Some (Hashtbl.create 7)
          | Some _ -> ());
        (match !maybecache with None -> raise CantCreateCache
          | Some cache ->
              try Hashtbl.find cache name
                with Not_found -> let result = 
                  callmethodfull (selfname, body, env) self
                  in Hashtbl.replace cache name result;
                  result)
    | _ -> callmethodfull (selfname, body, env) self
  and callmethodfull (selfname, body, env) self =
    (match selfname with 
        Some name -> eval (Add(name, self, env)) body 
      | None -> eval env body)
;;

(* for debugging: deparse an expression. Could be considered a
   specification of a subset of the grammar, but does not (yet)
   exercise the following:
   - commas
   - the x{...}.'()' as x(...) syntactic sugar
   - the x.'*'(y) as x * y syntactic sugar
   - the x{arg1=a, arg2=b, arg3=c} as x{a, b, c} syntactic sugar
*)

let is_identifier string =
    let is_id_start_char ch = let code = Char.code ch in
        (code >= Char.code 'A' && code <= Char.code 'Z') ||
        (code >= Char.code 'a' && code <= Char.code 'z') || ch = '_'
    in let is_id_char ch = is_id_start_char ch || let code = Char.code ch in
        (code >= Char.code '0' && code <= Char.code '9')
    in
    let rec test string ii =
        if ii = String.length string then true
        else is_id_char string.[ii] && test string (ii + 1)
    in String.length string > 0 && is_id_start_char string.[0] &&
        test string 1 ;;
let escstr quote string =
  let rec esclist quote = function
      [] -> []
    | '\\' :: cs -> '\\' :: '\\' :: esclist quote cs
    | c :: cs when c = quote -> '\\' :: c :: esclist quote cs
    | c :: cs -> c :: esclist quote cs
  in Bicicleta_lexer.string_of_list 
    (esclist quote (Bicicleta_lexer.list_of_string string));;
let escname name = 
  if is_identifier name then name
  else "'" ^ escstr '\'' name ^ "'" ;;
let rec show_bicexpr_i indent = function
  | Name name -> escname name
  | StringConstant string -> "\"" ^ escstr '"' string ^ "\""
  | Derivation(object_, self, methods) -> 
      show_bicexpr_i indent object_ 
      ^ show_bicexpr_i indent (Literal(self, methods))
  | Literal(Some self, methods) ->
      "{" ^ escname self ^ ": " ^ outer_show_methods indent methods ^ "}"
  | Literal(None, methods) ->
      "{" ^ outer_show_methods indent methods ^ "}"
  | Call(object_, method_name) -> 
      show_bicexpr_i indent object_ ^ "." ^ escname method_name 
  | Integer n -> string_of_int n
  | Float n -> string_of_float n
  | NativeMethod _ -> "(;native method;)"
and outer_show_methods indent = function
  | Definition(_, _, Definition(_, _, _)) as m -> 
      let newindent = indent ^ "  " in
        "\n" ^ newindent ^ show_methods newindent m ^ "\n" ^ indent
  | m -> show_methods indent m
and show_methods indent = function
  | NoDefs -> ""
  | Definition(name, body, NoDefs) -> show_method indent name body
  | Definition(name, body, next) ->
      show_method indent name body ^ "\n" ^ indent ^ show_methods indent next
and show_method indent name body = 
  escname name ^ " = " ^ (show_bicexpr_i indent body) ;;
let show_bicexpr = show_bicexpr_i "" ;;

(* show_bicobj: mostly for error reporting, also used in the REPL.
   Doesn't work well, but that's not surprising if you look at the
   code.  The idea is that the "labels" list is supposed to keep us
   from printing out the same thing more than once. *)
exception ShowBicobjIsBroken;;
let rec _show_bicenv level labels = function 
    Phi -> ""
  | Add(name, obj, rest) ->
      String.make level ' ' ^ name ^ ": " 
      ^ __show_bicobj (level+2) labels obj ^ ";\n" 
      ^ _show_bicenv level labels rest
and show_selfname = function None -> "" | Some selfname -> selfname ^ ": "
and _show_bicobj level labels = function
    UserData (UserInteger x) -> string_of_int x
  | UserData (UserString x) -> "\"" ^ x ^ "\""
  | UserData (UserFloat x) -> string_of_float x
  | Error (a, b) -> "Error(\"" ^ a ^ "\", \"" ^ b ^ "\")"
  | Derive (name, selfname, body, env, next, _) ->
      __show_bicobj level labels next ^ " {\n" 
      ^ String.make level ' ' ^ show_selfname selfname ^ name 
      ^ " = " ^ show_bicexpr body ^ " (in\n" 
      ^ _show_bicenv level labels env ^ ")}" 
  | ProtoObject -> "{}" 
and findq obj = function
  | [] -> None
  | x :: y -> if obj == x then Some (List.length y) else findq obj y
and labelfor labels obj = 
  match findq obj !labels with 
      None -> raise ShowBicobjIsBroken 
    | Some label -> "#" ^ string_of_int label ^ "="
and __show_bicobj level labels obj = 
  match findq obj !labels with
      None -> 
        labels := obj :: !labels; 
        labelfor labels obj ^ _show_bicobj level labels obj
    | Some label -> "#" ^ string_of_int label ^ "#"
;;
let show_bicobj x = _show_bicobj 0 (ref []) x ;;

(* for unit tests and interactive testing *)
let tokenize_ next string = let buf = Lexing.from_string string 
  in let rec get_toks toks = 
    let tok = next buf in
      match tok with 
          Bicicleta_parser.Eof -> List.rev toks
        | _ -> get_toks (tok :: toks)
  in get_toks [] ;;
let tokenize = tokenize_ Bicicleta_lexer.next ;;  (* for unit tests *)

(* This ugly, hairy piece of rot is supposed to simplify the parser by
   keeping the poor widdle parser from having to cope with extra
   newlines in random places.  See, we want to use newlines to
   separate method definitions, but we want to ignore them the rest of
   the time.  So the obvious thing to do is to remove newlines that
   aren't strictly needed to support method definitions in a
   post-processing stage for the lexer.  So in between the lexer (32
   lines of ocamllex at present) and the parser (50 lines of
   ocamlyacc) we interpose this 42-line monstrosity.  It seemed like a
   good idea when Meredith Patterson suggested it, but she didn't know
   I was using an ocamllex-generated parser and would therefore have
   to do this in an intermediate step... *)

exception EmptyWindowCantHappen;;
let drop_unnecessary_newlines lexer_next =
  let (window : Bicicleta_parser.token list ref) = ref [] 
  and last_token = ref None 
  in let next lexbuf =
    let rec really_read_token tok = 
      window := !window @ [tok] ; last_token := Some tok; main_switch ()
    and read_token () = 
      let tok = lexer_next lexbuf
      in match !last_token with 
          None -> really_read_token tok
        | Some Bicicleta_parser.Newline -> (match tok with
              Bicicleta_parser.Newline -> read_token ()
            | _ -> really_read_token tok)
        | Some _ -> really_read_token tok
    and output_token () = match !window with
        [] -> raise EmptyWindowCantHappen
      | x :: y -> window := y; x
    and discard_newline () = match !window with 
        [] -> raise EmptyWindowCantHappen
      | x :: y -> window := y ; main_switch ()
    and main_switch () = match !window with
        [] -> read_token ()
      | [Bicicleta_parser.Newline] -> read_token ()
      | [_] -> output_token ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _] ->
          read_token ()
      | [Bicicleta_parser.Newline; _] -> discard_newline ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         Bicicleta_parser.Equals] -> output_token ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         Bicicleta_parser.Newline] -> read_token ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         _] -> discard_newline ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         Bicicleta_parser.Newline; Bicicleta_parser.Equals] ->
          output_token ()
      | [Bicicleta_parser.Newline; Bicicleta_parser.NameTok _;
         Bicicleta_parser.Newline; _] -> discard_newline ()
      | _ :: _ -> output_token ()
    in main_switch () 
  in next;;

let parse string = Bicicleta_parser.main 
  (drop_unnecessary_newlines Bicicleta_lexer.next)
  (Lexing.from_string string) ;;
let unquote = Bicicleta_lexer.unquote ;;

(* free variable computation, for error detection. *)
module StringSet = Set.Make(String) ;;
let rec stringset = function
    [] -> StringSet.empty
  | s :: rest -> StringSet.add s (stringset rest) ;;

let rec freevars = function
    Name n -> stringset [n]
  | Integer _ | StringConstant _ | Float _ -> stringset ["prog"]
  | NativeMethod _ -> stringset []
  | Literal (Some selfname, methods) -> 
        StringSet.diff (freevars_methods methods) (stringset [selfname])
  | Literal (None, methods) -> freevars_methods methods
  | Derivation(object_, self, methods) ->
        StringSet.union (freevars object_) (freevars (Literal(self, methods)))
  | Call(object_, _) -> freevars object_
 and freevars_methods = function
    NoDefs -> stringset []
  | Definition (name, body, rest) ->
        StringSet.union (freevars body) (freevars_methods rest) ;;
let freevars_list expr = StringSet.elements (freevars expr) ;;

(* What method names are implemented in some piece of source? *)
let rec implemented = function
    Name _ | NativeMethod _ -> stringset []
  | Integer _ | StringConstant _ | Float _ -> stringset ["userdata"]
  | Derivation(object_, _, methods) ->
      StringSet.union (implemented object_) (implemented_methods methods)
  | Literal(_, methods) -> implemented_methods methods
  | Call(object_, _) -> implemented object_
and implemented_methods = function
    NoDefs -> stringset []
  | Definition(name, body, rest) -> 
      StringSet.union (stringset [name])
        (StringSet.union (implemented body) (implemented_methods rest))
;;
let implemented_list expr = StringSet.elements (implemented expr) ;;

(* What method names are called in some piece of source? *)
let rec called = function
    Name _ | NativeMethod _ -> stringset []
  | Integer _ -> stringset ["sys"; "native_integer"]
  | StringConstant _ -> stringset ["sys"; "native_string"]
  | Float _ -> stringset ["sys"; "native_float"]
  | Derivation(object_, _, methods) ->
      StringSet.union (called object_) (called_methods methods)
  | Literal(_, methods) -> called_methods methods
  | Call(object_, method_name) ->
      StringSet.union (called object_) (stringset [method_name])
and called_methods = function
    NoDefs -> stringset []
  | Definition(_, body, rest) ->
      StringSet.union (called body) (called_methods rest)
;;
let called_list expr = StringSet.elements (called expr) ;;

(* Looking at the output from this routine has caught a number of bugs. *)
let lint expr =
  let callees = called expr and implementees = implemented expr
  in ["only called", 
         StringSet.elements (StringSet.diff callees implementees);
     "only implemented", 
         StringSet.elements (StringSet.diff implementees callees);
     "free variables", freevars_list expr] ;;

let unit_tests () =
    assert (is_identifier "foo") ;
    assert (is_identifier "x") ;
    assert (is_identifier "x3") ;
    assert (not (is_identifier "3")) ;
    assert (is_identifier "x_y") ;
    assert (is_identifier "_x") ;
    assert (not (is_identifier "()")) ;
    assert (not (is_identifier "(3")) ;
    assert (not (is_identifier "")) ;

    assert ((escname "hips") = "hips") ;
    assert ((escname "()") = "'()'") ;
    assert ((escname "don't") = "'don\\'t'") ;
    assert ((escname "\\") = "'\\\\'") ;

    (* environment lookup *)
    let ustr x = UserData (UserString x)
    in 
    assert (get "foo" (Add("foo", ustr "bar", Phi)) = ustr "bar") ;
    assert ((get "foo" Phi) = Error("self-name not found", "foo")) ;
    assert ((get "foo" (Add("bar", ustr "baz", Phi))) = 
        Error("self-name not found", "foo")) ;

    (* evaluation tests without parsing *)
    let string_value_is expr str = 
      (eval Phi (Call(expr, "userdata"))) = ustr str
    in
    assert (string_value_is (StringConstant "foo") "foo") ;
    assert (string_value_is (Call(Literal(Some "self", 
               Definition("foo", StringConstant("quux"), NoDefs)), "foo"))
           "quux") ;
    assert ((eval Phi (Call(Literal(Some "self", 
           Definition("foo", StringConstant("quux"), NoDefs)), "baz"))) 
           = Error("method not found", "baz")) ;
    (* note that this expression is very similar to the definition of
       prog.sys.bool in bicicleta_lib.ml.  That's because it's an earlier
       version. *)
    let booleans = Literal(Some "booleans", 
        Definition("true", Literal(Some "boolean", 
            Definition("if_true", Literal(Some "self", 
                Definition("()", Call(Name "self", "then"),
                Definition("then", StringConstant("no consequent given"),
                Definition("else", StringConstant("no alternate given"),
                NoDefs)))),
            Definition("negated", Call(Name "booleans", "false"),
            Definition("if_false", Call(Call(Name "boolean", "negated"), 
                                                              "if_true"),
            NoDefs)))),
        Definition("false", Derivation(Call(Name "booleans", "true"), 
            Some "boolean",
            Definition("if_true", Derivation(Call(Call(Name "booleans", "true"),
                "if_true"), Some "self",
                Definition("()", Call(Name "self", "else"), 
                NoDefs)),
            Definition("negated", Call(Name "booleans", "true"),
            NoDefs))),
        NoDefs)))
    in

    (* print_endline (show_bicexpr booleans) ; *)
    assert (string_value_is(Call(Call(Call(booleans, "true"), 
                                     "if_true"), "()"))
                           "no consequent given") ;
    assert (string_value_is(Call(Derivation(Call(Call(booleans, "true"), 
                                                "if_true"), None,
             Definition("then", StringConstant("is true"),
             Definition("else", StringConstant("is false"),
             NoDefs))),
           "()")) 
           "is true") ;
    assert (string_value_is (Call(Derivation(Call(Call(booleans, "false"),
                                                "if_true"), None,
             Definition("then", StringConstant("is true"),
             Definition("else", StringConstant("is false"),
             NoDefs))),
           "()")) 
           "is false") ;

    (* unquoting *)
    assert((unquote "'foo'") = "foo");
    assert((unquote "\"foo\"") = "foo");
    assert((unquote "'hasn\\'t'") = "hasn't");
    assert((unquote "\"\\\"no\\\" \"") = "\"no\" ");
    assert((unquote "\"\\\"no\\\"\"") = "\"no\"");

    (* tokenizing *)
    (* wish I could say 
       'from Bicicleta_parser import Lbrace, Rbrace, NameTok, Colon, Equals',
       but I really don't want to open Bicicleta_parser;; here. *)

    let lbrace = Bicicleta_parser.Lbrace and rbrace = Bicicleta_parser.Rbrace
        and nametok n = Bicicleta_parser.NameTok n 
        and colon = Bicicleta_parser.Colon and equals = Bicicleta_parser.Equals
        and optok o = Bicicleta_parser.OpTok o 
        and period = Bicicleta_parser.Period 
        and inttok n = Bicicleta_parser.IntTok n 
        and lparen = Bicicleta_parser.Lparen
        and rparen = Bicicleta_parser.Rparen and comma = Bicicleta_parser.Comma
        and newline = Bicicleta_parser.Newline
        and stringtok x = Bicicleta_parser.StringTok x
    in
    assert((tokenize "a") = [nametok "a"]);
    assert((tokenize "a.b") = [nametok "a"; period; nametok "b"]);
    assert((tokenize "aa+bb") = [nametok "aa"; optok "+"; nametok "bb"]);
    assert((tokenize "{xx: }") = [lbrace; nametok "xx"; colon; rbrace]);
    assert((tokenize "'()' * (2**32)") = [nametok "()"; optok "*"; lparen;
                                          inttok "2"; optok "**"; inttok "32";
                                          rparen]);
    assert((tokenize "\"foo\".length") = [stringtok "foo"; period; 
                                          nametok "length"]);
    assert((tokenize "(){}:.,") = [lparen; rparen; lbrace; rbrace; colon; 
                                   period; comma]);
    assert((tokenize "~`!@$%^&*-+=<>?/|\\") = [optok "~`!@$%^&*-+=<>?/|\\"]);
    assert((tokenize "{x:y=x}") = [lbrace; nametok "x"; colon; nametok "y";
                                   equals; nametok "x"; rbrace]);
    assert((tokenize "foo { foo : \n x = foo. \tbletch\ny=foo.x\n}\n")
           = [nametok "foo"; lbrace; nametok "foo"; colon; newline; 
              nametok "x"; equals; nametok "foo"; period; nametok "bletch";
              newline;
              nametok "y"; equals; nametok "foo"; period; nametok "x";
              newline; rbrace; newline]);
    assert((tokenize "x[3]") = [nametok "x"; Bicicleta_parser.Lsquare; 
                                inttok "3"; Bicicleta_parser.Rsquare]);

    (* parsing *)
    assert((parse "x") = Name "x");
    assert((parse "'()'") = Name "()");
    assert((parse "foo.bar.baz") = Call(Call(Name "foo", "bar"), "baz"));
    assert((parse "\"foo\".length") = Call(StringConstant "foo", "length"));
    assert((parse "{x:y=x}") 
          = Literal(Some "x", Definition("y", Name "x", NoDefs)));
    assert((parse "{x:}") = Literal(Some "x", NoDefs));
    assert((parse "{xx:y=xx,z=xx}") 
          = Literal(Some "xx", Definition("y", Name "xx",
                  Definition("z", Name "xx", NoDefs))));
    assert((parse "{x: x = \"x\", y = \"y\", z = \"z\"}") = Literal(Some "x",
                  Definition("x", StringConstant("x"),
                  Definition("y", StringConstant("y"),
                  Definition("z", StringConstant("z"), NoDefs)))));
    assert((parse "point {p: x = \"x\", y = \"y\"}") = Derivation(Name "point",
        Some "p", Definition("x", StringConstant("x"),
                  Definition("y", StringConstant("y"), NoDefs))));
    assert((parse "(a.x)") = Call(Name "a", "x"));             
    assert((parse "(((a).x))") = Call(Name "a", "x"));
    assert((parse "{x: y=x,\n  z=x}") = Literal(Some "x", 
        Definition("y", Name "x",
        Definition("z", Name "x", NoDefs))));
    assert((parse "{x: y=x,\n\n  z=x}") = Literal(Some "x", 
        Definition("y", Name "x",
        Definition("z", Name "x", NoDefs))));
    assert((parse "x\n") = Name "x");
    (* support end-line comments at EOF *)
    assert((parse "x # a variable!") = (parse "x"));
    (* separate methods with newlines: *)
    assert((parse "{x:y=x\nz=x}") = Literal(Some "x",        
        Definition("y", Name "x",
        Definition("z", Name "x", NoDefs))));
    (* allow newlines around period *)
    assert((parse "x\n.\ny \n . \n z") = Call(Call(Name "x", "y"), "z"));
    (* allow newlines in parens *)
    assert((parse "(\nx\n)") = Name "x");
    (* allow leading newlines *)
    assert((parse "\nx") = Name "x");
    (* allow newlines between prototype and overrides *)
    assert((parse "x{y: z = w}") = (parse "x\n{y: z = w}"));
    (* allow newlines around self-name *)
    assert((parse "x {y: z = w}") = (parse "x {\ny\n:z=w}"));
    (* allow newlines around equals sign *)
    assert((parse "x{y:z=w}") = (parse "x{y:z\n=\nw}"));
    assert((parse "x{y:z=w,v=u}") = (parse "x{y:z\n=\nw\nv\n=\nu}"));
    (* treat end-line comments as newlines *)
    assert((parse "{a=b # b!\n c=d}") = parse("{a=b, c=d}"));

    (* various kinds of syntactic sugar: *)
    (* omitting self-names *)
    assert((parse "x { z = w }") 
           = Derivation(Name "x", None, Definition("z", Name "w", NoDefs)));
    assert((parse "{x = 1, y = 2}") = Literal(None,
        Definition("x", Integer 1, Definition("y", Integer 2, NoDefs))));
    (* parenthesized arguments *)
    assert((parse "x(verbose = 1)") = (parse "x { verbose = 1 }.'()'"));
    assert((parse "{env: fac = {fac: x = 3, 
            '()' = fac.x.'<'(arg1=2).if_true(then=1, 
                else=fac.x.'*'(arg1=env.fac(x=fac.x.'-'(arg1=1))))}}.fac(x = 4)
        ") = (parse "{env: fac = {fac: x = 3,
            '()' = fac.x.'<'{arg1=2}.'()'.if_true{then=1, 
                 else=fac.x.'*'{arg1=env.fac{
                     x=fac.x.'-'{arg1=1}.'()'}.'()'}.'()'}.'()'}
            }.fac{x = 4}.'()'")) ;

    (* positional arguments *)
    assert((parse "x(3)") = (parse "x(arg1 = 3)"));
    assert((parse "x{3}") = (parse "x{arg1 = 3}"));
    assert((parse "x{y: 3}") = (parse "x{y: arg1 = 3}"));
    assert((parse "x(37, quiet=true)") = (parse "x(arg1 = 37, quiet = true)"));
    assert((parse "x(37, 38)") = (parse "x(arg1 = 37, arg2 = 38)"));

    (* infix *)
    assert((parse "3 + 4") = (parse "3.'+'(4)"));
    assert((parse "1 + 2 + 3") = (parse "1.'+'(2).'+'(3)"));

    (* indexing notation *)
    assert((parse "x[3]") = (parse "x.'[]'(3)"));

    (* combination of infix with other things *)
    assert((parse "1 + x(3)") = (parse "1 + (x(3))"));
    assert((parse "1 + x[3]") = (parse "1 + (x[3])"));
    assert((parse "1 + x{3}") = (parse "1 + (x{3})"));

    (* just a check to see if our parser doesn't crash. *)
    ignore(parse "prog.sys.normal_commutative_number {self:
        clientdata = 5.clientdata
        show = self.clientdata.show
        as_integer = self
        as_float = self.clientdata.as_float
        coerce = {op: arg1 = 2, '()' = op.arg1.as_integer}
        add = {op: arg1 = 3, '()' = self.clientdata.add(op.arg1.clientdata)}
        negated = self.clientdata.negated
        multiply = self.add {op:
            '()' = self.clientdata.multiply(op.arg1.clientdata)
        }
        modulo = {op: arg1 = 2
            '()' = self.clientdata.divmod(op.arg1.clientdata).mod
        }
        # As Jamie McCarthy points out, rationals made out of machine
        # integers are really pretty limited... but for now I'm using them
        # anyway.
        divide = {op: arg1 = 2, '()' = prog.sys.rational.new(self, op.arg1)}
        reciprocal = 1 / self
        gcd = self.binop {op: arg1 = 5
            '()' = (arg1 == 0).if_true(then = self, 
                else = op.arg1.gcd(self % op.arg1))
        }
        less_than = self.add {op:
            '()' = self.clientdata.less_than(op.arg1.clientdata)
        }
    }");

    (* ensure newlines don't separate positional parameters *)
    assert((parse "x{{}\n{}}") = (parse "x{{}{}}"));

    (* more evaluation tests now that we know parsing works *)
    assert((eval Phi (parse "3.userdata")) = UserData (UserInteger 3));
    assert((eval Phi (parse "3.5.userdata")) = UserData (UserFloat 3.5));
    assert(string_value_is (parse "{}{\n\tx = \"foo\"\n\ty=\"bar\"}.x") "foo");
    assert(string_value_is (parse "{\n\tx = \"foo\"\n\ty=\"bar\"}.y") "bar");

    (* evaluation with native method *)
    let native_add = NativeMethod (fun env -> 
      let x = eval env (Call(Name "method", "arg1"))
      and y = eval env (Call(Name "method", "arg2"))
      in match (x, y) with
          (UserData (UserInteger xs), UserData (UserInteger ys)) -> 
            UserData (UserInteger (xs + ys))
              (* saving the invalid args is a pain, so we don't bother *)
        | _ -> Error ("invalid addition args", "??"))
    in let intrinsics = derive "integer_add" (Some "method") native_add Phi
                              ProtoObject
    in let addition = parse "intrinsics { 3.userdata, 4.userdata }.integer_add"
    in let seven = eval (Add("intrinsics", intrinsics, Phi)) addition
    in 
    assert(seven = UserData (UserInteger 7));
      (* That the above works is a little strange.  If we just eval
         "3" in an environment without "prog", we get an evaluation
         error saying it needs "prog"; and likewise if we eval
         "3.beanstalk"; but if we eval "3.userdata", we get the right
         thing.  That's because the "3" is actually inheriting from an
         error object --- but that works!  As long as you don't try to
         call any methods you were hoping the error object would be
         defining, but only methods you put there. *)

    (* showing expressions *)
    assert((parse (show_bicexpr booleans)) = booleans);
    assert((show_bicexpr (parse "{x = 1}")) = "{x = 1}");
    assert((show_bicexpr (parse "{'doesn\\'t' = 1}")) = "{'doesn\\'t' = 1}");
    assert((show_bicexpr (parse "{foo = \"bar\"}")) = "{foo = \"bar\"}");
    assert((show_bicexpr (parse "{foo = \"ba\\\"r\"}")) = 
        "{foo = \"ba\\\"r\"}");
    assert((show_bicexpr (parse "3.5")) = "3.5");

    (* computing free variables *)
    let freevars_are expr vars = 
        StringSet.equal (freevars(parse(expr))) (stringset vars)
    in
    assert(freevars_are "x" ["x"]) ;
    assert(freevars_are "1" ["prog"]) ;
    assert(freevars_are "1.2" ["prog"]) ;
    assert(freevars_are "\"foo\"" ["prog"]) ;
    assert(freevars_are "{x: y=z, w=x, t=uv}" ["z"; "uv"]) ;
    assert(freevars_are "x{y: z=w, zz=x, zzz=y}" ["w"; "x"]) ;
    assert(freevars_are "x.y" ["x"]) ;
    (* this code contains no free variables, ... *)
    assert(freevars_are "{env: fac = {fac: x = 3,
    '()' = fac.x.'<'{lt: arg1=2}.'()'.if_true{i: then=1, 
         else=fac.x.'*'{mu: arg1=env.fac{f:
             x=fac.x.'-'{m: arg1=1}.'()'}.'()'}.'()'}.'()'}
    }.fac{f: x = 4}.'()'" ["prog"]) ;
    (* but this buggy version did: *)
    assert(freevars_are "{env: fac = {fac: x = 3,
    '()' = x.'<'{lt: arg1=2}.'()'.if_true{i: then=1, 
         else=fac.x.'*'{mu: arg1=env.fac{f:
             x=fac.x.'-'{m: arg1=1}.'()'}.'()'}.'()'}.'()'}
    }.fac{f: x = 4}.'()'" ["x"; "prog"]) ;

    assert(freevars_are "{}" []);
    assert(freevars_are "{x=y}" ["y"]);
    assert(freevars_are "x {y = z}" ["x"; "z"]);

    (* computing lists of method names for linting *)
    let implemented_are expr vars =
      StringSet.equal (implemented(parse(expr))) (stringset vars)
    in
    assert(implemented_are "x" []);
    assert(implemented_are "3" ["userdata"]);
    assert(implemented_are "3.5" ["userdata"]);
    assert(implemented_are "\"foo\"" ["userdata"]);
    (* not really a good way to tell here... *)
    assert(StringSet.equal (implemented (NativeMethod (fun _ -> ProtoObject)))
            (stringset []));
    assert(implemented_are "{}" []);
    assert(implemented_are "{a=b}" ["a"]);
    assert(implemented_are "{a=b, c=d}" ["a"; "c"]);
    assert(implemented_are "{a=b}{c=d}" ["a"; "c"]);
    assert(implemented_are "{a=b}.c.d" ["a"]);
    assert(implemented_are "{a={b={c=d}.e}.f}" ["a"; "b"; "c"]);

    let called_are expr vars =
      StringSet.equal (called(parse(expr))) (stringset vars)
    in
    assert(called_are "x" []);
    assert(called_are "3" ["sys"; "native_integer"]);
    assert(called_are "3.5" ["sys"; "native_float"]);
    assert(called_are "\"foo\"" ["sys"; "native_string"]);
    (* there is no way to tell here: *)
    assert(StringSet.equal (called (NativeMethod (fun _ -> ProtoObject)))
            (stringset []));
    assert(called_are "b.c" ["c"]);
    assert(called_are "b.c.d.e.f" ["c"; "d"; "e"; "f"]);
    assert(called_are "{}" []);
    assert(called_are "{a=b.c.d}" ["c"; "d"]);
    assert(called_are "{a=b.c.d, e=f.g.h}" ["c"; "d"; "g"; "h"]);
    assert(called_are "{a=b.c.d, e=f.g.h}" ["c"; "d"; "g"; "h"]);
    assert(called_are "{a=b.c.d, e=f.g.h}{i=j.k}.mnop"
            ["c"; "d"; "g"; "h"; "k"; "mnop"]);
    assert(called_are "{a={b={c=d}.e}.f}" ["e"; "f"]);
;;
unit_tests();;



This file, bicicleta_lib.ml, contains some native functions and the
beginnings of a standard library.

(* -*- mode: tuareg; compile-command: "./build" -*- *)
(* The beginnings of a standard library, with some basic native methods. *)
(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
(* for interactive use:
#load "bicicleta_lexer.cmo" ;;
#load "bicicleta_parser.cmo" ;;
#load "bicicleta.cmo" ;;
#load "bicicleta_lib.cmo" ;;
open Bicicleta ;;
open Bicicleta_syntax ;;
open Bicicleta_lib ;;
*)
open Bicicleta_syntax ;;
open Bicicleta ;;

let prog_sys_source = "{prog: sys = {
object = {self:
        # One day, things will inherit from this implicitly if not 
        # otherwise specified.
        # These two methods are intended to be defined differently on errors.
        '!!' = {op: '()' = self}
        is_ok = prog.sys.bool.true
        # But not this one.
        '->' = {op: arg1 = self, key = self, value = op.arg1}
    }
bool =
    {bool: 
        true = prog.sys.object {self: 
            if_true = {if:
                '()' = if.then
                then = prog.error(\"no consequent given\")
                else = prog.error(\"no alternate given\")
            }
            not = bool.false
            if_false = self.not.if_true
            '||' = {or: '()' = self}
            '&&' = {and: arg1 = self, '()' = and.arg1}
            show = \"true\"
        }
        false = bool.true {self:
            if_true = bool.true.if_true{if: '()' = if.else}
            not = bool.true
            '||' = {or: arg1 = self, '()' = or.arg1}
            '&&' = {and: '()' = self}
            show = \"false\"
        }
    }

# prog.sys.normal_commutative_number:

# This is a base class for number-like things that support the usual
# commutative numerical operations, such as native integers, integers
# in Z_n, floating-point numbers, rational numbers, vectors,
# polynomials, rational expressions, continued fractions, complex
# numbers, sampled audio signals, dates, time intervals, images,
# multidimensional arrays, and so on.

# Here's a table of the levels of support you can provide to the
# methods herein.

# If you implement:     You get proper support for:
# add                   +
# add, negated          +, negated, -
# multiply              *
# multiply, reciprocal  *, reciprocal, /
# power                 **
# modulo                %
# less_than             <, ==, >, <=, >=, !=, between
# less_than, coerce(0)  <, ==, >, <=, >=, !=, between, abs, negative, positive

# You can also override 'subtract', 'divide', 'equals', and
# 'greaterthan' to get the expected effects; there are default
# implementations built from the above six methods.  If you do that,
# you may want to override 'reverse -' and 'reverse /' as well.

# If you override the 'coerce' method to convert objects to some type
# your arithmetic methods know how to support, it will convert them to
# the right type ahead of time.

# The 'coerce' method need not return objects supporting the same
# protocol as your own, although they need to support 'negated' if you
# want to subtract them, and they need to support 'reciprocal' if you
# want to divide by them.  So the 'coerce' method for a date object
# might coerce things to time intervals and then support + and - with
# them.  By the same token, your arithmetic methods might return
# objects of some other type --- but remember that 'negated' might get
# fed to 'add', and 'reciprocal' might get fed to 'multiply'.

# In order to support arithmetic on either side of the operator, the
# operator methods, such as '+', delegate to the 'reverse ...' method
# on the other object if the 'add' method fails --- and they provide a
# 'reverse +' method that does the right thing, in case the other
# object wants to delegate to it.  However, the 'reverse ...' methods
# depend on the operations being commutative; this is in exchange for
# the relatively loose requirements on 'coerce' described above.

# No implementations are provided for 'reverse **' and 'reverse %'.

normal_commutative_number = 
    prog.sys.object {self:
        coerce = {op: arg1 = 1, '()' = op.arg1}
        binop = {op: arg1 = 1, other = self.coerce(op.arg1)}

        normal_binop = self.binop {op:
            op = self.add
            revop = op.arg1.'reverse +'
            result = op.op(op.other)
            commuted = op {op: '()' = op.result}
            '()' = op.result !! op.revop(self)
        }
        '+' = self.normal_binop
        'reverse +' = self.'+'.commuted
        subtract = {op: arg1=2, '()' = self.add(op.arg1.negated)}
        '-' = self.normal_binop {op: op = self.subtract, 
            revop = op.arg1.'reverse -'}
        'reverse -' = self.binop {op: '()' = self.negated.add(op.other)}

        '*' = self.normal_binop {op: op = self.multiply, 
            revop = op.arg1.'reverse *' }
        'reverse *' = self.'*'.commuted
        divide = {op: arg1=2, '()' = self.multiply(op.arg1.reciprocal)}
        '/' = self.normal_binop {op: op = self.divide, 
            revop = op.arg1.'reverse /'}
        'reverse /' = self.binop {op: '()' = self.reciprocal.multiply(op.other)}

        '%' = self.normal_binop {op: op = self.modulo, 
            revop = op.arg1.'reverse %' }
        '**' = self.normal_binop {op: op = self.power, 
            revop = op.arg1.'reverse **'}

        '<' = self.normal_binop {op: op = self.less_than, 
            revop = op.arg1.'reverse <'}
        greaterthan = {op: '()' = op.arg1 < self}
        '>' = self.normal_binop {op: op = self.greaterthan, 
            revop = op.arg1.'reverse >'}
        'reverse <' = self.'>'.commuted
        'reverse >' = self.'<'.commuted
        # BEWARE! As the Common Lisp HyperSpec says about EQUALP:
        # Object equality is not a concept for which there is a uniquely
        # determined correct algorithm. The appropriateness of an equality
        # predicate can be judged only in the context of the needs of some
        # particular program. Although these functions take any type of
        # argument and their names sound very generic, equal and equalp
        # are not appropriate for every application.
        equals = {op: arg1 = 2
            '()' = (self < op.arg1).not && (self > op.arg1).not}
        '==' = self.normal_binop {op: op = self.equals, 
            revop = op.arg1.'reverse =='}
        'reverse ==' = self.'=='.commuted
        inverse_comparator = self.normal_binop {op:
            baseop = self.less_than
            revop = op.arg1.'reverse >='
            op = {ge: '()' = op.baseop(ge.arg1).not}
        }
        '>=' = self.inverse_comparator
        '<=' = self.inverse_comparator {op: baseop = self.greaterthan, 
            revop = op.arg1.'reverse <='}
        'reverse >=' = self.'<='.commuted
        'reverse <=' = self.'>='.commuted
        '!=' = self.inverse_comparator {op: baseop = self.'==',
            revop = op.arg1.'reverse !='}
        'reverse !=' = self.'!='.commuted
        between = {op: arg1 = 2.negated, arg2 = 2
            '()' = (self >= op.arg1) && (self < op.arg2)}
        negative = self < 0
        positive = self.negative.not
        abs = self.negative.if_true(then=self.negated, else=self)
    }

# XXX Doesn't work because prog.if doesn't exist yet because we don't 
# have variadic functions because we don't have lists.
rational = 
    prog.sys.normal_commutative_number {self:
        numer = 1
        denom = 2
        new = {op: arg1 = 6, arg2 = 9,
            g = op.arg1.gcd(op.arg2)
            numer = op.arg1 / op.g
            denom = op.arg2 / op.g
            denom_is_1 = op.denom.is_ok && (op.denom == 1)
            ok_test = (op.numer * op.denom)
            # XXX note that prog.if doesn't exist yet!
            '()' = prog.if(
                op.denom_is_1 -> op.numer,
                op.ok_test.is_ok -> self { numer = op.numer, denom = op.denom },
                else = op.ok_test)
        }
        show = \"{numer} <hr> {denom}\" % self
        coerce = {op: arg1 = 2
            '()' = prog.if(
                op.arg1.denom.is_ok -> op.arg1,
                op.arg1.as_integer.is_ok -> self.new(op.arg1.as_integer, 1),
                else = prog.error(\"could not coerce {arg1} to rational\" % op)
            )
        }
        add = {op: arg1 = self.new(2, 3),
            '()' = self.new(
                (self.numer * op.arg1.denom) + (self.denom * op.arg1.numer),
                self.denom * op.arg1.denom
            )
        }
        negated = self.new(self.numer.negated, self.denom)
        multiply = self.add {op:
            '()' = self.new(self.numer * op.arg1.numer, 
                            self.denom * op.arg1.denom)
        }
        reciprocal = self.new(self.denom, self.numer)
        less_than = {op: arg1 = self.new(2, 4),
            '()' = (self.numer * op.arg1.denom) < (self.denom * op.arg1.numer)
        }
        as_float = self.numer.as_float / self.denom.as_float
        as_integer = prog.if(self.denom == 1, 
            then=self.numer,
            else=prog.error(\"{frac} is not an integer\" % {frac=self}))
    }

# prog.sys.native_integer:

# The idea is that the interpreter will derive things from this object
# by overriding their \"userdata\" to point to an opaque integer
# object that can be passed to the native methods that do show,
# as_float, add, negated, multiply, divmod, and less_than.

# I'm still undecided about what those operations should return ---
# just the userdata result (making this code a little uglier) or the
# native_integer object (in which case the primitive arithmetic
# operations as well as the parser have to know about the
# prog.sys.native_integer name)?  For now I'm going with the former.

native_integer =
    prog.sys.normal_commutative_number {self:
        userdata = 5.userdata # as an example
        new = {new: arg1 = 6.userdata
            '()' = prog.sys.native_integer { userdata = new.arg1 }}
        intrinsics = intrinsics{self.userdata}
        show = prog.sys.native_string.new(self.intrinsics.integer_show)
        as_integer = self
        as_float = prog.sys.native_float.new(self.intrinsics.integer_as_float)
        coerce = {op: arg1 = 2, '()' = op.arg1.as_integer}
        intrinsic_op = {op: arg1 = 3,
            intrinsics = self.intrinsics {
                arg2 = op.arg1.userdata
                # these two are for less_than:
                true = prog.sys.bool.true
                false = prog.sys.bool.false
            }
            userdata = op.intrinsics.integer_add
            '()' = self.new(op.userdata)   # by default
        }
        add = self.intrinsic_op
        negated = self.new(self.intrinsics.integer_negated)
        multiply = self.intrinsic_op {op:
            userdata = op.intrinsics.integer_multiply
        }
        modulo = self.intrinsic_op {op:
            userdata = op.intrinsics.integer_divmod.mod
        }
        # As Jamie McCarthy points out, rationals made out of machine
        # integers are really pretty limited... but for now I'm using them
        # anyway.
        divide = {op: arg1 = 2, '()' = prog.sys.rational.new(self, op.arg1)}
        reciprocal = 1 / self
        gcd = self.binop {op: arg1 = 5
            '()' = (op.arg1 == 0).if_true(then = self, 
                else = op.arg1.gcd(self % op.arg1))
        }
        less_than = self.intrinsic_op {op:
            '()' = op.intrinsics.integer_less_than
        }

        # Here are some faster versions of ops with less indirection, 
        # but which fail badly in mixed-mode arithmetic.
        #'+' = {op: '()' = self.new(
        #    intrinsics{self.userdata, op.arg1.userdata}.integer_add)}
        #'-' = {op: '()' = self.new(
        #    intrinsics{self.userdata, op.arg1.userdata}.integer_subtract)}
        #'<' = {op:
        #    '()' = intrinsics{
        #        self.userdata, op.arg1.userdata,
        #        true=self.true,
        #        false=self.false
        #    }.integer_less_than
        #}
    }

# The float userdata has divide instead of modulo; otherwise it's
# quite similar to the integer.  My original design let it inherit
# pretty much all the methods, because the primitives were properties
# of the userdata, so they were named things like 'show' instead of
# 'integer_show', but I decided sticking all the operations in an
# 'intrinsics' namespace was simpler.

native_float = prog.sys.native_integer {self:
        userdata = 3.2.userdata
        new = {new: arg1 = 3.5.userdata
            '()' = prog.sys.native_float { userdata = new.arg1 }}
        show = prog.sys.native_string.new(self.intrinsics.float_show)
        as_float = self
        as_integer = prog.error(\"Can't coerce floats to integers\")
        modulo = prog.error(\"Can't take modulo of floats yet\")
        coerce = {op: arg1 = 2, '()' = op.arg1.as_float}
        add = self.intrinsic_op {op:
            userdata = op.intrinsics.float_add
        }
        negated = self.new(self.intrinsics.float_negated)
        multiply = self.intrinsic_op {op:
            userdata = op.intrinsics.float_multiply
        }
        divide = self.intrinsic_op {op:
            userdata = op.intrinsics.float_divide
        }
        less_than = self.intrinsic_op {op:
            '()' = op.intrinsics.float_less_than
        }
    }

native_string = prog.sys.object {self:
        userdata = \"bethmolnar\".userdata
        new = {new: arg1 = \"matthew\".userdata
            '()' = prog.sys.native_string { userdata = new.arg1 }}
        show = self
    }

}}" ;;

let getarg arg env = eval env (Call(Name "method", arg)) ;;
let arg1 = getarg "arg1" ;;
let arg2 = getarg "arg2" ;;
let primitive_binary myfun = 
  NativeMethod(fun env -> myfun (arg1 env, arg2 env) env) ;;
let integer_binary op = primitive_binary(fun args env ->
  match args with
      (UserData (UserInteger xi), UserData (UserInteger yi)) ->
        op xi yi env
    | (x, y) -> Error ("invalid binary operation args", 
                      show_bicobj x ^ ", " ^ show_bicobj y)) ;;
let integer_unary op = primitive_binary(fun args env ->
  match args with (UserData (UserInteger xi), _) -> op xi env
    | (x, _) -> Error ("invalid unary operation arg", show_bicobj x)) ;;
let intret op a b env = UserData (UserInteger (op a b)) ;;
let true_expr = parse "method.true" ;;
let false_expr = parse "method.false" ;;
let boolret op a b env = eval env (if op a b then true_expr else false_expr) ;;
let defintrinsic name contents rest = 
  derive name (Some "method") contents Phi rest ;;

let integer_intrinsics = 
  (defintrinsic "integer_multiply"  (integer_binary (intret ( * )))
  (defintrinsic "integer_add"       (integer_binary (intret (+)))
  (* integer_subtract: not currently used *)
  (defintrinsic "integer_subtract"  (integer_binary (intret (-)))
  (defintrinsic "integer_less_than" (integer_binary (boolret (<)))
  (defintrinsic "integer_negated"   (integer_unary (fun x env ->
    UserData (UserInteger ~-x)))
  (defintrinsic "integer_show"      (integer_unary (fun x env ->
    UserData (UserString (string_of_int x))))
        (* XXX also need divmod *)
  ProtoObject)))))) ;;

let float_binary op = primitive_binary(fun args env ->
  match args with
      (UserData (UserFloat xi), UserData (UserFloat yi)) ->
        op xi yi env
    | (x, y) -> Error ("invalid binary operation args", 
                      show_bicobj x ^ ", " ^ show_bicobj y)) ;;
let float_unary op = primitive_binary(fun args env ->
  match args with (UserData (UserFloat xi), _) -> op xi env
    | (x, _) -> Error ("invalid unary operation arg", show_bicobj x)) ;;
let floatret op a b env = UserData (UserFloat (op a b)) ;;

let basic_intrinsics = 
  (defintrinsic "float_show"       (float_unary (fun x env ->
    UserData (UserString (string_of_float x))))
  (defintrinsic "float_add"        (float_binary (floatret (+.)))
  (defintrinsic "float_negated"    (float_unary (fun x env -> 
    UserData (UserFloat ~-.x)))
  (defintrinsic "float_multiply"   (float_binary (floatret ( *.)))
  (defintrinsic "float_divide"     (float_binary (floatret (/.)))
  (defintrinsic "float_less_than"  (float_binary (boolret (<)))
  (defintrinsic "integer_as_float" (integer_unary (fun x env ->
    UserData (UserFloat (float_of_int x))))
  integer_intrinsics))))))) ;;

let basic_prog = eval (Add("intrinsics", basic_intrinsics, Phi)) 
  (parse prog_sys_source) ;;
let eval_with_lib expr = eval (Add("intrinsics", basic_intrinsics, 
                               Add("prog", basic_prog, Phi)))
  (Call(Derivation(Name "prog", Some "prog",
    Definition("()", expr, NoDefs)), "()")) ;;

let unit_tests () =
    (* really complicated evaluation *)
    let lib_eval_is str userdata =
      let actual_result = (eval_with_lib (parse ("(" ^ str ^ ").userdata")))
      in let rv = actual_result = UserData(userdata)
      in (if not rv then print_endline (show_bicobj actual_result));
        rv
    in
    (* basic boolean behavior *)
    assert(lib_eval_is "prog.sys.bool.true.show" (UserString "true"));
    assert(lib_eval_is "prog.sys.bool.false.show" (UserString "false"));
    assert(lib_eval_is "prog.sys.bool.true.not.show" (UserString "false"));
    assert(lib_eval_is "prog.sys.bool.false.not.show" (UserString "true"));
    assert(lib_eval_is "(prog.sys.bool.true && prog.sys.bool.true).show"
            (UserString "true"));
    assert(lib_eval_is "(prog.sys.bool.true && prog.sys.bool.false).show"
            (UserString "false"));
    assert(lib_eval_is "(prog.sys.bool.false && prog.sys.bool.false).show"
            (UserString "false"));
    assert(lib_eval_is "(prog.sys.bool.false && prog.sys.bool.true).show"
            (UserString "false"));
    assert(lib_eval_is "(prog.sys.bool.true || prog.sys.bool.true).show"
            (UserString "true"));
    assert(lib_eval_is "(prog.sys.bool.true || prog.sys.bool.false).show"
            (UserString "true"));
    assert(lib_eval_is "(prog.sys.bool.false || prog.sys.bool.false).show"
            (UserString "false"));
    assert(lib_eval_is "(prog.sys.bool.false || prog.sys.bool.true).show"
            (UserString "true"));
    (* basic integer arithmetic *)
    assert(lib_eval_is "3.negated" (UserInteger (-3)));
    assert(lib_eval_is "3 + 4" (UserInteger (7)));
    assert(lib_eval_is "(3 + 4).show" (UserString "7"));
    assert(lib_eval_is "\"foo\".show" (UserString "foo"));
    assert(lib_eval_is "prog.sys.bool.true.if_true(then=3, else=4)"
            (UserInteger 3));
    assert(lib_eval_is "(3 > 4).show" (UserString "false"));
    assert(lib_eval_is "(3 < 4).show" (UserString "true"));
    assert(lib_eval_is "(3 < 4).if_true(then=5, else=6)" (UserInteger 5));
    assert(lib_eval_is "(3 > 4).if_true(then=5, else=6)" (UserInteger 6));
    assert(lib_eval_is "(3 > 4).if_false(then=5, else=6)" (UserInteger 5));
    assert(lib_eval_is "{env:
        fac = {fac:
            arg1 = 3
            '()' = (fac.arg1 < 2).if_true(
                then = 1
                else = fac.arg1 * env.fac(fac.arg1 - 1)
            )
        }
    }.fac(5)" (UserInteger 120));
    assert(lib_eval_is "3 * 4" (UserInteger 12));
    (* basic integer comparisons *)
    assert(lib_eval_is "(3 == 3).show" (UserString "true"));
    assert(lib_eval_is "(3 == 4).show" (UserString "false"));

    (* floats *)
    assert(lib_eval_is "3.5" (UserFloat 3.5));
    assert(lib_eval_is "3.5.show" (UserString "3.5"));
    assert(lib_eval_is "3.5 + 4.0" (UserFloat 7.5));
    assert(lib_eval_is "4.5 - 3.0" (UserFloat 1.5));
    assert(lib_eval_is "0.5 * 0.25" (UserFloat 0.125));
    assert(lib_eval_is "3.5 / 2.0" (UserFloat 1.75));
    assert(lib_eval_is "1.0 + 1.0 + 1.0" (UserFloat 3.0));
    assert(lib_eval_is "(3.5 < 4.0).show" (UserString "true"));
    assert(lib_eval_is "(3.5 > 4.0).show" (UserString "false"));
    assert(lib_eval_is "(4.0 > 3.5).show" (UserString "true"));
    assert(lib_eval_is "(3.5 == 3.5).show" (UserString "true"));

    (* a little bit of mixed-mode arithmetic *)
    assert(lib_eval_is "3.5 / 2" (UserFloat 1.75));
    (* This one doesn't work yet because it depends on prog.error 
       existing and having the right '!!' behavior *)
(*     assert(lib_eval_is "2 * 1.5" (UserFloat 3.0)); *)
;;
unit_tests();;


And then, so you can use it interactively, there's this
read-eval-print loop.  It prints some minimal profiling information on
exit.

(* -*- mode: tuareg; compile-command: "./build" -*- *)
(* something that lets you try out expressions *)
(* Bicicleta language interpreter 
 Copyright (C) 2007  Kragen Javier Sitaker 

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU General Public License as published by 
 the Free Software Foundation; either version 2 of the License, or 
 (at your option) any later version. 

 This program is distributed in the hope that it will be useful, 
 but WITHOUT ANY WARRANTY; without even the implied warranty of 
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 GNU General Public License for more details. 

 You should have received a copy of the GNU General Public License 
 along with this program; if not, write to the Free Software 
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *)
open Bicicleta ;;
open Bicicleta_syntax ;;
open Bicicleta_lib ;;

let call_count_report () =
  let f key value tail = (value, key) :: tail
  in let calls_list = Hashtbl.fold f call_count []
  in (List.fold_left (fun t (v, k) -> t + v) 0 calls_list,
      List.sort (Pervasives.compare) calls_list) ;;

let rec main inp outp = 
  try
    output_string outp "'()' = "; flush outp;
    (try 
        let inval = input_line inp in
        let inexpr = parse inval 
        in let rv = eval_with_lib inexpr
        in let show = eval (Add("rv", rv, Phi)) (parse "rv.show.userdata")
        in 
             output_string outp (show_bicobj show ^ "\n");
          flush outp;
      with
          Parsing.Parse_error -> output_string outp "parse error\n"
    );
    main inp outp
  with End_of_file -> 
    let total, per_method = call_count_report ()
    in List.iter (fun (v, k) -> 
      output_string outp (k ^ ": " ^ string_of_int v ^ "\n"))
      (List.rev per_method);
      output_string outp ("total method calls: " ^ string_of_int total ^ "\n")
;;
Hashtbl.clear call_count ;;
print_endline "Bicicleta version 3, Copyright (C) 2007  Kragen Javier Sitaker
Bicicleta comes with ABSOLUTELY NO WARRANTY; for details see the file
\"COPYING\".
This is free software, and you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
" ;;
main stdin stdout ;;

Thu, 22 Mar 2007

Contents
--------

Introduction
Extensions
Scoping
* Syntactic sugar for list comprehensions, using "higher-order" patterns
* Explicit declarations
* Implicit pattern augmentation
Global Scope
SIMD
Prefix syntax: An Enlightening Syntactic Digression
A More Extended Example: A Recipes File
Efficiency
Connection To SnikiSniki

Introduction
------------

I was reading Wouter van Oortmerssen's brilliant thesis again, and I
had an idea.  His Aardappel language is a linear eager
(innermost-first) tree-rewriting system, and he points out that
although it doesn't support any kind of inheritance, you can
substitute new kinds of objects for old ones, simply by adding new
rewrite rules for the functions that need to be successfully
applicable to the new objects.

So I got to thinking.  What if we take that approach to the extreme?
Suppose you could define properties on records in terms of rewrite
rules on those records' properties?

    { x: X, y: Y }.r = (X*X + Y*Y).sqrt

That would define an "r" method, or property, on any record that had X
and Y properties.  But you could nest the pattern more deeply, use
constants, and just use the property names to name the property values
in cases where it wasn't ambiguous; here's an example taken from my
toy APL in OCaml (recently posted to kragen-hacks, I think) that shows
how to render APL expressions:

    { unop, value }.show = Unop + " " + Value.show
    { atom_value }.show = Atom_value.show_atom
    { parenthesized_value }.show = "(" + Parenthesized_value + ")"
    { left_op: { atom_value } as Left_op, bin_op, right_op }.show
        = (Left_op, Bin_op, Right_op).show_binop
    { left_op: { parenthesized_value } as Left_op, bin_op, right_op}.show
        = (Left_op, Bin_op, Right_op).show_binop
    (L, O, R).show_binop = L + " " + O + " " + R
    { left_op, bin_op, right_op}.show 
        = { parenthesized_value: Left_op }.show + " " + Bin_op + Right_op.show

    [].show_atom = "()"
    [N, M, ...Lst].show_atom = N.show_num + " " + [M, ...Lst].show_atom
    [N].show_atom = N.show_num

Here I'm using [N, M, ...Lst] to mean a list whose first two items are
N and M and whose remainder is Lst, a la Lisp (N . (M . Lst)), Prolog
[N|[M|Lst]], or OCaml N :: M :: Lst.  Without any syntactic sugar, you
could still write it as { car: N, cdr: { car: M, cdr: Lst } }.

Here's a fragment from my prototype Bicicleta implementation showing
the use of a constant ("nodefs") in a pattern:

    { method_name, method_body, rest: nodefs }.definition 
        = (Method_name, Method_body).show_method
    { method_name, method_body, rest }.definition 
        = (Method_name, Method_body).show_method + ", " + Rest.show_methods

(The real code is in OCaml; that's just a translation.)

Presumably you'd want to namespace the property names by some other
mechanism to reduce spurious pattern matches.

This pattern-matching syntax is less concise than the Caml or
Aardappel equivalent, but it seems that it should make it easier to
define new kinds of data that implement multiple protocols.  It is
probably harder to implement efficiently, though.

However, you could imagine that many pattern matches will be for the
simplest cases, where we're only interested in the properties of a
single record and have no particular requirements for them.  In this
case, you could even omit the entire record!  So, for some of the
previous examples, you could write:

    r = (X*X + Y*Y).sqrt
    show = Unop + " " + Value.show
    show = Atom_value.show_atom
    show = "(" + Parenthesized_value + ")"
    { left_op: { atom_value } as Left_op, bin_op, right_op }.show
        = (Left_op, Bin_op, Right_op).show_binop
    { left_op: { parenthesized_value } as Left_op, bin_op, right_op}.show
        = (Left_op, Bin_op, Right_op).show_binop
    (L, O, R).show_binop = L + " " + O + " " + R
    show = { parenthesized_value: Left_op }.show + " " + Bin_op + Right_op.show

Perhaps you could shorten it further by inferring extra required property
names even in cases where the pattern record is not entirely omitted:

    { left_op: { atom_value } as Left_op }.show
        = (Left_op, Bin_op, Right_op).show_binop
    { left_op: { parenthesized_value } as Left_op }.show
        = (Left_op, Bin_op, Right_op).show_binop

By itself, the ability to define these rewrite rules, create records
with properties whose names are known at compile time, and read properties
whose names are known at compile time, suffices for a Turing-complete
higher-order functional programming language; the rest of the above
(with infix operators, tuples, lists, and so on) can be viewed as
syntactic sugar.  (X + Y might rewrite as {left: X, right: Y}.sum, for
example.)

(The "as Left_op" requires some explanation; it means that the pattern
on the left side of the "as" should be bound to the name on the right
side, in the above cases the objects that matched {atom_value} and
{parenthesized_value}, which might have arbitrary other data in them.
I'm not sure if this feature is more than just syntactic sugar, but I
suspect so.)

In a way, it has a very APLish feel to it --- r = (X*X + Y*Y).sqrt,
thought of as a statement, simultaneously creates a new "column"
called "r" for all the values known as "x" and "y" (when they are on
the same records).  In Bicicleta, you can occasionally reach that same
level of brevity, but only inside the context of the object.

Extensions
----------

There are other obvious extensions --- unification, properties with
property names not known at compile-time (in patterns, property
accesses, and even property definitions) and inheritance, but these
are not necessary --- not even inheritance!  Defining a new property
will automatically create that property on any record having the
necessary attributes, which gives you a sort of multiple inheritance
already.

I was thinking that this might be an interesting basis for an
end-user-oriented database.  Doing the equivalent of SQL's 
SELECT ... WHERE would require a way to name attributes, preferably
anonymous ones, so that you could say e.g. "(R < 4).where" or (with an
infix operation "where") "Mypoints where (R < 4)", with "(R < 4)"
evaluating to an anonymous function or anonymous property that is True
for records whose R is less than 4, and False otherwise.

Scoping
-------

But the database idea in the "extensions" section introduces multiple
name scopes in the same expression: R presumably comes from each
point, and Mypoints presumably comes from some other context --- it
clearly can't come from each point.  And we probably don't want to
restrict where-expressions to contain only constants and properties of
the queried items --- and if we're using the set of properties
mentioned in each subexpression to determine when they are even
applicable, we have an untenable situation.  Here are the options I
have come up with.

* Syntactic sugar for list comprehensions, using "higher-order" patterns

The language as described earlier is already powerful enough to handle
this sort of query without the ability to do things like apply
anonymous functions.  Here the first four lines define a "filter"
function, the fifth line defines an rlt4 "function", and the sixth
line is the translation of "Mypoints where (R < 4)".

    ([], _).filter = []
    ([A, ...As], F).filter = ((F, A).apply, As, F).filternext
    (true, As, F).filternext = [A, ...(As, F).filter]
    (false, As, F).filternext = (As, F).filter
    (rlt4, A).apply = A.r < 4
    (Mypoints, rlt4).filter

You could add syntactic sugar so that you could write 
(Mypoints, (A -> A.r < 4)).filter or 
or [A in Mypoints if A.r < 4] that translated into the above.  A more
general list-comprehension would also support 
[A for A in Mypoints if A.r < 4], and perhaps also multiple sequences to
loop over.

Still, it would probably be better to be able to write that with
get-property-by-name, as follows:

    ([], _).filter = []
    ([A, ...As], F).filter = (A.F, As, F).filternext
    (true, As, F).filternext = [A, ...(As, F).filter]
    (false, As, F).filternext = (As, F).filter
    A.rlt4 = A.r < 4
    (Mypoints, rlt4).filter

Because then you could say (Mypoints, is_visible).filter.  Maybe that
doesn't matter if you can say [A in Mypoints if A.is_visible].

In the language as I've discussed it so far, (rlt4, A).apply (or
A.rlt4) is still defined for records that don't have an r --- the
left-hand-side pattern doesn't mention r, and the right-hand side
doesn't mention R (which would cause an inferred r on the
left-hand-side).  But presumably "A.r < 4" would evaluate as 
"error < 4" or "nil < 4" or something, and that should presumably not
get ignored by default.  But maybe you could write a different kind of
filter that did ignore it.

Maybe you could also write ({r} -> R < 4), in which case 
(rlt4, {}).apply would fail to match (rlt4, {r}).apply, and your
evaluation would get stuck in the middle when 
((rlt4, {}).apply, As, F).filternext failed to reach normal form.
Probably that should also cause it to return an error.  Unlike
Aardappel, this language family does distinguish between data
structures and functions at a basic level, which would give it an
excuse to return an error.

The anonymous-property syntactic sugar would probably benefit from
multiple pattern-match cases, so you could write 
({r} -> R < 4 | {nonpolar} -> false) or some such.

Anyway, with this kind of filtering, you could add attributes to
master records from joins --- using Avi Bryant's example from
DabbleDB, where you have a table of talks with presenter names, and a
table of presenters with presenter names and biographies:

    presenter_rec = [P in Presenters if P.name == Presenter].first
    presenter_bio = Presenter_rec.bio

* Explicit declarations

Here's a second way out.  In Python and Tcl, you can only assign to
variables in the innermost syntactic scope.  You could take an
analogous approach and only implicitly introduce new pattern elements
when a previously undeclared variable was mentioned, and then
introduce it automatically in the innermost scope, in which case you
could write "Mypoints where (R < 4)" but you wouldn't get the right
effect from "Presenters where (Name == Presenter)" --- that would
return records that had two properties, name and presenter, with the same
value.  But you could write

    {Presenter}.presenter_rec = (Presenters where (Name == Presenter)).first
    presenter_bio = Presenter_rec.bio

* Implicit pattern augmentation

You could, also, just declare that property accesses implicitly
augment the pattern match, and then you could write

    Talk.presenter_rec = (Presenters where (Name == Talk.presenter)).first

It would be somewhat undesirable to ignore all properties whose
computation asked for an undefined property, which I think would be a
result of this approach.

Global scope
------------

I didn't mention where the variable Presenters comes from in the
above.  I assumed it came from some global scope, not from every
record.  Maybe it should have some Ruby-style sigil to indicate that.

SIMD
----

Having properties that might have values of vectors of other records
could simplify the process, by making query screens, tables of detail
records, and the like, just ordinary records.  The ().first thing in
the example above is ugly; it would be nicer to say this instead:

    presenter_rec = [P in Presenters if P.name == Presenter]
    presenter_bio = Presenter_rec.bio

For that to work, though, the .bio property access has to
automatically map over whatever "where" returns --- which probably
means that all properties should be potentially multivalued, as in
Prolog, Icon, Pick, Lotus Agenda, or perhaps APL.  This suggests the
need for either a small and well-defined set of properties that apply
to the entire collection rather than each item, or some special syntax
for applying any property to the entire collection rather than each
item.  I'm going to ignore that problem for now and just let some
methods apply to the whole thing, while others apply only to part of
it, the same mess most of those languages have.

This could also provide a neat solution to clashing rewrite rules,
although only time will tell if the neat solution is also a useful
solution --- it might be preferable to be able to do the equivalent of
overriding a method in a subclass so that the base class method is
ignored, by providing a more specific rewrite rule.

Prefix syntax: An Enlightening Syntactic Digression
---------------------------------------------------

Suppose I rewrite some of my previous examples with prefix syntax.

    r { x: X, y: Y } = sqrt(X*X + Y*Y)
    r = sqrt(X*X + Y*Y)

    show { left_op: { atom_value } } = show_binop (Left_op, Bin_op, Right_op)
    show_binop (L, O, R) = L + " " + O + " " + R
    show = show { parenthesized_value: Left_op } + " " + Bin_op + show Right_op

    show_atom [] = "()"
    show_atom [N, M, ...Lst] = show_num N + " " + show_atom [M, ...Lst]
    show_atom [N] = show_num N
    definition { rest: nodefs } = show_method (Method_name, Method_body)
    definition 
        = show_method (Method_name, Method_body) + ", " + show_methods Rest

    filter (_, []) = []
    filter (F, [A, ...As]) = filternext ((F, A).apply, F, As)
    filternext (true, F, As) = [A, ...filter(F, As)]
    filternext (false, F, As) = filter (As, F)
    apply (rlt4, A) = A.r < 4
    filter (Mypoints, rlt4)

    presenter_rec = [P in Presenters where name P == Presenter].first
    presenter_bio = Presenter_rec.bio

This doesn't change the semantics at all, but it ought to look
familiar to users of Haskell or OCaml; now the "methods" look like
functions.  There are only a few important differences:

1. The patterns are defined, not on the structural representation of
   the objects, but on the set of functions applicable to those
   objects.  Remember, the list, tuple, and infix notation is just
   syntactic sugar --- in the list case, [N, M, ...Lst] means { car:
   N, { car: M, cdr: Lst } }, in the tuple case, (true, F, As) means {
   n: 3, arg1: true, arg2: F, arg3: As }; and in the infix case, (X*X
   + Y*Y) means ((X, X).'*', (Y, Y).'*'),'+'.  So you can always
   define new objects that implement whatever protocol is desired for
   some existing operation, unlike in OCaml.

   This probably implies that the process of figuring out how and
   whether a function can be applied will resemble some kind of
   deduction system.  The simplest solution is probably to maintain
   the set of functions applicable to any particular object.
   (Fortunately, in the language so far presented, this doesn't
   require actually running any of the functions --- just examining
   their dependencies.  See "Efficiency".)

2. The objects' "contents" are really point-wise overrides of
   functions perhaps not otherwise applicable to those objects.  You
   can view a definition like f = { x: 1, y: 2 } as syntactic sugar for
   the following:

   x g23132 = 1
   y g23132 = 2
   f = g23132

   Except that { x: 1, y: 2 } is potentially garbage-collectable,
   while g23132, taken literally, would not be.

3. You can define a new pattern-action rule for an existing function
   anywhere.  This probably implies some kind of specificity ordering,
   as in Aardappel, and some kind of feedback about when it's
   applicable.

A More Extended Example: A Recipes File
---------------------------------------

Suppose you have a recipe file of the following form:
{recipes: [ { 
    instructions: "This is how we do it..."
    ingredients: [ {ingredient: "celery", quantity: 3, unit: "stalk"}
                   ...],
    servings: 4,
  } ...]
}

You can imagine a bunch of queries that might not be too hard to write:

    is_celery = Ingredient == "celery"
    celery_ingredients = [I in Ingredients if I.is_celery]
    {celery_ingredients: [N, ...Lst]}.has_celery = true
    celery_quantity = Celery_ingredients.quantity.sum  # assuming SIMD auto-map
    celery_per_serving = Celery_quantity / Servings
    # It would be cool to be able to define quantity_per_serving as a
    # property of each ingredient, but that requires access to the
    # surrounding context.
    ingredients_per_serving = (Ingredients, Servings).ing_divide
    (List, Divisor).ing_divide = [{
           ingredient: Item.ingredient, 
           quantity: Item.quantity / Divisor,
           unit: Item.unit
    } for Item in List]  # assuming list comprehensions
    # This next item only applies when you add a Desired_servings field to a
    # particular recipe.
    ingredients_for_desired_servings = 
        (Ingredients, Servings / Desired_servings).ing_divide
    calories_per_unit = [Nut.calories for Nut in Nutrition_database if
        Nut.ingredient_name == Ingredient && Nut.ingredient_unit == Unit]
    calories = Calories_per_unit * Quantity
    calories = Ingredients.calories.sum
    calories_per_serving = Calories / Servings

    cost_per_gram = [Item.cost for Item in Latest_shopping_price if
        Item.ingredient_name = Ingredient]
    cost = Cost_per_gram * Grams
    grams = [[Conversion.factor * Quantity for Conversion in Conversions if
        Conversion.from == Unit && Conversion.to == "grams"],
        [Density.factor * Quantity for Density in Food_densities if
        Density.per_what == Unit && Density.units == "grams"]].coalesce
    cost = Ingredients.cost.sum
    cost_per_serving = Cost / Servings

    # Search for a recipe to use up the leftovers in the fridge:

    ({ingredients}, ingredient).contains_ingredient = 
        [I in Ingredients if I.ingredient == Ingredient].any
    [Term, ...Terms].search_results = 
        [Recipe for Recipe in Recipes if 
        [(Recipe, Ingredient).contains_ingredient 
            for Ingredient in [Term,...Terms]].all]

Most of these should be no harder to write in Bicicleta, but being
able to define methods "out-of-line" like this might have its
advantages.

Efficiency
----------

For a given set of method definitions, the set of methods that apply
to a particular object can be mostly statically computed from the set
of methods that are assigned values for that object; we can call that
set the "base object shape".  A finite program contains a finite set
of base object shapes, at most one for each object literal in the
program, and probably usually a lot less, so you can precompute most
of the applicable method definitions for each object shape.

It is possible to define methods that exist on some objects of a
particular shape, but not others.  For example, if we have only this
definition:

    [N, M, ...Lst].show_atom = N.show_num + " " + [M, ...Lst].show_atom

then some objects of shape {car, cdr} will have show_atom defined,
while others will not, depending on what their cdr is; and some of
those that have it defined will have an error when they try to call
show_atom on their cdr.  (If we take the suggestion from earlier that
we augment the pattern on the left-hand side so that the call on the
right-hand-side cannot fail, then we end up with a pattern that
matches only infinite lists.)

Adding this definition helps matters a bit:

    [N].show_atom = N.show_num

Now any object of {car, cdr} whose cdr is either of shape [] or {car,
cdr} has a match.  It seems plausible that some kind of type inference
might be able to move this kind of pattern-matching to compile-time,
leaving only a single conditional branch to run-time.

Effectively, this proposal suggests inferring a set of classes in a
system from a set of objects and method inference rules, in order to
be able to use efficient lookup methods.

Some kind of "cut", as in Prolog, is probably also necessary.  In much
of the above, I've assumed that only the most specific method
definition ever applies, which is a kind of "cut".

Connection To SnikiSniki
------------------------

In Darius Bacon's SnikiSniki, you can make little tables out of little
conjunctions of clauses, which Prolog-style look for solutions in the
triple database; if you say

    [[Person parent Parent, Parent parent Grandparent]]

then you get a table of people with their parents and grandparents.
In SnikiSniki, this doesn't create a new "grandparent" relationship,
but you could imagine that it could.

You could express something like the above in this approach as
follows:

    {parent: {parent: Grandparent}}.grandparent = Grandparent

Most pattern-matching languages (OCaml, Haskell, etc.) have some kind
of "as" feature that lets you give a name an interm