Mon, 09 Apr 2007

> random_shuffle looks pretty hairy.

random_shuffle, as usually implemented, is based on swaps (establishing a  
random permutation) -- but one can also think of it as sorting with a  
"broken" comparison function which returns random results, so there may be  
a clever alternative to doing a real sort with an appended random key?

> There's a non-clever way to make any sort stable, which is to add a
> secondary sort key that is the original position in the list.  There
> may be a clever way to do a stable mergesort without knowing the list
> lengths in advance, but I don't know it.

The idea of radix-sort is that one can reduce the problem of sorting on a  
wide key to several sorts on narrow keys -- adding a secondary sort key is  
a bit self-defeating for this use.

The brute-force way to preserve stability is to reverse the reversed list  
in between passes, as used here with integers providing lists (and a pair  
of integers providing a tape):
http://en.literateprograms.org/Merge_sort_(dc)

The clever way (as used with actual tape drives) was to reverse the sense  
of comparison on each pass.

> That leaves only the heap and permutation operations.  I really don't
> know about them.

I'd guess heaps and permutations are essentially swap-based, and that  
swaps imply random access.
(permutations are closely related to insertion/selection-sort, which are  
not so far away from heap sort)

Knuth Vol 3, "Sorting and Searching", might be a good reference.  Anything  
that was good for external sorting with tapes ought to be good with  
lists.  (indeed, being good with tapes implies that one should be able to  
find more cache-friendly implementations than the standard  
singly-linked-list)

> A remarkably large share of the uses of array indexing by numerical
> variables I found in both my Python code and my JavaScript code were
> for some variant of the string.join problem --- given an array of N
> strings, connect them with N-1 commas, except when N=0, in which case
> we should connect them with N=0 commas instead of N-1 = -1 commas.

Squint at it properly, and tail-call optimization becomes an instance of  
string.join: instead of "a,b,c," (where x is a jump and ',' the equivalent  
return) we want to generate "a,b,c".  The dual (context preservation when  
building argument lists) follows the same pattern: we only need to  
save/restore contexts if there are at least two overlapping evaluations.

This may be a practical reason for the last couple of decades' interest in  
monads: we prefer to program in an algebraic, tree-like form (code as well  
as data), while the iron prefers to handle linear sequences of code  
operating on contiguous chunks of data (op x state -> state') -- and  
monads provide machinery to go from tree inputs to flat results in a  
theoretically nice manner.

-Dave