Mon, 31 Mar 2008

On Mon, Mar 31, 2008 at 02:45:11AM +0200, Dave Long wrote:
> [quoting Kragen]
> >At some time in their lives, all eccentrics who spend a lot of time
> >reading must take on the doomed project of the orthographic reform of
> >their language.  Occasionally this project is not doomed; for example,
> >if their scheme is backed by a king or revolutionary government, it
> >may have some chance of success.
> 
> The eccentricity may lie in the top-down assumption of orthographic  
> reform, as opposed to the bottom-up processes of orthographic change.

You are clearly correct.  I wonder why that wasn't clear to me before
you wrote.

I'm not sure that it makes the project any less doomed.

> >   Of course, we would have to pick a standard pronunciation to use
> >   for the phonetic spelling.
> 
> But why?  A reform of an orthography certainly requires a standard,  
> but by dropping the "ortho" requirement, people could simply spell as  
> they pronounce.

That might not be such a bad idea; it would certainly make literacy
more widespread.  But perhaps people can already do this, and as you
point out later, many already are.  The same forces that impeded
hangul's adoption in Korea for 500 years are at work today: language
use as a group membership marker rather than a means of communication.

That's part of it.

But a standardized, or at least slowly-changing, orthography makes it
quite easy to communicate among, say, Brazil, Argentina, and Mexico.
Even though the orthography differs between Portuguese and Spanish,
it's close enough that you can read clearly.

English spelling reform might widen that gap between English and the
Romance languages, though.  "Frequently" is much closer to
"frecuentemente" than "freekwuntlee".

> Consider the IM style displayed in these two versions of the same  
> commercial:
> http://www.youtube.com/watch?v=Knb6I9s8Wk8 (gsw)
> http://www.youtube.com/watch?v=UTLyRbZ55hw (fr)

Do you suppose the orthography ("heterography"?) in the commercial
creates a communication barrier?  Consider this quote from "Donna
Babee" on the Carbrain Young Fleeto page at
http://www.bebo.com/Profile.jsp?MemberId=5362005867

        here drew ya wee cum stain hoo dae u hink u r fck ma scheme al
        cum up n wreck ur scheme ya wee wank ur the wan thts no go a
        life if aw u dae is pelter wee boiz on bebo u need tae get a
        grip ya poof never mind a life FCK UR SCHEME

        CYF on top non stop run a mock full stop

in reply to "Drew Mcskimmin":

        fuck yer schemes fae cumbernauld yez are aw wee fuckin daft
        bois wae nae life ya bunch a chavie bastards btyre all the wiy
        ya fuckin tossers cumbernauld yez are haein a laff mare like
        scummnauld haha poofs

Maybe the communications obstacle there is not the presumably already
mostly phonetic spelling ("dae", "hae", and "no go" aren't that hard
to figure out) but the vocabulary I don't share.  (I guess "pelter" is
"pester"?  "fae"?  "btyre"?)

Even if the intended effect is creating a barrier to communication,
that barrier will exist even when it's not wanted.  Argentine IRC is
considerably more difficult than formal written Argentine Spanish in
part due to the absence of dictionaries.  (For spoken Argentine
Spanish, now at least we have a national academy of lunfardo...)

> One problem with popular orthographies of this sort is that they may  
> be too ephemeral; by being too faithful to the speech patterns of a  
> particular time and place, they lose the universality that we'd like  
> to see in a language and a literature.  Shakespeare, for instance,  
> seems to be more accessible for the novice when printed on the page  
> rather than presented on the stage.

I've always found it more accessible on the stage.  Watching it on
stage seems to be a more popular activity.  Maybe that's an illusion
because watching a stage play, or a movie in a theater, is more
observable than reading a Dover or Penguin Classics paperback?

But http://en.wikipedia.org/wiki/Hamlet_(1996_film) says Branagh's
Hamlet was a failure by earning only US$5 million at the US box office
that year.  But Dover Publications sells Shakespeare in printed form
(see
e.g. http://ocw.mit.edu/OcwWeb/Literature/21L-420Spring-2006/Readings/)
and
http://query.nytimes.com/gst/fullpage.html?res=9C01E2D81F3FF935A2575BC0A9669C8B63
says (in 2000):

        The Courier Corporation said yesterday that it had agreed to
        buy Dover Publications Inc. for $39 million, adding
        special-interest books to its list of titles as it seeks
        buyers on the Internet. Dover, which had revenue of $32
        million last year, sells more than 7,000 publications at
        bookstores and by mail order. Based in Mineola, N.Y., it
        reprints books on subjects from architecture to math. Courier
        identifies buyers on the Internet with special interests and
        tells them where they can find a particular work.

So Dover at the time sold maybe 16 million copies of all of the
Shakespeare plays (and everything else) every year, and maybe a
million people went to go see Branagh's movie in the few weeks or
months that it was in theaters.  

I was hoping to dig up something more definitive, but I think that's
pretty indecisive.

> [0] To be fair, since early composition was primarily oral, the  
> ancients took greater care to clearly signpost and articulate their  
> thoughts than we do in contemporary written text.  Rhetorical figures  
> are much more important when one is asking an audience to reconstruct  
> a parse tree, not from a punctuated text, but from a strictly linear  
> sequence of phonemes.

Interesting.  I wonder if there are rhetorical figures in modern
writing that would similarly disappear with comic-book bolding and
better layout.  Would it be a net positive?  Tonight I had a
discussion with a taxi driver about whether the use of seatbelts
discourages caution in driving.

> I've seen many TROFF sources that seem to have been written in a  
> ventilated style.  At the time, I had thought it was just a  
> reflection of the early line editors: by keeping phrases and clauses  
> on distinct lines, editing at 300 -- or even 110 -- baud on a  
> teletype becomes less painful.  But if ventilation were a meme of the  
> 60's, it may have even been the result of conscious choice.

Perhaps it was intended to reduce the size of linewise diffs.  I've
certainly done that in TeX, especially when collaborating with other
people by emailed patches.  (On second thought, |}fmt or M-q isn't
particularly convenient in ed; perhaps the lines remained unfilled
after edits because refilling the paragraphs was too inconvenient.)

> (indeed, this may be a reasonable halfway step: although we'd like
> for a reader to quickly grasp the structure of a text, it's even
> more important for an editor to have done so)

Indeed.

> Finally, what about format=flowed email?

Stallman wants Emacs to do general word processing, with fonts and
stylesheets and so on.  Perhaps separating the formatting codes from
the text (put them in a comment block at the end?  in a separate
file?) would provide a less disharmonious format than the usual
memory-dump word-processor formats; and in that case you could provide
several alternative sets of formatting for the same text.

Sun, 30 Mar 2008

Forwarded because (a) it's awesome and (b) I think Cc: kragen-tol meant
Cc: kragen-discuss.

----- Forwarded message from Dave Long <dave.long at bluewin.ch> -----

X-Pobox-Delivery-ID:
 E9C48BA8-FEBD-11DC-AACE-2374AF2C1679-74472045!sienna.pobox.com
x-pobox-client-address: 66.42.187.181
x-pobox-client-name: panacea.canonical.org
To: Kragen Javier Sitaker <kragen at pobox.com>
Resent-Date: Mon, 31 Mar 2008 02:55:37 +0200
Resent-To: kragen-discuss at canonical.org
From: Dave Long <dave.long at bluewin.ch>
Resent-From: Dave Long <dave.long at bluewin.ch>
Resent-Message-Id: <3250798B-5E11-4965-81BE-D814A1843CA9 at bluewin.ch>
Cc: kragen-tol at lists.canonical.org
Subject: Re: orthographic reform of English
X-BeenThere: kragen-discuss at canonical.org
X-Mailman-Version: 2.1.5
List-Id: "Discussion about kragen-* lists." <kragen-discuss.canonical.org>
List-Unsubscribe:
 <http://lists.canonical.org/mailman/listinfo/kragen-discuss>, 
 <mailto:kragen-discuss-request at canonical.org?subject=unsubscribe>
List-Archive: <http://lists.canonical.org/pipermail/kragen-discuss>
List-Post: <mailto:kragen-discuss at canonical.org>
List-Help: <mailto:kragen-discuss-request at canonical.org?subject=help>
List-Subscribe: <http://lists.canonical.org/mailman/listinfo/kragen-discuss>,
 <mailto:kragen-discuss-request at canonical.org?subject=subscribe>

>At some time in their lives, all eccentrics who spend a lot of time
>reading must take on the doomed project of the orthographic reform of
>their language.  Occasionally this project is not doomed; for example,
>if their scheme is backed by a king or revolutionary government, it
>may have some chance of success.

The eccentricity may lie in the top-down assumption of orthographic  
reform, as opposed to the bottom-up processes of orthographic change.

>   Of course, we would have to pick a standard pronunciation to use
>   for the phonetic spelling.

But why?  A reform of an orthography certainly requires a standard,  
but by dropping the "ortho" requirement, people could simply spell as  
they pronounce.  WWII influenced orthography in at least the  
Netherlands and Switzerland; in the former, orthographic reform, once  
viewed as the province of eccentric hypermodernists, became reality  
as the postwar dutch sought to distance their language from german;  
in the latter, the swiss dialects, once disparaged as the teutonic  
equivalents of ebonics, expected to disappear with improved  
education, became languages of pride and patriotism.

Of course, attempting to distance one's culture from a historically  
politically unappealing one may result in successful government  
backing of orthographic reform, but one must note that youth culture  
distances itself from straight adult culture without requiring  
governmental decrees.

Consider the IM style displayed in these two versions of the same  
commercial:
http://www.youtube.com/watch?v=Knb6I9s8Wk8 (gsw)
http://www.youtube.com/watch?v=UTLyRbZ55hw (fr)

The first, in swiss german, uses “standard” spelling rules to  
phonetically convey dialect speech.
The second, in swiss french, uses “langage SMS” digits and letters to  
phonetically convey standard speech. (http://fr.wikipedia.org/wiki/ 
Nouvelles_formes_de_communication_écrites)

Neither is orthographically "correct"; adult german swiss may speak  
swiss dialects, but they are supposed to write in standard german,  
and the situation is even simpler for the french swiss.  However, a  
quick survey of internet bulletin boards reveals that -- much like  
1337 -- at the younger and more informal end there have evolved  
different, less-strict-if-more-hip, orthographies.

One problem with popular orthographies of this sort is that they may  
be too ephemeral; by being too faithful to the speech patterns of a  
particular time and place, they lose the universality that we'd like  
to see in a language and a literature.  Shakespeare, for instance,  
seems to be more accessible for the novice when printed on the page  
rather than presented on the stage.

At the opposite end of the range of timescales for change in the  
conventions of written communications, we have the display of text  
itself.  Perhaps the question of adoption of "ventilated prose"  
should be looked at, not as a matter of decades or centuries, but in  
the context of the gradual -- nearly glacial -- addition of hints for  
the reader in written material.

>   There is considerable room for debate about the best layout for
>   English text; even for simpler languages like OCaml that are
>   traditionally written indented in this fashion, there is often some
>   ambiguity about the best way to format code.  The basic principle,
>   though, is that the hierarchical structure of the sentences should
>   be reflected in a layout with the smaller parts of the sentence
>   indented further to the right.


Alphabets came in around, say, 1500 BCE.  But just as the input and  
length limits of SMS have driven abbreviations in modern  
communications, early writers were willing to displace much of the  
work of reconstruction upon the reader:

THRSCNSDRBLRMFRDBTABTTHBSTLYTFRNGLSHTXT

At least the greeks, who didn't have the regularities of semitic  
languages to fall back upon, figured they'd have enough pity on the  
reader to regularly notate the vowels as well:  (ca. 1000 BCE?)

THEREISCONSIDERABLEROOMFORDEBATEABOUTTHEBESTLAYOUTFORENGLISHTEXT

>From there it took a few millenia (to roughly 1000 CE) for the  
concept of "ventilated sentences" to catch on; there were a few  
eccentrics who added spaces and other punctuation[0] to their  
sentences for a few centuries before, and the process wasn't  
completed until a few centuries after, but in general we now consider  
it an unspoken duty of a writer to clearly separate words in phrases:

There is considerable room for debate about the best layout for  
english text

So perhaps, over another period of centuries, around the year 3000 if  
not before[1], we will expect that writers will articulate and  
subordinate their thoughts in two dimensions, and anyone who flows  
text together in dense rectangular blocks will be considered as  
eccentric as someone of our times who has chosen  
tojamallthetexttogetherwithnoregardforarticulatingindividualwords.

There is
  considerable room for debate
    about the best layout
      for english text

-Dave

:: :: ::

[0] To be fair, since early composition was primarily oral, the  
ancients took greater care to clearly signpost and articulate their  
thoughts than we do in contemporary written text.  Rhetorical figures  
are much more important when one is asking an audience to reconstruct  
a parse tree, not from a punctuated text, but from a strictly linear  
sequence of phonemes.

[1] There are many opportunities in current communications where  
presentation is sufficiently separated from content that one might  
easily experiment with ventilated prose without offending the  
sensibilities of naive end readers.

I've seen many TROFF sources that seem to have been written in a  
ventilated style.  At the time, I had thought it was just a  
reflection of the early line editors: by keeping phrases and clauses  
on distinct lines, editing at 300 -- or even 110 -- baud on a  
teletype becomes less painful.  But if ventilation were a meme of the  
60's, it may have even been the result of conscious choice.

HTML, Wikis, and the varied message board markup languages, in  
reflowing their output, also give the opportunity to write in a style  
of which Bucky would approve, yet passing unnoticed by the average  
reader.  (indeed, this may be a reasonable halfway step: although  
we'd like for a reader to quickly grasp the structure of a text, it's  
even more important for an editor to have done so)

Finally, what about format=flowed email?


----- End forwarded message -----

> At some time in their lives, all eccentrics who spend a lot of time
> reading must take on the doomed project of the orthographic reform of
> their language.  Occasionally this project is not doomed; for example,
> if their scheme is backed by a king or revolutionary government, it
> may have some chance of success.

The eccentricity may lie in the top-down assumption of orthographic  
reform, as opposed to the bottom-up processes of orthographic change.

>    Of course, we would have to pick a standard pronunciation to use
>    for the phonetic spelling.

But why?  A reform of an orthography certainly requires a standard,  
but by dropping the "ortho" requirement, people could simply spell as  
they pronounce.  WWII influenced orthography in at least the  
Netherlands and Switzerland; in the former, orthographic reform, once  
viewed as the province of eccentric hypermodernists, became reality  
as the postwar dutch sought to distance their language from german;  
in the latter, the swiss dialects, once disparaged as the teutonic  
equivalents of ebonics, expected to disappear with improved  
education, became languages of pride and patriotism.

Of course, attempting to distance one's culture from a historically  
politically unappealing one may result in successful government  
backing of orthographic reform, but one must note that youth culture  
distances itself from straight adult culture without requiring  
governmental decrees.

Consider the IM style displayed in these two versions of the same  
commercial:
http://www.youtube.com/watch?v=Knb6I9s8Wk8 (gsw)
http://www.youtube.com/watch?v=UTLyRbZ55hw (fr)

The first, in swiss german, uses “standard” spelling rules to  
phonetically convey dialect speech.
The second, in swiss french, uses “langage SMS” digits and letters to  
phonetically convey standard speech. (http://fr.wikipedia.org/wiki/ 
Nouvelles_formes_de_communication_écrites)

Neither is orthographically "correct"; adult german swiss may speak  
swiss dialects, but they are supposed to write in standard german,  
and the situation is even simpler for the french swiss.  However, a  
quick survey of internet bulletin boards reveals that -- much like  
1337 -- at the younger and more informal end there have evolved  
different, less-strict-if-more-hip, orthographies.

One problem with popular orthographies of this sort is that they may  
be too ephemeral; by being too faithful to the speech patterns of a  
particular time and place, they lose the universality that we'd like  
to see in a language and a literature.  Shakespeare, for instance,  
seems to be more accessible for the novice when printed on the page  
rather than presented on the stage.

At the opposite end of the range of timescales for change in the  
conventions of written communications, we have the display of text  
itself.  Perhaps the question of adoption of "ventilated prose"  
should be looked at, not as a matter of decades or centuries, but in  
the context of the gradual -- nearly glacial -- addition of hints for  
the reader in written material.

>    There is considerable room for debate about the best layout for
>    English text; even for simpler languages like OCaml that are
>    traditionally written indented in this fashion, there is often some
>    ambiguity about the best way to format code.  The basic principle,
>    though, is that the hierarchical structure of the sentences should
>    be reflected in a layout with the smaller parts of the sentence
>    indented further to the right.


Alphabets came in around, say, 1500 BCE.  But just as the input and  
length limits of SMS have driven abbreviations in modern  
communications, early writers were willing to displace much of the  
work of reconstruction upon the reader:

THRSCNSDRBLRMFRDBTABTTHBSTLYTFRNGLSHTXT

At least the greeks, who didn't have the regularities of semitic  
languages to fall back upon, figured they'd have enough pity on the  
reader to regularly notate the vowels as well:  (ca. 1000 BCE?)

THEREISCONSIDERABLEROOMFORDEBATEABOUTTHEBESTLAYOUTFORENGLISHTEXT

 From there it took a few millenia (to roughly 1000 CE) for the  
concept of "ventilated sentences" to catch on; there were a few  
eccentrics who added spaces and other punctuation[0] to their  
sentences for a few centuries before, and the process wasn't  
completed until a few centuries after, but in general we now consider  
it an unspoken duty of a writer to clearly separate words in phrases:

There is considerable room for debate about the best layout for  
english text

So perhaps, over another period of centuries, around the year 3000 if  
not before[1], we will expect that writers will articulate and  
subordinate their thoughts in two dimensions, and anyone who flows  
text together in dense rectangular blocks will be considered as  
eccentric as someone of our times who has chosen  
tojamallthetexttogetherwithnoregardforarticulatingindividualwords.

There is
   considerable room for debate
     about the best layout
       for english text

-Dave

:: :: ::

[0] To be fair, since early composition was primarily oral, the  
ancients took greater care to clearly signpost and articulate their  
thoughts than we do in contemporary written text.  Rhetorical figures  
are much more important when one is asking an audience to reconstruct  
a parse tree, not from a punctuated text, but from a strictly linear  
sequence of phonemes.

[1] There are many opportunities in current communications where  
presentation is sufficiently separated from content that one might  
easily experiment with ventilated prose without offending the  
sensibilities of naive end readers.

I've seen many TROFF sources that seem to have been written in a  
ventilated style.  At the time, I had thought it was just a  
reflection of the early line editors: by keeping phrases and clauses  
on distinct lines, editing at 300 -- or even 110 -- baud on a  
teletype becomes less painful.  But if ventilation were a meme of the  
60's, it may have even been the result of conscious choice.

HTML, Wikis, and the varied message board markup languages, in  
reflowing their output, also give the opportunity to write in a style  
of which Bucky would approve, yet passing unnoticed by the average  
reader.  (indeed, this may be a reasonable halfway step: although  
we'd like for a reader to quickly grasp the structure of a text, it's  
even more important for an editor to have done so)

Finally, what about format=flowed email?

Sat, 29 Mar 2008

At some time in their lives, all eccentrics who spend a lot of time
reading must take on the doomed project of the orthographic reform of
their language.  Occasionally this project is not doomed; for example,
if their scheme is backed by a king or revolutionary government, it
may have some chance of success.

There is a history of some of these successful attempts in
http://en.wikipedia.org/wiki/Spelling_reform and a catalogue of
fourteen unsuccessful attempts in English at
http://en.wikipedia.org/wiki/English_reform.

So I am offering these suggestions for the orthographic reform of
English without any real hope that they have any chance of widespread
adoption, except perhaps through automated translation software.
Briefly, I advocate phonetic spelling, syllable blocks, boldface for
sentence stress, and syntactic layout.

1. Phonetic spelling.  There's an existing, widely-understood phonetic
   alphabet, used in almost all the dictionaries of the world except
   for English ones; it's called the International Phonetic Alphabet,
   or IPA.  Continuing to write English in the impoverished Latin
   alphabet, without even using accents as most other languages do,
   wastes the time of countless generations of youngsters, who could
   be spending their elementary-school days on algebra, music,
   literature, art, or vocabulary, rather than spelling.  So we should
   write English with the IPA.

   Of course, we would have to pick a standard pronunciation to use
   for the phonetic spelling.  I propose using the dialect of English
   with the largest number of speakers: Indian English, with 350
   million users.  It may have the disadvantage that its phonology is
   somewhat less complex than that of most American, English, and
   Australian dialects, which may make it difficult to infer the
   English (etc.) pronunciations for words from their spelling.  But
   this should be much less of a problem than at present.

   George Bernard Shaw famously willed much of his estate to a failed
   attempt to promulgate a phonetic spelling system for English.  See
   http://en.wikipedia.org/wiki/Shavian for details.  Other famous
   would-be English-spelling reformers include Benjamin Franklin,
   Melvin Dewey, Theodore Roosevelt, Mark Twain, and Noah Webster.

2. Syllable blocks.  Korea's Hangul is the only script to successfully
   combine the easy skimmability of Chinese logograms with the easy
   learning of phonetic writing systems.  So the letters used in
   writing English should be similarly arranged into syllable blocks.
   I have the impression that Korean has very little inflection and
   consequently fewer inflection-related vowel changes, so this may
   not work as well for English as for Korean, but most words in
   English do not have any inflection-related vowel changes either.
   For example, I think the previous sentence contains none, and this
   sentence contains only "think".

   Note that, according to Wikipedia, although hangul was created in
   the 1400s and promoted by the king, it didn't displace the
   Chinese-character system until the 20th century; from
   http://en.wikipedia.org/wiki/Hangul#Other_names:

        Until the early twentieth century, hangul was denigrated as
        vulgar by the literate elite who preferred the traditional
        hanja writing system[citation needed]. They gave it such names
        as:

            * Eonmun ("vernacular script").
            * Amkeul ("women's script").
            * Ahaekkeul or ahaegeul ("children's script").
            * Achimgeul ("writing you can learn within a morning").

3. Boldface for sentence stress.  This *convention* is already widely
   used in *comic books*, in order to facilitate *comprehension*.  I
   *suspect*, but have no *proof*, that it could convey *much* of the
   emotional *content* that is so often *misread* in *email* today.
   Conveying emotions *clearly* with only *word choice* is a very
   difficult *discipline*, the discipline of *poetry*. While poetry is
   a *priceless part* of our cultural *heritage*, it is a *serious
   problem* that communicating emotions *clearly* through email
   requires *writing poetry*.

4. Syntactic layout.  Rather than being divided into paragraph blocks,
   text should be divided into lines according to phrasal divisions,
   and indented to show the hierarchical structure of the phrases.
   This is essentially universal practice for writing computer
   programs, with the partial exception of assembly language, and has
   been for decades, for the excellent reason that it makes the
   programs dramatically easier to understand.  Buckminster Fuller
   called it "ventilated prose", and used it for the same reason, but
   the unfortunate effect of his writing in this format was that his
   work was often dismissed as "poetry":

        Though the preparation for that mid-nineteen-thirties
        presentation had been developed under the close observation of
        the corporation's Director of Research, my final written
        presentation of it was declared by the Direcdtor to be
        incomprehensible. Disgruntled, I re-read it carefully and
        returned to the Director saying, "Please listen to this," and
        proceeded to read in spontaneously metered "doses" from my
        manuscript. As I read I also watched for expressions of
        comprehension on the Director's face. The Director pondered
        each verbal dose, and when his face signalled "that is clear"
        I would intuitively measure out the next portion. Finally, the
        Director said, "Why don't you write it that way?" I said, "I
        am reading directly and without skipping from my original
        text"; so the Director said, "It just doesn't read that way."
        The explanation was that the intuitive doses did not
        correspond to conventional syntax.

        When the re-written report was submitted, the Director said,
        "This is lucid, but it is poetry, and I cannot possibly hand
        it to the President of the Corporation for submission to the
        Board of Directors." I insisted that it was obviously not
        poetry, since both he and I knew how I had chopped up a
        conventional prose report. The Director said, "I am having two
        poets for dinner tonight and I will take this to them and see
        what they say." He returned the next day and said, "It's too
        bad --- it's poetry."

   (That's according to
   http://webhome.cs.uvic.ca/~vanemden/zzVentProse.html which has no
   visible authorship information, but it is on Maarten van Emden's
   home page, and it is supposedly a quote "from the preface of No
   More Second-Hand God" by Buckminster Fuller, Southern Illinois
   University Press, 1963.")

   Here's an example, supposedly from "Intuition", via
   http://listserv.acsu.buffalo.edu/cgi-bin/wa?A2=ind9411&L=geodesic&T=0&P=5919

        And wherever they came from,
        The thoughts arranged in this book
        Are discoveries
        Of its author
        Since he first came in 1913
        To think
        That nature did not have
        Separate departments of
        Mathematics, physics,
        Chemistry, biology,
        History, and languages,
        Which would require
        Department head meetings
        To decide what to do
        Whenever a boy threw
        A stone in the water,
        With the complex of consequences
        Crossing all departmental lines.
        Ergo, I came to think that nature
        Has only one department --
        And I set out to discover its
        Obviously
        Omnirational
        Comprehensively co-ordinate system,
        And thankfully found it.

   Fuller's "ventilated prose" fails to take advantage of indentation.

   More recently, a group of researchers have written software to
   parse and automatically reformat text in this format, under the
   name "Visual-Syntactic Text Formatting" or "Live Ink", and
   conducted numerous experiments to measure its effect on
   readability.  They found that it improved readability
   substantially.  For more details, see
   http://www.readingonline.org/articles/r_walker/ "Visual-Syntactic
   Text Formatting", by Stan Walker, P. Schloss, C. R. Fletcher,
   C. A. Vogel, & R. C. Walker, 2005-05, via Reading Online 8(6), ISSN
   1096-1232; their software online at
   http://phil.red-castle.com/cgi-bin/HtmlClipRead80.exe rendered
   Fuller's text above as follows:

    And wherever
        they came from,
       The thoughts arranged
          in this book are discoveries
             of its author since he first
           came in 1913
              to think that nature
                 did not have
                separate departments
                  of mathematics,
       physics,
          chemistry,
       biology,
          history,
       and languages,
          Which
        would require department
            head meetings
        to decide what
            to do whenever
                a boy
                    threw a stone in the water,
       With the complex
          of consequences
             crossing all departmental lines.

   The parsing contains some errors; this would be more accurate:

    And wherever they came from,
        The thoughts 
                arranged in this book
            are discoveries of its author 
                since he first came 
                    in 1913
                to think 
                    that nature did not have
                        separate departments
                            of mathematics,
                               physics,
                               chemistry,
                               biology,
                               history, and
                               languages,
                        which would require 
                                department head meetings
                            to decide what to do 
                                whenever 
                                    a boy threw a stone 
                                        in the water,
                            with the complex
                                of consequences
                                    crossing all departmental lines.

   There is considerable room for debate about the best layout for
   English text; even for simpler languages like OCaml that are
   traditionally written indented in this fashion, there is often some
   ambiguity about the best way to format code.  The basic principle,
   though, is that the hierarchical structure of the sentences should
   be reflected in a layout with the smaller parts of the sentence
   indented further to the right.

These changes to English orthography would make English much easier to
learn, read, write, and even speak.  But there is no chance that they
will ever be adopted, even if people came to believe that I was some
kind of super-genius; the obstacles to orthography changes are simply
too great.

Sun, 16 Mar 2008

So I took a walk tonight.  I was in search of an ATM that wasn't out
of money, so that we could get cash to pay rent.  I guess the banks
here don't restock their ATMs on Sundays, so ATMs are often out of
cash, receipt paper, or both.

I took the bus down to Cabildo, in the area where we stayed when we
first came to Buenos Aires in November 2006.  We stayed with Mariana
Ponzi and her family; she was organizing a huge pillow fight (the
Lucha de Almohadas de Buenos Aires, perhaps history's largest) on my
birthday, and was excited to hear our stories of the famous San
Francisco Valentine's Day pillow fight.

The first two ATMs I tried were out of money, and I didn't know where
to look for more, so I strolled along Avenida Cabildo in the hot
summer night.  A young couple were weighing themselves on a digital
scale in front of a pharmacy, so I weighed myself as well; alarmingly,
I apparently weigh 109 kilograms, about 25 more than I ought to.
Maybe that's why my knees and ankles hurt so often.

I saw a huge RENT, EL MUSICAL banner strung over a nearby park,
illuminated from behind by the park's streetlights.  I walked over to
check it out, and was ambushed by live music from a crowded street
along one side of the park.

RENT is apparently playing at the KONEX Cultural Center in a few
weeks.  I have fond memories of seeing it in the previous millennium
in Cincinnati with a group of close friends.  I plan to see it here as
well, but I may have more difficulty understanding it in Spanish.

Thousands of people crowded the street, many of them carying laurel
branches, and after a while the music stopped and a priest started
speaking.  I stood and listened for a bit.  He was saying Mass in the
street, maybe because it was the beginning of Semana Santa and the
thousands and thousands of people in the street wouldn't have fit into
the church.

Across the street, the Mass crowd faded into the park, which was full
of its usual Sunday evening merchants; it was only 21:00, an hour
after dark, so they hadn't closed up their shops yet.  Fortunately, I
didn't spot any moneychangers, so I was saved the temptation of
overturning their tables.

Around the corner, I finally found an ATM that was only out of small
bills --- it still had enough AR$100 bills to allow us to pay the
rent.

I listened to the Mass for a while.  I'd never heard the Lord's Prayer
in Spanish before; the priest left out the bit about the kingdom, the
power, and the glory.  Even though I didn't grow up in the Catholic
Church, the emotions of the crowd moved me nearly to tears.

I thought maybe I'd walk by where Mariana used to live before she
moved to Spain, where we'd stayed when we first arrived here; it was
only a couple of blocks away from the Mass, across some granite
crosswalks and past a cinema.

Halfway there, a drunk guy carrying a yellow washcloth and missing
some teeth beat his chest at me and said he was "¡loco!"  I grinned
and said I was too, and he shook my hand.  He asked if I was from
Germany; I explained that I was born in the US, but now I live in
Argentina.  He welcomed me to Argentina with great enthusiasm, 17
months late, gave me a big hug and a kiss on the cheek, and sent me on
my way.  I wished him luck.  I was happy to find that my wallet was
still in my pocket, and sad that I felt the need to check.

I walked a bit past Mariana's old apartment building without noticing,
but I thought I recognized the restaurant on the corner, so I turned
around the corner to see if the other restaurant I remembered was
still there.  This was an instance of the empanadas chain "1810", the
first place our tongues were ever blessed with the flavor of Argentine
empanadas horneadas.  Next door was a kiosco, the first place I'd
tasted an alfajor, although I didn't know that was what it was called
at the time.

The "1810" instance had tripled in size, devouring the restaurant next
door, and it was full of customers, which I suppose is a sign that the
Argentine economy is more or less functioning.

I unbuttoned my shirt a bit to cool off and went over to Cabildo to
wait for the bus home.  About ten or fifteen buses to the wrong place
passed me before I finally found a stop for the bus route that goes to
our house; six minutes later, there was my bus.

I love Argentina.

Wed, 12 Mar 2008

Socialtext provides a relatively sane REST API to their Wiki service,
and so I implemented a CVS-style interface that lets you edit Wiki pages
at your leisure and merge them back up to the Wiki server when you feel
like it.

Aside from the usual Web stuff (LWP, URI) this depends on the JSON
module.  I've tested it with version 1.00 of the JSON module from
Debian.

I'll put this up under a free-software license on
http://pobox.com/~kragen/sw/stclient shortly, but I have to sleep first.
For the time being, just notice that it has a notice that it is not in
the public domain.

#!/usr/bin/perl -w
# -*- mode: cperl; encoding: utf-8 -*-
use strict;
use LWP::UserAgent;
use HTTP::Request;
use URI;

# XXX there are a couple of inline modules here that probably should
# be split out into separate files for maintainability, but having
# them inline here makes this program easier to distribute.

=head1 NAME

st - CVS-style interaction with a Socialtext WikiWikiWeb

=head1 SYNOPSIS

 st checkout http://www.socialtext.net/your-wiki-name
 cd your-wiki-name
 vi homepage
 st diff -u
 st update
 st diff -u
 st commit

=head1 DESCRIPTION

C<st> is an example application of Socialtext's REST API.  It checks
out a copy of your Socialtext Wiki into your local filesystem and lets
you edit the text of your Wiki pages with your text editor of choice,
and then upload them again later, possibly merging in changes.

C<st checkout> or C<st co> prompts you for your email address and
Socialtext password, creates the work directory, and downloads all of
the pages in your Wiki.  This takes a while, and won't be practical on
a sufficiently large Wiki, but you only need to do it once.

C<st diff> or C<st di> shows the edits you currently have waiting to
be committed.  It produces unified diff output by default;
C<st diff -c> will give you context diffs instead.  This command works
even when you have no network access, such as when you're on a
commercial flight.

C<st update> or C<st up> updates the pages in your filesystem to the
current version on the Wiki server, trying to automatically merge in
any changes you have made locally.

C<st commit> or C<st ci> uploads any edits you have made to the Wiki
server.

=head1 FILESYSTEM LAYOUT

C<st> creates a directory named after your Socialtext workspace as a
child of the current directory.  Inside that directory, it creates a
.st directory containing the following files:

=over 4

=item C<user>

Your email address.

=item C<pass>

Your Socialtext password.

=item C<url>

The URL of the Wiki.

=item C<pristine>

A directory containing the last versions of all of the pages
downloaded from the server, one per file.

=item C<etags>

A directory containing the entity tags ("ETags") of all of the pages
downloaded from the server, one per file.

=back

=head1 BUGS

=over 4

=item *

Asks you for your username and password even for unparsable URLs.

=item *

Has a lost-update race condition when saving.
L<http://www.socialtext.net/st-rest-docs/index.cgi?http_status_codes>
doesn't list C<412 Precondition Failed> (see
L<http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.13>)
so there probably isn't any way to avoid that.  Should use If-Match
(L<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.24>)
or If-Unmodified-Since
(L<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.28>)
in order to try to avoid that.

=item *

Will fail if someone puts a newline in the user or password file using
C<vi>.

=item *

Won't be able to merge if C<diff3> isn't installed.  C<diff3> is an
"essential" package on Debian and has been part of Unix since at least
the 1980s, so this probably won't be a problem on any Unix system.

=item *

Won't be able to show diffs if C<diff> isn't installed.

=item *

No support for tags.

=item *

No support for attachments.

=item *

No support for comments.

=item *

No support for users, breadcrumbs, backlinks.

=item *

Very inefficient at figuring out whether files have been locally
modified.

=item *

Emits useless message apparently when there's a conflict:
C<diff3 exited with 256. at ../st line 442.>

=item *

Doesn't keep you from committing with conflicts.

=back

=head1 COPYRIGHT

Copyright 2008 Kragen Javier Sitaker.

=cut

my $sample_url = 'http://www.socialtext.net/st-rest-docs';

{
  package Dir;
  # "Dir" package to simplify reading and writing a bit.  Does this
  # maybe exist in the standard library yet?
  sub new {
    my ($class, $dir) = @_;
    return bless { dir => $dir }, $class;
  }

  sub create {
    my ($self) = @_;
    mkdir $self->{dir} or die "Can't create directory $self->{dir}: $!";
  }

  # XXX surely this is in the standard Perl library, right?  This
  # version won't work on old MacOS or VMS --- does anyone still care?
  sub _join {
    my ($dir, $filename) = @_;
    return "$dir/$filename";
  }

  sub child {
    my ($self, $filename) = @_;
    return _join($self->{dir}, $filename);
  }

  sub _read_file {
    my ($fn) = @_;
    open my $fh, '<', $fn or die "Couldn't open $fn: $!";
    return do { local $/; <$fh> }
  }

  sub _write_file {
    my ($fn, $content) = @_;
    my $fn_new = "$fn.new.$$";
    eval {
      open my $fh, '>', $fn_new or die "Couldn't open $fn_new: $!";
      print $fh $content or die "Couldn't write to $fn_new: $!";
      close $fh or die "Couldn't write to $fn_new: $!";
      rename $fn_new, $fn or die "Couldn't rename $fn_new to $fn: $!";
    };
    unlink $fn_new;
    die $@ if $@;
  }

  sub get {
    my ($self, $filename) = @_;
    return _read_file($self->child($filename));
  }

  sub set {
    my ($self, $filename, $content) = @_;
    return _write_file($self->child($filename), $content);
  }

  sub has {
    my ($self, $filename) = @_;
    return -e $self->child($filename);
  }

  sub subdir {
    my ($self, $subdirname) = @_;
    return Dir->new($self->child($subdirname));
  }

  sub filenames {
    my ($self) = @_;
    opendir my $dh, $self->{dir};
    return grep { $_ ne '.' and $_ ne '..' } readdir $dh;
  }

  sub pathnames {
    my ($self) = @_;
    return map { $self->child($_) } $self->filenames;
  }
}


{
  # An HTTP response for an x.socialtext-wiki page in the workspace.
  package Page;
  sub new {
    my ($class, $response) = @_;
    return bless { resp => $response }, $class;
  }

  sub content {
    my ($self) = @_;
    return $self->{resp}->content;
  }

  sub etag {
    my ($self) = @_;
    my $etag = $self->{resp}->header('ETag');
    die "No ETag for page: " . $self->content if not defined $etag;
    return $etag;
  }
}


{
  # Encapsulates both the filesystem directory and the HTTP user-agent
  # to talk to the server.
  package Client;
  use JSON qw();

  sub new {
    my ($class, $url, $user, $pass) = @_;
    my $uo = URI->new($url);
    my $local = $uo->path;
    unless ($local =~ m@\A/([^/]+)\Z@) {
      die "Couldn't parse URL $local --- should be something like $sample_url\n";
    }
    my $workspace_name = $1;

    my $ua = LWP::UserAgent->new;
    $ua->credentials($uo->host_port, "Socialtext", $user, $pass);

    return bless { 
        url_obj => $uo,
        workspace => $workspace_name,
        user => $user,
        pass => $pass,
        ua => $ua,
    }, $class;
  }

  sub new_from_dir {
    my ($class, $dirname) = @_;
    my $dir = Dir->new($dirname);
    my $conf = $dir->subdir('.st'); # XXX duplication
    my $self = $class->new($conf->get('url'),
                           $conf->get('user'),
                           $conf->get('pass'));
    $self->{dir} = $dir;
    return $self;
  }

  sub conf {
    my ($self) = @_;
    return $self->{dir}->subdir('.st');
  }

  sub pristine {
    my ($self) = @_;
    return $self->conf->subdir('pristine');
  }

  sub etags {
    my ($self) = @_;
    return $self->conf->subdir('etags');
  }

  sub data_url {
    my ($self, $trailing_path) = @_;
    my $uo = $self->{url_obj}->clone;
    $uo->path("/data/workspaces/$self->{workspace}/$trailing_path");
    return $uo;
  }

  sub http_get {
    my ($self, $url, $content_type) = @_;
    my $req = HTTP::Request->new(GET => $url->as_string);
    $req->header(Accept => $content_type);
    my $resp = $self->{ua}->request($req);
    die "HTTP request failure: " . $resp->as_string if $resp->is_error;
    return $resp;
  }

  sub get_json_data {
    my ($self, $trailing_path) = @_;
    my $resp = $self->http_get($self->data_url($trailing_path),
                               'application/json');
    return JSON::jsonToObj($resp->content);
  }

  sub get_page {
    my ($self, $page_id) = @_;
    return Page->new($self->http_get($self->data_url("pages/$page_id"),
                                     'text/x.socialtext-wiki'));
  }


  # Sample page datum from JSON index:
  #           {
  #             'page_uri' => 'https://www.socialtext.net/kregan-test-space/index.cgi?meeting_agendas',
  #             'page_id' => 'meeting_agendas',
  #             'name' => 'Meeting agendas',
  #             'modified_time' => '1205267435',
  #             'tags' => [
  #                         'Welcome',
  #                         'Recent Changes'
  #                       ],
  #             'uri' => 'meeting_agendas',
  #             'revision_id' => '20080311203035',
  #             'workspace_name' => 'kregan-test-space',
  # # (yeah, somebody misspelled my name)
  #             'last_edit_time' => '2008-03-11 20:30:35 GMT',
  #             'last_editor' => 'system-user at socialtext.net',
  #             'revision_count' => '1'
  #           },

  sub list_pages {
    my ($self) = @_;

    return @{$self->get_json_data('pages')};
  }

  sub outdated_files {
    my ($self) = @_;
    my @filenames = $self->etags->filenames;
    my %revision_ids = ();
    foreach my $page ($self->list_pages) {
      $revision_ids{$page->{page_id}} = $page->{revision_id};
    }
    return grep { not $self->etags->has($_) or
                    $self->etags->get($_) ne $revision_ids{$_} }
      keys %revision_ids;
  }

  # Called when a page has been updated from the server.
  sub updated {
    my ($self, $id, $resp) = @_;
    $self->pristine->set($id => $resp->content);
    $self->etags->set($id => $resp->etag);
  }

  # Download and save a page from the server; assumes no local changes
  sub update_page {
    my ($self, $id) = @_;
    print "U $id\n"; # XXX shouldn't this be in "command-line processing"?

    my $resp = $self->get_page($id);
    $self->{dir}->set($id => $resp->content);
    $self->updated($id => $resp);
    return $resp;
  }

  # Sets up a new workspace directory.
  sub create_dir {
    my ($self) = @_;

    $self->{dir} = Dir->new($self->{workspace});
    if (-e $self->{workspace}) {
      die "The subdirectory '$self->{workspace}' already exists here.\n";
    }

    $self->{dir}->create();
    $self->conf->create();
    $self->conf->set(user => $self->{user});
    $self->conf->set(pass => $self->{pass});
    $self->conf->set(url => $self->{url_obj}->as_string);
    $self->pristine->create();
    $self->etags->create();

    for my $page ($self->list_pages) {
      my $resp = $self->update_page($page->{page_id});

      # We insist on the following because it allows us to find out
      # which pages have been updated with a single collection GET
      # request, rather than a HEAD request for each page:
      my $etag = $resp->etag;
      die "ETag for page $page->{page_id} is $etag " .
        "but should be $page->{revision_id}"
        if $etag ne $page->{revision_id};
    }
  }

  # Merges updates from a single file.
  sub merge_updates {
    my ($self, $filename) = @_;
    my $tmpname = ".tmp.$$";
    my $newfile = ".tmp2.$$";
    eval {
      my $page = $self->get_page($filename);
      $self->{dir}->set($tmpname, $page->content); # XXX assumes {dir} means '.'
      open(my $pipe, '-|', 
        'diff3', '-m', $filename,
                       $self->pristine->child($filename),
                       $tmpname) or die "Can't open pipe: $!";
      open my $newfh, '>', $newfile or die "Can't open $newfile: $!";
      while (<$pipe>) {
        print $newfh $_;
      }
      if (not close $pipe) {
        die "popen of diff3 failed: $!" if $! != 0;
        warn "diff3 exited with $?."
      }
      close $newfh or die "Couldn't write to $newfile: $!";
      rename $newfile, $filename;
      $self->updated($filename => $page);
    };
    unlink $tmpname;
    unlink $newfile;
    die $@ if $@;
  }

  # Return a list of locally modified files.
  sub modified_files {
    my ($self) = @_;
    return grep { $self->{dir}->get($_) ne $self->pristine->get($_) } 
      $self->pristine->filenames;
  }

  # Upload a (hopefully-up-to-date) file to the server.
  sub commit_file {
    my ($self, $filename) = @_;

    my $data = $self->{dir}->get($filename); # XXX hope this is a byte string

    my $req = HTTP::Request->new(PUT => $self->data_url("pages/$filename"));
    # XXX the documentation doesn't say the server supports If-Match,
    # but if it gets correct support for it, then we'll get
    # concurrency-safe updates without any further changes to this
    # code.
    $req->header("If-Match" => $self->etags->get($filename));
    $req->header("Content-Type" => "text/x.socialtext-wiki");
    $req->header("Content-Length" => length $data);
    $req->content($data);

    my $response = $self->{ua}->request($req);

    if ($response->is_error) {
      die "Error uploading new version of $filename: " . $response->as_string;
    }

    $self->update_page($filename);
  }
}

# Ask the user for a line of input.
sub ask {
  my ($string) = @_;
  print $string;
  my $result = <STDIN>;
  chomp $result;
  return $result;
}

### User command-line processing.

sub checkout {
  my ($url) = @_;

  die_with_usage() if not $url;

  my $user = ask('Email address you use for your Socialtext account: ');
  my $pass = ask('Socialtext password (will be echoed): ');

  my $client = Client->new($url, $user, $pass);
  $client->create_dir();
}

sub diff {
  my (@flags) = @_;
  @flags = qw(-u) if not @flags; # specify --normal if you're a masochist
  foreach my $filename (Client->new_from_dir('.')->pristine->pathnames) {
    system 'diff', @flags, $filename, '.';
  }
}

sub update {
  my $client = Client->new_from_dir('.');
  foreach my $filename ($client->outdated_files) {
    $client->merge_updates($filename);
    print "M $filename\n";
  }
}

sub commit {
  my $client = Client->new_from_dir('.');
  # XXX need to check for unresolved conflicts
  my @files = $client->outdated_files;
  if (@files) {
    for my $filename (@files) {
      print "M $filename\n";
    }
    die "Bring the above files up-to-date with $0 update before committing.\n";
  }
  foreach my $filename ($client->modified_files) {
    $client->commit_file($filename);
    print "! $filename\n";
  }
}

sub die_with_usage {
  die <<EOF;
$0: usage: one of the following:
  $0 checkout $sample_url
  $0 diff
  $0 update
  $0 commit
EOF
}

my %cmds = (
  co => \&checkout, checkout => \&checkout,
  di => \&diff, diff => \&diff,
  up => \&update, update => \&update,
  ci => \&commit, commit => \&commit,
);

my $cmd = shift @ARGV;
die_with_usage if not $cmd or not $cmds{$cmd};
$cmds{$cmd}->(@ARGV);

Tue, 11 Mar 2008

Excellent points by Zooko.

I think it is illuminating to compare the internet to another
innovation in communication: there was an increase in armed
conflict in Europe lasting for centuries after Gutenberg's
invention of the printing press.

The printing press encouraged the dissemination of new ideas on
morality: new "ethical arguments".  Of course most of the ethical
arguments were Christian in nature.

I don't remember the author of the following hypothesis, but I
think he is a professional historian: disagreements over these
new ethical arguments caused an increase war and violence.

Some significant dates:

1450.     Gutenberg invents the printing press.
1517.     Martin Luther posts his 95 Theses.
1568      Start of the Eighty Year War between
          Catholic Spain and the Protestant low countries.
1597.     Francis Bacon publishes 3 books.
1618.     Start of the Thirty Year War.
1632.     Galileo publishes _Dialogs_.
1637.     Descartes published _Discourse_.

>From La Wik's article on the Eighty Year War:

>The Dutch Protestants compared their humble values favorably
>against the luxurious habits of the ecclesiastical nobility. The
>Protestant movement emphasized Christian virtues of modesty,
>cleanliness, frugality, and hard work. Symbolic stories from the
>New Testament, featuring fishermen, shipbuilders, and other
>simple occupations, resonated among the Dutch. The moral elements
>of the rebellion represented a challenge to the Spanish Empire.

Since the internet is now 37 years old, it might prove worthwhile
to try to identify an ethical argument that was disseminated
through the internet, then had a large impact on the world.  I
can think of one: Stallman's promotion of free software relied
heavily on an ethical argument (combined with Stallman's
willingness and ability to write good code).  Specifically, he
argued that the ownership of software should be regarded in the
same way that ownership of human being is regarded, and the
latter is surely an ethical argument.  Although Perens and
Raymond's "open-source movement" contradicted Stallman's ethical
argument (it is not unethical to own software, they said, but
there are excellent practical reasons to elect not to own it or
more precisely not to enforce all your rights as an owner"),
Perens and Raymond's practical argument would never have appeared
credible to, e.g., businessmen had not Stallman's ethical
argument motivated thousands of programmer to elect to contribute
to free software in the 14 years between when Stallman started
publishing on free software and when Perens and Raymond started
publishing on open-source software.

Of course Stallman's ethical argument is not capable of causing
an increase in violence, but I see no reason that the internet
could not be used to disseminate ethical arguments that are.

>Needless, I hope, to say: I think of this hypothesis as likely, not  
>because I want it to be true or find such prospects appealing

Same here.

I am thinking of a mechanism by which the internet will probably
decrease violence, but I expect it to exert its effect more
slowly than the mechanisms Zooko and I have described.

Mon, 10 Mar 2008

Several of the URLs in the previous message got broken across lines by
Zooko's mail client.  Here they are with a little context.

See John Robb for many ideas along these lines, richly bedecked with  
current examples, e.g.:

http://globalguerrillas.typepad.com/globalguerrillas/2008/02/henry-okah.html

I just read your note "freedom of communication will stop wars" [1].

[1] http://lists.canonical.org/pipermail/kragen-tol/2008-February/000880.html

(e.g. the famous study which showed
that people who watched Fox News were much more likely to believe
several untrue things about 9/11 and the Iraqi war [2])

[2] http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Az76adrcsxcud72oqjw62dzbuyy%3A57plpxz3skec4qnbhe43pzdfe2hjg5lh44wsfqxzg7y7klub2syq%3A3%3A10%3A319262?save=on&filename=IraqMedia_Oct03_rpt.pdf

Sun, 09 Mar 2008

A comment on a recent kragen-tol post, which I hope to be able to
respond to soon.

----- Forwarded message from zooko <zooko at zooko.com> -----

Cc: Josh Wilcox, Nathan Wilcox,
	Sebastian Kuzminsky,
	Amber O'Whielacronx,
	Olene Harris, Ron
From: zooko <zooko at zooko.com>
Subject: freedom of communication will cause more wars
To: Kragen Javier Sitaker <kragen at pobox.com>

Kragen:

I just read your note "freedom of communication will stop wars" [1].   
I remember being very excited about the Internet (and BBSes) in 1992  
+/- 1, and telling a bunch of people on a Greyhound bus all about  
this new freedom of communication and how it would stop all wars.   
However, subsequently there have been several wars among people who  
had some of those networks, and I've watched the development of this  
freedom of communication and the use of it closely, and I'm more  
skeptical of that hypothesis.

A related meme is the "McDonald's theory of world peace".  There was  
a period of several years after the fall of the Soviet Union when  
anti-war capitalists would state that there has never been a war  
between two countries, both of which hosted a McDonald's fast food  
restaurant.  The argument was that the spread of capitalism into a  
country is inevitably tied with the spread of freedom of speech,  
wealth, other Western ideas, etc., and that the people of such  
countries were too well-read, fat and happy to support going to war  
against each other.  This theory, too, has been disproven by example  
(I think the first counter-example was NATO's bombing of Serbia in  
1999) as well as undermined (in my own head) by further analysis.

Three countervailing considerations undermine my belief in such  
theories nowadays:

1. The power of filtering helps people to consume an information diet  
which only reinforces their current beliefs.  You may think that Fox  
News is divisive and inaccurate (e.g. the famous study which showed  
that people who watched Fox News were much more likely to believe  
several untrue things about 9/11 and the Iraqi war [2]).  But the  
bias and filtering effects of Fox News are probably much smaller than  
that of web sites avidly consumed by people who think that Fox News  
is too liberal and therefore don't watch it.  Then there is even more  
powerful filtering in use nowadays: consider the set of pages read by  
one individual, which set gets filtered through his friends and  
people he respects (by blog, e-mail, IM, texting), and by automated  
filtering tools customized for him.

2. The power of social networking facilitates smaller and more  
dispersed sets of people to bond and form a unified, loyal-to-one- 
another group.

3. The power of modern secure, robust, cheap, efficient  
communications is useful for actual war operations.

These are all more true with Internet-style decentralized  
communication and with automation in the hands of the end-user than  
with 20th-century style mass communication.  By this argument,  
Internet-style communications should lead to less empathy between  
opposing groups compared to mass communication, rather than more, and  
to more and smaller groups which are antipathic towards others, and  
to easier and safer attacks on others.

By the way, this line of thought argues that the Internet encourages  
war among sub-national groups -- it doesn't necessarily follow that  
it encourages war among nations.  But I'm currently more interested  
in the prospect of war among sub-national groups.

See John Robb for many ideas along these lines, richly bedecked with  
current examples, e.g.:

http://globalguerrillas.typepad.com/globalguerrillas/2008/02/henry- 
okah.html

Needless, I hope, to say: I think of this hypothesis as likely, not  
because I want it to be true or find such prospects appealing, but  
simply because I find it plausible.

Regards,

Zooko

[1] http://lists.canonical.org/pipermail/kragen-tol/2008-February/ 
000880.html
[2] http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK% 
3Az76adrcsxcud72oqjw62dzbuyy% 
3A57plpxz3skec4qnbhe43pzdfe2hjg5lh44wsfqxzg7y7klub2syq%3A3%3A10% 
3A319262?save=on&filename=IraqMedia_Oct03_rpt.pdf

----- End forwarded message -----

Sat, 08 Mar 2008

I tested this in SBCL.  See the end for instructions for running it.  On
a randomly generated file of 3000 points, it finds 2712 segments on my
machine in a little under 2 CPU minutes, which is about 23 lines or
points per second.  Not that fast, but hey, it's O(N²).

;;; Solution to fun pattern-recognition exercise
;http://www.cs.princeton.edu/courses/archive/spring08/cos226/assignments/lines.html
;; Briefly, the idea is that you can find sets of (exactly) collinear
;; points in O(N² lg N) time instead of O(N³) or worse, by sorting
;; the points by their angle from some starting point.

;; I think you can do O(N²) in most cases if you use a base-256 radix
;; sort.


;;; Notable features

;; I'm just using cons cells ("pairs") to hold points --- X in the
;; car, Y in the cdr.

;; The code is written in a purely functional style, with no side
;; effects except for input, output, and a couple of calls to "sort".


;;; Bugs

;; This version has an additional step at the end for removing
;; duplicate line segments.  This step is actually O(S²) where S is
;; the number of segments, multiplied by an additional O(M²) step of
;; looking for common points between the line segments if they are
;; parallel.  In the normal case, each point participates in zero,
;; one, or a few line segments, and the line segments are short and
;; not parallel, so this part is quite fast.

;; However, in a bad case, all of the points might be collinear.  This
;; will provoke O(N³) behavior.  I'm also not sure what the maximum
;; number of distinct 4-or-more-point line segments for N points is;
;; I'm sure it's not less than O(N) or more than O(N²), but if it's
;; more than O(N sqrt N), that could actually be even worse than the
;; all-collinear case.  That case would be easier to fix than the
;; all-collinear case.

;; It would be straightforward to maintain a hash of already-recorded
;; (slope, x, y) tuples, and query that before adding a new line
;; segment to the list, inside of "fast".  This would avoid the need
;; for the seven lines of code that currently eliminate duplicates in
;; O(S²) time, multiplying the asymptotic performance at O(N²) only by
;; a constant factor.  But I haven't done this.

;; A second bug is that multiple copies of the same point will be
;; considered to be collinear with some third point under some
;; circumstances (if the third point comes earlier in the file, or if
;; the line with the third point is vertical) but not under other
;; circumstances.

;; And it's possible that "brute" is actually worse than the O(N^4)
;; performance you'd naïvely expect, due to appending a bunch of
;; possibly long lists together.


;;; Input.

;; file format: one int (on first line) giving number of points N,
;; followed by N lines each containing two ints that are the
;; coordinates of a point.  We don't bother to pay attention to the
;; line boundaries.

;; Here's a sample data file (commented out):
;; 13
;; 16384  19200
;; 16384  18666
;; 16384  32000
;; 16384  21761
;; 10000  10000
;; 20000  10000
;; 30000  10000
;; 40000  10000
;; 1 1
;; 2 2
;; 3 3
;; 4 4
;; 5 5
;; It contains a horizontal line at y=10000, a vertical line at
;; x=16384, and a diagonal line at y=x.  

(defun get-point (stream)
  (cons (read stream) (read stream)))
(defun get-points (stream)
  (let ((npoints (read stream)))
    (loop for i from 1 to npoints
          collect (get-point stream))))
(defun get-points-from-fname (fname)
  (with-open-file (input fname :direction :input)
                  (get-points input)))


;;; Brute-force version.

;; Return the slope of the line between points a and b, or i (the
;; imaginary number) for vertical lines.  Positive infinity would be
;; better, but I don't think Common Lisp has a way to express it.
(defun slope (a b)
  (if (= (cdr a) (cdr b)) #C(0 1) ; return i for vertical lines
    (/ (- (car b) (car a))
       (- (cdr b) (cdr a)))))

;; Return true if three points are collinear.
(defun collinear-p (a b c) (= (slope a b) (slope a c)))

;; Return true if all points passed as arguments are collinear.
(defun all-collinear-p (a b &rest points)
  (loop for c in points
        always (collinear-p a b c)))

;; Brute-force main entry point: test all sets of four points.
(defun brute (points)
  (loop for p1s on points
        append (loop with p1 = (car p1s)
                     for p2s on (cdr p1s)
                     append (loop with p2 = (car p2s)
                                  for p3s on (cdr p2s)
                                  ;; Try to be slightly efficient by
                                  ;; not looping over fourth points
                                  ;; when the first three points are
                                  ;; not collinear.
                                  when (collinear-p p1 p2 (car p3s))
                                  append (loop with p3 = (car p3s)
                                               for p4 in (cdr p3s)
                                               when
                                                 (all-collinear-p p1 p2 p3 p4)
                                               collect (list p1 p2 p3 p4))))))


;;; "Fast" O(N² lg N) version: sort points by slope from a starting point

;; Return true if the slope of segment AB is less than the slope of segment AC.
(defun slope-<-p (a b c)
  (let ((slope1 (slope a b)) (slope2 (slope a c)))
    (and (not (complexp slope1)) ; return false if both or just first vertical
         (or (complexp slope2)          ; true if just second vertical
             (< slope1 slope2)))))

;; Partition "others", a list of points, into a list of
;; (variable-length) lines starting from "start" (appending "start" to
;; each one).
(defun fast-helper (start others)
  (if (null others) nil
    ;; "iterate" loops and adds points onto "currentline" until it
    ;; finds a point with a different slope, at which point it
    ;; recurses back to the top level of "fast-helper" to handle the
    ;; rest of the points.
    (labels ((iterate (a currentline others)
                      (cond ((null others)
                             (list currentline))
                            ((= (slope a (car currentline))
                                (slope a (car others)))
                             (iterate a
                                      (cons (car others) currentline)
                                      (cdr others)))
                            (t
                             (cons currentline (fast-helper a others))))))
      (iterate start
               (list (car others) start)
               (cdr others)))))

;; Sort a list of points by their slope from a certain starting point.
(defun sort-by-slope (start points)
  ;; copy-list because "sort" is destructive.  That was a frustrating
  ;; bug.
  (sort (copy-list points) (lambda (b c) (slope-<-p start b c))))

;; "Fast" main entry point.
;; For each point, finds all the line segments starting from it, and
;; discards the ones whose length is less than 4.
(defun fast (points)
  (loop for p1s on points
        append (let* ((p1 (car p1s))
                      (p2s (sort-by-slope p1 (cdr p1s))))
                 (remove-if (lambda (line) (< (length line) 4))
                            (fast-helper p1 p2s)))))


;;; Eliminating duplicate segments.

;; We assume that segments will have their last point in common.
;; This is valid with "fast" because all of the duplicate segments
;; will include the last point in the list.
(defun segments-equal (a b)
  (and (= (slope (car a) (cadr a))
          (slope (car b) (cadr b)))
       (intersection a b)))

;; Return the first member of each equivalence class of segments.
;; This does the right thing with "fast" because "fast" always
;; generates the longest segment first.  It doesn't do the correct
;; thing with "brute".
(defun distinct-segments (lines)
  (remove-duplicates lines :test #'segments-equal :from-end t))

(defun distinct-fast (points) (distinct-segments (fast points)))

;;; Output.

;; Comparator to provide a sorting order for points.
;; The only correctness constraint here is that the endpoints of a
;; line segment must sort outside of points inside that segment.
(defun point-<-p (a b)
  (or (< (car a) (car b))
      (and (= (car a) (car b)) (< (cdr a) (cdr b)))))

;; Destructively sort the points within lines to bring their endpoints
;; to the ends.
(defun normalize-lines (lines)
  (mapcar (lambda (points) (sort points #'point-<-p)) lines))

;; List the line segments in a nice format, according to their endpoints.
(defun print-lines (lines)
  (loop for line in (normalize-lines lines)
       do (let ((a (car line)) (b (car (last line))))
            (format t "(~S, ~S) -> (~S, ~S)~%"
                    (car a) (cdr a) (car b) (cdr b)))))


;;; Main program

(defun print-lines-from-fname (linefinder fname)
  (print-lines
   (funcall linefinder
            (get-points-from-fname fname))))

;;; Main program (SBCL version)

(defun sbcl-main ()
  (print-lines-from-fname #'distinct-fast (second *posix-argv*))
  (quit))
;; Run as:
;; sbcl --noinform --load collinear-points.lisp --eval '(sbcl-main)' pointsfile3
;; Alternatively, compile as:
;; (load "collinear-points.lisp")
;; (sb-ext:save-lisp-and-die "collinear-points" :toplevel #'sbcl-main
;;  :executable t)

Thu, 06 Mar 2008

In http://blog.plover.com/calendar/leapday.html Mark-Jason Dominus
suggests this algorithm for calculating leap years, as a proposed
replacement for the Gregorian system:

   1. Divide the year by 33. If the result [remainder?] is 0, it is not
      a leap year.  Otherwise,
   2. If the result is divisible by 4, it is a leap year. 

This Dominus Calendar has an average leap-day correction of
0.24242424... leap-days per year, against the Gregorian calendar's
0.2425, the tropical year's 0.24219 or so leap-days per year, and the
vernal equinox year's 0.2422 or so leap-days per year.

Perhaps it would be slightly simpler, and equally accurate, to do the
following:

   1. Divide the year by 33.
   2. Divide the remainder by 4.
   3. If the remainder is 1, it is a leap year.

Other values that would work in place of 1 are 2 and 3, but not 0.

I suspect there is some simpler algorithm (in the sense of not requiring
division by a large number such as 33) that is also more accurate.  You
could write the second one as \ y . ((y % 33) % 4) == 1 and the first
one as \ y . (\ r . (r != 0) && ((r % 4) == 0)) (y % 33), or if you're
into flat representations,
	r = y % 33
	r2 = r % 4
	result = r2 == 1
or
	r = y % 33
	r2 = r != 0
	r3 = r % 4
	r4 = r3 == 0
	result = r2 & r4

If our repertoire of primitives is %, ==, !=, <, &, and |, our
repertoire of operands is limited to integers in [0, 33] and previously
produced results, then the formula in each step has at most 39 * 6 * 39
= 12168 possibilities.  If our number of steps is limited to 5, then
there are only 12168^5 programs to search to guarantee that we find the
above two, which is unfortunately 266 744 826 599 558 381 568 programs.
That's on the edge of being practical to search by brute force; I think
it would be less than a thousand years on a thousand-CPU cluster.

The four-step programs should be practical to exhaustively search ---
they should take a few days at most on such a cluster.  A language
including + but only integers up to 17 would also include a form of my
short program, and would have at most (18+4) * 7 * (18+4) = 3388
possible formulas for a step, which gives less than 3388^4 programs to
search, which is only 131 756 972 359 936.  More exactly, there are 2527
possible first formulas, 2800 possible second ones, and then 3087, 3388.
The actual product is 74 001 973 953 600, almost twice as small.
Type-compatibility (the result has to be boolean; & and | can only apply
to two booleans; other operators can only apply to two numbers)
restricts the number of formulas further, but that number is difficult
to calculate exactly.

One difficulty is that measuring the accuracy of such a leap-day
calculation program presumably requires running it a number of times on
different year numbers, which adds an order of magnitude or two to the
search time.  If you have to run each one on average 30 times to find
out that it's unacceptable, you could end up running 2 x 10^15 steps or
so, which is maybe 10^16 instructions.  A modern quad-core CPU can
probably run around 5 x 10^9 instructions per second, so that leaves 2 x
10^5 seconds of testing if each primitive instruction above is a CPU
instruction.

A better approach may be to run the instructions SIMD-fashion, APL-like,
on a bunch of years at once.  This should avoid the need for dynamic
code generation to get reasonable performance, although you probably
still want to have the inner loops written in C or assembly.  (I don't
think you can use MMX or SSE for the division, but you can probably use
it for comparison and population count.)