Sat, 10 Dec 2005

Was pondering drag-and-drop, so I thought I'd put together a magnetic
poetry kit in DHTML --- the simplest possible use of drag-and-drop.
I took some time selecting words --- by hand, from the most common words
in the British National Corpus, and from actual poetry --- and removing
words I thought unsuitable.  I'm interested to hear feedback.

This is also online at http://pobox.com/~kragen/sw/magpoetry.html

Oh, and of course, I've only tried to make this work in Mozilla FireFox.


<html><head><title>Magnetic poetry in DHTML</title>
<script type="text/javascript">
function equals(a) {
    return function(b) { return a == b }
}
function has_class(classname) {
    return function(elem) {
        return elem.className.split(/\s+/).some(equals(classname))
    }
}

function move(node) {
    return function(ev) {
        ev.preventDefault()
        node.style.left = node.orig_offset_left + ev.screenX - node.origScreenX
        node.style.top = node.orig_offset_top + ev.screenY - node.origScreenY
    }
}

function kill(elem) { elem.parentNode.removeChild(elem) }

function release(node) {
    return function(ev) {
        document.removeEventListener('mousemove', node.mover_listener, true)
        document.removeEventListener('mouseup', node.releaser_listener, true)
        node.style.color = ''
        kill(node.origin)
    }
}

function start_drag(parent) {
    return function (ev) {
        ev.preventDefault()
        var node = ev.target
        while (node.parentNode != parent) node = node.parentNode
        var newnode = node.cloneNode(document)
        newnode.orig_offset_left = node.offsetLeft
        newnode.orig_offset_top = node.offsetTop
        newnode.style.position = 'absolute'
        newnode.style.left = newnode.orig_offset_left
        newnode.style.top = newnode.orig_offset_top
        newnode.style.display = 'block'
        newnode.style.color = '#7f7f7f'
        newnode.origScreenX = ev.screenX
        newnode.origScreenY = ev.screenY
        newnode.mover_listener = move(newnode)
        newnode.releaser_listener = release(newnode)
        newnode.origin = node
        document.addEventListener('mousemove', newnode.mover_listener, true)
        document.addEventListener('mouseup', newnode.releaser_listener, true)
        document.body.appendChild(newnode)
        make_draggable(newnode)
    }
}
function make_draggable(elem) {
    elem.addEventListener('mousedown', start_drag(elem.parentNode), true)
}
function make_contents_draggable(elem) {
    elem.addEventListener('mousedown', start_drag(elem), true)
}
function init() {
    Array.filter(document.all, has_class('dnd')).forEach(make_contents_draggable)
}
</script>
</head><body onload="init()">
<h1>Magnetic poetry DHTML</h1>
<p style="height: 20em"></p>
<p class="dnd">

<span>this</span> <span>we</span> <span>that</span> <span>are</span>
<span>back</span> <span>madonna</span> <span>sun</span>
<span>good</span> <span>possible</span> <span>to</span>
<span>boy</span> <span>piss</span> <span>within</span>
<span>might</span> <span>try</span> <span>compute</span>
<span>period</span> <span>it</span> <span>trade</span> <span>us</span>
<span>knowledge</span> <span>has</span> <span>but</span>
<span>school</span> <span>had</span> <span>language</span>
<span>is</span> <span>piece</span> <span>love</span> <span>to</span>
<span>leaves</span> <span>they</span> <span>door</span>
<span>my</span> <span>ly</span> <span>endless</span>
<span>right</span> <span>oil</span> <span>has</span> <span>shit</span>
<span>give</span> <span>heat</span> <span>do</span> <span>not</span>
<span>a</span> <span>major</span> <span>city</span> <span>or</span>
<span>his</span> <span>ocean</span> <span>land</span>
<span>radiant</span> <span>,</span> <span>s</span> <span>given</span>
<span>dawdle</span> <span>has</span> <span>s</span> <span>best</span>
<span>be</span> <span>one</span> <span>,</span> <span>like</span>
<span>and</span> <span>higher</span> <span>will</span>
<span>are</span> <span>in</span> <span>broken</span> <span>want</span>
<span>reach</span> <span>yellow</span> <span>do</span> <span>:</span>
<span>group</span> <span>last</span> <span>down</span> <span>it</span>
<span>I</span> <span>blue</span> <span>by</span> <span>well</span>
<span>hands</span> <span>it</span> <span>heart</span> <span>to</span>
<span>deal</span> <span>bid</span> <span>as</span> <span>every</span>
<span>ed</span> <span>food</span> <span>possible</span> <span>I</span>
<span>sea</span> <span>leg</span> <span>different</span>
<span>group</span> <span>bring</span> <span>some</span>
<span>home</span> <span>his</span> <span>outside</span>
<span>were</span> <span>close</span> <span>stem</span>
<span>who</span> <span>round</span> <span>always</span>
<span>spring</span> <span>able</span> <span>street</span>
<span>big</span> <span>main</span> <span>hair</span> <span>yet</span>
<span>wear</span> <span>.</span> <span>,</span> <span>wrap</span>
<span>your</span> <span>'s</span> <span>by</span> <span>about</span>
<span>November</span> <span>this</span> <span>her</span>
<span>I</span> <span>beneath</span> <span>er</span> <span>s</span>
<span>here</span> <span>like</span> <span>they</span> <span>I</span>
<span>idea</span> <span>gave</span> <span>d</span> <span>you</span>
<span>good</span> <span>year</span> <span>still</span>
<span>must</span> <span>America</span> <span>which</span>
<span>so</span> <span>water</span> <span>by</span> <span>y</span>
<span>seem</span> <span>s</span> <span>boy</span> <span>the</span>
<span>a</span> <span>fill</span> <span>were</span> <span>by</span>
<span>up</span> <span>do</span> <span>it</span> <span>the</span>
<span>this</span> <span>s</span> <span>curd</span> <span>blue</span>
<span>more</span> <span>I</span> <span>boat</span> <span>was</span>
<span>four</span> <span>became</span> <span>be</span>
<span>just</span> <span>over</span> <span>I</span> <span>will</span>
<span>life</span> <span>looking</span> <span>penetrate</span>
<span>pay</span> <span>him</span> <span>est</span> <span>dirt</span>
<span>was</span> <span>let</span> <span>such</span> <span>fall</span>
<span>sleep</span> <span>season</span> <span>;</span> <span>er</span>
<span>seem</span> <span>s</span> <span>strong</span> <span>what</span>
<span>likely</span> <span>it</span> <span>going</span>
<span>and</span> <span>hair</span> <span>walk</span> <span>un</span>
<span>er</span> <span>cup</span> <span>month</span> <span>s</span>
<span>what</span> <span>just</span> <span>run</span> <span>so</span>
<span>right</span> <span>we</span> <span>is</span> <span>there</span>
<span>or</span> <span>has</span> <span>s</span> <span>end</span>
<span>any</span> <span>back</span> <span>him</span> <span>ing</span>
<span>could</span> <span>is</span> <span>wrote</span>
<span>much</span> <span>roll</span> <span>turn</span> <span>s</span>
<span>use</span> <span>morning</span> <span>do</span>
<span>stood</span> <span>while</span> <span>soul</span>
<span>im</span> <span>y</span> <span>in</span> <span>a</span>
<span>sister</span> <span>women</span> <span>commune</span>
<span>,</span> <span>does</span> <span>and</span> <span>day</span>
<span>the</span> <span>I</span> <span>line</span> <span>it</span>
<span>some</span> <span>trade</span> <span>when</span>
<span>both</span> <span>black</span> <span>call</span> <span>ed</span>
<span>did</span> <span>anoint</span> <span>.</span> <span>ed</span>
<span>how</span> <span>in</span> <span>plans</span> <span>sky</span>
<span>life</span> <span>night</span> <span>would</span> <span>I</span>
<span>there</span> <span>paper</span> <span>,</span> <span>open</span>
<span>is</span> <span>do</span> <span>es</span> <span>I</span>
<span>blue</span> <span>yeah</span> <span>different</span>
<span>es</span> <span>man</span> <span>blossom</span>
<span>potatoes</span> <span>now</span> <span>list</span>
<span>I</span> <span>has</span> <span>summer</span> <span>fact</span>
<span>other</span> <span>land</span> <span>ice</span> <span>thy</span>
<span>peace</span> <span>his</span> <span>is</span> <span>years</span>
<span>cup</span> <span>American</span> <span>wet</span>
<span>him</span> <span>her</span> <span>quite</span>
<span>think</span> <span>give</span> <span>work</span> <span>or</span>
<span>minister</span> <span>at</span> <span>has</span>
<span>thought</span> <span>was</span> <span>has</span>
<span>much</span> <span>s</span> <span>success</span>
<span>mouth</span> <span>loss</span> <span>paper</span>
<span>sudden</span> <span>help</span> <span>move</span>
<span>ment</span> <span>cold</span> <span>then</span> <span>is</span>
<span>out</span> <span>,</span> <span>though</span> <span>wish</span>
<span>with</span> <span>claw</span> <span>of</span> <span>be</span>
<span>his</span> <span>y</span> <span>west</span> <span>is</span>
<span>come</span> <span>talk</span> <span>but</span> <span>may</span>
<span>,</span> <span>at</span> <span>am</span> <span>nobody</span>
<span>child</span> <span>they</span> <span>big</span> <span>s</span>
<span>only</span> <span>was</span> <span>work</span> <span>s</span>
<span>and</span> <span>wench</span> <span>house</span>
<span>what</span> <span>is</span> <span>January</span>
<span>you</span> <span>but</span> <span>real</span>
<span>during</span> <span>thee</span> <span>it</span>
<span>never</span> <span>gray</span> <span>red</span> <span>was</span>
<span>which</span> <span>large</span> <span>,</span> <span>him</span>
<span>above</span> <span>sir</span> <span>who</span> <span>blue</span>
<span>start</span> <span>dog</span> <span>y</span> <span>end</span>
<span>about</span> <span>so</span> <span>they</span> <span>same</span>
<span>to</span> <span>flower</span> <span>simple</span>
<span>were</span> <span>center</span> <span>stand</span>
<span>France</span> <span>by</span> <span>bring</span>
<span>last</span> <span>about</span> <span>we</span> <span>the</span>
<span>ed</span> <span>which</span> <span>men</span> <span>you</span>
<span>have</span> <span>I</span> <span>ly</span> <span>by</span>
<span>pig</span> <span>by</span> <span>d</span> <span>less</span>
<span>by</span> <span>we</span> <span>to</span> <span>me</span>
<span>brother</span> <span>first</span> <span>hold</span>
<span>in</span> <span>full</span> <span>up</span> <span>est</span>
<span>es</span> <span>winter</span> <span>made</span> <span>.</span>
<span>that</span> <span>part</span> <span>be</span> <span>no</span>
<span>from</span> <span>say</span> <span>she</span> <span>more</span>
<span>further</span> <span>eat</span> <span>y</span> <span>with</span>
<span>she</span> <span>rumor</span> <span>old</span> <span>if</span>
<span>and</span> <span>figure</span> <span>the</span>
<span>part</span> <span>British</span> <span>boy</span> <span>s</span>
<span>built</span> <span>could</span> <span>office</span>
<span>room</span> <span>die</span> <span>old</span>
<span>changes</span> <span>bless</span> <span>company</span>
<span>do</span> <span>apple</span> <span>title</span>
<span>aware</span> <span>I</span> <span>where</span> <span>ed</span>
<span>empty</span> <span>I</span> <span>last</span> <span>under</span>
<span>but</span> <span>quite</span> <span>let</span>
<span>three</span> <span>post</span> <span>:</span>
<span>working</span> <span>into</span> <span>Britain</span>
<span>flower</span> <span>could</span> <span>leave</span>
<span>some</span> <span>second</span> <span>such</span>
<span>tried</span> <span>free</span> <span>be</span> <span>as</span>
<span>s</span> <span>strike</span> <span>are</span> <span>too</span>
<span>?</span> <span>first</span> <span>first</span> <span>and</span>
<span>long</span> <span>are</span> <span>water</span>
<span>general</span> <span>each</span> <span>do</span>
<span>much</span> <span>behind</span> <span>as</span>
<span>love</span> <span>in</span> <span>new</span>
<span>imagine</span> <span>fall</span> <span>lead</span>
<span>,</span> <span>in</span> <span>s</span> <span>let</span>
<span>or</span> <span>coal</span> <span>like</span> <span>was</span>
<span>car</span> <span>with</span> <span>ability</span>
<span>on</span> <span>es</span> <span>of</span> <span>know</span>
<span>be</span> <span>by</span> <span>st</span> <span>in</span>
<span>.</span> <span>is</span> <span>can</span> <span>lose</span>
<span>st</span> <span>y</span> <span>we</span> <span>the</span>
<span>paint</span> <span>was</span> <span>green</span> <span>an</span>
<span>would</span> <span>time</span> <span>it</span> <span>her</span>
<span>window</span> <span>region</span> <span>I</span>
<span>more</span> <span>northern</span> <span>there</span>
<span>into</span> <span>ocean</span> <span>was</span>
<span>seem</span> <span>s</span> <span>make</span> <span>being</span>
<span>decision</span> <span>days</span> <span>cover</span>
<span>about</span> <span>through</span> <span>which</span>
<span>drug</span> <span>stop</span> <span>or</span> <span>Brit</span>
<span>work</span> <span>this</span> <span>since</span>
<span>third</span> <span>view</span> <span>air</span>
<span>whip</span> <span>another</span> <span>early</span>
<span>any</span> <span>one</span> <span>almost</span> <span>saw</span>
<span>even</span> <span>before</span> <span>as</span>
<span>them</span> <span>be</span> <span>cold</span> <span>went</span>
<span>so</span> <span>nothing</span> <span>one</span> <span>her</span>
<span>was</span> <span>y</span> <span>good</span> <span>couple</span>
<span>moon</span> <span>most</span> <span>I</span> <span>y</span>
<span>god</span> <span>but</span> <span>was</span> <span>is</span>
<span>will</span> <span>each</span> <span>may</span> <span>it</span>
<span>beyond</span> <span>it</span> <span>are</span> <span>fact</span>
<span>es</span> <span>candle</span> <span>these</span>
<span>should</span> <span>made</span> <span>it</span> <span>was</span>
<span>I</span> <span>upon</span> <span>father</span> <span>most</span>
<span>young</span> <span>public</span> <span>would</span>
<span>paper</span> <span>oak</span> <span>priest</span>
<span>ly</span> <span>ly</span> <span>ear</span> <span>if</span>
<span>into</span> <span>lose</span> <span>muscular</span>
<span>die</span> <span>main</span> <span>school</span>
<span>high</span> <span>rape</span> <span>first</span> <span>s</span>
<span>looking</span> <span>can</span> <span>last</span>
<span>be</span> <span>mountain</span> <span>likely</span>
<span>other</span> <span>million</span> <span>I</span> <span>er</span>
<span>pendant</span> <span>be</span> <span>swell</span>
<span>whole</span> <span>six</span> <span>all</span> <span>to</span>
<span>:</span> <span>remember</span> <span>evoke</span>
<span>the</span> <span>years</span> <span>end</span> <span>our</span>
<span>round</span> <span>pure</span> <span>but</span>
<span>around</span> <span>role</span> <span>through</span>
<span>from</span> <span>;</span> <span>like</span> <span>.</span>
<span>grip</span> <span>desert</span> <span>if</span> <span>s</span>
<span>y</span> <span>anoint</span> <span>now</span> <span>kept</span>
<span>news</span> <span>those</span> <span>political</span>
<span>will</span> <span>party</span> <span>so</span> <span>know</span>
<span>wretched</span> <span>it</span> <span>family</span>
<span>.</span> <span>finale</span> <span>two</span> <span>do</span>
<span>coming</span> <span>upon</span> <span>a</span>
<span>throb</span> <span>taken</span> <span>es</span>
<span>fuck</span> <span>energy</span> <span>look</span>
<span>no</span> <span>a</span> <span>think</span> <span>went</span>
<span>we</span> <span>sorry</span> <span>be</span> <span>looked</span>
<span>should</span> <span>says</span> <span>ing</span>
<span>hand</span> <span>it</span> <span>same</span> <span>image</span>
<span>earth</span> <span>life</span> <span>an</span> <span>wood</span>
<span>s</span> <span>terror</span> <span>no</span> <span>dark</span>
<span>at</span> <span>own</span> <span>stone</span>
<span>friend</span> <span>see</span> <span>walk</span>
<span>they</span> <span>s</span> <span>what</span> <span>the</span>
<span>limpid</span> <span>est</span> <span>hospital</span>
<span>his</span> <span>might</span> <span>ed</span> <span>win</span>
<span>he</span> <span>effect</span> <span>able</span> <span>was</span>
<span>his</span> <span>dim</span> <span>case</span> <span>dress</span>
<span>two</span> <span>know</span> <span>tree</span> <span>eye</span>
<span>soul</span> <span>even</span> <span>new</span>
<span>against</span> <span>'s</span> <span>more</span>
<span>ish</span> <span>,</span> <span>at</span> <span>milk</span>
<span>are</span> <span>er</span> <span>my</span> <span>rain</span>
<span>sparrow</span> <span>are</span> <span>got</span> <span>it</span>
<span>'s</span> <span>is</span> <span>this</span> <span>as</span>
<span>god</span> <span>despair</span> <span>?</span> <span>are</span>
<span>got</span> <span>of</span> <span>as</span> <span>poor</span>
<span>for</span> <span>those</span> <span>unto</span>
<span>reasons</span> <span>er</span> <span>yes</span>
<span>salt</span> <span>been</span> <span>cerulean</span>
<span>s</span> <span>between</span> <span>,</span> <span>rip</span>
<span>again</span> <span>better</span> <span>s</span>
<span>better</span> <span>his</span> <span>.</span> <span>him</span>
<span>fall</span> <span>many</span> <span>is</span> <span>are</span>
<span>can</span> <span>bad</span> <span>two</span> <span>and</span>
<span>by</span> <span>kitchen</span> <span>over</span>
<span>sleep</span> <span>there</span> <span>knife</span>
<span>would</span> <span>years</span> <span>er</span> <span>in</span>
<span>hot</span> <span>carry</span> <span>was</span>
<span>local</span> <span>last</span> <span>like</span> <span>he</span>
<span>something</span> <span>ed</span> <span>by</span>
<span>said</span> <span>yet</span> <span>is</span> <span>me</span>
<span>white</span> <span>news</span> <span>sense</span>
<span>any</span> <span>do</span> <span>this</span> <span>by</span>
<span>staff</span> <span>which</span> <span>to</span> <span>it</span>
<span>it</span> <span>rather</span> <span>what</span> <span>war</span>
<span>mountain</span> <span>use</span> <span>mother</span>
<span>each</span> <span>s</span> <span>bring</span> <span>great</span>
<span>it</span> <span>hand</span> <span>of</span>
<span>everything</span> <span>will</span> <span>game</span>
<span>win</span> <span>it</span> <span>we</span> <span>live</span>
<span>by</span> <span>as</span> <span>would</span> <span>.</span>
<span>head</span> <span>it</span> <span>this</span> <span>ample</span>
<span>are</span> <span>went</span> <span>record</span> <span>s</span>
<span>husband</span> <span>just</span> <span>thou</span>
<span>direct</span> <span>mind</span> <span>mold</span>
<span>always</span> <span>by</span> <span>half</span>
<span>visit</span> <span>out</span> <span>girl</span>
<span>past</span> <span>such</span> <span>I</span> <span>at</span>
<span>I</span> <span>.</span> <span>about</span> <span>December</span>
<span>diamond</span> <span>get</span> <span>ready</span>
<span>their</span> <span>off</span> <span>village</span>
<span>is</span> <span>decided</span> <span>not</span> <span>I</span>
<span>by</span> <span>ice</span> <span>go</span> <span>one</span>
<span>it</span> <span>which</span> <span>create</span>
<span>could</span> <span>made</span> <span>like</span>
<span>meat</span> <span>this</span> <span>cigar</span> <span>s</span>
<span>in</span> <span>four</span> <span>carry</span>
<span>algorithm</span> <span>we</span> <span>more</span>
<span>fire</span> <span>think</span> <span>are</span>
<span>springtime</span> <span>cat</span> <span>let</span>
<span>there</span> <span>there</span> <span>from</span>
<span>without</span> <span>cherry</span> <span>sea</span>
<span>cream</span> <span>as</span> <span>September</span>
<span>along</span> <span>number</span> <span>shine</span>
<span>along</span> <span>only</span> <span>foreign</span>
<span>back</span> <span>forest</span> <span>go</span>
<span>heart</span> <span>now</span> <span>but</span>
<span>enough</span> <span>of</span> <span>;</span> <span>but</span>
<span>I</span> <span>death</span> <span>price</span> <span>on</span>
<span>ass</span> <span>hard</span> <span>cream</span>
<span>middle</span> <span>the</span> <span>ed</span> <span>by</span>
<span>I</span> <span>kind</span> <span>it</span> <span>social</span>
<span>as</span> <span>we</span> <span>of</span> <span>puke</span>
<span>delight</span> <span>under</span> <span>only</span>
<span>tit</span> <span>in</span> <span>ing</span> <span>for</span>
<span>to</span> <span>sent</span> <span>icon</span> <span>there</span>
<span>is</span> <span>surface</span> <span>very</span>
<span>there</span> <span>what</span> <span>has</span> <span>s</span>
<span>light</span> <span>to</span> <span>as</span> <span>by</span>
<span>until</span>

</p>
</body></html>

Thu, 08 Dec 2005

Semistructured data and QBE in XML
==================================

One of the nice things about spreadsheets is that it's all in one
place: your input data, your program, and the output of your program.
Because of this, you have the closure property: the output of your
program can be used as the input for another program.  It's nice to
have the closure property along other axes as well (that your program
itself can be the input or output of another program) but that's a
higher level.

Compare to an ordinary RDBMS: your data is in the database, your
queries are in text files or your client GUI, and your query results
are spewing out at you.  Creating a view is quite different from, and
quite a bit more trouble than, running a query.

Suppose you have a big XML file that has all your data, and you'd like
some of the data replicated around the doc in a controlled way (a
summary here, a detail there), so you can edit it wherever (and
presumably follow hyperlinks between the views).

You could select a bunch of subtrees by providing a QBE node, which
would have some common subset of the subtrees you wanted, identified
by text contents, tag and attribute names, and ancestor relationships.
For example, <person/> would match all the <person> elements in the
document, <person><firstname/></person> would match all the person
elements with firstname subelements, <person firstname=""/> would
match all the person elements with firstname attributes, <person
firstname="Bob"/> would match all the person elements with firstname
attributes with content exactly "Bob",
<person><firstname>Bob</firstname></person> would match all the person
elements containing firstname children with content exactly "Bob",
etc.

(Maybe to start with you'd want this to only pick children of some
particular node, like the root.)

You'd want to be able to elaborate on these selection queries in
various ways: case-insensitivity, substring queries ("person contains
text string Bob"), descendant queries, order relationships, negation,
predicates on ancestors, etc.  Maybe you'd like full XQuery and/or
XPath.  But I'll ignore all that for now.

First you need to be able to get results!  You could start with a
special attribute, "like", that points to the ID of a query.  So if
you put the following text in your document:

    <person id="firstnameguy"><firstname/></person>
    <wibble like="firstnameguy" />

then your editor would instantly expand out your wibble element to
have all kinds of crazy contents:

    <person id="firstnameguy"><firstname/></person>
    <wibble like="firstnameguy">
      <person>I know <firstname>Bob</firstname> <lastname>Smith</lastname>.
      </person>
      <person>My cat <firstname>Slink</firstname> likes mice.</person>
    </wibble>

Which would be linked back to their original sources in the document,
maybe with magically recomputed XPaths in some attribute, or maybe
with some kind of IDREF, but anyway you could hit some key to go to
the original, or you could edit it there.

Now suppose you want to restructure the tree in your query --- maybe
you just want a list of firstnames.  You could give an example of the
result you want and reference it, too, by ID:

    <person id="firstnameguy"><firstname id="hisfirstname"/></person>
    <p id="firstname_table">Name is <n from="hisfirstname" />.</p>
    <div like="firstnameguy" format="firstname_table" />

And your wibble would expand out into
    <div like="firstnameguy" format="firstname_table" />
      <p>Name is <n>Bob</n>.</p>
      <p>Name is <n>Slink</n>.</p>
    </div>

Still with the links back to the source, and still with the ability to
edit.  Inserting more elements into the query results is a pretty
straightforward thing to handle --- you just copy the QBE tree into
someplace.  Deleting them isn't so obvious; does the person want the
original node to stop existing, or just to stop satisfying the query,
perhaps by one of its subtrees being renamed, deleted, or something?
You can always jump to the original item itself to delete it.

So already, we have basic CRUD here in our hypothetical
XML-editor/semistructured-data-store, without too much syntax or
hassle, with enough smarts to do some vaguely useful HTML parsing and
reporting.  (We have a kind of 'select', 'project' into some simple
templates, but no join, intersect, difference, or union.)

You could add some glue to transclude arbitrary HTML documents in your
semistructured data store (which would be polled from time to time);
you could export parts of your document into URL-space with some other
magic attribute.  You could even map POSTs to some part of URL-space
into some element of your document, so HTML form posts to that URL
would get encoded in XML and added to your document.

Probably you'd want at some point to be able to control which node was
the "context node" for a query --- that is, to use a query to look for
results in just one particular subtree of the document.

Maybe use a special tag, like <query>Bob</query>, instead of just an
attribute?

What about parameters?  Linking?  Detail records?

Maybe it would be better to attach semantics to tag names instead of
using an IDREF attribute.  For example, you could say something like
(sorry for confused example):

    <person prototype="true">
      <name id="personsname"><firstname/> <lastname/></name>
      <friendof>
        <query><person><friend equalto="personsname"/></person></query>
      </friendof>
    </person>

to specify that whenever there's a <person>, it should have a <name>
containing <firstname> and <lastname> elements, and a <friendof>
element containing the results of a query over other <person> objects.

Maybe some elements or queries could reach out into their environment
to get query parameters, in the same relative place that they got them
from in the place where they were defined.

Anyway, I'm getting bogged down in details of the infinite
possibilities of how to design this system so that it could possibly
make sense, without really focusing on the main point, which is that
embedding some kind of specification of queries into your giant XML
document and including editable results of those queries immediately
inline could give you a relatively useful end-user database with little
hassle.

This relates to my feeling that I don't have a good semistructured data
store (see "semistructured data: summary of six years of wishes" <insert
ref>); to my desire to produce an end-user-programmable web-page
compositing system (see "Lossless HTML template expansion"
<http://lists.canonical.org/pipermail/kragen-tol/2005-April/000771.html>).
This does not relate to Meredith Patterson's AI project in Postgres
called "QBE".

Mon, 05 Dec 2005

(This is a recapitulation of
<http://lists.canonical.org/pipermail/kragen-tol/1999-August/000466.html>,
although that contains more implementation notes, and this contains more
principles.  There are a few other differences of opinion.)

I have in mind an XML editor constructed on the following principles:

1. Each keystroke should have a visible effect on the screen, and
should have no effects not visible on the screen.

2. The document should be close to well-formed at all times, at least
in the sense that it should have no syntactically invalid tags,
unmatched tags, or mis-nested tags.

3. The effect of a keystroke should be predictable from the contents
of the screen; it should not depend on any invisible state.

4. The screen should display only characters from the XML document,
the cursor, and possibly a selection.

5. If, starting from an empty file, you were to type the characters of
a well-formed XML document, one after the other, without making any
mistakes, the result should be that XML document, or at least one
semantically equivalent to it.

6. If you type one wrong printable character, you should always be
able to undo your mistake by hitting the backspace key once.

(I think these constraints may already be a nearly complete
specification for the editing behavior of the editor; I may have gone
too far and specified constraints that are impossible to meet.)

Why?  Well, XML contains a lot of redundancy.  I wonder if you could
construct an XML editor that added and removed the redundant
information in as you typed, so that your document was always at least
mostly well-formed XML; but at the same time, didn't get in your way
to the extent to which you typed that information yourself.

Autocompletion of tag and attribute names, of course, is another way to do
this.

The following example interactions, with the | representing the
cursor, illustrate what I mean.

Tags and attributes:

hi |         + "<"     -> hi <|></>
<|></>       + "a"     -> <a|></a>
<a|></a>     + space   -> <a |=""></a>
<a |=""></a> + "href=" -> <a href="|"></a>
<a |=""></a> + ">"     -> <a>|</a>
<a href="|"></a>        + "x.png"      -> <a href="x.png|"></a>
<a href="x.png|"></a>   + '"'          -> <a href="x.png" |=""></a>
<a name="foo" |=""></a> + "/"          -> <a name="foo" /|>
<a name="foo" /|>       + "f"          -> <a name="foo" />f|
<b>ex|cellent</b>       + "<"          -> <b>ex<|></>cellent</b>
<b>ex<|></>cellent</b>  + "/"          -> <b>ex</|b>cellent<b></b>
<b>ex</|b>cellent       + "-"          -> <b>ex</b>-|cellent
<b>ex</|b>cellent       + "b"          -> <b>ex</b|>cellent
<b>ex</b|>cellent       + "-"          -> <b>ex</b>-|cellent
<b>ex</|b>cellent       + ">"          -> <b>ex</b>|cellent
<|i>Now.</i>            + "l"          -> <l|i>Now.</li>
<l|i>Now.</li>          + ">"          -> <li>|Now.</li>
<i>Now.</|i>            + "l"          -> <li>Now.</l|i>

Entities:

C|         + "&"        -> C&|;
C&|;       + "oacute"   -> C&oacute|;
C&oacute|; + ";"        -> C&oacute;|
A &amp|;   + space      -> A &amp; |
A &|;      + space      -> A &amp; |

Deletion:
<b>ex</|b>cellent<b></b> + backspace   -> <b>ex<|></>cellent</b>
<b>ex</b|>cellent<b></b> + backspace   -> <b>ex<|></>cellent</b>
<b>ex</b>|cellent<b></b> + backspace   -> <b>ex|cellent</b>
<i><b>ex</b>|cellent</i> + backspace   -> <i><b></b>ex|cellent</i>
<i><b>|ex</b>cellent</i> + backspace   -> <i><|>ex</>cellent</i>
<i><|>ex</>cellent</i>   + backspace   -> <i>excellent</i>
<p width="90">Y|es</p>   + backspace   -> <p width="90">|es</p>
<p width="90">|es</p>    + backspace   -> <p width="90|">es</p>
<p width="90|">es</p>    + backspace   -> <p width="9|">es</p>
<p width="9|">es</p>     + backspace   -> <p width="|">es</p>
<p width="|">es</p>      + backspace   -> <p widt|="">es</p>
<p |="">es</p>           + backspace   -> <p|>es</p>
<p|>es</p>               + backspace   -> <|>es</>
<|>es</>                 + backspace   -> |es
&quot;|                  + backspace   -> &quot|;
&quot;Yes&|quot;         + backspace   -> &quot;Yes|&quot;
&quot;Yes&|;             + backspace   -> &quot;Yes|

(other cases I haven't worked out but which I think the above six
principles will clearly guide: deletion of tag name characters,
attribute value characters, attribute name characters, entity
characters, empty tag slash, close tag without a nearby matching open
tag)

To satisfy principle 3, the idea that you should be able to predict the
effects of your actions from the screen contents, we need to display
closing tags, even if they're far away.

The nondestructive backspaces displayed above --- which merely move the
cursor inside of some construct without deleting anything --- are
necessary because typing a " or > might move you outside of those same
constructs.  To satisfy principle 6, hitting backspace must move you
back inside those delimiters, returning you to the previous state.  I
think this is encapsulated in this heuristic: don't consider deleting
something that might not be the last keystroke.

One event the previously-mentioned rules don't cover is the insertion of
an element outside the root element ("hodie natus est radici frater").
You could handle this by wrapping a new root element around the outside;
I think this may violate principle 6, though.

The examples shown all go through ill-formed states with tag names and
attribute names that are zero characters long.  This violates principle
2.  One possible solution to this problem is to insert a tag or
attribute name in the appropriate place, but highlight it --- and have
the traditional MacOS rule that typing a printable character when there
is highlighted text replaces the highlighted text with the character.

If this UI idiom is supported in general, it's possible to highlight and
try to delete all sorts of funny things: partial elements, partial tags,
partial attributes.  I'm not sure exactly how to handle this.  (I'm able
to consider not supporting this UI idiom because I don't like it that
much, and I commonly use Emacs and vim, neither of which use it by
default.)

(existing popular XML editors: Oxygen (oxygenxml.com), XMLSpy, Stylus
Studio.  XMLSpy touts the following feature list for XML editing:
 Well-formedness checking  
 Validation (DTD & schema-based)   
 Intelligent Editing (DTD/Schema based entry-help)   
 Text View with syntax-coloring   
 Advanced context-sensitive entry-helpers   
 Line Number Margin   
 Text-folding Margin   (i.e. click in the margin to fold/unfold)
 Bookmarks & Bookmark Margin   
 Visual indentation guides   (vertical dotted lines)
 Find & Replace with enhanced XML capabilities (include and exclude
   various syntactic categories)
 Find & Replace based on Regular Expressions   
 Code-completion & syntax-help   
 Pretty-printing of XML files   
 Enhanced Grid & Table View   (successive occurrences of the 
   same element become numbered rows; differing attributes and
   subelements of the rows become columns)
 Browser View (HTML/XHTML Preview)   
 "Authentic Document View*" (forms-based data entry into XML)
 Dynamic Forms for context-sensitive document editing   
 CALS/HTML Table Support   
 Spell-Checking   
 OASIS Catalog Support (subset)  

"Text View remains the most popular editing view."
<http://www.altova.com/support_freexmlspyhome.asp>
)

Sat, 03 Dec 2005

#!/usr/bin/python
"""Yet another curses-based directory tree browser, in Python.

I thought I could use something like this for filename entry, kind of
like the old 4DOS 'select' command --- cd $(cursoutline.py).  So you
navigate and hit Enter, and it exits and spits out the file you're on.

"""
# There are several general approaches to the drawing-an-outline
# problem.  This program supports the following operations:
# - move cursor to previous item (in preorder traversal)
# - move cursor to next item (likewise)
# - hide descendants
# - reveal children
# And because it runs over the filesystem, it must be at least somewhat lazy
# about expanding children.
# And it doesn't really bother to worry about someone else changing the outline
# behind its back.
# So the strategy is to store our current linear position in the
# inorder traversal, and defer operations on the current node until the next
# time we're traversing.


import curses.wrapper, time, random, cgitb, os, sys
cgitb.enable(format="text")
ESC = 27
result = ''
start = '.'

def pad(data, width):
    # XXX this won't work with UTF-8
    return data + ' ' * (width - len(data))

class File:
    def __init__(self, name):
        self.name = name
    def render(self, depth, width):
        return pad('%s%s %s' % (' ' * 4 * depth, self.icon(),
                                os.path.basename(self.name)), width)
    def icon(self): return '   '
    def traverse(self): yield self, 0
    def expand(self): pass
    def collapse(self): pass

class Dir(File):
    def __init__(self, name):
        File.__init__(self, name)
        try: self.kidnames = os.listdir(name)
        except: self.kidnames = None  # probably permission denied
        self.kids = None
        self.expanded = False
    def children(self):
        if self.kidnames is None: return []
        if self.kids is None:
            self.kids = [factory(os.path.join(self.name, kid))
                         for kid in self.kidnames]
        return self.kids
    def icon(self):
        if self.expanded: return '[-]'
        elif self.kidnames is None: return '[?]'
        elif self.children(): return '[+]'
        else: return '[ ]'
    def expand(self): self.expanded = True
    def collapse(self): self.expanded = False
    def traverse(self):
        yield self, 0
        if not self.expanded: return
        for child in self.children():
            for kid, depth in child.traverse():
                yield kid, depth + 1
    
def factory(name):
    if os.path.isdir(name): return Dir(name)
    else: return File(name)

def main(stdscr):
    cargo_cult_routine(stdscr)
    stdscr.nodelay(0)
    mydir = factory(start)
    mydir.expand()
    curidx = 3
    pending_action = None
    pending_save = False

    while 1:
        stdscr.clear()
        curses.init_pair(1, curses.COLOR_WHITE, curses.COLOR_BLUE)
        line = 0
        offset = max(0, curidx - curses.LINES + 3)
        for data, depth in mydir.traverse():
            if line == curidx:
                stdscr.attrset(curses.color_pair(1) | curses.A_BOLD)
                if pending_action:
                    getattr(data, pending_action)()
                    pending_action = None
                elif pending_save:
                    global result
                    result = data.name
                    return
            else:
                stdscr.attrset(curses.color_pair(0))
            if 0 <= line - offset < curses.LINES - 1:
                stdscr.addstr(line - offset, 0,
                              data.render(depth, curses.COLS))
            line += 1
        stdscr.refresh()
        ch = stdscr.getch()
        if ch == curses.KEY_UP: curidx -= 1
        elif ch == curses.KEY_DOWN: curidx += 1
        elif ch == curses.KEY_PPAGE:
            curidx -= curses.LINES
            if curidx < 0: curidx = 0
        elif ch == curses.KEY_NPAGE:
            curidx += curses.LINES
            if curidx >= line: curidx = line - 1
        elif ch == curses.KEY_RIGHT: pending_action = 'expand'
        elif ch == curses.KEY_LEFT: pending_action = 'collapse'
        elif ch == ESC: return
        elif ch == ord('\n'): pending_save = True
        curidx %= line

def cargo_cult_routine(win):
    win.clear()
    win.refresh()
    curses.nl()
    curses.noecho()
    win.timeout(0)

def open_tty():
    saved_stdin = os.dup(0)
    saved_stdout = os.dup(1)
    os.close(0)
    os.close(1)
    stdin = os.open('/dev/tty', os.O_RDONLY)
    stdout = os.open('/dev/tty', os.O_RDWR)
    return saved_stdin, saved_stdout

def restore_stdio((saved_stdin, saved_stdout)):
    os.close(0)
    os.close(1)
    os.dup(saved_stdin)
    os.dup(saved_stdout)

if __name__ == '__main__':
    global start
    if len(sys.argv) > 1:
        start = sys.argv[1]
    saved_fds = open_tty()
    try: curses.wrapper(main)
    finally: restore_stdio(saved_fds)
    print result

Thu, 01 Dec 2005

Now I am feeling the need for a personal semistructured store more than
ever.  This is a searchable database that supports "data first,
structure later" <http://www.betaversion.org/~stefano/linotype/news/93/>
(i.e. you can add a schema incrementally to existing data entered
without one, or with a very primitive one), but supports enough
structure to render web pages and things like that.  Here are some
options:

- RDF/Sniki-style: label edges in a directed graph with nodes, which
  have names.  Define tables with Sniki-style queries.  Add field
  metadata with edges from the field label node.  Queries are different
  from edges; perhaps stored in nodes, perhaps defined in some manner by
  a local graph structure.  Adding data to a query result is the most
  convenient way to add it.
- UnQL-style: label edges in a directed graph with strings; nodes are
  anonymous; define queries by replacing regular expressions of edges
  with edges; display tables perhaps by going two levels deep from a
  particular node.  (OEM, object exchange model, is essentially
  identical to this.)
- Python/Perl/JavaScript-style: nodes are anonymous, but edges are
  uniquely labeled within a node with strings, and some nodes are
  special primitive types, like strings.  Some edges are 'virtual' and
  computed on demand by arbitrary code that walks the neighborhood.
  There are nodes that are lists.
- askSam-style: nodes are small text files, with a syntactic convention
  to specify fields.  Queries are full-text searches, perhaps limited to
  certain fields; reports can extract values of certain fields to get
  tabular output.
- XML-style: data is stored in a single hierarchy in which the children
  of each node are ordered, and each node is labeled with an element
  type and possibly some attributes; XPath or XQuery is the query
  language.  Maybe you have hyperlinks by id to other nodes.
- Wiki-style: a node is a small text file with a name; text formatted
  in a certain way inside the file represents a reference to another
  file.  Very similar to AskSam-style, but existing Wikis either don't
  have a facility for fields with values or don't use it much.
- wowbarbts-style: just like Wiki-style, but the node is a set of
  name-value pairs, and each value is potentially a small text file
  (which may contain references to other nodes.) Past versions are
  logged.
- Prolog-style: data is viewed in a set of relations of fixed arity; any
  relation may contain both data items directly entered there by the
  user as well as data items "inferred", i.e. produced as query results,
  from other relations, and there might actually be an infinite number
  of such other data items.  The data items may be atomic (i.e. strings)
  or node-labeled trees of atoms.
- Lotus-Agenda-style: there's an implication hierarchy of categories
  (with the universe at the top), and any data item (a short string of
  text) may belong to any subset of the hierarchy, subject to a few
  validity constraints.  Sets of categories within the tree form column
  values in reporting screens, categories form section headers in
  reporting screens, and the data items themselves are always displayed
  on reporting screens, which are also where you add new data items.  In
  a few cases, belonging to a category may have associated with it some
  numeric or date value.  Most category memberships are inferred from
  the text content of the data item, according to rules attached to the
  categories --- e.g. categorize anything containing the word "car" or
  "automotive" in the "automotive" category.
- Excel-style: you have a big rectangular grid of data values; you can
  automatically hide rows in some subset that don't meet your criteria,
  and you have "quick filters" that give you menus of possible data
  values.  Also you can hide whatever columns you want.  Really simple
  and not very powerful but apparently sufficient for many people's
  uses.
- mutated-Agenda-style: you have a big hierarchy of data items, each
  labeled either with a string or a hyperlink to some other data item.
  Category membership is denoted by a child labeled with a hyperlink to
  the category.  Reports/queries can optionally be run over just the
  descendants or children of a single node, minimizing the
  global-data-store-interference problem observed with several of these
  models.
- rumor-oriented style: data items are sets of name-value pairs; you
  can't modify old items, but only add new ones, which closely resemble
  input events; queries use the standard relational operations on some
  set of relations, one of which is the entire set of data items, and
  are evaluated in some namespace where you probably have defined most
  of the other names with standard queries.  By evaluating a query or a
  set of queries in a different namespace, you can get effects like
  undo, deletion, update, or the elimination of all actions from a
  particular IP address.  Using relational operations provides
  sequence-independence; the rumorset is not inherently sequenced.  Must
  use lazily-updated materialized views for most queries in order to
  achieve reasonable performance.
- zzstructure-style: each data item is a short string or other similar
  typed object; they are connected along 'axes', which have a linearity
  constraint --- if X follows Y along axis A, then nothing else can
  follow Y along that axis, and X must also precede Y along A.  Tabular
  display is achieved by mapping two of the axes onto the screen's
  dimensions.  You could probably do reasonable queries with regular
  expressions over axis directions and node tests, sort of like XPath or
  XQL.  However, Ted Nelson is a patent pirate who wishes that such
  projects would stop, and he may have the legal means to kill them
  <http://lists.gnu.org/archive/html/gzz-dev/2003-02/msg00042.html>
- del.icio.us-style: similar to Agenda, but without a hierarchy among
  categories and without any automatic categorization.
- unix-filesystem-style: nodes are either strings, string-indexed
  dictionaries of nodes, or transparent hyperlinks to other nodes.
  Queries (tree regular expressions on path traversals with full-text
  search, or XPath, would probably suffice) produce lists of node
  pathnames, perhaps with associated excerpts.  (It would be interesting
  to see full-fledged tools for viewing real Unix filesystems this way.)

To essentially any of these styles, you could add any of the following
features:
- full-text search
- incremental, interactive query result display
  (note that this probably implies that the query syntax should put
  field names after the field values being searched for; this seems
  obvious once stated, but does not seem to be a property of any
  existing query language)
- arithmetic on fields (for queries and reports)
- computed attributes (similar to arithmetic on fields, but perhaps more
  general, including query results)
- imperative programming
- Agenda-style automatic categorization
- RDF-style namespacing
- optimistically synchronized transactions
  - or at least locking
    - or at least some minimal atomic update
- access to old versions of the data
- indexing with Nilesh Dalvi's partial match index
  <ftp://ftp.cs.washington.edu/tr/2004/01/UW-CSE-04-01-01.pdf>
and probably some other features I haven't thought of yet.  These
features may actually matter as much as the underlying data
representation, and Excel shows that a nice UI for the simplest tasks
compensates for many weaknesses.

Into most of them, you could probably hack in some variant of the old Notes
synchronization approach, recently codified by Ray Ozzie at Microsoft as
Simple Sharing Extensions <http://msdn.microsoft.com/xml/rss/sse>; the
rumor-oriented style
<http://lists.canonical.org/pipermail/kragen-tol/2004-January/000749.html>
gives you stronger guarantees about synchronization, at the cost of
unbounded storage and unpredictable performance.

Thoughts I've had on this problem over the years
------------------------------------------------

<http://del.icio.us/kragen/semistructured-data>
38 relevant web pages and my brief comments on them over the last year

<http://lists.canonical.org/pipermail/kragen-tol/1999-August/000466.html>
XML editors
(how to make XML editing convenient so you can stand to take notes in it)

<http://lists.canonical.org/pipermail/kragen-tol/1999-November/000486.html>
XML query languages
(abortive notes on a better path language for XML)

<http://lists.canonical.org/pipermail/kragen-tol/2000-May/000580.html>
patricia extensions
(groping toward the idea I finally understood in the NFA-DFA post below)

<http://lists.canonical.org/pipermail/kragen-tol/2000-September/000633.html>
AskSam & outliners
(sorry about the terrible formatting; will fix it sometime)

<http://lists.canonical.org/pipermail/kragen-tol/2002-February/000689.html>
grep on RFC-822 headers and stuff
(a previous post expressing a wish for a semistructured data store, with
six things to store in it)

<http://lists.canonical.org/pipermail/kragen-discuss/2002-February/000750.html>
patricia indexes on directed graphs
(groping more toward the idea I finally understood in the NFA-DFA post)

<http://lists.canonical.org/pipermail/kragen-tol/2003-January/000737.html>
mvfs --- multivalued functions --- and query languages
(me groping toward an understanding of semistructured query optimization)

<http://lists.canonical.org/pipermail/kragen-hacks/2004-January/000384.html>
viewing filesystems as filterable outlines
(very rough initial prototype of a incremental full-text search UI)

<http://lists.canonical.org/pipermail/kragen-tol/2004-February/000752.html>
Patricia, UnQL, and NFA to DFA conversion
(how to build a really large index of a graph to make path regexes fast)

<http://lists.canonical.org/pipermail/kragen-tol/2005-October/000795.html>
optimizing SQL in web apps by waiting until the last minute
(proposing how to make web apps on a triple store fast)

<http://lists.canonical.org/pipermail/kragen-hacks/2005-November/000420.html>
semistructured data land: sorting paragraphs by aisle number

<http://lists.canonical.org/pipermail/kragen-tol/2005-November/000809.html>
folkschemas and semistructured data
(how to make it easy to select field names and values)

Other related stuff
-------------------

wowbarbts embodies some of these ideas as well; it would be nice if it
could end up getting published too, but I wouldn't be surprised if
circumstances prevented that.

The KB editor we used at Alpiri also embodied some of these ideas, but
I didn't write it.

Last year, a prominent XQuery researcher recommended Galax as the best
available open-source XQuery implementation.