If Charles Goldfarb, Edward Mosher, and Raymond Lorie — the creators of GML (get it?) and the grandfathers of modern-day HTML — had been tasked with the project of making the world’s poetry machine-readable (instead of government, legal, and scientific documents) we would have at our disposal the <poem>, <stanza>, and <line>
tags and I wouldn’t be writing this post.
But here we are in 2013, and we are still mired in the “div-class-tag-soup-nonsense” bemoaned by Dr. Olaf Hoffmann in his 2007 email to the W3C regarding this very matter. In fact, he proposed the very same elements above in his email, except for <stanza> which he thought should be <strophe> which I think is a little too specific and esoteric for most people.
View Source on most any site devoted to poetry and you’ll encounter one of three approaches:
- The poem in a <pre> tag. Here is how poets.org marks up “Syringa” by John Ashbery:
First off, the <table> tag is not a good look. As for the <pre> tag, it’s understandable to think a poem needs to be “preformatted” since we want to treat a poem’s whitespace and linebreaks literally. But the by-product of this treatment is that the structure of the poem is completely lost. Essentially the poem becomes one big blob of “preformatted” text and whitespace. They’re essentially treating the poem as an image and punting on marking up the poem semantically.
The critical failure is highlighted in red above. Why is the line break preserved here? Is this what the poet intended? Or is it the artifact of the page boundary in a printed edition? This is completely arbitrary and unsustainable.
Shouldn’t we treat each line as its own entity so it can stand up to the changing widths of the various displays it will encounter in the future?
- Each line in a <p> tag. This method seems fairly widespread but suffers from cruft and is semantically questionable. Joshua Tallent’s oft-cited post details the virtues of using CSS to make long lines indent correctly (plus has some clever tricks for making things work on older Kindles), but I can’t get past the fundamental wrongness of calling each line in a poem a paragraph. The negative indent technique is the right one, though, and it doesn’t have to be at the expense of semantics. (See below.)
- A <div> for each line. Another bloated, semantically empty solution that is ultimately unsustainable.
The above example is from the Poetry Foundation. The inline styling and un-semantic use of the div tag makes this the digital equivalent of printing your poem on newsprint. How long before this representation of poetry yellows and crumbles?
So, what, then?
Over the past two years, I’ve had the privilege to work on a project involving John Ashbery’s poetry, and I believe I’ve arrived at what I believe to be the best approach to setting poetry using standard HTML (in the absence of poetry-specific tags) and CSS. Caveat: I haven’t tested this technique on older Kindles (though I have tested it successfully on KF8).
Semantically, what is a poem?
Well, it’s a series of lines sometimes grouped into stanzas. And since line numbers are significant in verse, why don’t we mark up poems with stanzas as groups of ordered lists (<ol>), and each line an <li>?
For example,
<div class="poem"> <ol> <li>Do not go gentle into that good night,</li> <li>Old age should burn and rave at close of day;</li> <li>Rage, rage against the dying of the light.</li> </ol> <ol> <li>Though wise men at their end know dark is right,</li> <li>Because their words had forked no lightning they</li> <li>Do not go gentle into that good night.</li> </ol> </div>
which on first pass yields this:
- Do not go gentle into that good night,
- Old age should burn and rave at close of day;
- Rage, rage against the dying of the light.
- Though wise men at their end know dark is right,
- Because their words had forked no lightning they
- Do not go gentle into that good night.
Finally, making it look like poetry requires only a few lines of CSS:
.poem ol { list-style-type:none; margin:0 0 10px 0; padding:0; } .poem ol li { text-indent:-2em; padding-left:2em; margin:0; }
And you get this:
- Do not go gentle into that good night,
- Old age should burn and rave at close of day;
- Rage, rage against the dying of the light.
- Though wise men at their end know dark is right,
- Because their words had forked no lightning they
- Do not go gentle into that good night.
Did you notice that the numbers start over with each <ol>? No problem. Just use the “start” attribute on the <ol>. You can set it manually or programatically using Javascript. Add another line of CSS to show the numbers on :hover and now you can easily refer to line numbers!
- Do not go gentle into that good night,
- Old age should burn and rave at close of day;
- Rage, rage against the dying of the light.
- Though wise men at their end know dark is right,
- Because their words had forked no lightning they
- Do not go gentle into that good night.
Here’s a live version you can fiddle with:
UPDATE (Oct 25, 2013): I recently found the Text Encoding Initiative, which has a pretty smart XML schema for poetry which is in line with my thinking above.