miscoranda: by Sean B. Palmer

On Useful Redundancy

I always thought that having three different character escaping methods in HTML was a bit excessive. Being able to write — or — or — means that you have to think about which one you're going to use, and it's usually a pretty arbitrary choice. Most people prefer to use named entities when they can, but there aren't named entities for everything.

But the redundancy actually came in useful whilst I was developing pwyky, and it resulted in a neat hack which I thought I'd mention here.

In ASCII, in email and IRC, I tend to use a double hyphen with no surrounding spaces as an em-dash. In pwyky, I wanted to use the same thing, and have it be displayed as a proper em-dash in HTML. Easy enough. But I also wanted to allow any unicode to be entered, so I'd allow {U+HHHH} for that. This means that there are two ways in pwyky to enter the em-dash: either "foo--bar", or "foo{U+2014}bar"; guess which one is the more popular? But the latter syntax is useful, because the former is only employed when the double hyphens are surrounded by word characters.

A problem arose in that both -- and {U+2014} are converted to the same bit of HTML, "—". Pwyky works by compiling your text input to HTML when you save it, and then converting it back to text again when you go to edit it. But on the conversion back to text, it didn't know which form of entry was being used for em-dashes since they were being converted to a single form in HTML. Argh.

The obvious answer: use — for "{U+2014}", and — for "--". I had to hope now that the standard HTML parser module in Python could distinguish between the two, and thankfully it did.

I wonder what I would've done had there not been such a redundancy in HTML? I probably would've had to guess that any situation where a double hyphen could be used should be converted back to a double hyphen, but it's not the ideal solution.

by Sean B. Palmer, at 2004-03-09 08:05:37. Comment?

Favicons and Anticryptography · Blogger Admits to Making Posts Up As He Goes Along

Sean B. Palmer