I've been experimenting with various means of validating my site easily, and the other day I struck upon a good idea for at-a-glance validation: returning images based on the validation status of a referer. When you load an HTML document with images in it, most browsers will use the URI of the HTML document as a referer when fetching the image. So it follows that it would be easy to write a service that validates the containing HTML document of an image and delivers either a smiley face or a frowning face depending upon the status.
So I did just that: Validate With Logos. The documentation explains a bit more on how it works and how you can use it on your own site (it's just a small bash script and associated images). It's a bit slow because it uses the W3C's Validation Service: if you ask for your results in XML, the nsgmls output, it'll add an HTTP header giving the validation status. Long before XML-RPC webservices where hip and vogue, the Validator was providing a faster and more robust solution.
I have two test files for this service, validtest and invalidtest. One problem with the approach that I noticed from the tests is that when I load validtest it waits and fills in the image with a little smiling face, but when I go to invalidtest and it puts a smiling face in there too (from the cache) before regetting it and realising that it should be a frown. If I refresh them back and forth then they both do this—displaying the opposite status before the real one. I suspect that I could mess with some HTTP caching headers to change this behaviour.
Though the service itself is trivial, one of the things I most enjoyed in the writing of it was being pedantic with bash. I went through quite a few iterations of the twenty or so line bash script trying to make it as clear and as readable as it could possibly be. One of the biggest problems that I faced was getting the GET query string to be formatted clearly. I had been doing it like this:
QUERY="\
uri = ${URI//;/%3B};\
doctype = ${DOCTYPE// /+};\
output = xml\
"
And then joining it up in the URI with ${QUERY// /} (bash's neat replacement syntax for variables). The thing about this, though, is that I didn't like the trailing reverse soliduses, \, sprinkled all over the place. It hardly aids readability. So I asked the Swhackers if there was a way to remove all line breaks from a string in bash using only builtins and on a single line. I'd already managed to work out that this would do the trick:
VARIABLE=$(echo $(echo " ... "))
But it's not really more readable. The nicest approach would have been if bash supported line breaks in its substitutions. Now, technically it does since if you do:
${VARIABLE//
/}
The line breaks disappear, but hard-coding a line break into the substitution is certainly not optimal. It doesn't understand either the quoted or unquoted "\n" character escape syntax, and nothing I tried would convince it to recognize a line break that wasn't hardcoded. I noticed that bash is actually quite inconsistent in its regexp syntax for these style substitutions, in fact: character classes and negative character classes work, and character ranges work, but negative character ranges don't. The documentation also leads me to believe that POSIX named character classes (e.g. [:lower:]) should work, too, but they don't. Anyway, eventually I just went with a call to tr:
QUERY="?
uri = ${URI//;/%3B};
doctype = ${DOCTYPE// /+};
output = xml
"
...
$(tr -d " \n" <<<$QUERY)
Which is a little more verbose than I would have liked and makes a call to an external program, but it's readable, even clear, and it doesn't have those annoying reverse soliduses all over the place. If you have a better idea, though, please let me know!
Terje Bless is hitting up Jim Ley to produce a client side Javascript version of the validation by logo service, and if he gets round to it that's sure to be good. I am a little concerned, however, that the idea doesn't scale up all that well if you consider images embedded in pages that get millions of hits per day. It might be possible to cache replies from the validator and only ask for new validation results if the page has been modified since the last time of validation. But either way, I think that the overall concept is solid and useful.