Reply to “HTML as TeX replacement”
I recently published a two-part article in LWN about recent developments in the TeX typesetting universe. The first part was about TeX itself, and some of the powers it has gained since the olden days: Unicode input, Lua scripting, OpenType fonts, and more. The second part was about the intersection of the TeX and Web worlds, and the challenges facing authors, especially of technical material, who want to combine the advantages of both.
Kevin Marks recently wrote an article replying to some bits of this second part. He raises some interesting points that are worth discussing. In this response I’ll attempt deal with each issue that Marks brings up.
First, Marks says that my “idea of HTML is not quite up to date”. His first example of how far behind the times I am is to note that I failed to mention that hyphenation is already a CSS standard, and that some browsers support ligatures, too.
He’s right about this. I should have found the space to mention the fact that some browsers do have hyphenation and ligature support. My paragraph setting example was taken from Chrome, and I’ve known for several years that Chrome’s typography was particularly poor. A comment on the article mentioned the CSS3 hyphenation and ligature support in Firefox, so I’ve a chance to acknowledge the issue already. While Firefox is a prominent example of a browser that implements CSS3 hyphenation, however, the “many browsers” referred to by Marks turns out to be only half of the browsers listed in the table that he links to.
Marks takes the time to scold me here for my “odd” use of spaces around em-dashes. I would like to refer him to the AP style manual, used by most newspapers and wire services in the U.S., where these spaces are recommended. Other style authorities disagree. The upshot is that this is a matter of taste; Marks is entitled to consider my preference “odd,” but not to imply that it is incorrect.
But there is something else odd about Marks’ typesetting examples. His second example, demonstrating live rendering in the reader’s browser, looks very nearly as bad to me as the example of bad browser rendering in my article. The ligatures are there, but there is no hyphenation, and the same gappy text and uneven spacing. This is what I would expect when viewing the example in Chrome, but it looks just as bad in Firefox, which is supposed to support hyphenation. What’s going on?
A glance at Marks’ markup shows exactly what’s going on. He’s failed to include the language declaration that Firefox needs to know how to hyphenate. This declaration is required by the CSS3 spec that Marks links to.
While we’re admiring the source, let us also note that something even more fundamental is missing. There is no doctype, which means that this is not a legal HTML document at all. It will probably work more or less as intended in most browsers, but rendering will be unpredictable, as they will have to use heuristics, or default assumptions, to decide whether to treat the document as HTML, and, if so, what version. Scanning through the source, I notice a few additional minor syntax errors, that are relatively unimportant, but amusing nevertheless in a document intended to school others in “up to date” HTML knowledge. [Marks has now added a correct doctype, but other (minor) syntax errors remain.]
As I’ve learned from reading such authors as Jeffrey Zeldman, following standards and using correct markup is important, as it frees us from having to test our pages in dozens of browsers in order to ferret out their quirks. Or at least it will when all all major browsers become standards compliant, a day that actually seems to be approaching.
Returning to Marks’ three examples of [there is a fourth one now, of Firefox on the Macintosh] the Melville paragraph: the final one is an image of Safari’s rendering. There is hyphenation, and ligatures, and this looks much better than the examples that were set without the benefit of either; but to my eye, at least, this is a less than ideal example of the typographic art. Notice how the interword spacing is distractingly uneven; for example, notice how loose the spacing is in the line containing the word “coffin,” compared with the preceding and following lines. Now look again at the TeX example, and notice how the paragraph has a more uniform “color.” This is not just aesthetic nitpicking: these things make text easier and more pleasant to read. [The new Firefox/Mac example is even worse: look at the gappy line containing the word “funeral.” Hyphenation is not enough.]
This is the point I touched on briefly in my article, where I said that “until something equivalent to TeX’s whole-paragraph optimization is implemented in browsers, their rendering of text will always be — at least subtly — inferior.” Hyphenation helps a great deal, but doesn’t get you all the way there. In this connection, I was excited to read that the latest update to Android includes the TeX linebreaking algorithm at the system level. I don’t know what this means for browsers on Android, nor when this will move to the desktop, but this may be the turning point that puts us on the road to high quality typography on the Web.
Turning now to mathematics, Marks sets my version of Euler’s equation in HTML, and says “Note how that was displayed fine inline, just by using <sup>”. It wasn’t fine, though — not really. Note how Marks’ version uses Roman (upright) letters where math italics would be correct. You can get close to the correct result by using italic tags; but really, my point was not that this equation would be at all difficult to mark up using HTML. The context was an introduction to MathML, and the purpose was to show how a very simple equation requires a lot of verbose markup using that system. My recommendation, for now, was to use MathJax. I’m not sure what point Marks is trying to make here.
Marks: “Writing in utf8 means I don’t need a special sequence like \\pi for π.” As I demonstrate in part one, you can also use Unicode input in TeX equations.
Next, Marks turns to my more complicated example of Stokes’ Theorem. He displays an SVG image of the equation, generated from the TeX markup, that looks perfect. This is a good reminder that, now that enough of SVG is supported in most browsers, it should probably be employed in preference to resolution-dependent formats when images are used for equations.
After this, Marks displays the SVG markup that you can embed in your webpage to get a selectable, text-like SVG version of the equation. Even in the updated version of Marks’ article, where he has included the required STIX fonts, the results are pretty terrible. Marks’ article was discussed in Hacker News (as was mine); see the dozens of comments there about the various ways in which the equations failed to display properly for various users. Some of these comments refer to an earlier version of the article, before Marks included the STIX fonts, and mention missing glyphs; but the problems of incorrect positioning, spacing, and size remain even when the fonts are included. If Marks’ intention was to show how verbose, naive SVG markup leads to ugly, wrong, and unpredictable results, his example is a spectacular success. Other than that, I don’t see the point. His article purports to be about HTML, but SVG is not HTML; and his demonstration is the worst way I’ve yet seen to put equations in your web pages.
Finally: “many of the CSS specs I have linked to are still being edited, so this is a good time to try out authoring your mathematical papers that way”. Which way? I have no idea what he is suggesting at this point. If Marks knows a better way than using MathJax, I’d be interested to know what it is — but all his examples are worse, both for authoring and for reading the result.
I realize all the above sounds rather contentious and probably somewhat peevish, but that’s the nature of one of these “response to my critics” kinds of things. So I want to make it clear that I’m actually flattered that Kevin Marks thought it was worth taking the time to write a response to my little article, and gave me some food for thought. It seems as if we share an affection for the Web and at least agree that these issues are worth discussing, for they’ll have a large effect on the future of communication online.
[Marks, in an update at the bottom of his article, says something about a “clash of worldviews.” I don’t know what this is about; I thought we were both interested in figuring out how to put (possibly technical) material on the web.]
[Marks: “I’d just like TeX fans to stop making only 2-column fixed-size PDF papers that I can’t read on my phone or tablet without a lot of zooming, panning and squinting. I’m sure it is a powerful enough toolbox to do better at that.” PDFs of technical or scientific papers that one encounters in the wild are often set using a journal’s required LaTeX style file or template, and the author might not have the knowledge or time to reformat it for convenient online use. That said, it’s futile to try to read scientific papers on your phone. I’ve read novels on my phone, and survived. But some kinds of material just need space. You can try to read this on your phone if you insist, or this (not to mention this beauty, or this one), but I would suggest a larger screen, or paper.]