Author Information

Brian Kardell
  • Developer Advocate at Igalia
  • Original Co-author/Co-signer of The Extensible Web Manifesto
  • Co-Founder/Chair, W3C Extensible Web CG
  • Member, W3C (OpenJS Foundation)
  • Co-author of HitchJS
  • Blogger
  • Art, Science & History Lover
  • Standards Geek
Follow Me On...
Posted on 12/01/2023

The Effects of Nuclear Maths

Carrying on Eric's work on “The Effects of Nuclear Weapons” - with native MathML.

As I have explained a few times since first writing about it in Harold Crick and the Web Platform in Jan 2019, I'm interested in MathML (and SVG). I'm interested in them because they are historically special: integrated into the HTML specification/parser even. Additionally, their lineage makes them unnecessarily weird and "other". Worse, they're dramatically under-invested in/de-prioritized. I believe all that really needs to change. Note that none of that says that I need MathML, personally, for the stuff that I write - but rather that I understand why it is that way and the importance for the web at large (and society). This is also the first time I've helped advocate for a thing I didn't have a bunch of first-hand experience with.

Since then, I've worked with the Math Community Group to create MathML-Core to resolve those historical issues, get a W3C TAG review, an implementation in Chromium (also done by Igalians) and recharter the MathML Working Group (which I now co-chair).

For authors, the status of things is way better. Practically speaking, you can use MathML and render some pretty darn nice looking maths for a whole lot of stuff. Some of the integrations with CSS specifically aren't aligned, but they didn't exist at all before, so maybe that’s ok.

Despite all of this, it seems lots of people and systems still use MathJax to render as SVG, sometimes also rendering MathML for accessibility reasons. But, why? It's not great for performance if you don't have to.

I really wanted to better understand. So I decided to tackle some test projects myself and to take you all along with me by documenting what I do, as I do it.

Nuking all the MathJax (references)

Last year, my colleague Eric Meyer wrote a piece about some work he'd been doing on a passion project Recreating “The Effects of Nuclear Weapons” for the Web, in which they bring a famous book to the web as faithfully as they can.

This piece includes a lot of math - hundreds of equations.

You can see the result of their work on atomicarchive.com. The first two chapters aren't so math heavy, but Chapter 3 alone contains 179 equations!

It looks good!

Luckily, it's on github.

Oh hey! These are just static HTML files!

Inside, I see that each file contains a block like this:

<script id="MathJax-script" async src="mathjax/tex-chtml.js"></script>
<script>
MathJax = {
  tex: {
    inlineMath: [['$', '$'], ['\\(', '\\)']]
  },
  svg: {
    fontCache: 'global'
  }
};
MathJax.Hub.Config({
  CommonHTML: {
    linebreaks: {automatic: true}
  }
});
</script>

These lines configure how MathJax works, including how it is identified in the page. If you scan through the source of, say, chapter 3, you will see lots of things like this:

<p>where $c_0$ is the ambient speed of sound (ahead of the shock front), $p$ is the peak overpressure (behind the shock front), $P_0$ is the ambient pressure (ahead of the shock), and $\gamma$ is the ratio of the specific heats of the medium, i.e., air. If $\gamma$ is taken as 1.4, which is the value at moderate temperatures, the equation for the shock velocity becomes</p>
    
\[
U = c_0 \left( 1 + \frac{6p}{7P_0} \right)^{1/2},
\]
Code sample from chapter 3

What you can see here is that the text is peppered with those delimiters from the configuration, and inside is TeX.

People don't like writing *ML

I just want to take a minute and address the elephant in the room: Most people don't want to write MathML, and that's not as damning as it sounds. Case in point, as I am writing this post, I am not writing <p> and <h1> and <section>s and so on… I'm writing markdown, as the gods intended.

TimBL’s original idea for HTML wasn't that people would write it - it was that machines would write it. His first browser was an editor too - but a relatively basic rich text editor, a lot like the ones you use every day on github issues or a million other places. The very first project Tim did with HTML was the CERN phone book which did all kinds of complicated stuff to dynamically write HTML - the same way we do today. In fact, it feels like 90% of engineering seems to be transforming strings into other strings: And it's nothing new. TeX has been around almost as long as I've been alive and it's a fine thing to process.

This format, of text tokens being anywhere in the document is way more ideally suited to processing on a build or server than on the client. On the client we'd more ideally work with the tree and manage re-flow and so on. Here the tree is irrelevant - in the way even - because there are some conflicting escapes or encodings… But, it's just a string... And there are lots of things that can process that string and turn it into MathML.

My colleague Fred Wang has an open source project called TeXZilla, so I'm going to use that.

Nuking the scripts

First thing first, let's get rid of those script tags. Since they contain the tokens they'd be searching the body for, but well be processing the whole document, they'll just cause problems anyways.

Ok. Done.

Next, I checkout the TeXZilla project parallel to my book project and try to use the ‘streamfilter’ which lets me just cat and | (pipe) a file to process it…

 >cat chapter1.html | node "../../texzilla/node_modules/texzilla/TeXZilla.js" streamfilter

Hmm… It fails with an error that looks something like this (not exactly if you're following along, this is from a different file):

>cat chapter1.html | node "../../texzilla/node_modules/texzilla/TeXZilla.js" streamfilter

Error: Parse error on line 145:
...}\text{Neutron + }&\!\left\{\enspace
---------------------^
Expecting 'ENDMATH1', got 'COLSEP'
.....

I don’t know TeX, so this is kind of a mystery to me. I search for the quoted line and have a look. I find some code surrounded by \begin{align} and \end{align}. I turn to the interwebs and find some examples where it says \begin{aligned} and \end{aligned}, so I try just grep/changing it across the files and, sure enough, it processes further. Which is right? Who knows - let's move on...

I find a similar error like...

Error: Parse error on line 759:
...xt{ rads. } \; \textit{Answer}\]</asi
-----------------------^
Expecting '{', got 'TEXTARG'

Once again on the interwebs I find some code like Textit{...} (uppercase T) - so, I try that. Sure enough, it works fine. Or wait... Does it? No, it just leaves that code untransformed now.

But… Now we have MathML! At least in some chapters!

With a little more looking, I find that this is a primary sort of TeX thing, it's just a quirk of TeXZilla, and I turn those those textit{...} into mathit{...} and now we're good. Finally. Burned some time on that, but not too bad.

I encounter two other problems along the way: First, escaping. The source contains ampersand escapes for a few Unicode characters which aren't a problem when you're accessing it from body.innerText or something maybe, but are a problem here. Still, it takes only a couple of minutes more to replace them with their actual Unicode counterparts: &phi;φ, &lambda; ⇒ Λ, &&times; ⇒ ×).

When I’m done I can send that to a file, something like this…

>cat chapter3.html | node "../../texzilla/node_modules/texzilla/TeXZilla.js" streamfilter > output.html

And then just open it in a browser.

There are some differences. MathJax adds a margin to math that's on its own line (block) that isn't there natively. The math formulas that are ‘tabular’ aren't aligning their text (the =), and the font is maybe a little small. After a little fooling around, I add this snip to the stylesheet:

math[display="block" i] {
  margin: 1rem 0;
}

math {
    font-size: 1rem;
}

mfrac {
    padding-inline-start: 0.275rem;
    padding-inline-end: 0.275rem;
}

mtd {
  text-align: left;
}

And the last problem I hit? An error in the actual inline TeX in the source, I think, where it was missing a $. MathJax simply left that un-transformed, just like my previous example because it lacked the closing $, but TeXZilla was all errory about it. I sent a pull request, and I guess we'll see.

And like… that's it.

All in all, the processing time here including learning the tools and figuring out the quirks in a language I'm unfamiliar with is maybe half a day - but that's scattered across short bits of free time here and there.

Is it done? Is it good enough? It definitely needs some proofing to see what I've missed (I'm sure there's some!), and good scrutiny generally, but... It looks pretty good at first glance, right (below)?

Native MathML on the left, MathJax on the right

You can browse it at:

https://bkardell.com/effects-of-nuclear-math/html/.

And, send me an issue if you find anything that should be improved before I eventually send a pull request, or you see something that looks bad enough that you wouldn't switch to native MathML for this.

But, all in all: I feel like you can make pretty good use of native MathML at this point.