Author Information

Brian Kardell
  • Developer Advocate at Igalia
  • Original Co-author/Co-signer of The Extensible Web Manifesto
  • Co-Founder/Chair, W3C Extensible Web CG
  • Member, W3C (OpenJS Foundation)
  • Co-author of HitchJS
  • Blogger
  • Art, Science & History Lover
  • Standards Geek
Follow Me On...
Posted on 5/04/2024

Known Knowns

Fun with the DOM, the parser, illogical trees and "unknowns"...

HTML has this very tricky parser that does 'corrections' and things on the fly. So if you create a page like:

<table>
   Hello
   <td>
      Look below...
   </td>
</table>

And load it in your browser, what you'll actually get parsed as a tree will be

Things can literally be moved around, implied elements added and so on.

Illogical trees

But it's not impossible to create trees that are impossible to create with the parser itself, if you do it dynamically. With the DOM API, you can create whatever wild constructs you want: Paragraphs inside of paragraphs? Sure, why not.

Or, text that is a direct child of a table. Note this still renders the text in every browser engine.

You can even add children to 'void' elements that way too. Here's an interesting one: An <hr> with a text node child. Again, It renders the text in every browser engine (the color varies).

You can also put unknown elements into your markup and the text content is shown... By default, it is basically a <span>.

In most cases, HTML wants to show something... or at least leave CSS in control. For example, you can dynamically add children to a <script>. While that won't be shown by default, it's simply because the UA style sheet has script set to display: none;. If we change that, we can totally see it.

But this isn't universal: In some cases there are other renderers that are in control - mainly when it comes to form controls. But also, for example, if you switch the <hr> in the example above to a <br> it won't render the text. It doesn't generate a box that you can do anything with with CSS as far as I can tell, except make it display: none (useful if they’re in white-space: pre blocks to keep them from forcing open extra lines).

SVG

The HTML parser has fix ups for embedding SVG too, there are kind of integration points. But, in SVG you can have an unknown element too... For example:

<svg>
    <unknown>Test<unknown>
    <rect width="150" height="150" x="10" y="10" style="fill:blue;stroke:pink;stroke-width:5;opacity:0.5" />
</svg>

The unknown element won't render the text, nor additional SVG children inside it. For example:

<svg id=one width="300" height="170">
  <unknown><ellipse cx="120" cy="80" rx="100" ry="50" style="fill:yellow;stroke:green;stroke-width:3" /></unknown>
  <rect width="150" height="150" x="10" y="10" style="fill:blue;stroke:pink;stroke-width:5;opacity:0.5" />
</svg>

MathML

As you might expect, there are parser integrations for MathML too, and you can have an unknown elements in MathML too. In MathML, all elements (including unknown ones) generate a "math content box", but only token elements (like <mi>, <mo>, <mn>) render text. For example, the <math> element itself - if you try to put text in it, the text won't render, but it will still generate a box and other content.

<math>
   Not a token. Doesn't render.
   <mi>X</mi>
</math>

MathML has other elements too like <mrow> and <mspace> and <mphantom> which are just containers. Same story there, if you try to put text in them, the text won't render...

<math>
   <mrow>Not a token. Doesn't render.</mrow>
   <mi>X</mi>
</math>

But if you put the text inside a token element (like <mi>) inside that same <mrow>, then the text will render..

<math>
   <mrow><mi>ok</mi></mrow>
   <mi>X</mi>
</math>

In MathML, unknown elements are basically treated just like <mrow>. In the above examples, you could replace <mrow> with <unknown> and it'd be the same.

Unknown Unknowns

Ok, here's something you don't think about every day: Given this markup:

<unknown id=one>One</unknown>
<math>
  <unknown id=two>Two</unknown>
</math>
<svg>
  <unknown id=three>Three</unknown>
</svg>

One, Two and Three are three different kinds of unknowns!

console.log(one.namespaceURI, one.constructor.name)
// logs 'http://www.w3.org/1999/xhtml HTMLUnknownElement'

console.log(two.namespaceURI, two.constructor.name)
// logs 'http://www.w3.org/1998/Math/MathML MathMLElement'

console.log(three.namespaceURI, three.constructor.name)
// logs 'http://www.w3.org/2000/svg SVGElement`

In CSS, these can also (theoretically) be styled via namespaces. The following will only style the first of those:

@namespace html url(http://www.w3.org/1999/xhtml);
html|unknown { color: blue; }

Under-defined Unknowns

Remember how in the beginning we created nonsensical constructs dynamically? Well, we can do that here too. We can move an unknown MathMLElement right into HTML, or an unknown HTMLElement right into MathML, and so on - and it's not currently well-defined, universal, or consistent what actually happens here.

Here's an interesting example moves an unknown MathMLElement and a SVGElement into HTML, and an HTMLElement into MathML and so on

See the Pen Untitled by вкαя∂εℓℓ (@briankardell) on CodePen.

Here's how that renders in the various today:

Left to right: Chrome, Firefox, Safari all render differently

So, I guess we should probably fix that. I'll have to start creating some issues and tentative tests (feel free to beat me to it 😉)

Semi-related Rabbit Holes

Not specifically these issues, but related namespace stuff has caused a lot of problems that we're remedying as shown by a recent flurry of activity started by my colleague Luke Warlow to fix MathML-Core support in various libraries/frameworks:

Who's next? :)