Known Knowns
Fun with the DOM, the parser, illogical trees and "unknowns"...
HTML has this very tricky parser that does 'corrections' and things on the fly. So if you create a page like:
<table>
Hello
<td>
Look below...
</td>
</table>
And load it in your browser, what you'll actually get parsed as a tree will be
HTML
HEAD
BODY
#text
: HelloTABLE
TBODY
TR
TD
#text
: Look below...
#text
:
Things can literally be moved around, implied elements added and so on.
Illogical trees
But it's not impossible to create trees that are impossible to create with the parser itself, if you do it dynamically. With the DOM API, you can create whatever wild constructs you want: Paragraphs inside of paragraphs? Sure, why not.
Or, text that is a direct child of a table. Note this still renders the text in every browser engine.
You can even add children to 'void' elements that way too. Here's an interesting one: An <hr>
with a text node child. Again, It renders the text in every browser engine (the color varies).
You can also put unknown elements into your markup and the text content is shown... By default, it is basically a <span>
.
In most cases, HTML wants to show something... or at least leave CSS in control. For example, you can dynamically add children to a <script>
. While that won't be shown by default, it's simply because the UA style sheet has script set to display: none;
. If we change that, we can totally see it.
But this isn't universal: In some cases there are other renderers that are in control - mainly when it comes to form controls. But also, for example, if you switch the <hr>
in the example above to a <br>
it won't render the text. It doesn't generate a box that you can do anything with with CSS as far as I can tell, except make it display: none
(useful if they’re in white-space: pre
blocks to keep them from forcing open extra lines).
SVG
The HTML parser has fix ups for embedding SVG too, there are kind of integration points. But, in SVG you can have an unknown element too... For example:
<svg>
<unknown>Test<unknown>
<rect width="150" height="150" x="10" y="10" style="fill:blue;stroke:pink;stroke-width:5;opacity:0.5" />
</svg>
The unknown element won't render the text, nor additional SVG children inside it. For example:
<svg id=one width="300" height="170">
<unknown><ellipse cx="120" cy="80" rx="100" ry="50" style="fill:yellow;stroke:green;stroke-width:3" /></unknown>
<rect width="150" height="150" x="10" y="10" style="fill:blue;stroke:pink;stroke-width:5;opacity:0.5" />
</svg>
MathML
As you might expect, there are parser integrations for MathML too, and you can have an unknown elements in MathML too. In MathML, all elements (including unknown ones) generate a "math content box", but only token elements (like <mi>
, <mo>
, <mn>
) render text. For example, the <math>
element itself - if you try to put text in it, the text won't render, but it will still generate a box and other content.
<math>
Not a token. Doesn't render.
<mi>X</mi>
</math>
MathML has other elements too like <mrow>
and <mspace>
and <mphantom>
which are just containers. Same story there, if you try to put text in them, the text won't render...
<math>
<mrow>Not a token. Doesn't render.</mrow>
<mi>X</mi>
</math>
But if you put the text inside a token element (like <mi>
) inside that same <mrow>
, then the text will render..
<math>
<mrow><mi>ok</mi></mrow>
<mi>X</mi>
</math>
In MathML, unknown elements are basically treated just like <mrow>
. In the above examples, you could replace <mrow>
with <unknown>
and it'd be the same.
Unknown Unknowns
Ok, here's something you don't think about every day: Given this markup:
<unknown id=one>One</unknown>
<math>
<unknown id=two>Two</unknown>
</math>
<svg>
<unknown id=three>Three</unknown>
</svg>
One
, Two
and Three
are three different kinds of unknowns!
console.log(one.namespaceURI, one.constructor.name)
// logs 'http://www.w3.org/1999/xhtml HTMLUnknownElement'
console.log(two.namespaceURI, two.constructor.name)
// logs 'http://www.w3.org/1998/Math/MathML MathMLElement'
console.log(three.namespaceURI, three.constructor.name)
// logs 'http://www.w3.org/2000/svg SVGElement`
In CSS, these can also (theoretically) be styled via namespaces. The following will only style the first of those:
@namespace html url(http://www.w3.org/1999/xhtml);
html|unknown { color: blue; }
Under-defined Unknowns
Remember how in the beginning we created nonsensical constructs dynamically? Well, we can do that here too. We can move an unknown MathMLElement
right into HTML, or an unknown HTMLElement
right into MathML, and so on - and it's not currently well-defined, universal, or consistent what actually happens here.
Here's an interesting example moves an unknown MathMLElement
and a SVGElement
into HTML, and an HTMLElement
into MathML and so on
See the Pen Untitled by вкαя∂εℓℓ (@briankardell) on CodePen.
Here's how that renders in the various today:
So, I guess we should probably fix that. I'll have to start creating some issues and tentative tests (feel free to beat me to it 😉)
Semi-related Rabbit Holes
Not specifically these issues, but related namespace stuff has caused a lot of problems that we're remedying as shown by a recent flurry of activity started by my colleague Luke Warlow to fix MathML-Core support in various libraries/frameworks:
Who's next? :)