Preferences

looperhacks parent
HTML isn't XML. It's close, but it isn't. There's XHTML for that.

eponeponepon
Just for the record - XML and HTML are both subsets of SGML, somewhat overlapping, but by no means coterminous with each other (at least until HTML 5 - I'm honestly not sure what it's relationship to SGML is).

And, speaking from experience, the XML nay-sayers should largely be glad if they never had to deal with SGML :)

teddyh
HTML pretended to be a subset of SGML, but never really was, and the illusion quickly dispersed as time went on, since HTML was strictly pragmatic and ran in resource-constrained environments (the desktop), while SGML was academic, largely theoretical, and ran on servers, analyzing text.

XML, on the other hand, was more of a back-formation – a generalization of HTML; it was not, as I understand it, directly related to SGML in any way. The existence of XML was a reaction to SGML being impractical, so it would be strange if XML directly derived from SGML.

tannhaeuser
> XML [...] was not [...] directly related to SGML in any way

That's incorrect. XML is by definition a proper subset of WebSGML, the SGML revision specified in ISO 8879:1986 Annex K. These two specifications were published around the same time and authored by the same people.

In a nutshell, XML added DTD-less SGML (eg. such that every document can be parsed without markup declarations, unlike eg. HTML which has `img` and other empty elements the parser needs to know about) and XML-style empty elements. The features removed from SGML to become XML were tag inference/omission (as used in HTML), short references (for things such as Wiki syntax, CSV, and even JSON parsing), uses of marked sections other than `CDATA`, more complex use cases for notations, and link process declarations ("stylesheets") plus a couple others.

XML was intended as subset of SGML that can be meaningfully parsed without knowing DTD of document in question, which involves removing a lot of weird SGML features and constraining others. Formally XML is not SGML subset as there are some unimportant and some quite critical incompatible details.
The main point of HTML5 is that it is not defined in terms of SGML but by it's own grammar which is in fact described by imperative algorith for parsing it (which also unambiguously specifies what should happen for notionally invalid inputs, AFAIK to the extent that for every byte stream there is exactly one resulting DOM tree).
spiralx
http://sgmljs.net/docs/html5.html

HTML5 is almost a subset of SGML, barring some ambiguities in itz table spec, HTML comments in script tags and the spellcheck and contenteditable attributes.

3pt14159
I’m aware that HTML 5 wasn’t an XML but I thought XHTML was an XML and browsers still have to support it because not everyone is on HTML 5.

Either way, my original draft included language around the distinction, but I felt I’d already written too much so I cut it.

This item has no comments currently.