I’m in halfway trough an html parser and found html5 defined explicitly the rules of thumb for parsing ill formed html. (And I used to infer them from DTDs, sigh)
I love that fact, but I know well that html5 isn’t finalized yet (also I wonder if it ever will) and that it isn’t developed by the W3C, but by the WHATWG.
Searching for the spec I need I’m presented with:
- 8.2 section of the W3C TR
- 11.2 section of the WHATWG web-apps/current-work
If it wasn’t for the section numbers I would induce those are simply the same. But the different numbering makes me wonder. Which version is, supposedly, the most authoritative?
WHATWG seems to have more sections, and to have been added to since W3C uploaded its candidate recommendation.
Will W3C update to the WHATWG version?
Or will they stick to their current candidate until it gets to the official recommendation status?
Which html5 spec are we poor devils supposed to follow, when in doubt?
It depends on who you ask. Really. The politics of this are ugly. And to make matters worse, the specifications aren’t fully stable yet. I would have thought that the two specifications would be largely the same in their parsing sections since section 1.1.1 which lists the differences does not mention parsing. But then I did a web diff and I saw that there are subtle differences in the text. I would say that if you are actually implementing the specification to talk to the players involved about any differences you see between the specs, using the public mailing lists. Anyway, I am sorry I can’t give you a clear cut answer.
Always choose WHATWG over W3C, no exceptions.
Anne van Kesteren, currently employed by Mozilla, describes the current situation between WHATWG and W3C as follows on his blog:
The W3C has forked the [WHATWG] HTML Standard for the nth time. As always, it is pretty disastrous:
- Erased all Git history of the document.
- Did not document how they transformed the document. Issues of mismatches have already been reported and it will likely be a long time, if ever, before all bugs due to this process are uncovered, since it was not open.
- Did not discuss plans with the wider community.
- Did not discuss plans with the folks they were forking from.
- Did not even discuss plans with the members of the W3C Web Platform Working Group.
- Erased the acknowledgments section.
- Erased the copyright and licensing information and replaced it with their own.
OK , I eventually came to my own conclusion and I’m gonna share it.
I will follow the W3C version: blindly.
Politically speaking it’s not a simple decision. Let me explain.
I was extremely sceptic about w3c, and I possibly even hated their
guts during the whole XHTML debate/debacle. I saw the rise of
WHATWG as the arrival of our pragmatical saviours: people that
openly admitted that HTML can’t be made into a stiff, rigorous XML-derived language, while the whole internet bothers nigh about it.
So given this point of view I should go with the WHATWG spec, shouldn’t I?
WHATWG doesn’t establish official versions. I kind of wish they did, but they don’t.
They feel versions are too rigid for their…let’s say hip attitude.
They instead have only a live standard.
(and track implementation status of any single feature by major browsers)
But I’m not a major browser, I’m a small implementer, I cannot refer to a live standard.
Well, not unless I go crazy over it and release constantly, like there’s no tomorrow.
(that’s sort of what is happening with firefox and chrome)
So over neverending frenetic madness, I have to choose sanity. And W3C offers polished and numbered versions of the spec. And I can claim to conform to one of those version.
Biased answer from an editor of WHATWG HTML here. Hopefully the facts can speak for themselves though.
The WHATWG Living Standard should be considered authoritative. It is constantly worked on by a large community of contributors, including all browser vendors. No browser vendors implement according to W3C HTML; for some such as Firefox and Chrome this is a matter of publicly stated policy.
The WHATWG Living Standard is constantly receiving bug fixes and new features. For more information on this model of spec development, which more closely matches modern software development practices, see What does “Living Standard” mean?.
Unfortunately, the W3C sometimes copies and pastes our work onto their own website, and puts their own logo on it, and changes the names of the editors, and such. They do this for a variety of reasons, one of the largest of which is face-saving for the sake of their paying member companies (example of them stating this). What’s worse, they like to release “versions” (like HTML “5.0”, “5.1”, etc.) which are just outdated versions missing modern bug fixes and features that clog up search result pages, causing confusion like this very question. We are currently tracking the confusion caused by these forks, of which HTML is only one.
You can track their progress on the copy-and-paste job in their issue tracker or in commits such as this one. It’s a fun game to spot the bugs they introduce while doing this copy-and-paste job, as they generally does not read or understand the content they are copying, leading to widespread errors and inconsistencies.
When in doubt, try to match the behavior of actual browsers. That’s all that actually matters.
In general, WHATWG is probably more current than W3C, though it may include more things that browsers don’t support (yet).
You can think of W3C as taking snapshots of WHATWG at given points in time, stabilizing them, and then hardening them, never to be changed.
- W3C HTML5 was finalized 28 October 2014.
- W3C HTML5.1 was finalized 1 November 2016.
- W3C HTML5.2 is currently in its “First Working Draft” and probably won’t be finalized until 2019.