Comment by Animats - Hacker Neue

Animats Jul 3, 2015 parent

I've argued in the past for an intermediate position, especially for HTML. Browsers should be moderately tolerant of bad HTML. But rather than trying to handle errors invisibly, they should revert to a simplified rendering system intended to get the content across without the decorative effects. After the first error, a browser might stop processing further Javascript, display a red band indicating defective HTML, and display all text in the default font. It might also report the error to the server in some way.

Read through the error-recovery specification for HTML5. It's many pages of defined tolerance for old bugs. Then read the charset-guessing specification for HTML5, which is wildly ambiguous. (Statistical analysis of the document to guess the charset is suggested.) The spec should have mandated a charset parameter in the header a decade ago. If there's no charset specification, documents should render in ASCII with hex for values > 127.

dagw Jul 3, 2015

You've got two browsers to chose from. One that handles every site you visit without a problem, one which throws a bunch of obscure error messages on about 20% of the sites you visit.

Which do you think most people will chose?

femto113 Jul 3, 2015

I think this would have been great if done from the beginning, but even in early versions of Mosaic malformed HTML would still appear "correct" visually, and since once it looked ok most people figured it was ok we've been buried under broken HTML from the beginning. The idea that the browser "handles every site without a problem" is slightly misleading though, since even if everything looks ok the user is paying a price of lower performance and a slower pace of innovation as browser developers devote huge amounts of time, money, and attention to not puking on all that broken HTML.

comex Jul 3, 2015

To a large degree this is it. Nobody bats an eye if a misplaced quote somewhere in a Python program causes the whole program to fail to start, but XHTML breaking pages on syntax errors was considered a terrible idea because the old way worked fine(tm).

However, Python source code is not typically dynamically generated, while HTML is, increasing the probability of errors the site author could not trivially predict and the user can do nothing about.

tel Jul 3, 2015

Yeah, shouldn't apply the punishment to the user. It should do a best effort render and then DDOS the site. /s

Animats OP Jul 3, 2015

Good idea. Another version: if there are any errors in the HTML, the browser blocks all ads and trackers. Bad HTML would be fixed so fast...

bcoates Jul 3, 2015

Unfortunately software of all sorts has a pathological enthusiasm for adding defaulted, wrong metadata to everything. (look into medical charting and drug-dispensing software sometime if you're looking for a cheap scare).

Character-set and language tags are useless in practice, even the dumbest heuristics defeat them. Statistical analysis is so effective that encoding metadata should be forbidden, not required.

This item has no comments currently.