Comment by paganel - Hacker Neue

paganel 1 day ago parent

There are always unknown unknowns which a rigorous testing implementation would just hide under the rug (until they become visible on live, that is).

> They had 9000+ tests.

They were most probably also written by AI, there's no other (human) way. The way I see it we're putting turtles upon turtles hoping that everything will stick together, somehow.

simonw 1 day ago

No, those 9,000 tests are part of a legendary test suite built by real humans over the course of more than a decade: https://github.com/html5lib/html5lib-tests

Aloisius 17 hours ago

Sadly, JustHTML doesn't appear to be truly passing those tests.

It looks like the code doesn't always check whether expected errors in the testsuite match the returned errors - which is rather important to ensure one isn't just incidentally getting the expected output.

So while JustHTML looks sort of right, it'll actually do things like emit errors on perfectly valid html.

Plus, the test suite isn't actually comprehensive, so if one only writes code to pass the tests, it can fail in the real world where other parsers that actually wrote against the spec wouldn't have trouble.

For instance, the html5lib-tests only tests a small number of meta charsets and as a result, JustHTML can't handle a whole slew of valid HTML5 character encodings like windows-1250 or koi8-r - which parsers like html5lib will happily handle. There's even a unit test added by the AI that ensures koi8-r doesn't work, for some reason.

paganel OP 12 hours ago

I thought we were talking about human-scale (as in not multi-human) projects, my bad.

pjc50 1 day ago

I tabbed back to Visual Studio (C#): 24990 "unit" tests, all written by hand over the past years.

Behind that is a smaller number of larger integration tests, and the even longer running regression tests that are run every release but not on every commit.

zahlman 1 day ago

> They were most probably also written by AI, there's no other (human) way.

Yes. They came from the existing project being ported, which was also AI-written.

doganugurlu 16 hours ago

They were not, and they did not.

Those human tests are why your browser properly renders diversely messy HTML.

zahlman 4 hours ago

Oh, I misunderstood the previous submission, then.

This item has no comments currently.