Comment by arghwhat - Hacker Neue

arghwhat Apr 22, 2018 parent

JSON is fairly trivial. The post is a nonsensical rant about parsers accepting non-JSON compliant documents (as the JSON spec specifically states that parsers may), such as trailing commas.

In the large colored matrix, the following colors mean everything is fine: Green, yellow, light blue and deep blue.

Red are crashes (things like 10000 nested arrays causing a stack overflow—this is a non-JSON-specific parser bug), and dark brown are constructs that should have been supported but weren't (things like UTF-8 handling, which is again non-JSON specific parser bugs).

Writing parsers can be tricky, but JSON is certainly not a hard format to parse.

SamReidHughes Apr 23, 2018

As a question of fact, programs put out JSON that gets misparsed by other programs. Some simply parse floating point values differently, or they treat Unicode strings incorrectly, or output them incorrectly. Different parsers have different opinions about what a document represents. This has a real world impact.

Accepting invalid or ambiguous or undefined JSON is not an acceptable behavior. It means bugs get swallowed up and you can't reliably round trip data.

repsilat Apr 23, 2018

> Accepting invalid or ambiguous or undefined JSON is not an acceptable behavior

Just to make it explicit (and without inserting any personal judgement into the conversation myself): JSON parsers should reject things like trailing commas after final array elements because it will encourage people to emit trailing commas?

Having asked the question (and now explicitly freeing myself to talk values) it's new to me -- a solid and rare objection to the Robustness Principle. Maybe common enough in these sorts of discussions, though? Anyway, partial as I might be to trailing commas, I do quite like the "JSON's universality shall not be compromised" argument.

SamReidHughes Apr 23, 2018

Postel's Law or the "Robustness Principle" is an anti-pattern in general.

Accepting trailing commas in JSON isn't as big a deal as having two different opinions about what a valid document is. But you might think a trailing comma could indicate a hand-edited document that's missing an important field or array element.

arghwhat OP Apr 23, 2018

Unicode and floating point misparsing are not even remotely JSON related, but are simply bugs that can occur in any parser that handles unicode or floating point. Thus, complaining about it in a "JSON is a minefield" thread is a bit silly.

If you put out JSON that gets misparsed, you either generated invalid JSON, or the parser is faulty. Nothing around that.

This has nothing to do with whether parsers have flexibility to accept additional constructs, which is extremely common for a parser to do.

Annatar Apr 23, 2018

Actually unless one is doing JavaScript, JSON is extremely difficult to parse correctly. I challenge you to write a simple, understandable JSON parser in Bourne shell or in AWK.

repsilat Apr 23, 2018

> JSON is extremely difficult to parse correctly ... in Bourne shell or in AWK.

Sorry for the misquote, but does it get to the heart of your objection?

I'm torn here. On the one hand I want to say "Those are not languages one typically writes parsers in," but that's a really muddled argument:

1. People "parse" things often in bash/awk because they have to -- because bash etc deal in unstructured data.

2. Maybe "reasonable" languages should be trivially parseable so we can do it in Bash (etc).

I'm kinda torn. On the one hand bash is unreasonably effective, on the other I want data types to be opaque so people don't even try to parse them... would love to hear arguments though.

arghwhat OP Apr 23, 2018

You definitely shouldn't write a parser for anything in bash.

If you want to deal with JSON, I'd recommend jq as an easy manipulation tool.

Annatar Apr 23, 2018

I wrote Bourne shell and both of you assumed bash. Horrid state of affairs in today’s IT. All hail Linux hegemony!

And AWK, if it’s not easily parsable with the language specifically designed for parsing data, something is wrong with that data.

Annatar Apr 23, 2018

Shell and AWK programs have fewest dependencies, are extremely light on resources’ consumption and are extremely fast. When someone’s program emits JSON because the author assumed that programs which ingest that data will also be JavaScript programs, that’s a really bad assumption. It would force the consumer of that data to replicate the environment. This goes against core UNIX principles, as discussed at length in “The art of UNIX programming” book. It’s a rookie mistake to make.

nnq Apr 23, 2018

> It would force the consumer of that data to replicate the environment.

It won't, because JSON is a standard. Imperfect like all standards but practically good enough. And "plain text" just means "an undefined syntax that I have to mostly guess". And nobody "programs" in bash or awk anymore. The "standard scripting languages" for all sane devs are Python or Ruby (and some Perl legacy) and parsing JSON in them is trivial.

The "UNIX philosophy" was a cancerous bad idea anyway and now it's thankfully being eaten alive by its own children, so time to... rejoice?!

EDIT+: Also, if I feel truly lazy/evil (like in "something I'll only in this JS project"), I would use something much much less standard than JSON, like JSON-5 (https://github.com/json5/json5), which will practically truly force all consumers to use JS :P

fao_ Apr 23, 2018

The standard for plain text output in the terminal is and has always been TSV.

2 More Comments →

arghwhat OP Apr 23, 2018

Why in the world would I write a parser in bash or awk, regardless of the format? I certainly have better things to do. It doesn't matter how sensible a format is, those tools are simply not appropriate to write a parser in.

I have, however, written a JSON parser before at a previous company in C++ (they didn't want to pull in a library). It wasn't particularly hard.

And yes, like any other parser, it accepted additional non-JSON constructs. This was simply because it was take additional work to error out on those constructs, which would be a waste of time.

Annatar Apr 23, 2018

Why in the world would I have to use JavaScript or any object oriented programming language just because the application is poorly thought out and emits JSON? I certainly have better things to do.

It doesn't matter how sensible a format is, those tools are simply not appropriate to write a parser in.

AWK is a language designed for data parsing and processing. That is what it is designed to do.

How did you solve the parsing of arbitrarily nested structures?

jhomedall Apr 23, 2018

> How did you solve the parsing of arbitrarily nested structures?

Write a recursive descent parser, or use a parser generator.

Or, more realistically, use one of the many libraries available for parsing it (pretty much every language has one at this point).

mikerybka Apr 23, 2018

Hope one of these helps with your problem:

https://github.com/step-/JSON.awk

https://github.com/dominictarr/JSON.sh

rarepostinlurkr Apr 24, 2018

Json is clearly data.

If awk cannot parse it, perhaps it’s a shortcoming of awk? =)

Annatar Apr 24, 2018

It’s not an AWK limitation; recursive descent parsers are non-trivial to implement and tricky to get right.

This item has no comments currently.