For example, if you validate your json in your web front-end (EDIT: I used the wrong term. What I meant here is the server-side process that’s in front of your database) and then pass the string received to your json-aware database, you’re likely using two json implementations that may have different ideas about what constitutes valid json.
For example, a caller might pass in a dictionary with duplicate key names, and the two parsers might each drop a different one, or one might see json where the other sees a comment.
By representing fields with enums or proper types you get some constraints on values as well, eg: If a value is really an integer field then your type can declare it as Int and deserialization will smash it into that shape or throw an error, but you don't end up with indeterminate or nonsense values.
This can be even more important for UUIDs, Dates, and other extremely common types that have no native JSON representation, nor even any agreed-upon consensus around them.
You get less help from the language with dynamic languages like Python but you can certainly accomplish the same thing with some minimal extra work. Or perhaps it would be more accurate to say languages like Python offer easy shortcuts that you shouldn't take.
In any case I highly recommend this technique for enforcing basic sanitization of data. The other is to use fuzzing (AFL or libFuzzer).
You didn’t grow up in the 1980’s, I guess :-)
Why spend cycles serializing again if you already have that string?
{“command”:”feed”, “command”:”kill”}
Alice uses json parser #1. It keeps both “command” entries.Alice next checks the “command” value against a whitelist. Her json library reads the first value, returning the benign “feed”.
Alice next serializes the parsed structure and sends it to Bob. The serializer she uses returns the exact string Eve sent.
Bob, using a different json parser, parses the json. That parser drops the first “command”, so he gets the equivalent of
{“command”:”kill”}
Since Bob trusts Alice, he executes that command.What would help here is if Alice generated a clean copy of what she thinks she received, and serialized that. For more complex APIs, that would mean she has to know the exact API that Bob expects, though. That may mean extra work keeping Alice’s knowledge of the ins and outs of the API up to date als Bob’s API evolves.
If you don't do that... then multiple possible JSON parsers aren't a problem.
The CardDAV and CalDAV are not JSON, but their specification also requires you to preserve the whole vCard if you ever want to send your changes back to the server. CardDAV data may be accessed by multiple apps and they are allowed to add their private properties; any app that deals with vCards must preserve all properties, including those it doesn't understand or use.
If you are creating a JSON response from your own API, you control the JSON output.
Unless you are crafting JSON from scratch I doubt anyone runs into the issues mentioned in the OP.
I thought JSON was specified quite clearly.
There are no limits of the loopy things (the number of consecutive digits in numbers), but I don't consider that a weakness of the standard.
Most of the tests that I see do pass completely invalid JSON.
The summary table suggests that a real bug was found in about half of the parsers tested, and even a few of those bugs belong to a category that one might choose to ignore: almost any non-trivial function can be made to run out of memory, and lots of functions will crash ungracefully when that happens, rather than correctly free all the memory that was allocated and return an error code, which the caller probably doesn't handle correctly in any case because this case was never tested.