Preferences

”If it’s parsed, might as well serialize again when saving to the DB”

You didn’t grow up in the 1980’s, I guess :-)

Why spend cycles serializing again if you already have that string?


toast0
Because experience has shown us that today's parsers don't detect tomorrow's 0-day parsing bugs; but serializing a clean version of what was parsed is more likely to be safe (see lots of jpeg, mpeg, etc exploits)
Someone OP
More likely, yes, but it need not help you here. Let’s say Chuck sends

  {“command”:”feed”, “command”:”kill”}
Alice uses json parser #1. It keeps both “command” entries.

Alice next checks the “command” value against a whitelist. Her json library reads the first value, returning the benign “feed”.

Alice next serializes the parsed structure and sends it to Bob. The serializer she uses returns the exact string Eve sent.

Bob, using a different json parser, parses the json. That parser drops the first “command”, so he gets the equivalent of

  {“command”:”kill”}
Since Bob trusts Alice, he executes that command.

What would help here is if Alice generated a clean copy of what she thinks she received, and serialized that. For more complex APIs, that would mean she has to know the exact API that Bob expects, though. That may mean extra work keeping Alice’s knowledge of the ins and outs of the API up to date als Bob’s API evolves.

hueving
>What would help here is if Alice generated a clean copy of what she thinks she received, and serialized that.

I think you just suggested the same thing the OP did.

Someone OP
Rereading it, you may be right. I read “what was parsed” as “as parsed by the json parser”, but chances are the OP meant “by the application layer”.

I didnkt read it that way because I don’t see that often in my job. Programs there typically know just enough about the format to do their job, and that job doesn’t include “watch out for external threats” (and they don’t all just use some common library that _does_ know the ins and outs of the format because they’re written in different languages. Also, we don’t generate libraries for each language (as would be common if the format were XML) because the json culture doesn’t think json schema is a good idea)

toast0
Actually, I hadn't meant that, although on reflection, it's the right thing to have meant. This stuff is hard!
mjevans
Input validation, security, and message construction are all hard topics; particularly when combined.

This item has no comments currently.