Comment by guntars - Hacker Neue

guntars Apr 22, 2018 parent

I sure hope you don’t just put random user provided blobs in your database, even if they’re validated. Also, how do you validate without parsing? If it’s parsed, might as well serialize again when saving to the DB.

Someone Apr 22, 2018

”If it’s parsed, might as well serialize again when saving to the DB”

You didn’t grow up in the 1980’s, I guess :-)

Why spend cycles serializing again if you already have that string?

toast0 Apr 22, 2018

Because experience has shown us that today's parsers don't detect tomorrow's 0-day parsing bugs; but serializing a clean version of what was parsed is more likely to be safe (see lots of jpeg, mpeg, etc exploits)

Someone Apr 22, 2018

More likely, yes, but it need not help you here. Let’s say Chuck sends

  {“command”:”feed”, “command”:”kill”}

Alice uses json parser #1. It keeps both “command” entries.

Alice next checks the “command” value against a whitelist. Her json library reads the first value, returning the benign “feed”.

Alice next serializes the parsed structure and sends it to Bob. The serializer she uses returns the exact string Eve sent.

Bob, using a different json parser, parses the json. That parser drops the first “command”, so he gets the equivalent of

  {“command”:”kill”}

Since Bob trusts Alice, he executes that command.

What would help here is if Alice generated a clean copy of what she thinks she received, and serialized that. For more complex APIs, that would mean she has to know the exact API that Bob expects, though. That may mean extra work keeping Alice’s knowledge of the ins and outs of the API up to date als Bob’s API evolves.

hueving Apr 23, 2018

>What would help here is if Alice generated a clean copy of what she thinks she received, and serialized that.

I think you just suggested the same thing the OP did.

Someone Apr 23, 2018

Rereading it, you may be right. I read “what was parsed” as “as parsed by the json parser”, but chances are the OP meant “by the application layer”.

I didnkt read it that way because I don’t see that often in my job. Programs there typically know just enough about the format to do their job, and that job doesn’t include “watch out for external threats” (and they don’t all just use some common library that _does_ know the ins and outs of the format because they’re written in different languages. Also, we don’t generate libraries for each language (as would be common if the format were XML) because the json culture doesn’t think json schema is a good idea)

toast0 Apr 23, 2018

Actually, I hadn't meant that, although on reflection, it's the right thing to have meant. This stuff is hard!

mjevans Apr 23, 2018

Input validation, security, and message construction are all hard topics; particularly when combined.

krapp Apr 22, 2018

If it's parsed, why even store in in a database as JSON at all?

If you don't do that... then multiple possible JSON parsers aren't a problem.

Mikhail_Edoshin Apr 23, 2018

Use case: I sync local data with web API. I do not use all the data I receive, only a few bits, but if I modify them, I have to send a complete object back to the server with all the other data. The simplest way to do this is to store the original JSON.

The CardDAV and CalDAV are not JSON, but their specification also requires you to preserve the whole vCard if you ever want to send your changes back to the server. CardDAV data may be accessed by multiple apps and they are allowed to add their private properties; any app that deals with vCards must preserve all properties, including those it doesn't understand or use.

couchand Apr 23, 2018

One common reason is to provide a flexible table for things that may not have an identical schema. For instance, an event log might have details about each event that differ based on the event type.

This item has no comments currently.