* The data does not map well to database tables, e.g. when it's tree structures (of course that could be represented as many table rows too, but it's complicated and may be slower when you always need to operate on the whole tree anyway)
* your programming language has better types and programming facilities than SQL offers; for example in our Haskell+TypeScript code base, we can conveniently serialise large nested data structures with 100s of types into JSON, without having to think about how to represent those trees as tables.
I find this one of the hardest part of using JSON, and the main reason why I rather put it in proper columns. Once I go JSON I needs a fair bit of code to deal with migrartions (either doing them during migrations; or some way to do them at read/write time).
Another example is a classifieds website, where your extra details for a Dress are going to be quite a bit different than the details for a Car or Watch. But, again, you don't necessarily want to inflate the table structure for a fully normalized flow.
If you're using a concretely typed service language it can help. C# does a decent job here. But even then, mixing in Zod with Hono and OpenAPI isn't exactly difficult on the JS/TS front.
tryna map everything in a relational way etc - you're in a world of pain
But the more complex it is, the more complex the relational representation becomes. JSON responses from some API's could easily require 8 new tables to store the data in, with lots of arbitrary new primary keys and lots of foreign key constraints, your queries will be full of JOIN's that need proper indexing set up...
Oftentimes it's just not worth it, especially if your queries are relatively simple, but you still need to store the full JSON in case you need the data in the future.
Obviously storing JSON in a relational database feels a bit like a Frankenstein monster. But at the end of the day, it's really just about what's simplest to maintain and provides the necessary performance.
And the whole point of the article is how easy it is to set up indexes on JSON.
Typical example is a price-setting product I work on.. there's price ranges that are universal (and DB columns reflect that part) but they all have weird custom requests for pricing like rebates on the 3rd weekend after X-mas (but only if the customer is related to Uncle Rudolph who picks his nose).
There's no reason to put all those extra fields in the same table that contains the universal pricing information.
I'll give a comparison.
JSON
- We have some frontend logic/view (that can be feature-flagged per customer) to manage updating the data that's otherwise mostly tagging along as a dumb "blob" (auto-expanded to regular a part of the JSON objects maps/arrays at the API boundary making frontend work easier, objects on the frontend, "blobs" on the backend/db)
- Inspecting specfic cases (most of the time it's just null data) is just copying out and formatting the special data.
- If push comes to shows, all modern databases support JSON queries so you can pick out specifics IF needed (has happened once or twice with larger customers over the years).
- We read and apply the rules when calculating prices with a "plugin system"
DB Schema (extra tables)
- Now you have to wade through lots of customer-specific tables just to find the tables that takes most of the work-time (customer specifics are seldomly what needs work once setup). We already have some older customer-specific stuff from the early days (I'm happy that it's not happened much lately).
- Those _very_ few times you actually need to inspect the specific data by query you might win on this (but as mentioned above, JSON queries has always solved it).
- Loading the universal info now needs to query X extra tables (even when 90%-95% of the data has no special cases).
- Adding new operations on prices like copying,etc now needs to have logic for each piece of customer specific table to properly make it tag along.
- "properly" modelled this reaches the API layer as well
- Frontend specialization is still needed
- Calculating prices still needs it's customization.
I don't really see how my life would have been better for managing all extra side-effects of bending the code to suit these weird customer requests (some that aren't customers anymore) when 90-95% of the time it isn't used and seldomly touched upon with mature customers.
I do believe in the rule of 3, if the same thing pops up three times I do consider if that needs to be graduated to more "systematic" code, so often when you abstract after seeing something even 2 times it never appears again leaving you with some abstraction to maintain.
JSON columns, like entity-attribute-value tables or goto statements all have real downsides and shouldn't be plonked in without a reason, but hell if I'd have to work with overly complex schemas/models because people start putting special cases into core pieces of code just because they heard that a technique was bad.
It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.