There is something very nice and expressive about the existing JSON types. Just 6 types (null, boolean, string, number, array, and dictionary) are enough to cover a ton of use cases, and as you suggest, one can always fall back to "stringly typed" alternatives by implementing one's own serialization and deserialization for extra types.
CBOR features are almost one-to-one with JSON, except that the encoding is more size-efficient, it supports a few additional types (e.g., integers and floats are separate), and it allows semantic tags.
However, some of the things I mentioned above, do have benefits for interoperability with JSON, although they aren't good for a general-purpose use; I think that it would generally be better to make a good format rather than trying to work only with the bad ideas of other specifications. (Fortunately, I think what I described above could be implemented using a subset of CBOR.)
However, using these formats (whether CBOR or JSON) is often more complicated than should be needed for a specific use anyways.
For configuration formats, I 100% agree with you. I do not want any data type except a string and a hashmap (maybe an array if you're being luxurious). Not an int, not a float, not a boolean, not a datetime (looking at you, TOML). For configuration formats I am always immediately feeding those files into a language with a richer type system that will actually parse them; my program and its embedded types are the schema. (Users of dynamically-typed languages may reasonably disagree.)
However, for the serialization use case, I'm not so sure. There's an argument that having a schema against which to do lightweight validation at several points in the pipeline isn't the worst idea, and built-in primitives get you halfway to a half-decent schema. I'm ambivalent at worst.
They are not. Configuration is a very tiny subset of a more general problem that you also mention: serialization.
Your config file will be de-serialized by your program and parsed into some specific types. Including numbers (tons of edge cases), dates (tons of edge cases), strings (tons of edge cases) etc.
It becomes worse when your program is used by more people than just you: which field is a date? In which format? Do you handle floats? What precision? What's the decimal separator? Do you do string normalization? What are valid and invalid characters, if any?
You can't pretend that your config is "just strings". They are not
You can't build a generic schema validator that will accept exactly the valid configs for some program and nothing else anyway, so forget the half-assed type checking attempts and just provide the hierarchical structure. It's up to the application to define the valid grammar and semantics of each config option and parse it into an application-specific type.
Human input is full of tradeoffs, that’s why it’s bash and not typescript in your shell path column. And you’ll meet a great resistance from users if you make your config fully typed and require to refer to schema dtd ns or whatever bs xml had.
Bash is there purely for historical reasons. And it sucks.
> And you’ll meet a great resistance from users if you make your config fully typed and require to refer to schema dtd ns or whatever bs xml had.
That schema can and will help editors to validate and autocomplete things on the fly, and can also serve as a reference for what actual data the config accepts.
But different languages interpret different strings in different ways by default.
This leads to major bugs.
One of the great strengths of JSON is that parsing a number is well-defined.
The way you're suggesting would lead to people emitting JSON with leading zeros sometimes, and then some languages end up interpreting certain numbers as octal.
No thank you.
JSON numbers are far more restrictive than strings and carry precisely defined meaning in a way that arbitrary strings don't. They're only "just certain strings" in the same way anything can be serialized to a string, which doesn't really mean anything.
What does jq do to them?
echo 1.4e99999999999999 | jq
1.7976931348623157e+308
While I agree that the meaning of json numbers exists, I'm not sure which JSON standard you're referring to that contains this meaning. json.org certainly does not contain it, and links to ECMA-404, which just says "JSON is agnostic about the semantics of numbers."I've never gone back to formalize the grammar or otherwise mature it. But it's served me well as-is, and it's been easy to convert "up" to JSON or YAML or XML or what-have-you, once the case for an interface beyond plain text proves worthwhile.
It would be much simpler if all primitives were strings, and it'd probably save a few people from accidentally doing the wrong thing while dealing with prices.