Comment by kibwen - Hacker Neue

kibwen Jun 2, 2024 parent

Let's make a distinction here between serialization formats and configuration formats. Because JSON is often used for both, these two use cases often get conflated.

For configuration formats, I 100% agree with you. I do not want any data type except a string and a hashmap (maybe an array if you're being luxurious). Not an int, not a float, not a boolean, not a datetime (looking at you, TOML). For configuration formats I am always immediately feeding those files into a language with a richer type system that will actually parse them; my program and its embedded types are the schema. (Users of dynamically-typed languages may reasonably disagree.)

However, for the serialization use case, I'm not so sure. There's an argument that having a schema against which to do lightweight validation at several points in the pipeline isn't the worst idea, and built-in primitives get you halfway to a half-decent schema. I'm ambivalent at worst.

troupo Jun 2, 2024

> my program and its embedded types are the schema.

They are not. Configuration is a very tiny subset of a more general problem that you also mention: serialization.

Your config file will be de-serialized by your program and parsed into some specific types. Including numbers (tons of edge cases), dates (tons of edge cases), strings (tons of edge cases) etc.

It becomes worse when your program is used by more people than just you: which field is a date? In which format? Do you handle floats? What precision? What's the decimal separator? Do you do string normalization? What are valid and invalid characters, if any?

You can't pretend that your config is "just strings". They are not

mike_hock Jun 2, 2024

I kind of took away the opposite from the parent post. Of course, your config isn't just strings, but it also isn't just a limited set of primitive types that the inventor of some one-size-fits-all configuration language envisioned.

You can't build a generic schema validator that will accept exactly the valid configs for some program and nothing else anyway, so forget the half-assed type checking attempts and just provide the hierarchical structure. It's up to the application to define the valid grammar and semantics of each config option and parse it into an application-specific type.

troupo Jun 3, 2024

That's why every time I run into a program-specific config I curse the developer because there's no way of knowing what exactly a particular program (or a framework) needs :)

wruza Jun 2, 2024

But most configs are just strings and it’s okay. How does it get so bad just itt?

Human input is full of tradeoffs, that’s why it’s bash and not typescript in your shell path column. And you’ll meet a great resistance from users if you make your config fully typed and require to refer to schema dtd ns or whatever bs xml had.

troupo Jun 3, 2024

> that’s why it’s bash and not typescript in your shell path column

Bash is there purely for historical reasons. And it sucks.

> And you’ll meet a great resistance from users if you make your config fully typed and require to refer to schema dtd ns or whatever bs xml had.

That schema can and will help editors to validate and autocomplete things on the fly, and can also serve as a reference for what actual data the config accepts.

hgyjnbdet Jun 2, 2024

I would say all configs should be treated as castable strings. That's why for config files I much prefer the INI format.

nevermore24 Jun 2, 2024

The strings are strings. I don't care how people handle their dates, that's between them and their god.

This item has no comments currently.