Comment by drawkbox - Hacker Neue

drawkbox Apr 22, 2018 parent

Before JSON, XML and standard binary formats, there were just CSV/TSV and random binary formats which was a bigger minefield. Simply exchanging data was a project in itself.

At least JSON and XML are text based when it comes to data exchange. Back in the day before APIs that needed to exchange data cleanly, without JSON/XML, exchanging data was not only a minefield but one with constant carpet bombing. The fact that edge cases that are rarely run into is all that is left of data exchange issues is a huge advancement.

What is great is in most cases JSON works fantastic and simplifies data exchange and APIs all the way to front ends and backends. XML is available if needed. So are binary standard formats now for really compact areas like messaging for performance that humans may never see or you may never have a third party that needs to parse it. The task of parsing xml in client side javascript is not fun especially, neither is binary parsing where adding a value can break the whole object, JSON keys can come and go.

The engineer can choose the tool for the job but there better be a good reason to use anything over simple JSON, almost any problem can be solved with it. Engineers should aim to take data complexity and make it as simple as possible, not take simple and make it complex for job security, real engineers always move more simple when possible and away from vogon ways.

For data that is exchanged between services and front-end/backend, JSON is the simplified format that makes things move faster. XML got tarred and devolved into vogon sludge with SOAP services and nested namespacing/schemas but still is needed in some areas. Binary standard formats when you control both sides or noone else needs to connect to it or you don't need it on the front end maybe or possibly you need performant real-time messaging. There is also YAML if you need more typing or BSON where binary needed but still simple. All formats have good uses and bad but using binary when JSON will suffice is not being as simple as possible.

JSON is easy to get around and is more lightweight, if you run into a problem you can just restructure your JSON to make it work where binary or XML take more work to change without breaking changes, especially downstream causing many more versions and conversions. JSON is a data messaging format meant to simplify. Most of the issues in the OP article could be solved storing the values in a string with a "type" or "info" key that allows conversion in the backend i.e. long numbers or hex etc or storing binary as base64 etc.

JSON is based on basic CS types in objects, lists, simple data types like string, number, bool, date, this makes for a simplifying of all systems that serialize and deserialize to it. JSON helps spread simplicity while being dynamic.

JSON works best with ever changing dynamic data/code/projects we build today and in seconds you can be consuming data from third party APIs faster than any other format and more simplistically, that is why it won.

dboreham Apr 22, 2018

You forgot BER ;)

dfox Apr 22, 2018

And XDR, although in it's case it is slightly unclear whether it is standardized or random binary format ;)

eadmund Apr 22, 2018

> Before JSON, XML and standard binary formats, there were just CSV/TSV and random binary formats which was a bigger minefield.

S-expressions predate both, are simpler to parse than either, are more legible than both and are cheaper than either.

Here's a JSON example from http://json.org/example.html:

    {
        "glossary": {
            "title": "example glossary",
    		"GlossDiv": {
                "title": "S",
    			"GlossList": {
                    "GlossEntry": {
                        "ID": "SGML",
    					"SortAs": "SGML",
    					"GlossTerm": "Standard Generalized Markup Language",
    					"Acronym": "SGML",
    					"Abbrev": "ISO 8879:1986",
    					"GlossDef": {
                            "para": "A meta-markup language, used to create markup languages such as DocBook.",
    						"GlossSeeAlso": ["GML", "XML"]
                        },
    					"GlossSee": "markup"
                    }
                }
            }
        }
    }

In XML it'd be:

    <!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
     <glossary><title>example glossary</title>
      <GlossDiv><title>S</title>
       <GlossList>
        <GlossEntry ID="SGML" SortAs="SGML">
         <GlossTerm>Standard Generalized Markup Language</GlossTerm>
         <Acronym>SGML</Acronym>
         <Abbrev>ISO 8879:1986</Abbrev>
         <GlossDef>
          <para>A meta-markup language, used to create markup
    languages such as DocBook.</para>
          <GlossSeeAlso OtherTerm="GML">
          <GlossSeeAlso OtherTerm="XML">
         </GlossDef>
         <GlossSee OtherTerm="markup">
        </GlossEntry>
       </GlossList>
      </GlossDiv>
     </glossary>

And as an S-expression it'd be:

    (glossary (title "example glossary")
              (div
               (title S)
               (list
                (entry (id SGML)
                       (sort-as SGML)
                       (term "Standard Generalized Markup Language")
                       (acronym SGML)
                       (def (para "A meta-markup language, use to create markup languages such as DocBook.")
                            (see-also GML XML))
                       (see markup)))))

Which is, I believe, a huge improvement.

djur Apr 23, 2018

The S-expression has cleaner whitespace and field names than the JSON, which makes it harder to make an apples-to-apples comparison.

But the biggest problem with that S-expression is that I don't know how to parse it. Is SGML a symbol, identifier, a quoteless string? How do I know when parsing the 'entry' field that what follows is going to be a list of key/value pairs without parsing the whole expression? Is 'see-also GML XML' parsed as a list? How do we distinguish between single element lists and scalars? Is it possible to express a list at the top level, like JSON allows? How do you express a boolean, or null?

Of the problems outlined in the OP, S-expressions solve one: there's no question of how to parse trailing commas because there are no trailing commas. They do not solve questions of maximum levels of nesting. They have the same potential pitfalls with whitespace. They have exactly the same problems with parsing strings and numbers. They have the same problem with duplicated keys.

My point here isn't that you can't represent JSON as S-expressions. Clearly you can. My point is that in order to match what JSON can do, you have to create rules for interpreting the S-expressions, and those rules are the hard part. Those rules, in essence, _are_ JSON; once you've written the logic to serialize the various types supported by JSON to and from S-expressions, you've implemented "JSON with parentheses and without commas".

eadmund Apr 23, 2018

> Is SGML a symbol, identifier, a quoteless string?

It's a sequence of bytes — a string, if you like.

> How do I know when parsing the 'entry' field that what follows is going to be a list of key/value pairs without parsing the whole expression?

You wouldn't, and as a parser you wouldn't need to. The thing which accepts the parsed lists of byte-sequences would need to know what to do with whatever it's given, but that's the same issue as is faced by something which accepts JSON.

> Is 'see-also GML XML' parsed as a list?

(see-also GML XML) is a list.

> How do we distinguish between single element lists and scalars?

'(single-element-list)' is a single-element list; 'scalar' is a scalar. Just like '["single-element-list"]' & '"scalar"' in JSON.

> Is it possible to express a list at the top level, like JSON allows?

That whole expression is a list at top level.

> How do you express a boolean, or null?

The same way that you represent a movie, a post or an integer: by applying some sort of meaning to a sequence of bytes.

> They do not solve questions of maximum levels of nesting.

They don't solve the problem of finite resources, no. It'll always be possible for someone to send one more data than one can possibly process.

> They have the same potential pitfalls with whitespace.

No, they don't, because Ron Rivest's canonical S-expression spec indicates exactly what is & is not whitespace.

> They have exactly the same problems with parsing strings and numbers.

No they don't, because they don't really have either strings or numbers: they have lists an byte-sequences. Anything else is up to the application which uses them — just like any higher meaning of JSON is up to the application which uses it.

> They have the same problem with duplicated keys.

No, they don't — because they don't have keys.

> My point is that in order to match what JSON can do, you have to create rules for interpreting the S-expressions, and those rules are the hard part.

My point is that JSON doesn't — and can't — create all the necessary rules, and that trying to do so is a mistake, because applications do not have mutually-compatible interpretations of data. One application may treat JSON numbers as 64-bit integers, another as 32-bit floats. One application may need to hash object cryptographically, and thus specify an ordering for object properties; another may not care. Every useful application will need to do more than just parse JSON into the equivalent data structure in memory: it needs to validate it & then work with it, which almost certainly means converting that JSON-like data structure into an application-specific data structure.

The key, IMHO, is to punt on specifying all of that for everyone for all time and instead to let each application specify its protocol as necessary. The reason to use S-expressions for that is that they are structured and capable of representing anything.

Ultimately, we can do more by doing less. JSON is seductive, but it'll ultimately leave one disappointed. It does a lot, but not enough. S-expressions do enough to let you do the rest.

djur Apr 23, 2018

I hope you understand that those questions were rhetorical -- they're questions that do not need to be asked about the equivalent JSON representation. Questions developers don't have to ask each other about the data they're sending each other.

The canonical S-expression representation solves some of the problems JSON has, true, but the example you provided is not a canonical S-expression. It wouldn't make sense for it to have been, because canonical S-expressions are a binary format and not comparable in this context to JSON or XML.

Application developers voted with their feet for serialization formats with native representations of common data types (strings, numbers, lists, maps, booleans, null). There's a lot of reasons that JSON has supplanted XML, but one of them is that JSON has these types built in and XML does not. A lot of real-world data interchange and storage can make good use of those primitives. Many problems boil down to "how do I pass around a list of key/value records". There is a lot to say for not having to renegotiate that kind of basic detail every time two applications need to communicate.

You can represent S-expressions as JSON strings and arrays. I've done it. It was the best way to represent the data I was trying to store, but that's because the data was already represented as S-expressions. I've never seen anyone else do it, and that doesn't surprise me. For most purposes JSON is used for, it is more useful than S-expressions -- not necessarily more powerful, but more useful.

tonyg Apr 23, 2018

Interpreted as a Rivest S-expression, the example given above conforms to the "advanced transport representation" [1], and so can automatically and straightforwardly be converted to the "canonical representation" [2].

In an important sense, then, I'd claim that it is a "canonical S-expression".

The reason this works is because SPKI S-expressions aren't just a grammar for a syntax, they also come with [3] a total /equivalence relation/, which is exactly what JSON lacks and which is what makes JSON such a pain to work with.

In other words, SPKI S-expressions have a semantics. JSON doesn't.

Lots of other "modern" data languages also lack equivalence relations, making them similarly difficult to use at scale.

[ETA: Of course, your point about lacking common data types is a good one! My fantasy-land ideal data language would be something drawing from both SPKI S-expressions and BitTorrent's "bencoding", which includes integers and hashes as well as binary blobs and lists.]

---

[1] Section 6.3 of http://people.csail.mit.edu/rivest/Sexp.txt

[2] Section 6.1 of http://people.csail.mit.edu/rivest/Sexp.txt

[3] The SPKI S-expression definition is still a draft and suffers a few obvious problems - ASCII-centrism and the notion of a "default MIME type" being two major deficits. Still, I'd love to see the document revived, updated, and completed. Simply having an equivalence relation already lifts it head and shoulders above many competing data languages.

drawkbox OP Apr 23, 2018

s-expressions are better than binary for sure but also you end up having to maintain/write parsers for front-end/back-end and more. s-expressions influenced HTML/XML creation. Anything not JSON/XML you end up with formats that don't have massive support on the client and server side and take more work to serialize/deserialize to/from, same with YAML, other formats that have more typing and rules are not as simple and do add some complexity.

The big reason that JSON and even XML were so successful is that parsing from front-end to back-end and the direct use in javascript and APIs for instance is such a simple step, JSON being easier than XML but XML being easier than binary and other formats with more requirements/rules/complexity.

The basic types, ease of nesting, readability used in both JSON and even XML led to systems that serialize/deserialize to it also influenced the systems to be more simple.

CSV/XLS/binary/BER/DER/ASN.1 etc data exchanging all have more mines in the field than JSON and XML has more than JSON. JSON's killer feature is simplicity and forces you into more simple input/output.

Simplicity is always good when it comes to exchanging data.

tannhaeuser Apr 23, 2018

Just wanted to share that you can technically parse S-expressions using SGML.

[1]: https://web.archive.org/web/19991008044801/http://www.blnz.c...

This item has no comments currently.