Comment by eadmund - Hacker Neue

eadmund Apr 22, 2018 parent

> Before JSON, XML and standard binary formats, there were just CSV/TSV and random binary formats which was a bigger minefield.

S-expressions predate both, are simpler to parse than either, are more legible than both and are cheaper than either.

Here's a JSON example from http://json.org/example.html:

    {
        "glossary": {
            "title": "example glossary",
    		"GlossDiv": {
                "title": "S",
    			"GlossList": {
                    "GlossEntry": {
                        "ID": "SGML",
    					"SortAs": "SGML",
    					"GlossTerm": "Standard Generalized Markup Language",
    					"Acronym": "SGML",
    					"Abbrev": "ISO 8879:1986",
    					"GlossDef": {
                            "para": "A meta-markup language, used to create markup languages such as DocBook.",
    						"GlossSeeAlso": ["GML", "XML"]
                        },
    					"GlossSee": "markup"
                    }
                }
            }
        }
    }

In XML it'd be:

    <!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
     <glossary><title>example glossary</title>
      <GlossDiv><title>S</title>
       <GlossList>
        <GlossEntry ID="SGML" SortAs="SGML">
         <GlossTerm>Standard Generalized Markup Language</GlossTerm>
         <Acronym>SGML</Acronym>
         <Abbrev>ISO 8879:1986</Abbrev>
         <GlossDef>
          <para>A meta-markup language, used to create markup
    languages such as DocBook.</para>
          <GlossSeeAlso OtherTerm="GML">
          <GlossSeeAlso OtherTerm="XML">
         </GlossDef>
         <GlossSee OtherTerm="markup">
        </GlossEntry>
       </GlossList>
      </GlossDiv>
     </glossary>

And as an S-expression it'd be:

    (glossary (title "example glossary")
              (div
               (title S)
               (list
                (entry (id SGML)
                       (sort-as SGML)
                       (term "Standard Generalized Markup Language")
                       (acronym SGML)
                       (def (para "A meta-markup language, use to create markup languages such as DocBook.")
                            (see-also GML XML))
                       (see markup)))))

Which is, I believe, a huge improvement.

djur Apr 23, 2018

The S-expression has cleaner whitespace and field names than the JSON, which makes it harder to make an apples-to-apples comparison.

But the biggest problem with that S-expression is that I don't know how to parse it. Is SGML a symbol, identifier, a quoteless string? How do I know when parsing the 'entry' field that what follows is going to be a list of key/value pairs without parsing the whole expression? Is 'see-also GML XML' parsed as a list? How do we distinguish between single element lists and scalars? Is it possible to express a list at the top level, like JSON allows? How do you express a boolean, or null?

Of the problems outlined in the OP, S-expressions solve one: there's no question of how to parse trailing commas because there are no trailing commas. They do not solve questions of maximum levels of nesting. They have the same potential pitfalls with whitespace. They have exactly the same problems with parsing strings and numbers. They have the same problem with duplicated keys.

My point here isn't that you can't represent JSON as S-expressions. Clearly you can. My point is that in order to match what JSON can do, you have to create rules for interpreting the S-expressions, and those rules are the hard part. Those rules, in essence, _are_ JSON; once you've written the logic to serialize the various types supported by JSON to and from S-expressions, you've implemented "JSON with parentheses and without commas".

eadmund OP Apr 23, 2018

> Is SGML a symbol, identifier, a quoteless string?

It's a sequence of bytes — a string, if you like.

> How do I know when parsing the 'entry' field that what follows is going to be a list of key/value pairs without parsing the whole expression?

You wouldn't, and as a parser you wouldn't need to. The thing which accepts the parsed lists of byte-sequences would need to know what to do with whatever it's given, but that's the same issue as is faced by something which accepts JSON.

> Is 'see-also GML XML' parsed as a list?

(see-also GML XML) is a list.

> How do we distinguish between single element lists and scalars?

'(single-element-list)' is a single-element list; 'scalar' is a scalar. Just like '["single-element-list"]' & '"scalar"' in JSON.

> Is it possible to express a list at the top level, like JSON allows?

That whole expression is a list at top level.

> How do you express a boolean, or null?

The same way that you represent a movie, a post or an integer: by applying some sort of meaning to a sequence of bytes.

> They do not solve questions of maximum levels of nesting.

They don't solve the problem of finite resources, no. It'll always be possible for someone to send one more data than one can possibly process.

> They have the same potential pitfalls with whitespace.

No, they don't, because Ron Rivest's canonical S-expression spec indicates exactly what is & is not whitespace.

> They have exactly the same problems with parsing strings and numbers.

No they don't, because they don't really have either strings or numbers: they have lists an byte-sequences. Anything else is up to the application which uses them — just like any higher meaning of JSON is up to the application which uses it.

> They have the same problem with duplicated keys.

No, they don't — because they don't have keys.

> My point is that in order to match what JSON can do, you have to create rules for interpreting the S-expressions, and those rules are the hard part.

My point is that JSON doesn't — and can't — create all the necessary rules, and that trying to do so is a mistake, because applications do not have mutually-compatible interpretations of data. One application may treat JSON numbers as 64-bit integers, another as 32-bit floats. One application may need to hash object cryptographically, and thus specify an ordering for object properties; another may not care. Every useful application will need to do more than just parse JSON into the equivalent data structure in memory: it needs to validate it & then work with it, which almost certainly means converting that JSON-like data structure into an application-specific data structure.

The key, IMHO, is to punt on specifying all of that for everyone for all time and instead to let each application specify its protocol as necessary. The reason to use S-expressions for that is that they are structured and capable of representing anything.

Ultimately, we can do more by doing less. JSON is seductive, but it'll ultimately leave one disappointed. It does a lot, but not enough. S-expressions do enough to let you do the rest.

djur Apr 23, 2018

I hope you understand that those questions were rhetorical -- they're questions that do not need to be asked about the equivalent JSON representation. Questions developers don't have to ask each other about the data they're sending each other.

The canonical S-expression representation solves some of the problems JSON has, true, but the example you provided is not a canonical S-expression. It wouldn't make sense for it to have been, because canonical S-expressions are a binary format and not comparable in this context to JSON or XML.

Application developers voted with their feet for serialization formats with native representations of common data types (strings, numbers, lists, maps, booleans, null). There's a lot of reasons that JSON has supplanted XML, but one of them is that JSON has these types built in and XML does not. A lot of real-world data interchange and storage can make good use of those primitives. Many problems boil down to "how do I pass around a list of key/value records". There is a lot to say for not having to renegotiate that kind of basic detail every time two applications need to communicate.

You can represent S-expressions as JSON strings and arrays. I've done it. It was the best way to represent the data I was trying to store, but that's because the data was already represented as S-expressions. I've never seen anyone else do it, and that doesn't surprise me. For most purposes JSON is used for, it is more useful than S-expressions -- not necessarily more powerful, but more useful.

tonyg Apr 23, 2018

Interpreted as a Rivest S-expression, the example given above conforms to the "advanced transport representation" [1], and so can automatically and straightforwardly be converted to the "canonical representation" [2].

In an important sense, then, I'd claim that it is a "canonical S-expression".

The reason this works is because SPKI S-expressions aren't just a grammar for a syntax, they also come with [3] a total /equivalence relation/, which is exactly what JSON lacks and which is what makes JSON such a pain to work with.

In other words, SPKI S-expressions have a semantics. JSON doesn't.

Lots of other "modern" data languages also lack equivalence relations, making them similarly difficult to use at scale.

[ETA: Of course, your point about lacking common data types is a good one! My fantasy-land ideal data language would be something drawing from both SPKI S-expressions and BitTorrent's "bencoding", which includes integers and hashes as well as binary blobs and lists.]

---

[1] Section 6.3 of http://people.csail.mit.edu/rivest/Sexp.txt

[2] Section 6.1 of http://people.csail.mit.edu/rivest/Sexp.txt

[3] The SPKI S-expression definition is still a draft and suffers a few obvious problems - ASCII-centrism and the notion of a "default MIME type" being two major deficits. Still, I'd love to see the document revived, updated, and completed. Simply having an equivalence relation already lifts it head and shoulders above many competing data languages.

drawkbox Apr 23, 2018

s-expressions are better than binary for sure but also you end up having to maintain/write parsers for front-end/back-end and more. s-expressions influenced HTML/XML creation. Anything not JSON/XML you end up with formats that don't have massive support on the client and server side and take more work to serialize/deserialize to/from, same with YAML, other formats that have more typing and rules are not as simple and do add some complexity.

The big reason that JSON and even XML were so successful is that parsing from front-end to back-end and the direct use in javascript and APIs for instance is such a simple step, JSON being easier than XML but XML being easier than binary and other formats with more requirements/rules/complexity.

The basic types, ease of nesting, readability used in both JSON and even XML led to systems that serialize/deserialize to it also influenced the systems to be more simple.

CSV/XLS/binary/BER/DER/ASN.1 etc data exchanging all have more mines in the field than JSON and XML has more than JSON. JSON's killer feature is simplicity and forces you into more simple input/output.

Simplicity is always good when it comes to exchanging data.

tannhaeuser Apr 23, 2018

Just wanted to share that you can technically parse S-expressions using SGML.

[1]: https://web.archive.org/web/19991008044801/http://www.blnz.c...

This item has no comments currently.