The main advantage of ASN.1 (specifically DER) in an HTTPS/PKI context is that it's a canonical encoding. To my understanding Protobuf isn't; I don't know about Thrift.
(A lot of hay is made about ASN.1 being bad, but it's really BER and other non-DER encodings of ASN.1 that make things painful. If you only read and write DER and limit yourself to the set of rules that occur in e.g. the Internet PKI RFCs, it's a relatively tractable and normal looking serialization format.)
You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.
rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
> which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.
Sort of -- DER gets a bad rap for two reasons:
1. OpenSSL had (has?) an exceptionally bad and permissive implementation of a DER parser/serializer.
2. Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER. This has caused an absolutely obscene amount of pain in PKI standards, which is why just about every modern PKI standard that uses ASN.1 bends over backwards to emphasize that all encodings must be DER and not BER.
(2) in particular is pernicious: the public Web PKI has successfully extirpated BER, but it still skulks around in private PKIs and more neglected corners of the Internet (like RFC 3161 TSAs) because of a long tail of OpenSSL (and other misbehaving implementation) usage.
Overall, DER itself is a mostly normal looking TLV encoding; it's not meaningfully more complicated than Protobuf or any other serialization form. The problem is that it gets mashed together with BER, and it has a legacy of buggy implementations. The latter is IMO more of a byproduct of ASN.1's era -- if Protobuf were invented in 1984, I imagine we'd see the same long tail of buggy parsers regardless of the quality of the design itself.
If the schema uses IMPLICIT tags then - unless I'm missing something - this isn't (easily) possible.
The most you'd be able to tell is whether the TLV contains a primitive or constructed value.
This is a pretty good resource on custom tagging, and goes over how IMPLICIT works: https://www.oss.com/asn1/resources/asn1-made-simple/asn1-qui...
> Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER
:sweat: That might explain why some of the root certs on my machine appear to be BER encoded (barring decoder bugs, which is honestly more likely).
> rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
Almost. The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.
Could you explain what you mean? The tag does indeed encode this: for an integer you'd see `INTEGER`, for a string you're see `UTF8String` or similar, for an array you'd see `SEQUENCE OF`, etc.
You can verify this for yourself by using a schemaless decoder like Google's der-ascii[1]. For example, here's a decoded certificate[2] -- you get fields and types, you just don't get the semantics (e.g. "this number is a public key") associated with them because there's no schema.
[1]: https://github.com/google/der-ascii
[2]: https://github.com/google/der-ascii/blob/main/samples/cert.t...
PER lacks type information, making encoding much more efficient as long as both sides of the connection have access to the schema.
In my experience it does tell you the type, but it depends on the schema. If implicit types are used, then it won't tell you the type of the data, but if you use explicit, or if it is neither implicit nor explicit, then it does tell you the type of the data. (However, if the data type is a sequence, then you might not lose much by using an implicit type; the DER format still tells you that it is constructed rather than primitive.)
You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString. I'll note that three of those types have "Universal" / "General" in their name, and several more imply it.
How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME? Don't be fooled, all those types describe both a date _and_ time, if you just want a time then that's TIME-OF-DAY.
It's understandable how a standard with teletex roots got to this point but doesn't lead to good implementations when there is that much surface area to cover.
> You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString.
They could be grouped into three groups: ASCII-based (IA5String, VisibleString, PrintableString, NumericString), Unicode-based (UTF8String, BMPString, UniversalString), and ISO-2022-based (TeletexString, VideotexString, GraphicString, GeneralString). (CHARACTER STRING allows arbitrary character sets and encodings, and does not fit into any of these groups. You are unlikely to need it, but it is there in case you do need it.)
IA5String is the most general ASCII-based type, and GeneralString is the most general ISO-2022-based type. For decoding, you can treat the other ASCII-based types as IA5String if you do not need to validate them, and you can treat GraphicString like GeneralString (for TeletexString and VideotexString, the initial state is different, so you will have to consider that). For the Unicode-based types, BMPString is UTF-16BE (although normally only BMP characters are allowed) and UniversalString is UTF-32BE.
When making your own formats, you might just use the most general ones and specify your own constraints, although you might prefer to use the more restrictive types if they are known to be suitable; I usually do (for example, PrintableString is suitable for domain names (as well as ICAO airport codes, etc) and VisibleString is suitable for URLs (as well as many other things)).
> How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME?
UTCTime probably should not be used for newer formats, since it is not Y2K compliant (although it may be necessary when dealing with older formats that use it, such as X.509); GeneralizedTime is better.
In all of these cases, you only need to implement the types you are using in your program, not necessarily all of them.
(If needed, you can also use the "ASN.1X" that I made up which adds some additional nonstandard types, such as: BCD string, TRON string, key/value list, etc. Again, you will only need to implement the types that you are actually using in your program, which is probably not all of them.)
Implementing GeneralString in all its horror is a real pain, but also you'll never ever need it.
This generality in ASN.1 is largely due to it being created before Unicode.
That's not really the problem. The problem is that DER is a tag-length-value encoding, which is quite redundant and inefficient and a total crutch that people who didn't see XDR first could not imagine not needing, but yeah, they really didn't need it. That crutch made it harder, not easier, to implement ASN.1/DER.
XML is no picnic either, by the way. JSON is much much simpler, and it's true you don't need a schema, but you end up wanting one anyways.
You can parse DER without using a schema (except for implicit types, although even then you can always parse the framing even if the value cannot necessarily be parsed; for this reason I only use implicit types for sequences and octet strings (and only if an implicit type is needed, which it often isn't), in my own formats). (The meaning of many fields will not be known without the schema, but that is also true of XML and JSON.)
I wrote a DER parser without handling the schema at all.
However, to have a sane interface for actually working with the data you do need a schema that can be compiled to a language specific notation.
There should be no need for a canonical encoding. 40 years ago people thought you needed that so you could re-encode a TBSCertificate and then validate a signature, but in reality you should keep the encoding as-received of that part of the Certificate. And so on.
1] the somewhat ironic part is that when it was discovered that using just passwords for authentication is not enough, the so called "lighweight" LDAP got arguably more complex that X.500. Same thing happened to SNMP (another IETF protocol using ASN.1) being "Simple" for similar reasons.
Yes, JOSE is still infinitely better than XmlSignatures and the canonical XML madness to allow signatures _inside_ the document to be signed.
- huge braking change with the whole cert infrastructure
- this question was asked to the people who did choose ASN.1 for X509 and AFIK they saied today they would use protobuf. But I don't remember where I have that from.
- JOSE/JWT etc. aren't exactly that well regarded in the crypto community AFIK or designed with modern insights about how to best do such things (too much header malleability, too much crypto flexibility, too little deterministic encoding of JSON, too much imprecise defined corner cases related to JSON, too much encoding overhead for keys and similar (which for some pq stuff can get in the 100KiB ranges), and the argument of it being readable with a text editor falls apart if anything you care about is binary (keys, etc.) and often encrypted (producing binary)). (And IMHO opinion the plain text argument also falls apart for most non-crypto stuff I mean if you anyway add a base64 encoding you already dev need tooling to read it, and weather your debug tooling does a base64 decode or a (maybe additional) data decode step isn't really relevant, same for viewing in IDE which can handle binary formats just fine etc. but thats an off topic discussion)
- if we look at some modern protocols designed by security specialists/cryptographers and have been standardized we often find other stuff (e.g. protobuf for some JWT alternatives or CBOR for HSK/AuthN related stuff).
That is true, but it's also true that JWT/JOSE is a market winner and "everywhere" today. Obviously, it's not a great one and not without flaws, and its "competition" is things like SAML which even more people hate, so it had a low bar to clear when it was first introduced.
> CBOR
CBOR is a good mention. I have met at least one person hoping a switch to CWT/COSE happens to help somewhat combat JWT bloat in the wild. With WebAuthN requiring CBOR, there's more of a chance to get an official browser CBOR API in JS. If browsers had an out-of-the-box CBOR.parse() and CBOR.stringify(), that would be interesting for a bunch of reasons (including maybe even making CWT more likely).
One of the fun things about CBOR though is that is shares the JSON data model and is intended to be a sibling encoding, so I'd also maybe argue that if CBOR ultimately wins that's still somewhat indirectly a "JSON win".
CBOR is self-describing like JSON/XML meaning you don’t need a schema to parse it. It has better set of specific types for integers and binary data unlike JSON. It has an IANA database of tags and a canonical serialization form unlike MsgPack.
With the top level encoding solved, we could then go back to arguing about all the specific lower level encodings such as compressed vs uncompressed curve points, etc.
- ASN.1 is a set of a docent different binary encodings
- ASN.1's schema languages is IMHO way better designed then Protobuf but also more complex as it has more features
- ASN.1 can encode much more different data layouts (e.g. things where in Protobuf you have to use "tricks") each being layout in the output differently depending on the specific encoding format, annotations on the schema and options during serialization
- ASN.1 has many ways to represent things more "compact" which all come with their own complexity (like bit mask encoded boolean maps)
overall the problem of ASN.1 is that it's absurdly over engineered leading to you needing to now many hundred of pages of across multiple standard documents to just implement one single encoding of the docent existing ones and even then you might run into ambiguous unclear definitions where you have to ask on the internet for clarification
if we ignore the schema languages for a moment most senior devs probably can write a crappy protobuf implementation over the weekend, but for ASN.1 you might not even be able to digest all relevant standards in that time :/
Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???
ASN.1 was not over-engineered in 1990. The things that kept it from ruling the world are:
- the ITU-T specs for it were _not_ free back then
- the syntax is context dependent, so using a LALR(1) parser generator to parse ASN.1 is difficult, though not really any more than it is to parse C with a LALR(1) parser generator, but yeah if it had had a LALR(1)-friendly syntax then ASN.1 would have been much easier to write tooling for
- competition from XDR, DCE/MS RPC, XML, JSON, Protocol Buffers, Flat Buffers, etc.
The over-engineering came later, as many lessons were learned from ASN.1's early years. Lessons that the rest of the pack mostly have not learned.
ASN.1 has many encoding standards, but you don't need to implement them all, only the specific one for your needs.
ASN.1 has a standard and an easy to follow spec, which Protobuf doesn't.
In sum: I could cobble together a working ASN.1 implementation over a weekend. In contrast, getting to a clean-room working Protobuf library is a month's work.
Caveat: I have not had to deal with PKI stuff. My experience with ASN.1 is from LDAP, one of the easiest protocols to implement ever, IMO.
> to follow spec, which Protobuf doesn't.
I can't say so personally, but from what I heard from the coworker I helped the spec isn't always easy to follow as there are many edge cases where you can "guess" what they probably mean but they aren't precise enough. Through they had a surprising amount of success to get clarification from authoritative sources (I think some author or maintainer of the standard but I'm not fully sure, it was a few years ago).
In general there seem to be a huge gap between writing something which works for some specific usage(s) of ASN.1 and something which works "in general/all the standard" (for the relevant encodings (mainly DER, but also at least part of non-DER BER as far as I remember)).
> Protobuf doesn't.
yes but it's wire format is relatively simple and documented (not as a specification but documented anyway) so getting something going which can (de-)serialize the wire format really isn't that hard and the mapping from that to actual data types is also simpler (through also won't work for arbitrary types due to dump design limitations). I would be surprised if you need a month of work to get something going there. Through if you want to reproduce all the tooling eco-system or special ways some libraries can interact with it (e.g. in place edits etc.) it's a different project altogether. What I mean is just a (de-)serializer for the wire format with appropriate mapping to data types (objects,structs, whatever the language you use prefers).
But for Protobuf you kinda have to. Needing to parse the .proto files and comform to Google's code gen ideas is implied.
For ASN you just need to follow a spec, not a concrete implementation.
Yes, but you can use a subset of ASN.1. You don't have to implement all of x.680, let alone all of x.681, x.682, and x.683.
I had to look up https://www.merriam-webster.com/dictionary/docent
If it were created today it would look a lot like OAuth JSON Web Tokens (JWT) and would use JSON instead of ASN.1/DER.
no, not at all
they share some ideas, that doesn't make it "pretty much ASN.1". Its only "pretty much the same" if you argue all schema based general purpose binary encoding formats are "pretty much the same".
ASN.1 also isn't "file" specific at all it's main use case is and always has been being used as message exchange protocols.
(Strictly speaking ASN.1 is also not a single binary serialization format but 1. one schema language, 2. some rules for mapping things to some intermediate concepts, 3. a _docent_ different ways how to "exactly" serialize things. And in the 3rd point the difference can be pretty huge, from having something you can partially read even without schema (like protobuff) to more compact representations you can't read without a schema at all.)
At the implementation level they are different, but when integrating these protocols into applications, yeah, pretty much. Schema + data goes in, encoded data comes out, or the other way around. In the same way YAML and XML are pretty much the same, just different expressions of the same concepts. ASN.1 even comes with multiple expressions of exactly the same grammar, both in text form and binary form.
ASN.1 was one of the early standardised protocols in this space, though, and suffers from being used mostlyin obscure or legacy protocols, often with proprietary libraries if you go beyond the PKI side of things.
ASN.1 isn't file specific, it was designed for use in telecoms after all, but encodings like DER work better inside of file formats than Protobuf and many protocols like it. Actually having a formal standard makes including it in file types a lot easier.