Comment by Keyframe - Hacker Neue

Keyframe Oct 23, 2025 parent

I salute your for deep dive into this. History would have it that ASN.1 was already there as both an IDL and serialization format when HTTPS certs were defined. If it were today, would it be the same or would we end up with protobuf or thrift or similar?

woodruffw Oct 23, 2025

> If it were today, would it be the same or would we end up with protobuf or thrift or similar?

The main advantage of ASN.1 (specifically DER) in an HTTPS/PKI context is that it's a canonical encoding. To my understanding Protobuf isn't; I don't know about Thrift.

(A lot of hay is made about ASN.1 being bad, but it's really BER and other non-DER encodings of ASN.1 that make things painful. If you only read and write DER and limit yourself to the set of rules that occur in e.g. the Internet PKI RFCs, it's a relatively tractable and normal looking serialization format.)

jcranmer Oct 23, 2025

I'm hardly a connoisseur of DER implementations, but my understanding is that there are two main problems with DER. The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON. This means your generic DER parser needs to have an ASN.1 schema passed into it to parse the DER, and this leads to the second problem, which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.

woodruffw Oct 23, 2025

> The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON.

You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.

rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.

> which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.

Sort of -- DER gets a bad rap for two reasons:

1. OpenSSL had (has?) an exceptionally bad and permissive implementation of a DER parser/serializer.

2. Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER. This has caused an absolutely obscene amount of pain in PKI standards, which is why just about every modern PKI standard that uses ASN.1 bends over backwards to emphasize that all encodings must be DER and not BER.

(2) in particular is pernicious: the public Web PKI has successfully extirpated BER, but it still skulks around in private PKIs and more neglected corners of the Internet (like RFC 3161 TSAs) because of a long tail of OpenSSL (and other misbehaving implementation) usage.

Overall, DER itself is a mostly normal looking TLV encoding; it's not meaningfully more complicated than Protobuf or any other serialization form. The problem is that it gets mashed together with BER, and it has a legacy of buggy implementations. The latter is IMO more of a byproduct of ASN.1's era -- if Protobuf were invented in 1984, I imagine we'd see the same long tail of buggy parsers regardless of the quality of the design itself.

BradleyChatha Oct 23, 2025

> You can parse DER perfectly well without a schema, it's a self-describing format.

If the schema uses IMPLICIT tags then - unless I'm missing something - this isn't (easily) possible.

The most you'd be able to tell is whether the TLV contains a primitive or constructed value.

This is a pretty good resource on custom tagging, and goes over how IMPLICIT works: https://www.oss.com/asn1/resources/asn1-made-simple/asn1-qui...

> Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER

:sweat: That might explain why some of the root certs on my machine appear to be BER encoded (barring decoder bugs, which is honestly more likely).

woodruffw Oct 23, 2025

Ah yeah, IMPLICIT is the main edge case. That's a good point.

3 More Comments →

cryptonector Oct 23, 2025

Is it really because of OpenSSL? Anyways, I don't see much of this in the wild.

jeroenhd Oct 23, 2025

You can parse DER, but you have no idea what you've just parsed without the schema. In a software library, that's often not very useful, but at least you can verify that the message was loaded correctly, and if you're reverse engineering a proprietary protocol you can at least figure out the parts you need without having to understand the entire thing.

woodruffw Oct 23, 2025

Yes, it's like JSON in that regard. But the key part is that the framing of DER doesn't require a schema; that isn't true for all encoding formats (notably protobuf, where types have overlapping encodings that need to be disambiguated through the schema).

4 More Comments →

jcranmer Oct 23, 2025

> You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.

> rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.

Almost. The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.

woodruffw Oct 23, 2025

> The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.

Could you explain what you mean? The tag does indeed encode this: for an integer you'd see `INTEGER`, for a string you're see `UTF8String` or similar, for an array you'd see `SEQUENCE OF`, etc.

You can verify this for yourself by using a schemaless decoder like Google's der-ascii[1]. For example, here's a decoded certificate[2] -- you get fields and types, you just don't get the semantics (e.g. "this number is a public key") associated with them because there's no schema.

[1]: https://github.com/google/der-ascii

[2]: https://github.com/google/der-ascii/blob/main/samples/cert.t...

3 More Comments →

jeroenhd Oct 23, 2025

DER is TLV. You don't know the specifics ("this integer is a value between 10 and 53") that the schema contains, but you know it's an integer when you read it.

PER lacks type information, making encoding much more efficient as long as both sides of the connection have access to the schema.

zzo38computer Oct 24, 2025

> The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least)

In my experience it does tell you the type, but it depends on the schema. If implicit types are used, then it won't tell you the type of the data, but if you use explicit, or if it is neither implicit nor explicit, then it does tell you the type of the data. (However, if the data type is a sequence, then you might not lose much by using an implicit type; the DER format still tells you that it is constructed rather than primitive.)

syncsynchalt Oct 23, 2025

One of my big problems with ASN.1 (and its encodings) is how _crusty_ it is.

You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString. I'll note that three of those types have "Universal" / "General" in their name, and several more imply it.

How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME? Don't be fooled, all those types describe both a date _and_ time, if you just want a time then that's TIME-OF-DAY.

It's understandable how a standard with teletex roots got to this point but doesn't lead to good implementations when there is that much surface area to cover.

zzo38computer Oct 24, 2025

I think that it is useful to have different types for different purposes.

> You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString.

They could be grouped into three groups: ASCII-based (IA5String, VisibleString, PrintableString, NumericString), Unicode-based (UTF8String, BMPString, UniversalString), and ISO-2022-based (TeletexString, VideotexString, GraphicString, GeneralString). (CHARACTER STRING allows arbitrary character sets and encodings, and does not fit into any of these groups. You are unlikely to need it, but it is there in case you do need it.)

IA5String is the most general ASCII-based type, and GeneralString is the most general ISO-2022-based type. For decoding, you can treat the other ASCII-based types as IA5String if you do not need to validate them, and you can treat GraphicString like GeneralString (for TeletexString and VideotexString, the initial state is different, so you will have to consider that). For the Unicode-based types, BMPString is UTF-16BE (although normally only BMP characters are allowed) and UniversalString is UTF-32BE.

When making your own formats, you might just use the most general ones and specify your own constraints, although you might prefer to use the more restrictive types if they are known to be suitable; I usually do (for example, PrintableString is suitable for domain names (as well as ICAO airport codes, etc) and VisibleString is suitable for URLs (as well as many other things)).

> How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME?

UTCTime probably should not be used for newer formats, since it is not Y2K compliant (although it may be necessary when dealing with older formats that use it, such as X.509); GeneralizedTime is better.

In all of these cases, you only need to implement the types you are using in your program, not necessarily all of them.

(If needed, you can also use the "ASN.1X" that I made up which adds some additional nonstandard types, such as: BCD string, TRON string, key/value list, etc. Again, you will only need to implement the types that you are actually using in your program, which is probably not all of them.)

cryptonector Oct 23, 2025

Eh, for all new things use only UTF8String and you're done. For all old things limit yourself to US-ASCII in whatever kind of string and you're done.

Implementing GeneralString in all its horror is a real pain, but also you'll never ever need it.

This generality in ASN.1 is largely due to it being created before Unicode.

cryptonector Oct 23, 2025

> I'm hardly a connoisseur of DER implementations, but my understanding is that there are two main problems with DER. The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON.

That's not really the problem. The problem is that DER is a tag-length-value encoding, which is quite redundant and inefficient and a total crutch that people who didn't see XDR first could not imagine not needing, but yeah, they really didn't need it. That crutch made it harder, not easier, to implement ASN.1/DER.

XML is no picnic either, by the way. JSON is much much simpler, and it's true you don't need a schema, but you end up wanting one anyways.

zzo38computer Oct 24, 2025

> The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON.

You can parse DER without using a schema (except for implicit types, although even then you can always parse the framing even if the value cannot necessarily be parsed; for this reason I only use implicit types for sequences and octet strings (and only if an implicit type is needed, which it often isn't), in my own formats). (The meaning of many fields will not be known without the schema, but that is also true of XML and JSON.)

I wrote a DER parser without handling the schema at all.

whizzter Oct 23, 2025

I wrote an Asn.1 decoder and since it contains type/size info you can often read a subset and handle the rest as opaque data objects if you need round-tripping, this is required as there can be plenty of data that is unknown to older consumers (like the ETSI EIDAS/Pades personal information extensions in PDF signatures).

However, to have a sane interface for actually working with the data you do need a schema that can be compiled to a language specific notation.

cryptonector Oct 23, 2025

> The main advantage of ASN.1 (specifically DER) in an HTTPS/PKI context is that it's a canonical encoding. To my understanding Protobuf isn't; I don't know about Thrift.

There should be no need for a canonical encoding. 40 years ago people thought you needed that so you could re-encode a TBSCertificate and then validate a signature, but in reality you should keep the encoding as-received of that part of the Certificate. And so on.

dfox Oct 23, 2025

It is not only that ASN.1 was there before SSL, but even the certificate format was there before SSL. The certificate format comes from X.500, which is the "DAP" part of "LDAP", L as in "Lightweight" in "LDAP" refers mostly to LDAP not using public key certificates for client authentication in contrast to X.500 [1]. Bunch of other related stuff comes from RSA's PKCS series specifications, which also mostly use ASN.1.

1] the somewhat ironic part is that when it was discovered that using just passwords for authentication is not enough, the so called "lighweight" LDAP got arguably more complex that X.500. Same thing happened to SNMP (another IETF protocol using ASN.1) being "Simple" for similar reasons.

cryptonector Oct 23, 2025

x.400 and x.500 are the real horrors lurking in PKI/PKIX. Absolute horrors.

WorldMaker Oct 23, 2025

If it were designed today, I would imagine it could end up looking like JWT (JOSE) and use JSON. I've seen several key exchange formats in JSON beyond JWT/JOSE in the wild today as well, so we may even get there eventually in a future upgrade of TLS.

whizzter Oct 23, 2025

Yes and no, the JSON handling of things like binary data (hashes) and big-ints leaves a bit to be desired (sure we can use base64 encoding). Asn.1 isn't great by any extent but for this JSON really isn't much better apart from better library support.

Yes, JOSE is still infinitely better than XmlSignatures and the canonical XML madness to allow signatures _inside_ the document to be signed.

cryptonector Oct 23, 2025

Or use COSE, which uses CBOR, which doesn't have to base64-encode all the binary blobs.

dathinab Oct 23, 2025

possible but unlikely for multiple reasons

- huge braking change with the whole cert infrastructure

- this question was asked to the people who did choose ASN.1 for X509 and AFIK they saied today they would use protobuf. But I don't remember where I have that from.

- JOSE/JWT etc. aren't exactly that well regarded in the crypto community AFIK or designed with modern insights about how to best do such things (too much header malleability, too much crypto flexibility, too little deterministic encoding of JSON, too much imprecise defined corner cases related to JSON, too much encoding overhead for keys and similar (which for some pq stuff can get in the 100KiB ranges), and the argument of it being readable with a text editor falls apart if anything you care about is binary (keys, etc.) and often encrypted (producing binary)). (And IMHO opinion the plain text argument also falls apart for most non-crypto stuff I mean if you anyway add a base64 encoding you already dev need tooling to read it, and weather your debug tooling does a base64 decode or a (maybe additional) data decode step isn't really relevant, same for viewing in IDE which can handle binary formats just fine etc. but thats an off topic discussion)

- if we look at some modern protocols designed by security specialists/cryptographers and have been standardized we often find other stuff (e.g. protobuf for some JWT alternatives or CBOR for HSK/AuthN related stuff).

WorldMaker Oct 23, 2025

> JOSE/JWT etc. aren't exactly that well regarded in the crypto community

That is true, but it's also true that JWT/JOSE is a market winner and "everywhere" today. Obviously, it's not a great one and not without flaws, and its "competition" is things like SAML which even more people hate, so it had a low bar to clear when it was first introduced.

> CBOR

CBOR is a good mention. I have met at least one person hoping a switch to CWT/COSE happens to help somewhat combat JWT bloat in the wild. With WebAuthN requiring CBOR, there's more of a chance to get an official browser CBOR API in JS. If browsers had an out-of-the-box CBOR.parse() and CBOR.stringify(), that would be interesting for a bunch of reasons (including maybe even making CWT more likely).

One of the fun things about CBOR though is that is shares the JSON data model and is intended to be a sibling encoding, so I'd also maybe argue that if CBOR ultimately wins that's still somewhat indirectly a "JSON win".

bccdee Oct 24, 2025

I thought protobuf didn't expose any canonical binary encoding. Who's been using protobuf as a cryptographic protocol primitive?

elcritch Oct 23, 2025

The IETF has made a bunch of standards lately like COSE for doing certificates and encryption stuff with CBOR. It’s largely for embedded stuff, but I could see it being a modern alternative. I haven’t used it myself yet.

CBOR is self-describing like JSON/XML meaning you don’t need a schema to parse it. It has better set of specific types for integers and binary data unlike JSON. It has an IANA database of tags and a canonical serialization form unlike MsgPack.

thadt Oct 23, 2025

No, we would use something similar to S-Expressions [1]. Parsing and generation would be at most a few hundred lines of code in almost any language, easily testable, and relatively extensible.

With the top level encoding solved, we could then go back to arguing about all the specific lower level encodings such as compressed vs uncompressed curve points, etc.

[1] https://datatracker.ietf.org/doc/rfc9804

otabdeveloper4 Oct 23, 2025

ASN.1 seems orders of magnitude simpler than Protobuf or Thrift.

dathinab Oct 23, 2025

how did you end up believing that?

- ASN.1 is a set of a docent different binary encodings

- ASN.1's schema languages is IMHO way better designed then Protobuf but also more complex as it has more features

- ASN.1 can encode much more different data layouts (e.g. things where in Protobuf you have to use "tricks") each being layout in the output differently depending on the specific encoding format, annotations on the schema and options during serialization

- ASN.1 has many ways to represent things more "compact" which all come with their own complexity (like bit mask encoded boolean maps)

overall the problem of ASN.1 is that it's absurdly over engineered leading to you needing to now many hundred of pages of across multiple standard documents to just implement one single encoding of the docent existing ones and even then you might run into ambiguous unclear definitions where you have to ask on the internet for clarification

if we ignore the schema languages for a moment most senior devs probably can write a crappy protobuf implementation over the weekend, but for ASN.1 you might not even be able to digest all relevant standards in that time :/

Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???

cryptonector Oct 23, 2025

> Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???

ASN.1 was not over-engineered in 1990. The things that kept it from ruling the world are:

- the ITU-T specs for it were _not_ free back then

- the syntax is context dependent, so using a LALR(1) parser generator to parse ASN.1 is difficult, though not really any more than it is to parse C with a LALR(1) parser generator, but yeah if it had had a LALR(1)-friendly syntax then ASN.1 would have been much easier to write tooling for

- competition from XDR, DCE/MS RPC, XML, JSON, Protocol Buffers, Flat Buffers, etc.

The over-engineering came later, as many lessons were learned from ASN.1's early years. Lessons that the rest of the pack mostly have not learned.

otabdeveloper4 Oct 23, 2025

ASN.1 doesn't have a schema language. It has a schema spec, how you encode the schema is up to you. This is a huge boon.

ASN.1 has many encoding standards, but you don't need to implement them all, only the specific one for your needs.

ASN.1 has a standard and an easy to follow spec, which Protobuf doesn't.

In sum: I could cobble together a working ASN.1 implementation over a weekend. In contrast, getting to a clean-room working Protobuf library is a month's work.

Caveat: I have not had to deal with PKI stuff. My experience with ASN.1 is from LDAP, one of the easiest protocols to implement ever, IMO.

dathinab Oct 24, 2025

my experience is from helping a coworker writing a ASN.1 serialization/deserialization library limited to a subset of encodings but (close to) the full spec each encoding ;)

> to follow spec, which Protobuf doesn't.

I can't say so personally, but from what I heard from the coworker I helped the spec isn't always easy to follow as there are many edge cases where you can "guess" what they probably mean but they aren't precise enough. Through they had a surprising amount of success to get clarification from authoritative sources (I think some author or maintainer of the standard but I'm not fully sure, it was a few years ago).

In general there seem to be a huge gap between writing something which works for some specific usage(s) of ASN.1 and something which works "in general/all the standard" (for the relevant encodings (mainly DER, but also at least part of non-DER BER as far as I remember)).

> Protobuf doesn't.

yes but it's wire format is relatively simple and documented (not as a specification but documented anyway) so getting something going which can (de-)serialize the wire format really isn't that hard and the mapping from that to actual data types is also simpler (through also won't work for arbitrary types due to dump design limitations). I would be surprised if you need a month of work to get something going there. Through if you want to reproduce all the tooling eco-system or special ways some libraries can interact with it (e.g. in place edits etc.) it's a different project altogether. What I mean is just a (de-)serializer for the wire format with appropriate mapping to data types (objects,structs, whatever the language you use prefers).

otabdeveloper4 Oct 25, 2025

> Through if you want to reproduce all the tooling eco-system

But for Protobuf you kinda have to. Needing to parse the .proto files and comform to Google's code gen ideas is implied.

For ASN you just need to follow a spec, not a concrete implementation.

cryptonector Oct 23, 2025

> ASN.1's schema languages is IMHO way better designed then Protobuf but also more complex as it has more features

Yes, but you can use a subset of ASN.1. You don't have to implement all of x.680, let alone all of x.681, x.682, and x.683.

anentropic Oct 23, 2025

for "docent", do you mean "dozen"?

I had to look up https://www.merriam-webster.com/dictionary/docent

ohdeardear Oct 23, 2025 (dead)

cryptonector Oct 23, 2025

ASN.1 goes back to 1984. PKI goes back to 1988.

If it were created today it would look a lot like OAuth JSON Web Tokens (JWT) and would use JSON instead of ASN.1/DER.

jeroenhd Oct 23, 2025

Protobuf is pretty much ASN.1 with better tooling, optimized for message exchange protocol rather than files, when it comes down to the details. Withouth ASN.1 and the lessons learned from it, another binary serialization protocol would've probably taken its place, and I bet Protobuf and similar tools would look and perhaps work quite differently. The same way JSON would look and act quite differently if XML had never been invented.

dathinab Oct 23, 2025

> Protobuf is pretty much ASN.1

no, not at all

they share some ideas, that doesn't make it "pretty much ASN.1". Its only "pretty much the same" if you argue all schema based general purpose binary encoding formats are "pretty much the same".

ASN.1 also isn't "file" specific at all it's main use case is and always has been being used as message exchange protocols.

(Strictly speaking ASN.1 is also not a single binary serialization format but 1. one schema language, 2. some rules for mapping things to some intermediate concepts, 3. a _docent_ different ways how to "exactly" serialize things. And in the 3rd point the difference can be pretty huge, from having something you can partially read even without schema (like protobuff) to more compact representations you can't read without a schema at all.)

jeroenhd Oct 23, 2025

> if you argue all schema based general purpose binary encoding formats are "pretty much the same"

At the implementation level they are different, but when integrating these protocols into applications, yeah, pretty much. Schema + data goes in, encoded data comes out, or the other way around. In the same way YAML and XML are pretty much the same, just different expressions of the same concepts. ASN.1 even comes with multiple expressions of exactly the same grammar, both in text form and binary form.

ASN.1 was one of the early standardised protocols in this space, though, and suffers from being used mostlyin obscure or legacy protocols, often with proprietary libraries if you go beyond the PKI side of things.

ASN.1 isn't file specific, it was designed for use in telecoms after all, but encodings like DER work better inside of file formats than Protobuf and many protocols like it. Actually having a formal standard makes including it in file types a lot easier.

cryptonector Oct 23, 2025

PB is a lot more invasive at the build system layer, and in the libraries you have to link with. But that's not an essential aspect of PB, more like accidental, thus you're quite right :)

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous