In this day and age that seems increasingly like a solved problem to most end users, often a client-side issue or using a very old method of generating a PDF?
Modern PDF supports font embedding of various kinds (legality is left as an exercise to the PDF author) and supports 14 standard font faces which can be specified for compatibility, though more often document authors probably assume a system font is available or embed one.
There are still problems with the format as it foremost focuses on document display rather than document structure or intent, and accessibility support in documents is often rare to non-existent outside of government use cases or maybe Word and the like.
A lot of usability improvements come from clients that make an attempt to parse the PDF to make the format appear smarter. macOS Preview can figure out where columns begin and end for natural text selection, Acrobat routinely generates an accessible version of a document after opening it, including some table detection. Honestly creative interpretation of PDF documents is possibly one of the best use cases of AI that I’ve ever heard of.
While a lot about PDF has changed over the years the basic standard was created to optimize for printing. It’s as if we started with GIF and added support to build interactive websites from GIFs. At its core, a PDF is just a representation of shapes on a page, and we added metadata that would hopefully identify glyphs, accessible alternative content, and smarter text/line selection, but it can fall apart if the PDF author is careless, malicious or didn’t expect certain content. It probably inherits all the weirdness of Unicode and then some, for example.
Modern PDF supports font embedding of various kinds (legality is left as an exercise to the PDF author) and supports 14 standard font faces which can be specified for compatibility, though more often document authors probably assume a system font is available or embed one.
There are still problems with the format as it foremost focuses on document display rather than document structure or intent, and accessibility support in documents is often rare to non-existent outside of government use cases or maybe Word and the like.
A lot of usability improvements come from clients that make an attempt to parse the PDF to make the format appear smarter. macOS Preview can figure out where columns begin and end for natural text selection, Acrobat routinely generates an accessible version of a document after opening it, including some table detection. Honestly creative interpretation of PDF documents is possibly one of the best use cases of AI that I’ve ever heard of.
While a lot about PDF has changed over the years the basic standard was created to optimize for printing. It’s as if we started with GIF and added support to build interactive websites from GIFs. At its core, a PDF is just a representation of shapes on a page, and we added metadata that would hopefully identify glyphs, accessible alternative content, and smarter text/line selection, but it can fall apart if the PDF author is careless, malicious or didn’t expect certain content. It probably inherits all the weirdness of Unicode and then some, for example.