I made a design decision for a standard for dataset structure to explicitly ban characters beyond ascii [A-Za-z0-9.,-_ ] precisely because all the positivity around utf-8 often leads people to think that it comes with no additional complexity cost. There is an escape hatch with a way to indicate that a dataset uses unicode filenames but the standard states that any consumer may reject such datasets because unicode support is explicitly not required.
I got pushback from people who would not have to implement or maintain the systems for being a backward asciite so seeing this article is rather vindicating.
I got pushback from people who would not have to implement or maintain the systems for being a backward asciite so seeing this article is rather vindicating.