Comment by tgbugs - Hacker Neue

tgbugs Dec 6, 2024 parent

I made a design decision for a standard for dataset structure to explicitly ban characters beyond ascii [A-Za-z0-9.,-_ ] precisely because all the positivity around utf-8 often leads people to think that it comes with no additional complexity cost. There is an escape hatch with a way to indicate that a dataset uses unicode filenames but the standard states that any consumer may reject such datasets because unicode support is explicitly not required.

I got pushback from people who would not have to implement or maintain the systems for being a backward asciite so seeing this article is rather vindicating.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous