Preferences

I made a design decision for a standard for dataset structure to explicitly ban characters beyond ascii [A-Za-z0-9.,-_ ] precisely because all the positivity around utf-8 often leads people to think that it comes with no additional complexity cost. There is an escape hatch with a way to indicate that a dataset uses unicode filenames but the standard states that any consumer may reject such datasets because unicode support is explicitly not required.

I got pushback from people who would not have to implement or maintain the systems for being a backward asciite so seeing this article is rather vindicating.


This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal