Preferences

Can you elaborate on what's going on here? Something like "fd" is assuming your filenames are UTF-8, but you're actually using some other encoding?

What's happening here is that fd's `-e/--extension` flag requires that its parameter value be valid UTF-8. The only case where this doesn't work is if you want to filter by a file path whose extension is not valid UTF-8. Needless to say, I can't ever think of a case where you'd really want to do this.

But if you do, fd still supports it. You just can't use the `-e/--extension` convenience:

    $ touch $'a.\xFF'
    $ fd '.*\.(?-u:\xFF)'
    a.�
That is, `fd` requires that the regex patterns you give it be valid UTF-8, but that doesn't mean the patterns themselves are limited to only matching valid UTF-8. You can disable Unicode mode and match raw bytes via escape sequences (that's what `(?-u:\xFF)` is doing).

So as a matter of fact, the sibling comments get the analysis wrong here. `fd` doesn't assume all of its file paths are valid UTF-8. As demonstrated above, it handles non-UTF-8 paths just fine. But some conveniences, like specifying a file extension, do require valid UTF-8.

Right because file names are not guaranteed to be UTF-8. That's the reason Rust has str and then OsStr. You may assume Rust str is valid UTF-8 (unless you have unsafe code tricking it) but you may not assume OsStr is valid UTF-8.

Here an invalid UTF-8 is passed via command line arguments. If it is desired to support this, the correct way is to use args_os https://doc.rust-lang.org/beta/std/env/fn.args_os.html which gives an iterator that yields OsString.

No, OsStr is for OSes that don't encode strings as UTF-8, e.g. Windows by default. Use Path or PathBuf for paths, they're not strings. Pretty much every OS has either some valid strings that aren't valid paths (e.g. the filename `CON` on Windows is a valid string but not valid in a path), some valid paths that aren't valid strings (e.g. UNIX allows any byte other than 0x00 in paths, and any byte other than 0x00 or `/` in filenames, with no restrictions to any text encoding), or both.
fd is assuming the argument to -e is valid UTF-8 while filenames can contain any byte but NUL and /
"I cook up impractical situations and then blame my tools for it"

Nobody cares that valid filenames are anything except the null byte and /. Tell me one valid usecase for a non-UTF8 filename.

UTF-8 is common now, but it hasn't always been. Wanting support for other encoding schemes is a valid ask (though, I think the OP was needlessly rude about it).
It's backwards compatible with ascii right?

But yeah I suppose you would need support for all the other foreign-language encodings that came in between -- UCS-2 for example.

But basically nobody does that. Glib (which drives all GTK apps' and various other apps file reading) doesn't support anything other than UTF8 filenames. At that point I'd consider the "migration" done and dusted.

The world is a lot more complicated & varied than you think :) I was digging around in some hard drives from 2004 just last weekend. At that time, lots of different encodings were common, especially internationally. Software archaelogy is a common hobby, it could be nice to be able to use a tool like this to search through old filesystems. "Not worth the effort" is definitely a valid response to the feature request, but that also doesn't mean there is absolutely no use for the feature.
You do not have (or write programs for) filesystems that contain loads of ancient mp3 and wma files.

It is the bane of my existence, but many programs support all the Latin-1 and other file name encodings that are incompatible with UTF-8, so users expect _your_ programs to work too.

Now if you want me to actually _display_ them all correctly, tough turds...

True. Btw curious, is there a defined encoding for text in mp3 metadata? Or is that a pain too.
Running a shell script went badly, generating a bunch of invalid files containing random data in their names, rather than one file containing random data.

You wish to find and delete them all, now that they've turned your home directory into a monstrosity.

nah, eff all that. Roll back the snapshot.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal