Comment by jiggawatts

jiggawatts Sep 8, 2024 parent

Crazy is the right term. File system APIs in general have too many sharp edges and need a ground-up rethink.

Consider S3-like protocols: these recognise that 99% of the time applications just want “create file with given contents” or “read back what they’ve previously written.”

The edge cases should be off the beaten path, not in your way tripping you up to when you want the simple scenario.

Cthulhu_ Sep 10, 2024

Aren't the edge cases features? An abstraction (or different API, sure) is in order to prevent footguns. However, this abstraction should not force fsyncs for example, due to the performance impact mentioned. It puts the choice of guaranteed writes vs performance to the developer.

josephg Sep 10, 2024

A better abstraction, designed from the ground up, wouldn’t force fsyncs to work.

For example, write groups or barriers (like memory barriers) would be wonderful. Or a transaction api, or io completion ports like on windows.

In a database (and any other software designed for resiliency), you want the file contents to transition cleanly from state A to B to C, with no chance to end up in some intermediate state in the case of power loss. And you want to be notified when the data has durably written. It’s unnecessarily difficult to write code that does that on top of POSIX in an efficient way. Most code that interacts with files is either slow, wrong or both. All because the api is bad.

aseipp Sep 10, 2024

> Aren't the edge cases features?

What features do you have in mind?

> It puts the choice of guaranteed writes vs performance to the developer.

Yes, and it's a completely false choice. This entire point of this thread is that fsync is an incredibly difficult API to use in a way that gets you the guarantees you need ("don't lose the writes to this file"). And that the consistency guarantees of specific filesystems, VFS, POSIX, and their interactions are not easy to understand even for the experienced -- and it can be catastrophic to get wrong.

It isn't actually a choice between "Speed vs correctness". That's a nice fairy tale where people get to pretend they know what they're up against and everyone has full information. Most programmers aren't going to have the attention to get this right, even good ones. So then it's just "99.9% chance you fucked it up and it's wrong" and then your users are recovering data from backups.

lionkor Sep 10, 2024

It sounds more like you are asking for an abstraction

lmz Sep 10, 2024

The file system is already an abstraction. I think they are asking if it's the right one.

TickleSteve Sep 10, 2024

Is the filesystem the correct abstraction? For most applications, a database-like API is more appropriate, hence SQLite.

cryptonector Sep 10, 2024

The filesystem is very easy to use for simple things by comparison to a DB, and it's more accessible from the shell. But you're right, the filesystem is very difficult to use in a power failure safe way. SQLite3 has great power failure recovery testing, so the advice to use SQLite3 for any but the simplest things is pretty good.

It'd be very nice to get some sort of async filesystem write barrier API. Something like `int fbarrier(int fd)` such that all writes anywhere in the filesystem will be sync'ed later when you `fsync()` that fd.

It would also be very nice to have an async `sync()`/`fsync()`. That may sound oxymoronic, but it's not. An async `sync()`/`fsync()` would schedule and even start the sync and then provide a completion notice so the application can do other work while the sync happens in the background. One can do sync operations in worker threads and then report completion, but it'd be nice to have this be a first class operation. Really, every system call that does "I/O" or is or can be slow should be / have been designed to be async-capable.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous