For example, write groups or barriers (like memory barriers) would be wonderful. Or a transaction api, or io completion ports like on windows.
In a database (and any other software designed for resiliency), you want the file contents to transition cleanly from state A to B to C, with no chance to end up in some intermediate state in the case of power loss. And you want to be notified when the data has durably written. It’s unnecessarily difficult to write code that does that on top of POSIX in an efficient way. Most code that interacts with files is either slow, wrong or both. All because the api is bad.
What features do you have in mind?
> It puts the choice of guaranteed writes vs performance to the developer.
Yes, and it's a completely false choice. This entire point of this thread is that fsync is an incredibly difficult API to use in a way that gets you the guarantees you need ("don't lose the writes to this file"). And that the consistency guarantees of specific filesystems, VFS, POSIX, and their interactions are not easy to understand even for the experienced -- and it can be catastrophic to get wrong.
It isn't actually a choice between "Speed vs correctness". That's a nice fairy tale where people get to pretend they know what they're up against and everyone has full information. Most programmers aren't going to have the attention to get this right, even good ones. So then it's just "99.9% chance you fucked it up and it's wrong" and then your users are recovering data from backups.
It'd be very nice to get some sort of async filesystem write barrier API. Something like `int fbarrier(int fd)` such that all writes anywhere in the filesystem will be sync'ed later when you `fsync()` that fd.
It would also be very nice to have an async `sync()`/`fsync()`. That may sound oxymoronic, but it's not. An async `sync()`/`fsync()` would schedule and even start the sync and then provide a completion notice so the application can do other work while the sync happens in the background. One can do sync operations in worker threads and then report completion, but it'd be nice to have this be a first class operation. Really, every system call that does "I/O" or is or can be slow should be / have been designed to be async-capable.
Consider S3-like protocols: these recognise that 99% of the time applications just want “create file with given contents” or “read back what they’ve previously written.”
The edge cases should be off the beaten path, not in your way tripping you up to when you want the simple scenario.