Comment by jcrites - Hacker Neue

jcrites Oct 5, 2023 parent

Are there good data handling libraries that provide abstractions over pipes, sockets, files, and memory and implement optimizations like these? I'd be interested in knowing if there are such libraries in C, C++, Rust, or other systems languages.

I wasn't familiar with some of the APIs mentioned in the article like splice() and vmsplice(), so I wondered if there are libraries that I might use when building ~low-level applications that take advantage of these and related optimizations where possible automagically. (As another commenter mentioned: these APIs are hard to use and most programs don't take advantage of them)

Do libraries like libuv, tokio, Netty handle this automatically on Linux? (From some brief research, it seems like probably they do)

duped Oct 6, 2023

This may go against the grain but this isn't really worth abstracting over since it's not portable. You'll probably want to implement it by hand everywhere you need it.

Higher level code only uses them rarely because they're pretty special purpose and they have to be specialized for Linux. If you're shuffling data around without looking at it only on Linux, splice is useful. There's not that many applications that have that property (something like say, TCP/UDP proxies definitely need it - but your bog standard HTTP server? Not so much).

And if you are writing these apps then the buzzwords like "zero copy" come up often, and splice is one of the first results you'll see.

NavinF Oct 6, 2023

The main reason why people write abstractions over stuff like this is to make it portable. I'm sure there's something similar to vmsplice on every relevant OS. The library can also fallback to write_read if you're targeting some ancient platform

duped Oct 6, 2023

> I'm sure there's something similar to vmsplice on every relevant OS.

There isn't.

gpderetta Oct 6, 2023

I think Linus generally considers splice a failed experiment. It works fine is some simple scenarios, but the generalized support for it needed to make it work failed to materialize.

Having said that, these days sendfile is implemented in term of splice, so in a way many HTTP servers use it.

jeromegn Oct 5, 2023

There’s a crate for tokio, so it’s not automatic but might still be interesting: https://lib.rs/crates/tokio-splice

vacuity Oct 7, 2023

You might want to look at Cosh[1]. I'm puzzling over the paper right now, actually! It's a model for providing a message-passing abstraction that still allows for optimizations. I don't think it's really known outside of the research setting, and writing an efficient Cosh implementation will probably require some time.

In short, it provides three modes of transfer: move, share, and copy. For instance, a move transfer takes data that the sender has R/W permissions to and wholly "gives" it to the receiver. This may be done with page table VM remappings. It also has a strong or weak property that indicates whether the sender and receiver can be trusted to cooperate or must be strictly corralled with VM permission remappings.

To be honest, I don't know if it can be optimized well enough to match ultra-optimized pipes or whatever reliably. That might be a "sufficiently smart compiler" issue. Still, I think it's worth a shot.

[1] https://barrelfish.org/publications/trios14-baumann-cosh.pdf

This item has no comments currently.