- @graveland Which Linux interface was used for the userspace block driver (ublk, nbd, tcmu-runner, NVMe-over-TCP, etc)? Why did you choose it?
Also, were existing network or distributed file systems not suitable? This use case sounds like Ceph might fit, for example.
- qemu-img convert supports copy_file_range(2) too. Was the `--copy-range-offloading` option used in the benchmark?
It would be helpful to share the command-line and details of how benchmarks were run.
- I'm the presenter of the talk, but not an io_uring kernel developer or security expert.
The io_uring implementation is complex and the number of lines of code is non-trivial. On the other hand, as code matures and the number of bugs being reported falls, the trade-off between functionality gained and risk of security issues changes. More people will decide to use io_uring as time passes. People already rely on much larger and more complex subsystems like the network stack or file systems.
- Donations are possible through PayPal: https://www.qemu.org/donations/
QEMU is a Software Freedom Conservancy member project like Git, OpenWRT, and many others. You can donate through the Conservancy link you posted and mention which project you wish to support.
- It looks plausible: XFS's xfs_dio_write_end_io() updates the on-disk file size. Do you have a link to documentation that confirms this is true for Linux or POSIX filesystems?
Edit: POSIX 1003.1-2017 defines fdatasync(2) behavior in 3.384 Synchronized I/O Data Integrity Completion, where it says "For write, when the operation has been completed or diagnosed if unsuccessful. The write is complete only when the data specified in the write request is successfully transferred and all file system information required to retrieve the data is successfully transferred".
So I think POSIX does guarantee that a write at the end of the file with O_DSYNC/followed by fdatasync(2) (and therefore, Linux RWF_DSYNC) is sufficient. Thank you for pointing out that RWF_DSYNC is sufficient for appends, vlovich123!
- io_uring is available from RHEL 9.3 onward. The catch is that it's disabled by default and needs to be enabled at runtime via the "kernel.io_uring_disabled" sysctl.
- Agreed, when metadata changes are involved then RWF_SYNC must be used.
RWF_DSYNC is sufficient and faster when data is overwritten without metadata changes to the file.
- The Linux RWF_DSYNC flag sets the Full Unit Access (FUA) bit in write requests. This can be used instead of fdatasync(2) in some cases. It only syncs a specific write request instead of the entire disk write cache.
- There is ongoing discussion about this topic in the QEMU AI policy: https://lore.kernel.org/qemu-devel/20250625150941-mutt-send-...
- The article mentions getting SPEC CPU running but doesn't share performance results or scalability results (now the CPU can decode twice as many instructions, etc). Can someone who has been following the research in this area share some results?
- I'm not familiar with this stuff and don't have time to fully read the specs plus the required background reading. Here is my guess based on skimming the spec:
The application may embed the VAPID public key while the VAPID private key is kept secret by the app developer. That way only the app developer can send valid push notifications. This approach doesn't work when the app running on an untrusted device sends push notifications directly though?
I guess the trick is for the app to treat the push notification purely as a hint to go fetch the latest state from the app server. Do not trust anything in the push notification message. Then it doesn't matter whether the messages are spoofed.
You linked to some Android Intents code in the firebase-message code. I guess that is related to preventing Intent spoofing, but I'm not sure?
- In this direct to FCM model, does the application running on the untrusted user's device need to embed any sensitive credentials?
What prevents someone else from impersonating my app or other users on my app?
- Can you be specific about what cannot be configured on mounted embedded media?
systemd follows the drop-in config file model where configuration snippets are placed into directories (like .d/ directories in Debian). It should not be necessary to run utilities from the target system in order to configure it on mounted media.
Drop unit files for services, sockets, filesystem mounts, timers, etc onto the mounted media and they will be detected when the target system boots.
- Keep in mind it doesn't magically wake up descheduled tasks. So it's necessary to go through the kernel if the destination task is not currently running on a CPU. The latency in that case will be similar to what we have today.
And in cases where you can guarantee that the destination task is running you can already uses shared memory for low-latency communication today (polling or mwait).
I'm not saying userspace interrupts are useless but they are not as convincing as it seems at first glance. I think more proof of concepts (enabling real applications) and benchmarking are needed to demonstrate the advantages.
- Lots of cool hybrid container/VM ideas are being developed!
Bootable Container Images are a standard for launching VMs from OCI images: https://containers.github.io/bootable/
crun-vm (https://github.com/containers/crun-vm) is similar to RunCVM in that it can launch container images (or VM disk images) in VMs. It's an OCI runtime so it fits into the podman, Docker, or Kubernetes model.
krunvm (https://github.com/containers/krunvm) is a standalone tool with a similar workflow where you can launch a VM from an OCI image. It predates Bootable Container Images, so I think it injects its own kernel.
- > the sound from musical instruments
Pianoteq (https://en.wikipedia.org/wiki/Pianoteq) is a physically modeled collection of instruments (mostly pianos). Runs even on a Raspberry Pi and sounds like the real deal without gigabytes of prerecorded samples. Super impressive what physical modeling can achieve.
- There is even NVMe passthrough support via io_uring, so it's still possible to send custom NVMe commands when using io_uring instead of a userspace driver: https://www.usenix.org/system/files/fast24-joshi.pdf
Normal block I/O use cases don't really need NVMe io_uring passthrough, but it addresses the more exotic cases and is available in mainline Linux. And NVMe passthrough might eke out a little more performance.
- Thank you! This quick overview is helpful: https://player.vimeo.com/video/922512661
- I have a hard time getting a sense of what the display is really like from the website and the embedded video (it cuts too quickly, uses depth of field shots of the tablet, etc).
Given that this is all about a new display, it would be nice to show a more pragmatic demo video right upfront. A demo that gives a clear look at how the display behaves with respect to lighting, reflections, animation, touch screen, etc. That's what I would look for when deciding to buy this.
Maybe YouTube reviewers will end up providing this information...
This is called the Primary Selection and is separate from the Clipboard (Ctrl+C/Ctrl+V). IMO the Primary Selection is more convenient than the Clipboard.