VM Escape: QEMU Case Study

100 points Apr 28, 2017

throw2016 Apr 28, 2017

I think there is too much cruft in Qemu and a simple install can pull in hundreds of dependencies. For instance if you just want to run x86_64 VMs there is no point getting 20 different platforms.

For the standard run of the mill VM there is also rarely any need for the multitude of devices. The more stuff you put in the more surface area for exploits. The Qemu install candidate needs to be seriously redesigned and pruned down by package maintainers.

Intel Clear Linux tried running VMs without qemu altogether with Kvmtool but are now using 'Qemu lite'. But there are no packages, instructions or even a readme on what exactly they have done to make 'Qemu lite'. Kvmtool is one of those packages with zero documentation.

Libvirt is needlessly convoluted and its usage complexity has led to an inferior solution like Virtualbox gaining traction on Linux, inspite of kvm in the kernel. For those looking for a simpler way to use kvm I suggest trying something like Aqemu or even just resorting to the command line. I found it far more lightweight and reasonable than libvirt's heavy handed way of running a simple qemu-kvm command line and adding an XML mess to it.

kashyapc Apr 28, 2017

Please don't make suggestions like "libvirt is too convoluted, just resort to [QEMU] command-line" -- FWIW, I'm saying this as an every day QEMU command-line user -- without having the complete picture. Yes, there are some valid scenarios where people who know what they're doing can (and they do) directly use the QEMU command-line. But for the most majority, libvirt makes life far simpler, because of the following.

Daniel Berrangé (lead maintainer of libvirt) articulates it much more succintly than I could, allow me to quote him:

"It is a common attitude among people who look at QEMU and see a deceptively simple command-line syntax for launching a VM and don't realize that you'll enter a world of hurt when you go beyond the initial simple command-line syntax.

This[1] was written in 2011, but is pretty valid and probably more besides:

  https://www.berrange.com/posts/2011/06/07/what-benefits-does-libvirt-offer-to-developers-targetting-qemukvm/

Recently, we had a very nice example of benefit of libvirt to security when the VENOM bug came out. Anyone using libvirt automatically had anti-VENOM available thanks to SELinux/AppArmour which would prevent exploitation in most cases."

More on it (from his very detailed response on this thread[+], specifically from the 4th paragraph on):

"Also note that this kind of bug [VENOM] in QEMU device emulation is the poster child example for the benefit of having sVirt (either SELinux or AppArmor backends) enabled on your compute hosts.. With sVirt, QEMU is restricted to only access resources that have been explicitly assigned to it. This makes it very difficult (likely/hopefully impossible[1]) for a compromised QEMU to be used to break out to compromise the host as a whole, likewise protect against compromising other QEMU processes on the same host. The common Linux distros like RHEL, Fedora, Debian, Ubuntu, etc all have sVirt feature available and enabled by default and OpenStack doesn't do anything to prevent it from working. Hopefully no one is actively disabling it themselves leaving themselves open to attack...

[1] I'll never claim anything is 100% foolproof, but it is intended to be impossible to escape sVirt, so any such viable escape routes would themselves be considered security bugs. "

[+] http://lists.openstack.org/pipermail/openstack-operators/201...

throw2016 Apr 28, 2017

I think you are presuming here about the 'complete picture'. The point is not to encourage anyone. I am sure people can make their own decisions.

We need to highlight the issues with libvirt that leaves a great technology like kvm unused. Security can be used to close down any discussion but the result is many people are using virtualbox and not libvirt.

Seccomp, selinux or Apparmor can be used without libvirt as required so I don't see this as a security issue. On the contrary the needlessly complex XML config files likely put many users off from using it properly or using it as all.

kashyapc Apr 29, 2017

You are simply overstating the issue. Was not closing the discussion down with security -- that line of thought didn't even occur to me.

The `virtual-box` equivalent in the KVM stack is `virt-manager` (under the hood, it uses libvirt APIs) -- these are primarily for desktop Virtualization, not for server Virtualisation. To manage a fleet of servers in a data center, people don't fire up a desktop application.

Can some aspects of the experience with `virt-manager` / libvirt stack be improved? Certainly yes. But saying things like it "leaves a great technology like kvm unused" is ridiculous hyperbole.

A clarification about why it is partly a "security issue": The sVirt guest confinement mechanism in libvirt builds on top of basic SELinux / AppArmour confinement. As in, with sVirt: each VM (the QEMU proces that is managed by libvirt) and its associated disk gets a unique SELinux label -- this means, even if a guest (in a fleet of 1000 VMs) is compromised, it is contained to just that specific guest.

throw2016 Apr 29, 2017

I think you are conflating issues here.

Obviously virtualbox users do not need data center level security and should not have to deal with that complexity, and naturally choose not to, and use virtualbox.

You have turned a simple observation about the complexity of libvirt's configuration and the resulting traction of virtualbox on Linux into a discussion on data centers, 1000's of VMs, and security. This seem to be talking around the issue so this discussion is moot.

Are you saying selinux, apparmor and seccomp cannot be used on qemu processes without libvirt?

2 More Comments →

nwmcsween May 1, 2017

Libvirt is a mess, when something could have been made simple Libvirt opted for extensibility instead and it shows in the code.

bonzini Apr 28, 2017

QEMU-lite is just a set of patches on top of QEMU, which are in the process of being merged.

j_s Apr 28, 2017

If for some reason using QEMU with potentially hostile code (not recommended!), disable all the hardware emulation you can!

rwmj Apr 28, 2017

You should keep qemu up to date of course, and remove devices you don't need. But more importantly ensure you are using it through libvirt. Libvirt adds an SELinux or AppArmor layer, and a container, and enables seccomp for the qemu process. So even if the qemu process is compromised, it is very limited in what it can do to the host.

aseipp Apr 28, 2017

FWIW I've never really been impressed with QEMU's seccomp sandboxing, because it's almost entirely bolted on, and in the grand scheme, I think, does little. Does libvirt do anything substantially different or just "turn it on"?

If you look at the list of syscalls whitelisted[1], it's almost a joke -- you could literally do just about anything given those syscalls if you actually had code execution... And every one of those syscalls is a direct interface to the kernel, which becomes the weakest link in a container/LSM environment where your code will be running, presumably. How is seccomp going to stop my exploit when almost everything my exploit could ever need is whitelisted, including hundreds of syscalls that directly talk to the kernel?

I'm glad QEMU is improving their security footprint, but I honestly think its seccomp support is way overblown even though people like to bring it up... When your seccomp policy looks like the one from QEMU -- it just bans a few things they've never used, rather than only allowing things they should use. Taking advantage of it to the level of something like Chrome does (which truly mitigates attack surface with process isolation) would require almost an entire redesign of the whole thing, I think.

[1] https://github.com/qemu/qemu/blob/master/qemu-seccomp.c

rwmj Apr 28, 2017

Sure, I agree. It's the SELinux policy which really confines QEMU. For example the compromised QEMU will only be able to open exactly the files containing the guest's drives (not even the drives of other guests on the same host).

secur101 Apr 29, 2017

@rwmj can you please point to the list of white-listed QEMU-KVM devices used in RHEL?

2 More Comments →

sdeziel Apr 28, 2017

Unfortunately on Ubuntu, seccomp is opt-in only. Works well when enabled though.

bonzini Apr 28, 2017

Pretty much all cloud providers that use KVM (where "pretty much all" probably means all except Google) are using QEMU and potentially can be running hostile code, so the "not recommended" remark is perhaps a bit exaggerated.

j_s Apr 28, 2017

Thanks for the tip! Too late to edit, bummer. [Edit:If you're right I] would have switched it to: "not for the faint of heart!"

My understanding was from Qubes choosing Xen and also AWS (they both deal with Xen advisories instead). The Qubes Architecture Specification goes into detail starting on page 11: https://www.qubes-os.org/attachment/wiki/QubesArchitecture/a...

KVM uses the open source qemu emulator for [I/O emulattion]. [...] The I/O emulator is a complex piece of software, and thus it is reasonable to assume that it contains bugs and that it can be exploited by the attacker. In fact both Xen and KVM assume that the I/O emulator can be compromised and they both try to protect the rest of the system from a potentially compromised I/O emulator.

Also pointed out elsewhere on thread: Google skipping QEMU.

Edit: I am digging into it more, but I don't see KVM+QEMU on any top-tier provider (GCE, AWS, Azure [Hyper-V-ish])? My understanding was the only time QEMU was required was to emulate processor architectures, eg. x86 on ARM or vice-versa. QEMU is also used by some reverse-engineering/anti-malware emulation stuff.

bonzini Apr 28, 2017

In China KVM is probably more common than Xen; Aliyun and Tencent use it, and Huawei is transitioning from Xen to KVM (with QEMU). KVM is probably more common among whoever uses OpenStack.

Xen also requires a hardware emulator to run HVM guests (including, but not limited to, Windows VMs). I don't know about now, but it definitely used to be QEMU for AWS.

QEMU can do emulation, but with KVM you use the hypervisor to run code at full speed until it has to interact with the emulated hardware.

Oldhand2017 Apr 28, 2017

Aliyun and Tencent use Xen, too. They are both on Xen's pre-disclosure list. I don't know it now, but Aliyun was at least Xen-only at one point. Huawei offers products both with KVM and Xen. This doesn't necessarily invalidate your speculation though.

The OpenStack aspect is true. Xen lacks support there.

A new Xen guest mode called PVH will remove QEMU when running Linux -- it is basically HVM without QEMU. Windows still requires QEMU.

j_s Apr 28, 2017

Thanks for the follow-up! I'll settle for "no tier-1 US provider publicly admits to using QEMU". :)

I didn't dig too far into the AWS vulnerability list to try to find QEMU; XEN shows up right away! Ok: QEMU is last mentioned July 2015, and in none of the mentions is AWS vulnerable.

https://www.google.com/?q=site:https://aws.amazon.com/securi...

bonzini Apr 28, 2017

> and in none of the mentions is AWS vulnerable.

Yep, that's because most bugs are found in legacy devices that are never found in production. The big exception was a buffer overflow in the floppy device emulation (the "VENOM" vulnerability).

A lot of AWS security bulletins say "AWS customers' data and instances are not affected by these issues". I read it as "we knew about it a couple weeks in advance and have done a rolling upgrade". :)

secur101 Apr 29, 2017

My logic is:

Without a hardened kernel, LSM can be trivially bypassed and seccomp seems to whitelist everything under the sun. This only leaves us with QEMU code quality to rely on. Since Grsec is not longer available this becomes even more urgent.

Xen relies on stubdoms to isolate QEMU from their TCB which leaves them with bugs in the hypervisor itself as the only avenue of attack. The number of Xen-only bugs vs Linux is way fewer. Please correct me if I'm wrong.

@bonzini I use virtio-9p for shared folders all the time why did you dismiss that as a non-issue: https://www.hackerneue.com/item?id=13755021

If you are a KVM dev please look seriously into using an advanced, intelligent fuzzer like the DARPA Grand Challenge winner Shellphish. It can find security bugs and propose patches for them:

https://github.com/shellphish http://angr.io/

Security aside I find Libvirt more wanting in UX. The single biggest roadblock is the lack of a virtual appliance implementation that new comers can point and import to from Virt-Manager. I hope this gets resolved down the line.

secur101 Apr 29, 2017

I'm still rooting for KVM because it has the best hardware compatibility, it in mainline and its performance is the best. However the security situation with QEMU is not as rosy as some of you portray it (and its not as bad as it looks either). The average customer/user is going to look at the number of QEMU CVEs on cvedetails and compare them to the number of Xen's and go with the latter. However most of the QEMU bugs only affect legacy emulated hardware or components not enabled in a KVM guest, most won't think that far.

It would clear things up if you have a table on your site showing which QEMU vulnerabilties affect a specified default configuration of a RHEL/Debian guest out of the box in libvirt. See this for example: https://www.qubes-os.org/security/xsa/

What I want to see:

* Adoption of QEMU-lite as the default mode for Linux guests. There's no point to running Linux in almost any emulated hardware.

* A builtin monitoring solution like Google has that detects excessive DRAM bitflips [1] and cache misses [2] and terminates the guests to foil rowhammer and covert channel attacks.

* A re-design of KSM thats not prone to rowhammer abuse [3]

[1] https://cloudplatform.googleblog.com/2017/01/7-ways-we-harde...

[2] https://www.usenix.org/system/files/conference/usenixsecurit...

[3] http://www.cs.vu.nl/~kaveh/pubs/pdf/ffs-usenixsec16.pdf

bonzini Apr 30, 2017

* I am not aware of any attacks against legacy hardware except for VENOM. Intel's QEMU-lite patches are disabling these devices for speed rather than security reasons. In any case, no external patches are needed right now to disable most legacy devices: QEMU's Q35 machine type doesn't have a default floppy controller and you can already remove the HPET, PIT, SATA controller and SMBIOS controller. What is left is used, albeit sometimes rarely, by the firmware or the OS (e.g. IOAPIC, RTC, PCI host bridge or ACPI); any replacement would be more likely to have holes than the current well-tested code.

* Rowhammer detection is interesting, but not really related to virtualization. Thanks to KVM's design any such monitoring solution would apply equally to Linux containers. This is not the case for Xen, for example.

* Besides Rowhammer, memory dedup is highly subject to side channel attacks. I think this is a much more important issue, and it already pretty much forces you to disable KSM in multi-tenant applications.

0xFFC OP Apr 28, 2017

What do they use at Google?

rwmj Apr 28, 2017

Some proprietary code replacing qemu (but still using KVM). Since they still have to emulate PC devices, this just means they have a different set of security holes and fewer people reviewing the code.

sitkack Apr 28, 2017

Given QEMUs track record, I'd wager that the goog code gets more reviewers and more testing. QEMU is literally swiss cheese, or it emulates real swiss cheese, poorly.

7 More Comments →

sydney6 Apr 28, 2017

Ways we harden our KVM hypervisor at Google Cloud: Security in plaintext: https://www.hackerneue.com/item?id=13483603

legulere Apr 28, 2017

With all the bugs in hardware emulation, wouldn't it make sense to emulate the linux kernel a la bash for windows instead of running the linux kernel on emulated hardware?

btbuilder Apr 28, 2017

Sounds like you are describing containers; while not emulation neither is virtualization. There are many more opportunities for escape dealing with Linux containers than virtualization due to the increased complexity of the interface.

While I'm impressed by the work Microsoft have done to support the Linux kernel interfaces I would imagine the complexity of the effort to implement correct behavior from Windows kernel primitives would lead to more potential security vulnerabilities.

Another comparison might be Linux syscall support within illumos[1] which AFAIK relies on mature Solaris Zones for isolation.

[1] https://www.slideshare.net/bcantrill/illumos-lx

danieldk Apr 28, 2017

Sounds like you are describing containers; while not emulation neither is virtualization.

Another possibility would be User-mode Linux (UML), in contrast to containers, it gives each 'virtual machine' its own Linux kernel, where the Linux kernel runs as another Linux program.

pmiller2 Apr 28, 2017

Not always. Suppose you are writing a stand-alone kernel, for instance. That's much easier to debug in a VM than on real hardware.

overgryphon Apr 28, 2017

This is a case of legacy code left in an important attack surface. I doubt many people need a virtual floppy drive today.

0x0 Apr 28, 2017

Nonsense. I use the virtual floppy drive in VMs all the time, because I'm virtualizing legacy systems. But I do agree it could probably be disabled (and thus unexploitable) by default.

ec109685 Apr 29, 2017

This is a different bug.

This item has no comments currently.