无需虚拟机软件的 Linux VM

无需虚拟机软件的 Linux VM – 用户模式 Linux
Linux VM without VM software – User Mode Linux

原始链接: https://popovicu.com/posts/linux-vm-without-vm-software-user-mode/

## 用户模式 Linux：内核中的内核 Linux 具有一项独特的能力：它可以移植到*自身*。用户模式 Linux (UML) 允许你在现有的 Linux 内核中运行一个 Linux 内核作为进程，有效地创建一个虚拟机，而无需传统的虚拟化软件（如 QEMU）或 root 权限。 UML 利用内核的硬件抽象层，使用宿主机资源——文件作为块设备，套接字用于 I/O——来模拟“客户”内核的硬件。这是通过半虚拟化实现的，其中驱动程序意识到虚拟化环境并可以优化通信。构建 UML 内核涉及一个标准配置过程（使用 `ARCH=um make menuconfig`），并带有用于利用宿主机设施的特定“知情”驱动程序。然后需要一个用户空间，通常使用 Buildroot 等工具构建，才能在嵌套内核中运行。虽然 UML 允许运行一个单独的内核实例，但它与宿主机紧密相连，并且不像 KVM 那样提供相同的隔离性。它的优势在于内核调试和实验，提供了一种有趣且富有洞察力的方式来探索内核行为，而不是作为生产虚拟化解决方案。它占据了完整虚拟机和容器之间的独特空间，提供了一个与宿主机直接连接的单独内核。

## 用户模式 Linux：一种轻量级虚拟化方法最近的 Hacker News 讨论重新审视了用户模式 Linux (UML)，这是一种在另一个 Linux 内核*内部*运行 Linux 内核作为进程的方法——实际上是一种无需传统虚拟化软件的虚拟机。UML 通过利用文件和套接字来模拟新的内核实例，从而实现这一点，允许进程以受限的系统可见性运行，类似于容器。早期版本依赖于 `ptrace()` 来拦截系统调用，而后续版本探索了“skas”（独立内核地址空间）。尽管 UML 提供了比早期 VMware 等昂贵或不成熟的虚拟化方案更具吸引力的替代方案，甚至为 Linode 的早期产品提供了支持，但性能限制以及 KVM 和高级容器等更高效解决方案的出现导致了其在大型托管服务中的衰落。然而，UML 仍然对开发和测试很有价值，特别是内核级工作，在这些工作中，使用 GDB 等工具进行调试至关重要。它还用于专业任务，例如使用“时空穿越”模式加速单元测试。尽管不太主流，UML 仍在维护和开发中，提供了一种独特的虚拟化方法。

原文

If you carefully read the Linux kernel docs, you will find an interesting statement:

Linux has also been ported to itself. You can now run the kernel as a userspace application - this is called UserMode Linux (UML).

Today, we’ll explore how you can start an unconventional VM by running a Linux kernel as a process within the Linux kernel itself. This approach doesn’t require installing virtualization software like QEMU, nor does it need root privileges, which opens up some intriguing possibilities.

Open Table of contents

Kernel’s Hardware Abstraction

A fundamental responsibility of the kernel is to abstract hardware and offer a consistent interface to userspace. This includes managing shared resources like the CPU and memory for multiple tasks. The kernel determines the underlying hardware (e.g., through a device tree on some platforms, which lists system components) and connects the appropriate drivers.

This hardware can also be entirely virtual. In a QEMU virtual machine, for instance, resources like memory and attached disks are virtualized by the QEMU userspace application, incurring a certain performance overhead. The CPU presents an interesting case, as it too can be virtualized in userspace, particularly when emulating a different architecture.

A fascinating aspect of drivers for virtualized hardware is that they can be enlightened — or, more formally, paravirtualized. This means the drivers are aware they’re running on virtualized hardware and can leverage this by communicating with the hardware in specialized ways. While the specifics are complex, one can imagine drivers interacting with virtual hardware in ways not feasible with physical counterparts. Online sources suggest that paravirtualization can achieve performance levels close to those of physical devices using traditional drivers.

UML - Kernel in a Userspace Process

Personally, I view UML as a paravirtualized kernel configuration. Instead of running directly on bare metal, the UML kernel operates atop an existing kernel instance, leveraging some of its userspace functionalities. For instance, rather than linking the console driver to a physical UART, it can utilize standard userspace input/output. Similarly, a block device driver can target a file on the host’s filesystem instead of a physical disk.

In this setup, UML is essentially a userspace process that cleverly employs concepts like files and sockets to launch a new Linux kernel instance capable of running its own processes. The exact mapping of these processes to the host — specifically, how the CPU is virtualized — is something I’m not entirely clear on, and I’d welcome insights in the comments. One could envision an implementation where guest threads and processes map to host counterparts but with restricted system visibility, akin to containers, yet still operating within a nested Linux kernel.

This page from the kernel’s documentation has a pretty good illustration of what this looks like:

            +----------------+
            | Process 2 | ...|
+-----------+----------------+
| Process 1 | User-Mode Linux|
+----------------------------+
|       Linux Kernel         |
+----------------------------+
|         Hardware           |
+----------------------------+

I highly recommend checking out that page for more detailed documentation, particularly for the compelling reasons listed for its usefulness. The final point is especially appealing:

It’s extremely fun.

And that’s precisely why we’re diving into it today!

Building a UML Kernel

First things first: it’s crucial to understand that a UML kernel can run only on x86 platforms. You can layer an x86 UML kernel on top of an existing x86 kernel; as far as I know, no other configurations are supported.

Next, we’ll build the UML binary. The configuration process starts with:

ARCH=um make menuconfig

You can configure the kernel much like you normally would. You’ll immediately notice several UML-specific options on the initial configuration page. I tend to think of these as “enlightened” drivers, designed to use the host’s userspace facilities as virtual hardware.

For this demonstration, I specifically enabled the BLK_DEV_UBD option. The documentation explains:

The User-Mode Linux port includes a driver called UBD which will let you access arbitrary files on the host computer as block devices. Unless you know that you do not need such virtual block devices, say Y here.

This option wasn’t enabled by default (which surprised me a bit), so I recommend setting it to Y. Once you’ve finalized your configuration, building is straightforward:

ARCH=um make -j16

And this produces a linux binary right there!

$ file linux
linux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=742d088d46f7c762b29257e4c44042f321dc4ad5, with debug_info, not stripped

Interestingly, it’s dynamically linked to the C standard library:

$ ldd linux
        linux-vdso.so.1 (0x00007ffc0a3ce000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3490409000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3490601000)

Building Userspace

To do anything meaningful within our nested kernel, we need a userspace. For simplicity, I chose to download the latest Buildroot and build it for x86/64.

If you’re feeling adventurous and want to try building a minimal userspace from scratch but aren’t sure where to begin, pairing this with the micro Linux distro exercise could be a lot of fun.

Running the Nested Kernel

To make things interesting, I decided to provide a block device to the nested kernel, write some data to it, and then verify that data from the host system.

First, let’s create the disk image:

$ dd if=/dev/urandom of=./disk.ext4 bs=1M count=100

Next, we’ll format it with ext4:

$ sudo mkfs.ext4 ./disk.ext4

Now, it’s time to fire up the kernel in userspace. I’ll use the Buildroot image (an ext2 file provided by Buildroot) as the root filesystem:

./linux ubd0=/tmp/uml/rootfs.ext2 ubd1=/tmp/uml/disk.ext4 root=/dev/ubda

And just like that, we’re greeted by a very familiar kernel boot sequence!

Core dump limits :
        soft - 0
        hard - NONE
Checking that ptrace can change system call numbers...OK
Checking syscall emulation for ptrace...OK
Checking environment variables for a tempdir...none found
Checking if /dev/shm is on tmpfs...OK
Checking PROT_EXEC mmap in /dev/shm...OK
Linux version 6.14.7 (uros@debian-home) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #6 Mon May 19 16:27:13 PDT 2025
Zone ranges:
  Normal   [mem 0x0000000000000000-0x0000000063ffffff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000000000-0x0000000003ffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x0000000003ffffff]
random: crng init done
Kernel command line: ubd0=/tmp/uml/rootfs.ext2 ubd1=/tmp/uml/disk.ext4 root=/dev/ubda console=tty0
printk: log buffer data + meta data: 16384 + 57344 = 73728 bytes
Dentry cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
Inode-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
Sorting __ex_table...
Built 1 zonelists, mobility grouping on.  Total pages: 16384
mem auto-init: stack:all(zero), heap alloc:off, heap free:off
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
NR_IRQS: 64
clocksource: timer: mask: 0xffffffffffffffff max_cycles: 0x1cd42e205, max_idle_ns: 881590404426 ns
Calibrating delay loop... 8931.73 BogoMIPS (lpj=44658688)
Checking that host ptys support output SIGIO...Yes
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
Memory: 57488K/65536K available (3562K kernel code, 944K rwdata, 1244K rodata, 165K init, 246K bss, 7348K reserved, 0K cma-reserved)
...

and at the end, we have the Buildroot login:

Run /sbin/init as init process
EXT4-fs (ubda): warning: mounting unchecked fs, running e2fsck is recommended
EXT4-fs (ubda): re-mounted 23cafb4d-e18f-4af4-829d-f0dc7303e6c4 r/w. Quota mode: none.
EXT4-fs error (device ubda): ext4_mb_generate_buddy:1217: group 1, block bitmap and bg descriptor inconsistent: 7466 vs 7467 free clusters
Seeding 256 bits and crediting
Saving 256 bits of creditable seed for next boot
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: OK
Starting crond: OK

Welcome to Buildroot
buildroot login:

The boot process was surprisingly quick.

Now, let’s create a mountpoint for our disk within the UML instance:

# mkdir /mnt/disk

Then, we mount the second UBD device (ubdb) to this mountpoint:

# mount /dev/ubdb /mnt/disk/

With the disk mounted, we can write a test file:

# echo "This is a UML test!" > /mnt/disk/foo.txt
# cat /mnt/disk/foo.txt
This is a UML test!

I can now shut down the UML VM:

# poweroff

which gives

# Stopping crond: stopped /usr/sbin/crond (pid 64)
OK
Stopping network: OK
Stopping klogd: OK
Stopping syslogd: stopped /sbin/syslogd (pid 40)
OK
Seeding 256 bits and crediting
Saving 256 bits of creditable seed for next boot
EXT4-fs (ubdb): unmounting filesystem e950822b-09f7-49c2-bb25-9755a249cfa1.
umount: devtmpfs busy - remounted read-only
EXT4-fs (ubda): re-mounted 23cafb4d-e18f-4af4-829d-f0dc7303e6c4 ro. Quota mode: none.
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system poweroff
reboot: Power down

On my host system:

$ sudo mount ./disk.ext4 ./img

$ cat ./img/foo.txt
This is a UML test!

This little experiment confirms that we successfully ran a VM using UML, wrote data to a block device within it, and those changes persisted, accessible from the host system.

Conclusion

Throughout this article, I’ve referred to UML as a VM, and you’d be right to raise an eyebrow. On one hand, it embodies the idea of hardware virtualization via host userspace facilities, and the environment gets its own distinct kernel. On the other hand, this guest kernel is intrinsically linked to the host’s kernel. While it aims for isolation, it doesn’t achieve the same level you’d expect from a QEMU VM powered by KVM.

What’s the real-world utility here? Is UML suitable for running isolated workloads? My educated guess is: probably not for most production scenarios. I believe UML’s primary strength lies in kernel debugging, rather than serving as a full-fledged, production-ready virtualization stack. For robust VM needs, KVM virtualization (operating at a different architectural layer) is far more battle-tested. Of course, containers offer another alternative if sharing the host kernel is acceptable for your workloads. UML carves out an interesting niche between these two: offering a separate kernel instance while still maintaining a unique connection to the host kernel. It’s a fascinating concept.

Perhaps in the future, this intriguing technology will garner more attention and see wider adoption. For now, though, it’s a fantastic tool for experimentation and, at the very least, a lot of fun to play with!

Happy hacking!

For updates, please consider following me on Twitter/X and LinkedIn.