我讨厌：编写 Wayland 应用程序

我讨厌：编写 Wayland 应用程序
I hate: Programming Wayland applications

原始链接: https://www.p4m.dev/posts/29/index.html

## Wayland 开发：开发者们的沮丧 Linux 从旧的 X11 窗口系统迁移到 Wayland 是为了追求更现代的架构、更好的安全性和更高的性能。虽然 Wayland 提供了更流畅的用户体验并且正日益成为标准，但为它开发应用程序却被证明极具挑战性。作者详细描述了一段令人沮丧的经历，将 Wayland 的复杂性与 raylib 等库，甚至旧的 X11/Win32 API 的相对简单进行了对比。Wayland 的核心设计——一种异步、面向对象的协议，严重依赖回调——创造了复杂且曲折的控制流，即使是打开窗口或处理输入等基本任务也需要大量的样板代码。主要问题包括碎片化的扩展支持、从 XML 文件生成的 API 代码以及缺乏标准化的功能（例如桌面状态检索）。即使是剪贴板访问、屏幕共享和热插拔设备等看似简单的功能也需要大量的变通方法。作者认为 Wayland 优先考虑了架构的纯粹性而非开发者易用性，导致即使是经验丰富的程序员也难以构建哪怕是中等复杂度的应用程序。尽管被认为是“未来”，但 Wayland 的开发体验感觉是倒退了一步。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录我讨厌：编程 Wayland 应用程序 (p4m.dev) 12 分，dwdz 发表于 56 分钟前 | 隐藏 | 过去 | 收藏 | 1 条评论帮助 jmclnx 发表于 4 分钟前 [–] 我可以说我讨厌所有 GUI 编程！幸运的是，我的专业编程都处理后端，所以我避免了 GUI :) 所以我理解你的痛苦。我听说 Wayland 编程比 X11 更难，但我从未做过任何一个，所以不知道这是否属实。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

A quick introduction:

If you want to program a graphical application for linux, your primary choices are using either X11 or Wayland. According to Wikipedia, X11 had its first release in 1984. X11 follows a client-server-model. I assume it's because the whole computational environment, was very different back then. If you have a central server which is doing all of the heavy lifting, while the user only connects to it via a slim client / dumb terminals / whatever, having the desktop system follow a client-server-model makes sense. In contrast, Wayland (protocol) had its first release in 2008. That's also quite a while ago, in the 2000s the computational environment was probably much closer to what we have today: A PC usually comes with a desktop / graphical interface, and the machine doesn't need to rely on an external server to do most of the computing.

So, over the years there has been a push to switch from X11 to Wayland. And, at least on a surface level, this makes sense to me: Developers probably have learned a lot about the various requirements of desktops, so having a (mostly) clean cut for this new desktop environment seems promising. I have read claims stating that Wayland is inherently more secure than X11. Wayland isn't "outdated", we can design the desktop with performance and modern use-cases in mind.

I am typing this on a desktop machine running sway, which is a Wayland compositor. There definitely have been the common hurdles like desktop recording / sharing not working. But over time, these issues have been resolved - at least for my machine. Some years ago, I tried out both X11 and Wayland (I think back on Arch Linux). And honestly, the sway installation was far easier than the i3/X11 one. This ease of installation, combined with Wayland supposedly being "the future of Linux Desktops", and it supporting X11 applications via XWayland, made me stick to sway, even with its rough edges.

That was the story of me using Wayland. Now comes the developing part - which has been a fucking nightmare.

For libraries to be used by other developers, I'm a big fan of:

Make easy things easy. Make hard things doable.

If you just want to open a window and do some simple rendering with your GPU, raylib is a fantastic library. Here is an example application:


#include <raylib.h>

int
main(void)
{
	InitWindow(1280, 720, "Test Window");
	SetTargetFPS(144);
	while ( ! WindowShouldClose())
	{
		BeginDrawing();
		ClearBackground(RAYWHITE);
		EndDrawing();
	}
	return 0;
}

Incredibly simple. Raylib is really good at covering the "make easy things easy" part. Ideally, you'd have an "upgrade path" where, using more complex code, you can handle more complicated edge-cases step-by-step [0].

When developing graphical Windows applications, you have to use Windows.h, do a few rather cryptic function calls to create and get window handles, and then you have your "main loop", where you work through a bunch of window events (mouse moves, keyboard input, window wants repaint, ...) and respond accordingly. It's fairly more complicated compared to raylib, so I see lots of potential for improvement.

This is the reason why I had high hopes for Wayland. Boy was I wrong. Wayland does not care for the simple use-case at all. Getting any example application to work is so incredibly ridiculous, that every second I program on Wayland, I yarn for the times I did Win32 programming.

Don't get me wrong: I'm not expecting that e.g. DPI aware mult-monitor applications using several input devices, mixed refreshrates and hot-plugging of devices "just works". It's just that every single thing, which I would expect to be reasonably simple, or have helper functions of any kind, is so incredibly painful at every step of the way.

I'm not posting code here, because my helper functions to open up an OpenGL Window and transform the whole Wayland insanity into a list of events is >1300 lines of code.

In Wayland, opening up a roughly works as follows:

You connect connect to a Wayland socket (this is ok)
Then you connect to the "registry". The "registry" is responsible for returning "global objects". This includes stuff like your monitors, but also some core Wayland objects (WlShm, WlCompositor, ...) and protocol extensions (XdgShell, WlrShell, etc).
Almost everything is done via callbacks.
You trigger Wayland gathering events and calling callbacks using wl_display_roundtrip() & wl_display_dispatch()
You need to setup a surface in order to display anything. How exactly this works varies wildly depending on if you are using e.g. OpenGL or not, and the type of surface you use

And when I mean everything, I do mean everything.

Want to retrieve the globals? Register registry callback functions.
Want to retrieve mouse/keyboard input? Register all callback functions, then you get them in a callback.
Need monitor information? Save the wl_output you got from the registry callback, then register all the callbacks on them.
Xdg needs you to regularly respond to PING functions. You can respond by registering a callback for the PINGs, and then calling the appropriate PONG in there.
Want to know whether your application just got focussed? Register all callbacks for surface events.

I fully blame Wayland being an Object Oriented Protocol for this. The control flow is horrible. Have fun trying to predict what code executes in what order after your wl_display_roundtrip() & wl_display_dispatch() calls - and I still don't know what's the difference between them and in what order to call them on. Initialization code for OpenGL is rather fragile, as some of these callback functions are called several times during initialization.

If you fuck anything up during initialization, just likely just get no window and the program is running endlessly. Even if do all of this callback bullshit, none of it is simple to use.

Keyboard input: You also need to initialize and translate the event info using xkb, otherwise you don't know what character the key press is supposed to represent.
You only get an event for key press and key release. If you want to support key repetition, you guessed it, get the key repetition info from some callback. Then manually setup a timer FD using said info, and keep track which button is supposed to repeat yourself.
Want to know what the current refreshrate is?
- Track all global wl_output objects and register all the callbacks for each
- Store the refreshrate info from the callback alongside the wl_output object
- Register callbacks for the surface you are displaying.
- When you enter a surface, you get a callback including the wl_output being used. Match that wl_output with your stored globals and the corresponding refreshrate.
By default on sway, you get window decorations if the program is tiled, but no decorations when you make the window floating. If you want to change this behavior:
- Get the zxdg_toplevel_decoration global from the registry
- Use that to explictly request what kind of decorations you want to have.
Is the window supposed to be a pop-up? Then set its min- and max-size to be equal.

The most valuable resource is Wayland.app. It has an overview over

the core protocol functions
accepted extensions
WIP extensions and where are they are used
Which version of the extension / protocols is used by which compositor.

You can see how fragmented this whole thing is. It's a complete mess. Oh and I haven't even mentioned how you can use these extensions. This is crucial, because you cannot open a window if you just use the core protocol. You used to be able to, but the "shell" which was used to display things, wl_shell, has been deprecated. I'm not kidding. You are supposed to use XdgShell instead [1]. But if you lookup the extension, you will probably only find an XML-file - because the interface code is generated from it using wayland-scanner as follows:


wayland-scanner private-code  < xdg-shell.xml > xdg_shell.c
wayland-scanner client-header < xdg-shell.xml > xdg_shell.h

The official Wayland Documentation does not mention wayland-scanner, only that it's generated from the XML. I'm also still not sure where I'd get the XML files from. For Void Linux, there is a wayland-protocols package, which puts the XML files in /usr/share/wayland-protocols. This is sheer insanity. There are so many obstacles you have to get over in order to produce just a blank window, and even if you get past that, the control flow of the application is fucking garbage. I have no idea why they didn't stick to "main event loop" instead, and just provided an easy way of dismissing events you don't care about.

To give you another glimpse: For some fucked up reason, opening up an OpenGL Window is easier than just drawing some pixels on the CPU. Yes, really. Here is, roughly, what you need to do for a software rendered application:

Get all the global objects
Create a WlSurface
Register all the callbacks for the WlSurface
Create an XdgSurface
Register all the callbacks for the XdgSurface
Create an XdgToplevel
Register all the callbacks for the XdgToplevel
Create a shared memory object which I'll call ShmFD using shm_open
Set its size using ftruncate and map it using mmap
Using the global WlShm and ShmFD, create an wl_shm_pool
Using the wl_shm_pool, create a WlBuffer
Call wl_surface_commit on your WlSurface
Call wl_display_dispatch() and wl_display_roundtrip()
- In the XdgSurface.Configure-callback, attach WlBuffer to the WlSurface. Damage and commit WlSurface to indicate that you have stuff to be displayed.
- In the XdgToplevel.Configure-callback, you need to keep track whether your resolution changed. If so, you probably want to create another WlBuffer with new dimensions, and draw to it
To know when you need to repaint:
- Create a wl_callback for the WlSurface
- Create a callback for it. When it's called you:
  - should do the rendering
  - destroy the current wl_callback
  - create a new wl_callback
  - add the callbacks to the wl_callback

I'm skipping over some details here, but you can see that this shit is fucking insane. Fuck anything up, and your application is less responsive, doesn't render at all or way too often, is leaking memory, and/or so on and so forth.

Here's a list of stuff I randomly stumbled over, in no particular order.

If you use your GPU for rendering, you are probably using eglSwapBuffers using EGL. If you activate VSync, eglSwapBuffers will automatically block until the frame is displayed, which is quite handy (no manual sleeping / refresh-rate querying required). But if you unplug / replug the monitor, eglSwapBuffers will block indefinitely.

Wayland, AFAIK, has no concept of primary monitors. For my personal application, I wanted to hardcode something using the right-most monitor. Wayland offers a geometry callback, which is supposed to return the x & y position within the global compositor space (and other info like physical dimensions, subpixel layouts, transforms, etc). Apparently several Wayland environments always return (0, 0) for the monitor position.

The Wayland Protocol development is so slow, that someone (from Valve?) started Frog Protocols, with the sole purpose of being able to iterate much more quickly.

As far as I know, there is no standardized way of retrieving the current "desktop state", meaning retrieving a list of open windows, their positions, etc. I'm aware that Wayland is supposed to be "secure", but simply not offering this functionality introduces fragmentation and is likely more insecure than offering an API with some sort of permission system. On Sway, you can retrieve the "desktop state" using the sway-ipc-socket, which communicates via JSON (kill me). However, there is no way to query for applications / surfaces using the Wlr Layer Shell extension, which are used for workspace independent windows (e.g. status bar) [2].

Getting the Wayland Clipboard to work is a nightmare. I just wanted to copy some text into the clipboard, but I gave up after while. For text, the easiest hack I found was to start another process running wl-clipboard and passing the relevant arguments to that.

Hotplugging stuff doesn't necessarily work. The keys of my drawing tablet pad are not recognized by any application until I restart the programs. Judging from my experiments, Wayland doesn't seem to generate the relevant events to announce that the device has been plugged in, while unplugging does send the events to render the device invalid [3].

Xdg-Desktop-Portal is required to do any form of screensharing. Thanks to Wayland thinking that screensharing / recording is "out of scope", xdg-desktop-portal has been implemented by several compositors, leading to fragmentation. On Void Linux, there are the following implementations available:

GNOME
GDK
KDE
LXQT
WLR
XAPP

While I was able to make parts of a Wayland Window transparent, I wasn't able to make it "click-through" / forward the events to the window behind it.

Setting mouse cursors is a massive pain in the ass. While I now know how to hide the cursor or how to display the "normal" cursor, I'm not sure how to handle other cursor types: You need to pass a "CursorType" string, and I legit couldn't find a list of all the valid valid strings [4].

As a user, using Wayland is nice.

Compared to X11, the internals of Wayland Compositors might also be a great upgrade, I don't know.

As a developer, I want to kill myself when working with Wayland code. The API of this "asynchronous object oriented protocol" is a fucking disaster. It's a huge downgrade compared to Win32 or X11 with XLib. You cannot write a simple application with Wayland. Every part of it, the different extensions, the fact that you generate the api code from XML, and the entire control flow when interacting with Wayland, is utterly horrible. This is the foundation all future Linux applications should supposedly build upon.

我讨厌：编写 Wayland 应用程序 I hate: Programming Wayland applications

我讨厌：编写 Wayland 应用程序
I hate: Programming Wayland applications