A quick introduction:
If you want to program a graphical application for linux, your primary choices are using either X11 or Wayland. According to Wikipedia, X11 had its first release in 1984. X11 follows a client-server-model. I assume it's because the whole computational environment, was very different back then. If you have a central server which is doing all of the heavy lifting, while the user only connects to it via a slim client / dumb terminals / whatever, having the desktop system follow a client-server-model makes sense. In contrast, Wayland (protocol) had its first release in 2008. That's also quite a while ago, in the 2000s the computational environment was probably much closer to what we have today: A PC usually comes with a desktop / graphical interface, and the machine doesn't need to rely on an external server to do most of the computing.
So, over the years there has been a push to switch from X11 to Wayland. And, at least on a surface level, this makes sense to me: Developers probably have learned a lot about the various requirements of desktops, so having a (mostly) clean cut for this new desktop environment seems promising. I have read claims stating that Wayland is inherently more secure than X11. Wayland isn't "outdated", we can design the desktop with performance and modern use-cases in mind.
I am typing this on a desktop machine running sway, which is a Wayland compositor. There definitely have been the common hurdles like desktop recording / sharing not working. But over time, these issues have been resolved - at least for my machine. Some years ago, I tried out both X11 and Wayland (I think back on Arch Linux). And honestly, the sway installation was far easier than the i3/X11 one. This ease of installation, combined with Wayland supposedly being "the future of Linux Desktops", and it supporting X11 applications via XWayland, made me stick to sway, even with its rough edges.
That was the story of me using Wayland. Now comes the developing part - which has been a fucking nightmare.
For libraries to be used by other developers, I'm a big fan of:
If you just want to open a window and do some simple rendering with your GPU, raylib is a fantastic library. Here is an example application:
#include <raylib.h>
int
main(void)
{
InitWindow(1280, 720, "Test Window");
SetTargetFPS(144);
while ( ! WindowShouldClose())
{
BeginDrawing();
ClearBackground(RAYWHITE);
EndDrawing();
}
return 0;
}
Incredibly simple. Raylib is really good at covering the "make easy things easy" part. Ideally, you'd have an "upgrade path" where, using more complex code, you can handle more complicated edge-cases step-by-step [0].
When developing graphical Windows applications, you have to use Windows.h, do a few rather cryptic
function calls to create and get window handles, and then you have your "main loop", where you work
through a bunch of window events (mouse moves, keyboard input, window wants repaint, ...) and respond
accordingly. It's fairly more complicated compared to raylib, so I see lots of potential for improvement.
This is the reason why I had high hopes for Wayland. Boy was I wrong. Wayland does not care for the simple use-case at all. Getting any example application to work is so incredibly ridiculous, that every second I program on Wayland, I yarn for the times I did Win32 programming.
Don't get me wrong: I'm not expecting that e.g. DPI aware mult-monitor applications using several input devices, mixed refreshrates and hot-plugging of devices "just works". It's just that every single thing, which I would expect to be reasonably simple, or have helper functions of any kind, is so incredibly painful at every step of the way.
I'm not posting code here, because my helper functions to open up an OpenGL Window and transform the whole Wayland insanity into a list of events is >1300 lines of code.
In Wayland, opening up a roughly works as follows:
- You connect connect to a Wayland socket (this is ok)
- Then you connect to the "registry". The "registry" is responsible for returning "global objects". This includes stuff like your monitors, but also some core Wayland objects (WlShm, WlCompositor, ...) and protocol extensions (XdgShell, WlrShell, etc).
- Almost everything is done via callbacks.
- You trigger Wayland gathering events and calling callbacks using
wl_display_roundtrip()&wl_display_dispatch() - You need to setup a surface in order to display anything. How exactly this works varies wildly depending on if you are using e.g. OpenGL or not, and the type of surface you use
And when I mean everything, I do mean everything.
- Want to retrieve the globals? Register registry callback functions.
- Want to retrieve mouse/keyboard input? Register all callback functions, then you get them in a callback.
- Need monitor information? Save the
wl_outputyou got from the registry callback, then register all the callbacks on them. - Xdg needs you to regularly respond to PING functions. You can respond by registering a callback for the PINGs, and then calling the appropriate PONG in there.
- Want to know whether your application just got focussed? Register all callbacks for surface events.
I fully blame Wayland being an Object Oriented Protocol for this.
The control flow is horrible. Have fun trying to predict what code executes in what order after your
wl_display_roundtrip() & wl_display_dispatch() calls - and I still don't know what's the difference between
them and in what order to call them on. Initialization code for OpenGL is rather fragile, as some of these
callback functions are called several times during initialization.
If you fuck anything up during initialization, just likely just get no window and the program is running endlessly. Even if do all of this callback bullshit, none of it is simple to use.
- Keyboard input: You also need to initialize and translate the event info using
xkb, otherwise you don't know what character the key press is supposed to represent. - You only get an event for key press and key release. If you want to support key repetition, you guessed it, get the key repetition info from some callback. Then manually setup a timer FD using said info, and keep track which button is supposed to repeat yourself.
- Want to know what the current refreshrate is?
- Track all global
wl_outputobjects and register all the callbacks for each - Store the refreshrate info from the callback alongside the
wl_outputobject - Register callbacks for the surface you are displaying.
- When you enter a surface, you get a callback including the
wl_outputbeing used. Match thatwl_outputwith your stored globals and the corresponding refreshrate.
- Track all global
- By default on sway, you get window decorations if the program is tiled, but no decorations when you make the window floating. If you want to change this behavior:
- Get the
zxdg_toplevel_decorationglobal from the registry - Use that to explictly request what kind of decorations you want to have.
- Get the
- Is the window supposed to be a pop-up? Then set its min- and max-size to be equal.
The most valuable resource is Wayland.app. It has an overview over
- the core protocol functions
- accepted extensions
- WIP extensions and where are they are used
- Which version of the extension / protocols is used by which compositor.
You can see how fragmented this whole thing is. It's a complete mess.
Oh and I haven't even mentioned how you can use these extensions.
This is crucial, because you cannot open a window if you just use the core protocol.
You used to be able to, but the "shell" which was used to display things, wl_shell,
has been deprecated.
I'm not kidding.
You are supposed to use XdgShell instead [1]. But if you lookup the extension, you will
probably only find an XML-file - because the interface code is generated from it using
wayland-scanner as follows:
wayland-scanner private-code < xdg-shell.xml > xdg_shell.c
wayland-scanner client-header < xdg-shell.xml > xdg_shell.h
The official Wayland Documentation does not mention wayland-scanner,
only that it's generated from the XML. I'm also still not sure where I'd get the XML files from.
For Void Linux, there is a wayland-protocols package, which puts the XML files in /usr/share/wayland-protocols.
This is sheer insanity. There are so many obstacles you have to get over in order to produce
just a blank window, and even if you get past that, the control flow of the application is fucking garbage.
I have no idea why they didn't stick to "main event loop" instead, and just provided an easy way
of dismissing events you don't care about.
To give you another glimpse: For some fucked up reason, opening up an OpenGL Window is easier than just drawing some pixels on the CPU. Yes, really. Here is, roughly, what you need to do for a software rendered application:
- Get all the global objects
- Create a
WlSurface - Register all the callbacks for the
WlSurface - Create an
XdgSurface - Register all the callbacks for the
XdgSurface - Create an
XdgToplevel - Register all the callbacks for the
XdgToplevel - Create a shared memory object which I'll call
ShmFDusingshm_open - Set its size using
ftruncateand map it usingmmap - Using the global
WlShmandShmFD, create anwl_shm_pool - Using the
wl_shm_pool, create aWlBuffer - Call
wl_surface_commiton yourWlSurface - Call
wl_display_dispatch()andwl_display_roundtrip()- In the
XdgSurface.Configure-callback, attachWlBufferto theWlSurface. Damage and commitWlSurfaceto indicate that you have stuff to be displayed. - In the
XdgToplevel.Configure-callback, you need to keep track whether your resolution changed. If so, you probably want to create anotherWlBufferwith new dimensions, and draw to it
- In the
- To know when you need to repaint:
- Create a
wl_callbackfor theWlSurface - Create a callback for it. When it's called you:
- should do the rendering
- destroy the current
wl_callback - create a new
wl_callback - add the callbacks to the
wl_callback
- Create a
I'm skipping over some details here, but you can see that this shit is fucking insane. Fuck anything up, and your application is less responsive, doesn't render at all or way too often, is leaking memory, and/or so on and so forth.
Here's a list of stuff I randomly stumbled over, in no particular order.
If you use your GPU for rendering, you are probably using eglSwapBuffers using EGL.
If you activate VSync, eglSwapBuffers will automatically block until the frame is displayed,
which is quite handy (no manual sleeping / refresh-rate querying required).
But if you unplug / replug the monitor, eglSwapBuffers will
block indefinitely.
Wayland, AFAIK, has no concept of primary monitors. For my personal application, I wanted to hardcode something using the right-most monitor. Wayland offers a geometry callback, which is supposed to return the x & y position within the global compositor space (and other info like physical dimensions, subpixel layouts, transforms, etc). Apparently several Wayland environments always return (0, 0) for the monitor position.
The Wayland Protocol development is so slow, that someone (from Valve?) started Frog Protocols, with the sole purpose of being able to iterate much more quickly.
As far as I know, there is no standardized way of retrieving the current "desktop state", meaning retrieving a list of open windows, their positions, etc. I'm aware that Wayland is supposed to be "secure", but simply not offering this functionality introduces fragmentation and is likely more insecure than offering an API with some sort of permission system. On Sway, you can retrieve the "desktop state" using the sway-ipc-socket, which communicates via JSON (kill me). However, there is no way to query for applications / surfaces using the Wlr Layer Shell extension, which are used for workspace independent windows (e.g. status bar) [2].
Getting the Wayland Clipboard to work is a nightmare.
I just wanted to copy some text into the clipboard, but I gave up after while.
For text, the easiest hack I found was to start another process running wl-clipboard and
passing the relevant arguments to that.
Hotplugging stuff doesn't necessarily work. The keys of my drawing tablet pad are not recognized by any application until I restart the programs. Judging from my experiments, Wayland doesn't seem to generate the relevant events to announce that the device has been plugged in, while unplugging does send the events to render the device invalid [3].
Xdg-Desktop-Portal is required to do any form of screensharing. Thanks to Wayland thinking that screensharing / recording is "out of scope", xdg-desktop-portal has been implemented by several compositors, leading to fragmentation. On Void Linux, there are the following implementations available:
- GNOME
- GDK
- KDE
- LXQT
- WLR
- XAPP
While I was able to make parts of a Wayland Window transparent, I wasn't able to make it "click-through" / forward the events to the window behind it.
Setting mouse cursors is a massive pain in the ass. While I now know how to hide the cursor or how to display the "normal" cursor, I'm not sure how to handle other cursor types: You need to pass a "CursorType" string, and I legit couldn't find a list of all the valid valid strings [4].
As a user, using Wayland is nice.
Compared to X11, the internals of Wayland Compositors might also be a great upgrade, I don't know.
As a developer, I want to kill myself when working with Wayland code. The API of this "asynchronous object oriented protocol" is a fucking disaster. It's a huge downgrade compared to Win32 or X11 with XLib. You cannot write a simple application with Wayland. Every part of it, the different extensions, the fact that you generate the api code from XML, and the entire control flow when interacting with Wayland, is utterly horrible. This is the foundation all future Linux applications should supposedly build upon.