利用模拟器存档状态在超级马里奥兄弟中实现一个玩具实时操作系统 (RTOS)。

利用模拟器存档状态在超级马里奥兄弟中实现一个玩具实时操作系统 (RTOS)。
A toy RTOS inside Super Mario Bros. using emulator save states

原始链接: https://prettygoodblog.com/p/what-threads-are-part-2

作者使用超级马里奥兄弟作为线程，创建了一个多线程的NES模拟器，以一种切实可行的方式阐释了线程的概念。模拟器每160帧在三个不同的游戏实例之间切换，每个实例都有独特的颜色调色板。游戏世界被设计成可以激活互斥锁和条件变量等同步原语，允许与线程边界进行交互。例如，特定区域禁用中断，另一个区域充当互斥锁，一次只允许一个马里奥进入，而旗杆则使用条件变量来暂停线程，直到所有其他线程都到达它。模拟器通过保存和加载游戏状态来实现并发性，这模仿了真实CPU上线程的实现方式。作者提供了用于修改FCEUX模拟器的Lua脚本，并鼓励读者探索它，这表明虽然线程在实践中很复杂，但可以通过亲自动手创建来理解。

一篇Hacker News帖子重点介绍了一个独特的项目：一个在未修改的NES模拟器（FCEUX）上的超级马里奥兄弟游戏中实现的玩具实时操作系统（RTOS）。该RTOS使用模拟器的存档状态作为线程上下文，从而实现抢占式多任务处理。它具有基本的互斥锁、中断屏蔽和条件变量，并在超级马里奥兄弟1-1关卡中进行了演示。虽然项目本身被认为是简化了的（使用了完整的存档状态而非最小化的线程上下文），但它新颖的方法在视觉上解释了线程和并发性，受到了称赞。评论者赞赏其对底层概念的神秘化处理以及对临界区演示。改进建议包括线程间的共享内存以及扩展到毁灭战士或多核模拟。作者欢迎反馈，特别是关于其作为教学工具的潜力，并承认这种比喻的局限性。

（评论） 2023-12-27

（评论） 2024-09-16

异步 Rust 的视觉之旅 2025-04-25

Show HN：我在 tmux 配置文件中实现了贪吃蛇游戏 2025-03-29

原文

This is another post about programming, which I almost never write about.

Click here to jump straight to trying this thing out for yourself.

In my previous post on Threads, I made an offhand comparison:

Threads are just emulator save states, coupled with a condition upon which they will be resumed.

At the time, I thought this a pretty okay analogy — but I couldn’t stop thinking about it. I’ve been turning it around in my mind for a while. I think it has serious untapped potential as a pedagogical tool.

So I added multithreading to Super Mario Bros. for the NES.

No buried ledes here.

I should explain myself.

What you just watched happens to be a multithreaded NES emulation, with Super Mario Bros. as the threads.

There are three “threads” running, each a distinct instance of the game. Every so often, the emulator switches “threads”, swapping to a different instance of the game.

Each “thread” is assigned a different color palette, which we apply when we resume said thread. This is why the colors are constantly changing around in the video.

When World 1-1 starts, the “multithreading” begins. Specifically, my script creates three save states, each representing the current state of the game. Then I create three threads, and give them their respective save state to hold on to.

Then I start the thread scheduler.

The thread scheduler’s job is simple:

Every 160 frames, switch to a new thread
- The next thread ideally, in a rotating fashion (1, 2, 3, 1, 2, 3, 1, …)
- Skip over threads which are KILLED, BLOCKED (unless they can be unblocked), or SLEEPING (unless it’s time to wake them)
To switch to a new thread:
- First, update the current thread’s save state with the current game state
- Then, load in the new thread’s save state
- Finally, update the game color palette to reflect which thread we’re on

This covers what you see for the first bit of the video: each thread starts at the same time (the loading screen for World 1-1), with different “games” being swapped in every so often.

Essentially, three different games of Mario Bros. are being played “at the same time”, but only one is actually active at any given time.

This image shows the full World 1-1 map, with color-coded areas indicating various features explained below. I recommend clicking to expand this image.

Time slicing is cool — we’re running three “concurrent” games of Mario! — but that’s far from the only threading concept demonstrated here.

I’ve set up the game world in such a way that certain areas or features activate synchronization primitives (such as a mutex); you can physically interact with threading boundaries.

I describe the synchronization primitives I’ve set up below, but the net effect is:

If a Mario is standing on the three blocks at the start of the level, no other Mario’s game will run until he leaves (Disabled Interrupts)
Only one Mario can be inside the Pipe Sub-level at a time (Mutex)
Once a Mario touches the flagpole, his game pauses until all the other Marios have touched their flagpoles (Condition Variables)

The red shaded area on the map denotes a section of the world where interrupts are disabled, and therefore, the thread scheduler cannot run.

Whenever a Mario enters the red shaded area, his thread won’t lose control until he leaves the area, regardless of what other Marios are doing.

The yellow shaded area (difficult to discern — it’s the fourth pipe from the left; the one you can go down) demonstrates a mutex: an area of the game world where only one Mario may exist at a time.

When a Mario goes down the pipe, he tries to acquire the pipeMutex. If nothing else owns that mutex (i.e., no other thread is currently in the pipe sub-level), then he immediately gains ownership of the mutex and proceeds without issue.

However, if another Mario owns the mutex (is presently in the pipe sub-level), then the Mario which is presently entering the pipe will be blocked until the mutex is released (the other Mario leaves the pipe sub-level).

Marios outside of the pipe sub-level are not blocked, and are still allowed to run.

The green shaded area (the flagpole at the end of the level) demonstrates a condition variable: when a Mario touches the flagpole, he increments numMariosTouchedFlagpole — a “condition variable” — by 1, and then blocks until that same condition variable is equal to the number of threads. In other words, he waits until all the other Marios have touched the flagpole before continuing.

Whenever a Mario kills an enemy, his thread goes to sleep for 300 frames.

It’s hard to tell in the video, but a thread that goes to sleep doesn’t necessarily come back on time; it has to wait until the thread scheduler next decides to run it.

This actually is threads. We have taken a machine — cursed with an inability to do more than one thing at a time — and added concurrency to it, without modifying the core engine (or CPU) to have any notion of “threads”.

We added threads to this emulator in the same way that we add threads to normal CPUs: we take clever advantage of a mechanism which allows us to A) save the current state of the machine and B) load it back up in the future if and when we choose. All without the emulator itself ever being designed for, or having any notion of, “threads”.

And you can touch it. You interact with these bona fide threads not through a debugger, not by instantiating a Mutex, but by walking into a critical section and observing how the threading behavior changes in real time.

It’s sick!

Has anyone seen it? Do you know where it lives? What time does it tend to come home?

I want more people to understand the things that “nobody understands.” Not because we will imminently need them; but because there is certain joy and immense value in conquering these things anyways.

If you’re getting into software today, you are liable to be thrown right into the middle of the abstraction wilderness. One can spend decades in these lands: learning new frameworks; adopting new stacks; exploring endlessly in all directions but into the mysterious down. No need to go down there. Nobody knows what’s beyond that barrier.

But the concrete has not cured. We do not have the luxury of treating these layers as bedrock: don’t even bother trying to get in. Just trust that it’ll hold your house up. Oh, we’ve built the house — this trillion-dollar industry of a house — and sure, it’s standing. But when the foundation fails, what use is the house? When the foundation must evolve, how will we contribute to that process, knowing only how to build atop it? How could we accurately judge a new foundation, knowing nothing about how the old one was built?

Threads are not very complicated, and I don’t think I’m particularly smart for knowing how they work. Simple circumstance forced me to work with them at a deep level, which one cannot do without gaining an understanding of the thing. Anyone who had to work with them in the same way I did would likely have the same level of understanding. Conversely, there are countless things I’ve never worked with — and thus do not understand — which I certainly should know more about.

What is complicated is how to make threads work extremely efficiently and reliably. Switching contexts? Not too hard. Optimizing the hell out of them with data structures and algorithms? My intuition usually fails me on this front — I am no CS guru. But the core innovation of threads requires none of this.

That’s the case for almost everything of this sort: these are all relatively simple ideas! Taking the time to understand them only requires understanding structure and logical flow; one does not need to dive into complex math or advanced algorithms in order to grasp the root idea. And once it’s been grasped, we can add it to our foundational understanding; a new bone in the ol’ conceptual skeleton; supercharging work way up the stack in ways we cannot predict.

Learning frontend frameworks is useful. One can produce billions of dollars from such work! But nothing will give you an understanding of frontend frameworks as a concept like creating your own framework. Once you’ve done that, the way you perceive and grok any other framework will be permanently enhanced.

Before I built this, I would have struggled to talk about how to implement a mutex — I’ve never done it before! My implementation is certainly terrible; a naive, simplistic approach which works only because the stakes are so low. But that’s fine: I can see the next steps. The ways in which it could improve. Through working on this — even through identifying problems I want to avoid solving — my brain is building pathways and forging connections between concepts without my knowledge.

I don’t know how the Linux kernel implements mutexes, or the best way to implement them in a battle-hardened system; it’d be silly to say that building this toy would grant me such expertise. But I now know how to evaluate what I’m looking at if I have to. “Damn, they built it way better than I did” is a far more instructive, useful experience than “damn, they built a thing I have no understanding of!”

At first, I was going to find an open-source NES emulator and add all this directly into its source code. This was going to be quite cool and impressive of me. Then I found FCEUX, which exposes a Lua plugin system through which I can do everything I need. Jackpot.

I do still want the credit for the original plan, though.

A few hundred lines of Lua later, and we have a legitimate thread scheduler, with support for mutexes, condition variables, interrupt masking, sleep, and more.

I highly encourage you to read the code for yourself, but I’ll walk through some of it here.

Before we build a thread scheduler, we need to be comfortable in our environment. We’re writing a Lua plugin for an NES emulator… what can we do?

Luckily, the documentation is quite helpful — already, we can see that we have the tools we need to:

Create a save state (savestate.create() and savestate.save())
1. This is how we’ll “save” a thread so we can resume it later
Load a save state (savestate.load())
1. This is how we’ll resume a thread we previously put to sleep
Read memory from the game’s RAM (memory.readbyte())
1. This is how we’ll figure out what level the player is on, their coordinates within the level, etc.
2. We’ll use this helpful document to figure out where all the juicy bits live in the game’s RAM
Draw text on the screen (gui.drawtext())
1. This is how we’ll obscure the majority of the screen with irrelevant information
Control when frames get executed on the emulator (emu.frameadvance())
1. Our lua code needs to call this function whenever a frame of the game should run
  1. This lets us do anything we want in-between frames

This is everything we need!

Let’s start with a basic “do-nothing” script, which is functionally identical to no script at all:

while true do
    emu.advanceframe()
end

We want to only kick in and start multithreading once the player has started the game.

I don’t have the best solution here, but I decided to hook into the GAME_MODE (0x0770) and PRE_LEVEL_SCREEN_SHOWING (0x0757) memory addresses. When each has the value of 1, we know that the game is starting and is showing the “pre-level screen”, which is a good place to start in my opinion.

This is what that looks like:

function initiate()
    emu.frameadvance()

    if not emu.emulating() then
        return
    end

    local gameMode = memory.readbyte(0x0770)
    local preLevelScreen = memory.readbyte(0x0757)

    if gameMode ~= 1 or preLevelScreen ~= 1 then
        return
    end

    initiated = true
end

function loop()
    emu.frameadvance()
end

while true do
    if not initiated then
        initiate()
    else
        loop()
    end
end

This works, but doesn’t do anything yet.

Now that we can detect when the game has begun, we can start implementing threads.

Remember: threads are just snapshots of state, combined with a condition upon which they should be resumed.

For now, we’ll ignore the “condition to resume” part. We’ll focus solely on the time-slicing bits.

So, we’ll need:

A list of threads
A notion of the “current thread”
A way to switch from the “current thread” to some other thread
A timer which tracks when we should switch threads

Here’s the full implementation for just time slicing:

THREAD_SWITCH_FREQUENCY = 100
NUM_THREADS = 3

local threads = {}
local curThreadIndex = nil
local curFrame = 0
local lastSwitchedThreads = 0

local initiated = false

function shouldRunScheduler()
    return (curFrame - lastSwitchedThreads) >= THREAD_SWITCH_FREQUENCY
end

function threadScheduler()
    local newThreadIndex = curThreadIndex + 1

    if newThreadIndex > NUM_THREADS then
        newThreadIndex = 1
    end

    local oldThread = threads[curThreadIndex]
    local newThread = threads[newThreadIndex]

    savestate.save(oldThread.saveState)
    savestate.load(newThread.saveState)

    curThreadIndex = newThreadIndex
end

function initiate()
    emu.frameadvance()

    if not emu.emulating() then
        return
    end

    local gameMode = memory.readbyte(0x0770)
    local preLevelScreen = memory.readbyte(0x0757)

    if gameMode ~= 1 or preLevelScreen ~= 1 then
        return
    end
    
    for i = 1, NUM_THREADS do
        local thread = {}
        thread.id = i
        thread.state = THREAD_STATE_PREEMPTED
        thread.saveState = savestate.create()

        savestate.save(thread.saveState)
        table.insert(threads, thread)
    end

    initiated = true
    curThreadIndex = 1
    threadScheduler()
end

function loop()
    emu.frameadvance()

    if shouldRunScheduler() then
        threadScheduler()
        lastSwitchedThreads = curFrame
    end
end

while true do
    curFrame = curFrame + 1

    if not initiated then
        initiate()
    else
        loop()
    end
end

This works! Start World 1-1, and you’ll start switching between 3 threads, ruining the gameplay experience in its entirety.

I think this is a good place to stop with the implementation details.

Now that you have a function which can switch threads on demand, it’s not too hard to add:

Thread priorities
Sleeping
Locking on resources (mutexes, semaphores, etc.)
Whatever your heart desires!

I encourage you to try out the full code — linked above — for yourself!

First, obtain a legal copy of a Super Mario Bros. ROM for the NES. I offer no assistance on this front.

Second, download FCEUX. Click that blog link and download that unsigned executable. Do it.

Third, download the Lua script and save it somewhere you can find on your computer.

Fourth, read the Lua script and ensure I didn’t just trick you into downloading malware.

Fifth, open FCEUX. Click File → Load Lua Script. Click Browse, then find the Lua file you saved. Hit Load and then Start.

Sixth, click File → Open ROM. Find the ROM file you downloaded.

Seventh, play the game. You might want to configure controls in Options → Input Config.

Eighth, realize that constantly switching between three instances of Super Mario Bros. isn’t pleasant, and you had no good reason to think it would be.

I haven’t actually set up a situation in which a true deadlock (A holds X, B holds Y, A tries to get Y while B tries to get X) can occur, but it would be handled (in a primitive manner) by the thread scheduler.

Whenever there is no thread that can be run (as would be the case in a deadlock, or if all threads are dead, or if all threads are asleep), the thread scheduler will halt the game and show an error message.

One deadlock-ish way this can happen is if every Mario is waiting on the mutex to enter the pipe, and then the Mario inside the pipe dies. This is really a dangling mutex, but let’s call it a deadlock.

(It’s hard to tell in this video, because the ‘on death’ trigger occurs before the animation even plays, but the Mario inside the pipe dies and thus leaves a dangling mutex)

What if the thread scheduler runs much more often?

WARNING: I think this video could legitimately induce an epileptic seizure.

I don’t like it.

This is not a good thread scheduler.

It does not support thread priorities; idle tasks; semaphores; fairness algorithms; dynamic thread spawning and joining; tracking mutex wait lists; making me any money. It is horribly inefficient, and very annoying to play with. But I love it.

I love it because I was able to build something — which I at one time presumed to be magic — in roughly 300 lines of Lua. I had never done this before! But here it is: my very own thread scheduler, in the most ridiculous of environments. Maybe this will be my DOOM thing: turning every video game into threads. Probably not.

It’s one of those projects that delights at every turn. How utterly wild it was to see it working the first time! Honestly, part of me didn’t expect that it would ever work.

I hope this taught you about… threads, was it?

利用模拟器存档状态在超级马里奥兄弟中实现一个玩具实时操作系统 (RTOS)。 A toy RTOS inside Super Mario Bros. using emulator save states

Has anyone seen it? Do you know where it lives? What time does it tend to come home?

利用模拟器存档状态在超级马里奥兄弟中实现一个玩具实时操作系统 (RTOS)。
A toy RTOS inside Super Mario Bros. using emulator save states