Released at the Outline Demoparty in May 2026, Ommen, NL
An exploration of algorithmic density in 16 bytes of x86 assembly.
In the demoscene, exploring what can be achieved within extreme constraints is a rewarding technical challenge. The following 16 bytes of x86 real-mode DOS assembly code represent a careful exercise in algorithmic density. When executed, it utilizes the computer's video memory as a calculation space to draw an infinite Sierpinski fractal, while simultaneously interpreting that geometry as audio data.
int 10h ; 2 bytes
mov bh, 0xb8 ; 2 bytes
mov ds, bx ; 2 bytes
L:
lodsb ; 1 byte
sub si, byte 57 ; 3 bytes
xor [si], al ; 2 bytes
out 61h, al ; 2 bytes
jmp short L ; 2 bytes
1. The Canvas: A Primed Void
The code begins with a standard BIOS interrupt: int 10h. This initializes Video Mode 0, establishing a 40x25 text mode grid. The subsequent instructions point the Data Segment (DS) to 0xB800, the physical memory address of the VGA/CGA text buffer.
When the BIOS clears the screen during this interrupt, it does not fill the memory with absolute zeroes. In text mode, every character space consists of two bytes: the ASCII character and the color attribute. The BIOS initializes all 2,000 character slots uniformly: the ASCII byte is set to 0x20 (the Space character), and the color byte is set to 0x07 (Light Gray text on a Black background). While the screen appears completely empty, it is actually a canvas primed with a uniform pattern of data.
The Importance of Uniformity: The mathematical progression we are about to explore relies on a predictable environment. If the memory contained random artifact data, the algorithmic calculations would ingest those discrepancies. In a cellular automaton, an unexpected bit can disrupt the pattern. A reasonably uniform memory space provides a foundation for the fractal to emerge clearly.
2. The Engine: Additive Prefix Sums
To understand the pure mathematics of the fractal, let us temporarily isolate our variables. We will model a perfectly zeroed state instead of the base 0x20 initialization. Additionally, we will substitute add instead of xor, and step forward by 16 bytes at a time across this memory, assuming the accumulator AL is loaded with the value 2.
A real-mode DOS segment spans exactly 65,536 bytes. By moving forward 16 bytes per iteration, it takes exactly 4,096 steps to traverse the segment (\( 65536 / 16 = 4096 \)). When the SI register advances past 0xFFFF, it wraps cleanly back to 0x0000.
As the loop progresses, it adds the current value of the accumulator to the memory cell, reading the updated value back into the accumulator. This effectively creates a running prefix sum. Because 4,096 is a multiple of 256 (the capacity of our 8-bit register), the mathematical carryover aligns when the segment wraps, cleanly resetting AL to 2 at the end of each full sweep.
The value of the \( k \)-th modified cell during pass \( p \) across the segment follows a binomial coefficient sequence, scaled by our initial value of 2:
$$A^{(p)}[k] \equiv 2 \binom{k+p}{p-1} \pmod{256}$$
The following table illustrates the first 16 steps of calculation across 16 memory cells, demonstrating how the values accumulate row by row:
3. Crystallization: XOR and the Sierpinski Shift
A deeper pattern is present within those decimal values. When performing binary addition, the bit-planes carry over into adjacent positions. However, if we discard the arithmetic carry and perform addition strictly modulo 2, we are left with the Exclusive OR (XOR) operation.
By using xor instead of add, the algorithm isolates the bit-planes. Because our modeled starting value is 2 (binary 00000010), only Bit 1 is ever affected by this specific calculation. The cascading decimal numbers become a pure toggle between 0x00 and 0x02. This progression maps perfectly to Rule 60 in Stephen Wolfram's elementary cellular automata:
$$Cell^{(p)}[k] = Cell^{(p-1)}[k] \oplus Cell^{(p)}[k-1]$$
According to Lucas's Theorem, this XOR relationship is mathematically guaranteed to match the state of Bit 1 from the additive table. We can validate this by visualizing the binary propagation (where '2' denotes Bit 1 being set):
4. The Voice of the Machine: Translating Data to Audio
A remarkably elegant detail lies in the instruction: out 61h, al
Port 61h interfaces with the internal PC speaker. Bit 1 of this port directly controls the speaker cone by pushing it outward when set to 1, and returning it when set to 0. Our routine computes the fractal via XOR, updates the memory, and immediately sends that byte to the speaker port.
Because the algorithm specifically isolates and toggles Bit 1, the geometry of the Sierpinski triangle serves as a direct set of instructions for the speaker cone. The execution speed of the CPU establishes the functional sample rate.
The patterns of 1s and 0s generated by the fractal yield distinct square waves, varying naturally in pulse width and frequency. The following plots visualize the states from the table above as audio waveforms:
When the fractal produces an alternating row (e.g., Pass 2), it outputs a higher-frequency square wave. When the structure introduces larger blocks of zeroes (the empty regions within the triangles, such as in Pass 4), the speaker cone remains stationary, resulting in rhythmic pauses. The resulting audio is a direct sonic representation of the mathematical structure.
5. The 56-Byte Step: Octave Shifts and Diagonal Shears
Returning to the actual code, we notice it does not step by 16. The instruction sub si, byte 57, combined with the increment from lodsb, results in a net movement of -56 bytes per iteration. The routine traverses memory in reverse.
This adjustment alters both the auditory frequency and the visual layout of the output.
The Audio: 7-Wrap Sweeps and Frequency Halving
While a 16-byte step completes a segment sweep in exactly 4,096 iterations, 56 does not divide 65,536 evenly (\( \gcd(56, 65536) = 8 \)). The loop visits only offsets that are multiples of 8, requiring 8,192 iterations to hit all available addresses, and wrapping around the segment 7 times before returning to 0x0000.
Since 8,192 is divisible by 256, the mathematical continuity of the Sierpinski sequence is preserved. However, because the macro-cycle is now 8,192 steps long rather than 4,096, it takes twice as many CPU cycles to complete a pass. This halves the fundamental frequency of the system, dropping the auditory rhythm by one octave, yielding a slower and deeper tone.
It is here that the routine's dual-purpose nature truly shines. The algorithm writes directly to the ASCII character bytes, where Bit 1 is exclusively responsible for toggling the speaker cone. Meanwhile, the remaining seven bits mutate into a cascade of pseudo-random ASCII glyphs, providing the chaotic visual texture of the production. Remarkably, sending this entire mixed-data byte directly to system port 61h, which historically manages various low-level motherboard controls, does not disrupt the system. In standard DOS environments and modern emulators, pushing these extra bits to the port is effectively harmless. This elegant coincidence allows the visual character data to safely double as the audio signal without causing a hardware crash.
The Visuals: Diagonal Structuring
We can also observe how a -56 byte step maps to a 40-character (80-byte) screen width:
- Stepping backward by 56 bytes is spatially equivalent to moving forward by 24 bytes on an 80-byte grid (\( -56 \equiv 24 \pmod{80} \)).
- Given that each character space occupies 2 bytes, a 24-byte shift equals exactly 12 columns.
The sequence progresses up 1 row and right 12 columns per step. To determine the number of distinct columns visited, we calculate the Greatest Common Divisor: \( \gcd(12, 40) = 4 \).
The arithmetic ensures the loop visits exactly one-quarter of the columns (\( 40 / 4 = 10 \)). Consequently, the fractal is rendered not as a contiguous image; instead, the spatial offset shears it diagonally into 10 evenly spaced vertical columns ascending the screen.
6. Final Observations on Memory Dependency
If we picture this routine running on a classic monochrome green monitor, we would observe ten distinct pillars of flickering, ascending glyphs, accompanied by an electronic tone mirroring the geometric density of the pattern.
However, it is important to acknowledge a practical reality regarding the initial state of the hardware. The theoretical model assumes a perfectly initialized, predictable environment. In reality, after the BIOS interrupt 10h, different system configurations, varying VGA BIOS implementations, and modern emulators (like DOSBox or PCem) can leave slightly different artifacts or default values in the upper regions of RAM. Because our algorithm continually reads and XORs against whatever is already present in memory, it will interact directly with these existing bits.
As a result, the effect is highly sensitive to its environment. The visual characters displayed and the timbre of the sound may vary noticeably depending on the specific machine or emulator executing the code.
To guarantee an identical, uniform output across every possible hardware configuration, one would simply need to add a brief setup routine to explicitly clear or preset the entire memory segment. While this would ensure theoretical perfection, it would naturally require a slightly larger footprint than the strict 16-byte limit allows. Ultimately, any of these memory environments can be reproduced on any other system with a little bit more code space, but embracing the subtle unpredictability of the machine's natural state is simply part of the charm of working within such tight constraints. Thank you for reading.