展示HN:MacMind – 一种在1989年 Macintosh 上的 HyperCard 中的 Transformer 神经网络
Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh

原始链接: https://github.com/SeanFDZ/macmind

## MacMind:一个揭秘人工智能的微型Transformer MacMind是一个功能完整的、拥有1216个参数的Transformer神经网络,*完全*使用1987年的HyperTalk脚本语言实现,并在一台老式Macintosh SE/30上进行训练。它通过随机示例学习位反转排列(快速傅里叶变换的关键步骤),展示了现代人工智能的核心原理,即使在极其有限的硬件上也能实现。 该项目旨在通过展示大型语言模型并非魔法,而是诸如反向传播和注意力等基本数学过程的放大版本,使人工智能易于理解。MacMind的每一行代码都可以在HyperCard中检查和修改,揭示了底层的数学原理。 尽管尺寸很小,MacMind独立“发现”了FFT的数学结构,与1965年的研究结果相符。它具有token嵌入、位置编码和自注意力机制,所有这些都没有编译代码或外部库。用户可以在经典的Mac环境中训练模型、测试其预测结果并可视化其注意力图。MacMind证明了核心AI过程无论计算能力如何都能工作,为人工智能世界提供了一个透明且易于理解的窗口。

## MacMind:在老式Mac上运行的AI 一位开发者成功地在1989年的Macintosh上,*完全*使用HyperCard训练了一个Transformer神经网络——名为MacMind。这项令人印象深刻的成就表明,诸如反向传播和注意力等核心AI原理并不依赖于现代硬件。 MacMind使用HyperTalk脚本语言构建,通过反复试验学习位反转排列(快速傅里叶变换中的一步),仅使用1216个参数。训练好的“智能”直接存储在HyperCard堆栈文件中,使其可以在较旧的Mac OS版本上移植。 该项目的目标是揭示AI的神秘面纱,展示其背后的数学原理,而不是将其呈现为“魔法”。开发者在GitHub上提供了一个预训练模型、一个用于实验的空白堆栈和一个Python验证工具。它强调了基本概念可以独立于技术进步而存在,呼应了发现被“转移”到早期时代。
相关文章

原文

A complete transformer neural network implemented entirely in HyperTalk, trained on a Macintosh SE/30.

MacMind is a 1,216-parameter single-layer single-head transformer that learns the bit-reversal permutation -- the opening step of the Fast Fourier Transform -- from random examples. Every line of the neural network is written in HyperTalk, a scripting language from 1987 designed for making interactive card stacks, not matrix math. It has token embeddings, positional encoding, self-attention with scaled dot-product scores, cross-entropy loss, full backpropagation, and stochastic gradient descent. No compiled code. No external libraries. No black boxes.

Option-click any button and read the actual math.

MacMind trained to step 1000 on a Macintosh SE/30


The same fundamental process that trained MacMind -- forward pass, loss computation, backward pass, weight update, repeat -- is what trained every large language model that exists today. The difference is scale, not kind. MacMind has 1,216 parameters. GPT-4 has roughly a trillion. The math is identical.

We are at a moment where AI affects nearly everyone but almost nobody understands what it actually does. MacMind is a demonstration that the process is knowable -- that backpropagation and attention are not magic, they are math, and that math does not care whether it is running on a TPU cluster or a 68000 processor from 1987.

Everything is inspectable. Everything is modifiable. Change the learning rate, swap the training task, resize the model -- all from within HyperCard's script editor. This is the engine with the hood up.


The bit-reversal permutation reorders a sequence by reversing the binary representation of each position index. For an 8-element sequence:

Position:    0    1    2    3    4    5    6    7
Binary:     000  001  010  011  100  101  110  111
Reversed:   000  100  010  110  001  101  011  111
Maps to:     0    4    2    6    1    5    3    7

So input [3, 7, 1, 9, 5, 2, 8, 4] becomes [3, 5, 1, 8, 7, 2, 9, 4].

This permutation is the first step of the Fast Fourier Transform, one of the most important algorithms in computing. The model is never told the rule. It discovers the positional pattern purely through self-attention and gradient descent -- the same process, scaled up enormously, that taught larger models to understand language.

After training, the attention map on Card 4 reveals the butterfly routing pattern of the FFT. The model independently discovered the same mathematical structure that Cooley and Tukey published in 1965.


MacMind is a 5-card HyperCard stack:

Card Purpose
1 -- Title Project name and credits
2 -- Training Train the model and watch it learn in real time
3 -- Inference Test the trained model on any 8-digit input
4 -- Attention Map Visualize the 8x8 attention weight matrix
5 -- About Plain-text explanation of what the model is doing

Click Train 10 for 10 training steps, or Train to 100% to train until the model gets a perfect score on a sample. For deeper training, run Train 10 repeatedly or click Train to 100% again -- the model picks up where it left off. For a longer run, open the Message Box (Cmd-M) and type trainN 1000 to train for 1,000 steps straight.

Each step generates a random 8-digit sequence, runs the full forward pass, computes cross-entropy loss, backpropagates gradients through every layer, and updates all 1,216 weights. Progress bars, per-position accuracy, and a training log update in real time.

Note: The training log field has a 30,000 character limit (a HyperCard constraint). After roughly 900 steps the log will fill up and HyperCard will display an error. To clear it and continue, open the Message Box (Cmd-M) and type:

put "" into card field "trainingLog"

Then resume training with trainN 500 (or whatever number of steps you want).

Card 2 -- Blank stack ready to train

After training, click New Random to generate a test input, then Permute to run the trained model. The output row shows the model's predictions and the confidence row shows how sure it is about each position.

To verify the result, apply the bit-reversal permutation by hand. The output should rearrange the input positions in this order:

Output[0] = Input[0]        Output[4] = Input[1]
Output[1] = Input[4]        Output[5] = Input[5]
Output[2] = Input[2]        Output[6] = Input[3]
Output[3] = Input[6]        Output[7] = Input[7]

For example, input [3, 7, 1, 9, 5, 2, 8, 4] should produce [3, 5, 1, 8, 7, 2, 9, 4]. If the model is well-trained, every position will be correct with confidence above 90%.

Card 3 -- Inference with correct prediction and confidence scores

The 8x8 grid visualizes which input positions the model attends to when producing each output position. After training, you should see the butterfly pattern: positions 0, 2, 5, 7 attend to themselves (fixed points of the permutation), while positions 1 and 4 attend to each other, and positions 3 and 6 attend to each other (swap pairs).

This is the same routing structure discovered by Cooley and Tukey in 1965 for the Fast Fourier Transform:

FFT Butterfly Diagram The classic FFT butterfly diagram (public domain). The model discovers this structure independently through attention.

Card 4 -- Attention map showing learned routing pattern


Component Dimensions Parameters
Token embeddings (W_embed) 10 x 16 160
Position embeddings (W_pos) 8 x 16 128
Query projection (W_Q) 16 x 16 256
Key projection (W_K) 16 x 16 256
Value projection (W_V) 16 x 16 256
Output projection (W_out) 16 x 10 160
Total 1,216

Data flow:

Input digits [8]
    |
Token embedding lookup + position embedding --> [8 x 16]
    |
Q, K, V projections --> [8 x 16] each
    |
Attention scores = Q x K^T, scaled by 1/sqrt(16) --> [8 x 8]
    | softmax per row
Attention weights --> [8 x 8]
    |
Context = weights x V --> [8 x 16]
    |
Residual connection: context + embedded input --> [8 x 16]
    |
Output logits = residual x W_out --> [8 x 10]
    | softmax per position
Predictions --> [8 x 10] probability distribution over digits

All weights and activations are stored as comma-delimited numbers in hidden HyperCard fields on Card 2. A 16x16 weight matrix is 256 comma-separated values in a single field. Save the stack, quit, reopen it: the trained model is still there.


Training on Real Hardware

MacMind was trained on a Macintosh SE/30 running System 7.6.1 and has also been tested through Basilisk II on Apple Silicon. HyperTalk is interpreted, and every multiply, every field access, every variable lookup goes through the interpreter. Each training step takes several seconds. Training to convergence (~1,000 steps) takes hours.

The model was left training overnight, grinding through backpropagation one 8 MHz multiply-accumulate at a time. By morning it had learned the permutation.


HyperCard 2.0 or later is required. HyperCard 1.x evaluates arithmetic left-to-right without standard precedence (2 + 3 * 4 = 20 instead of 14), which would silently corrupt every matrix multiplication and gradient computation in the model. HyperCard 2.0 introduced standard mathematical operator precedence. The stack was built and tested with HyperCard 2.1.

HyperCard 2.1 Minimum MacMind Reference
HyperCard 2.0 2.1
System software System 7 System 7.6.1
RAM 1 MB (2 MB recommended) 4 MB
Processor 68000 68030 (Mac SE/30)
Also runs on Mac OS 8, Mac OS 9, Mac OS X Classic Environment (through 10.4 Tiger on PowerPC)

On real vintage hardware, each training step takes several seconds and full training takes hours. On a modern Mac running Basilisk II or SheepShaver, performance is comparable -- HyperTalk interpretation is the bottleneck, not the host CPU.


Quick Start (pre-trained)

  1. Download MacMind-Trained.img from Releases
  2. Open it on your Mac running System 7 through Mac OS 9, or in an emulator (Basilisk II, SheepShaver, Mini vMac)
  3. Double-click the MacMind stack
  4. Navigate to Card 3 (Inference), click New Random, then Permute

Watch It Learn (blank stack)

  1. Download MacMind-Blank.img from Releases
  2. Open it on your Mac or in an emulator
  3. Navigate to Card 2 (Training)
  4. Click Train 10 for short runs, or Train to 100% to train until the model gets a perfect score on a sample. For a longer run, open the Message Box (Cmd-M) and type trainN 1000 to train for 1,000 steps straight. The model picks up where it left off each time.

Validate the Math (Python)

The validate.py script is a Python/NumPy reference implementation of the exact same transformer. It trains on the same task with the same architecture and confirms convergence to 100% accuracy.

pip install numpy
python3 validate.py

MacMind is an original implementation by Sean Lavigne.


Also From Falling Data Zone

AgentBridge -- let AI agents talk to your Classic Mac. A native System 7 / Mac OS 8 / Mac OS 9 application that connects Claude and other AI agents to Classic Mac OS through a simple text-based protocol. Works on real hardware and emulators.

More apps at apps.fallingdata.zone.


MIT. See LICENSE.

联系我们 contact @ memedata.com