我做了一个10美分MCU谈话。

我做了一个10美分MCU谈话。
I made a 10¢ MCU Talk

原始链接: https://www.atomic14.com/2025/10/29/CH32V003-talking

该项目展示了在功能强大的CH32V003微控制器上实现令人惊讶的音频播放，该微控制器价格低廉（仅0.10美元），且仅具有16KB的闪存。最初从简单的蜂鸣器生成开始，目标是让MCU“说话”。直接存储音频需要大量的压缩。测试表明，2位ADPCM压缩实现了最佳平衡，将6秒的音频片段（“打开舱门，HAL…”）压缩到12KB以下——可以舒适地与播放代码（1.4KB以下）一起存储。虽然音质较低，但令人惊讶的是可以辨认出来。创建了一个自定义工具，用于将WAV文件转换为这种格式。除了压缩音频外，该项目还探索了使用Talkie库进行LPC语音合成，这是一种来自早期语音合成器（如德州仪器TMS5220）的技术。这允许以极小的占用空间存储短语和单词，但声音听起来非常合成。开发了一个AI辅助的网络应用程序，用于从语音录音生成LPC数据。最终，该项目展示了巧妙压缩和高效算法的力量，从而能够在极其有限的硬件上实现语音和音频播放。代码可在GitHub上获取。

## 10美分MCU项目总结一位开发者使用价格低廉的CH32微控制器（约10美分）创建了一个语音生成系统。该项目在atomic14.com上详细介绍，探讨了音频压缩和合成技术，旨在从有限的硬件中产生可理解的语音。 Hacker News上的讨论强调了CH32系列的可负担性和令人惊讶的功能，包括集成的USB和以太网。用户讨论了对更好引脚规划工具的需求（与STM32的STM32CubeIDE相比）以及OpenWCH项目的许可清晰度。对话还涉及查看KiCad PCB设计（kicanvas.org）的工具，以及替代音频编解码器，如LPC10和NES的1位DPCM。一个关键收获是希望能够方便地获取原理图——理想情况下采用标准格式，如PDF——而不是依赖Git仓库和专用软件。该项目引发了对类似微控制器音频项目的兴趣，例如为同一芯片系列设计的MOD播放器。

原文

TLDR: Yes, you can fit about 7 seconds of audio into 16K of flash and still have room for code. And you can even play LPC encoded audio on a 10 cent MCU.

There’s quite a lot more detail in this video (and of course you can hear the audio!).

In the previous project, I had this ultra-cheap CH32V003 microcontroller playing simple tunes on a tiny SMD buzzer. It was just toggling a GPIO pin at musical note frequencies – 1-bit audio output – and it sounded surprisingly decent. That was a fun start, but now it’s time to push this little $0.10 MCU even further: can we make it actually talk?

CH32V003

Spoiler: Yes, we can! (well, there wouldn’t be much of a blog post if we couldn’t) This 8-pin RISC-V chip is now producing sampled audio data and spoken words. We’re really stretching the limits of what you can fit in 16 KB of flash.

16K Flash, 2K RAM

From Beeps to Actual Audio

Moving from simple beeps to real audio meant using the microcontroller’s PWM output as a rudimentary DAC. Instead of just on/off beeping, I’m driving a waveform at an 8 kHz sample rate using a high-frequency PWM on the output pin. The hardware is the same tiny board as before – but I’ve swapped the small SMD buzzer for a small speaker. The buzer works too, but it’s quieter and very tinny.

New Speaker

The sample I wanted to test with is just over 6 seconds in length - it’s the iconic “Open the pod bay doors HAL…” sequence from 2001.

Open the pod bay doors

If we keep this audio at 16-bit PCM, 8kHZ, we’d need about 96KB – way beyond our 16 KB flash! And remember, that 16 KB has to hold both the audio data and our playback code. Clearly some aggressive compression is required.

Format	Sample Rate	Bits/Sample	Size	Fits in 16KB?
CD Quality	44.1 kHz	16-bit	529 KB	❌ 33× too big!
Phone Quality	16 kHz	16-bit	192 KB	❌ 12× too big!
Basic PCM	8 kHz	8-bit	48 KB	❌ 3× too big!
4-bit ADPCM (IMA)	8 kHz	4-bit	24 KB	❌ 1.5× too big
QOA (Quite OK Audio)	8 kHz	3.2-bit	19 KB	❌ Still too big!
2-bit ADPCM	8 kHz	2-bit	12 KB	✅ Fits!

I considered a few encoding options for compressing the audio.

8-bit PCM: Simply using 8-bit samples at 8 kHz cuts size in half (to ~47 KB for 6s), but that’s still about 3× too large for our flash.
4-bit ADPCM: Adaptive Differential PCM is a simple lossy compression that could quarter the size. In theory 6 seconds would be ~24 KB – much closer to fitting,
“Quite OK Audio” (QOA): This is nice codec that packs audio into about 3.2 bits per sample (roughly 1/5 the size of 16-bit PCM)
2-bit ADPCM: Going even further with ADPCM, using only 2 bits per sample gives a 4:1 compression relative to 8-bit audio – that’s 75% storage savings.

2-bit ADPCM is definitely the winner here. Our 6-second clip shrinks to under 12 KB, which comfortably fits in flash with room for code. This looked like the winner, provided the audio quality was acceptable. The decoder for 2-bit ADPCM is also very lightweight (my implementation compiled to under just over 1K of code - 1340 bytes!). It’s definitely low quality - but it actually sounds surprisingly ok.

How does 2-bit ADPCM work?

It’s actually a very simple algorithm. Both the encoder and decoder maintain a predicted signal value and a step size index into a predefined table. Each 2-bit code tells the decoder how to adjust the current prediction and the step size index. In essence, we’re coding the difference between the real audio and our prediction, with only four possible levels (since 2 bits gives 4 values). After each sample, the algorithm adapts: if the prediction error was large, we move to a bigger step size (to allow larger changes); if the error was small, we use a smaller step size for finer resolution. This adaptive step is what makes it ADPCM (Adaptive Differential PCM).

Our codes are as follows:

00 (0): Go down by 1 step - subtract the step size from our current prediction
01 (1): Go up by 1 step - add the step size to our current prediction
10 (2): Go down by 2 steps - subtract the 2 x step size from our current prediction
11 (3): Go up by 2 steps - add the 2 x step size to our current prediction

2-bit ADPCM Compression

Even with this very high level of compression, the predicted waveform manages to track the original audio surprisingly well. The above graph shows a small snippet of the audio: the blue line is the original waveform and the yellow line is the ADPCM decoder’s output.

They’re not identical (and we wouldn’t expect them to be), but the general shape is preserved. When you play it back through the little speaker, it’s recognizable and surprisingly good.

To make my life easier, I built a quick conversion tool to encode WAV files into this 2-bit ADPCM format. The tool lets me drag-and-drop a WAV, and it gives you the files with the data that can ve dropped into the firmware code.

2-bit ADPCM Buzzer Studio

LPC Speech Synthesis

Six seconds of audio is cool, but what about longer phrases or even arbitrary speech? Storing anything much longer with raw or ADPCM audio would quickly fill the 16K of flash.

For my second experiment, I tried something different: instead of recorded waveform audio, I used an old-school speech synthesis approach. This leverages the fact that spoken language can be encoded very compactly by modeling the human voice, rather than storing the raw sound. Specifically, I integrated a library called Talkie.

Talkie is a software implementation of the Texas Instruments LPC speech synthesis architecture from the late 1970s. This was implemented in a variety of chips, most commonly the TMS5220 and TMS5100 speech chips.

TMS5220 and TMS5100 Variants

These were used in things like the original Speak & Spell, arcade games like early Star Wars, and speech add-ons for home computers (e.g. the BBC Micro).

Speak and Spell

The Talkie library (originally by Peter Knight, later added to by Adafruit) comes with a big set of examples and vocabulary. It’s also possible to extract examples from old ROMs from arcade games.

Each phrase or word only takes a few hundred bytes or even less, so you can fit quite a lot of speech into a few kilobytes of flash. The trade-off is that the voice has a very computer-esque timbre – think of the Speak & Spell’s voice. It’s clearly synthetic, but still understandable.

To say custom sentences not in the library, you either concatenate the available words/phonemes (which can be clunky), or you need to generate new LPC data. The original tools for this are a bit obscure – there’s BlueWizard (a classic Mac app) and PythonWizard (a command-line tool with TK GUI) which can analyze WAV files and produce LPC data.

I gave both a try with some success (and a few headaches setting them up). In the end, I cheated a bit and used an AI coding assistant to help me create a streamlined online tool for this.

The result is a little web app where I can upload a recording of, say, my own voice, and it outputs the LPC data. It even lets me play back the synthesized voice in-browser to check it.

LPC Encoder

So there we have it – our 10¢ microcontroller now has a voice! By using 2-bit ADPCM compression, we can store short audio clips (up to around 8 seconds) even in 16 KB of flash, and play them back via PWM with decent fidelity.

And with the Talkie LPC speech synthesis, we can make the device “speak” lots of words and phrases with only a tiny memory footprint.

If you want to hear it for yourself, check out the video demo linked at the top of this post. In the video, you’ll see (and hear) the WarGames clip and the Star Wars quotes running on the hardware. It’s honestly amazing what these cheap little MCUs can do. We’re really pushing the boundaries of cheap hardware here.

You can find all my code on GitHub in this repository.