花括号:Unix 与 C 语言的演变
Curly braces: An evolution of Unix and C

原始链接: https://thalia.dev/blog/unix-braces/

标志性的 Teletype Model 33 是早期 UNIX 系统主要使用的终端,但受限于 1963 年的 ASCII 标准,它缺乏 `{}` 大括号、`|` 等常用符号,甚至无法输入小写字母。为了绕过这些硬件限制,早期 UNIX 和 C 语言的开发经历了几个创造性的阶段。 在 C 语言最终定型之前,BCPL 及其后续版本 B 语言使用 `$(...$)` 来表示代码块。随着 UNIX 迁移到支持完整 ASCII 字符集的 Teletype Model 37,程序员得以使用现代语法。然而,为了保持与 Model 33 用户的兼容性,UNIX V4(1973 年)实现了一个透明终端驱动程序,将 `\(` 和 `\)` 序列自动转换为 `{` 和 `}`。 这些局限性塑造了 C 语言和 UNIX 的遗产。后来 C89 引入的三字母词(trigraphs,如 `??<`)和 C95 引入的双字母词(digraphs,如 `<%`),即便在 Model 33 被淘汰很久之后,依然为在受限硬件上编写 C 代码提供了标准化方案。归根结底,20 世纪 60 年代的机械限制在计算领域留下了永恒的印记,影响了从对小写字母的偏好到现代 C 语言语法风格设计的方方面面。

Hacker News 上的一场讨论探讨了编程语法的历史演变,重点关注了 1963 年 ASCII 标准和 33 型电传打字机等早期计算标准中缺少花括号(`{}`)及其他符号的情况。 评论者指出,这些硬件局限性影响了早期编程语言的设计。例如,Smalltalk 使用“向上箭头”(`↑`)和“向左箭头”(`←`)分别表示返回和赋值操作,因为这些字符在当时键盘上可用,后来才被插入符(`^`)和冒号加等号(`:=`)等现代约定所取代。另一位用户回忆起在旧语言中曾使用双括号(`[[ ]]`)作为花括号的替代方案,这凸显了在 Unix 和 C 语言的成型时期,有限的字符集如何迫使开发者做出创造性的语法选择。
相关文章

原文

19 May 2026

How were { } curly braces typed with a Teletype Model 33 on UNIX? These characters are especially important for C, but absent on this terminal. I was just asked a similar question and in response, this is a tour of the coevolution of UNIX and C, from this perspective, featuring “hello, world” through the ages.

This work is entirely my own (no AI) and the code samples are my construction. Sources for all inferences are cited.

ASCII 1963

The Teletype Model 33 famously couldn’t write lowercase letters. This teleprinter was designed around the first edition of the ASCII standard, ASA X3.4-1963, which hadn’t yet decided lowercase was worth adding. Some in the committee thought more control characters would be a better use of the limited encoding space. The standard soon evolved into its modern form, but the Model 33 was the first commercial use of ASCII and wildly popular, so its issues stuck.

In addition to missing lowercase, ASCII 1963 and the Model 33 lacked { } curly braces, | vertical bar, ` backtick, and ~ tilde, and they had up arrow instead of ^ caret and left arrow instead of _ underscore.

Trigraphs and digraphs

Curly braces are a prominent part of C syntax, used for blocks. For example:

int main(int argc, char *argv[]) {
    printf("hello, world!\n");
}

To support character sets without these characters, C89 invented trigraphs, so { could be written as ??< and } as ??>:

int main(int argc, char *argv[]) ??<
    printf("hello, world!\n");
??>

The trigraph ??/ for \ backslash can be used at the end of a line to produce a line continuation, which was lexical undefined behavior when within a universal character name. I encountered this case while writing a static analysis, but it was later fixed in C++26 .

Then, C95 introduced nicer-looking digraphs, so { can be written as <% and } as %>:

int main(int argc, char *argv[]) <%
    printf("hello, world!\n");
%>

But, trigraphs were only introduced after the Teletype Model 33 was obsolete. How did they write C code in the early ’70s?

Terminal drivers

Starting in UNIX V4 in November 1973, the teletype driver would translate between \( and { and between \) and }:

main(argc, argv)
char *argv[];
\(
        printf("hello, world!\n");
\)

This support was added sometime between the nsys kernel in August 1973 and the V4 manual in November 1973 . The Utah_v4 kernel (June 1974) and Dennis_v5 kernel (November 1974) have support, but nsys, a pre-release version of V4 before pipes were added back in, does not. The V2 and V3 kernels, which were written in assembly, did not survive, but the nsys kernel matches the V3 manual and the V1 kernel .

UNIX exposes devices through a common byte stream interface and this character translation is transparent to user space programs. Programs use the bytes for ASCII { } and the kernel translates them to \( \) on write to a Teletype Model 33, or in reverse on read.

This escaping evolved out of the need to delete characters sent by a terminal, since teleprinters can’t erase text that’s already been printed on paper. The scheme they used, inherited from Multics, was to process input by lines and interpret # “erase” as deleting the previous character and @ “kill” as clearing the current line. Either character can be escaped with backslash to get the literal character.

For example, this Utah_v4 session writes that program with a Teletype Model 33 and uses @ and # to fix a few mistakes:

% ed hello.c
?
a
main(argc, argv)
char *argv[];
\(
        printf("hallo, welt@    printf("hello #, world!\n");
\)
.
w
63
q
% cc hello.c
% a.out
hello, world!

If you signed in with another terminal, you would see:

% cat hello.c
main(argc, argv)
char *argv[];
{
        printf("hello, world!\n");
}

What about before UNIX V4? From its start in June 1972 , C used only braces. You just needed to use a terminal that could produce braces.

Early C structs

Interestingly, when structs were first added to C in December 1972 , they used parentheses instead of braces! For a time around nsys in August 1973, you could even write structs with either parentheses or braces. It was fully switched to only the modern syntax at the latest by June 1974 . This definition in nsys uses both styles :

struct user {
        int     u_rsav[2];              /* must be first */
        /* ... */
        struct  (
                int     u_ino;
                char    u_name[DIRSIZ];
        ) u_dent;
        /* ... */
} u;    /* u = 140000 */

But that’s just one language feature; blocks still required braces.

B

Before C was B, an interpreted language made by Ken Thompson for UNIX. B had no types—every value was a machine word—, perfect for the PDP-7 that UNIX started on, with 18-bit words.

A descendant of this early B remained in use for the Honeywell 6070, far after UNIX B was replaced with C. This machine has 36-bit words, so four characters fit into a word. The 1973 B language tutorial for the H6070 had the first-ever “hello, world” program, also using curly braces:

main( ) {
 extrn a, b, c;
 putchar(a); putchar(b); putchar(c); putchar('!*n');
}

a 'hell';
b 'o, w';
c 'orld';

B to NB

But this everything-is-a-word strategy fails on the PDP-11, which UNIX very quickly transitioned to. This machine has 16-bit words and 8-bit addressing. Since addresses could then be misaligned, B on the PDP-11 needed a hack where globals that weren’t word-aligned by the linker would be patched at runtime .

So, around May 1972 , Dennis Ritchie added char and [] pointer types to the language, calling it “new B”. Note the use of only [], instead of the later *:

main(argc, argv)
char argv[][]; {
        printf("hello, world!\n");
}

NB to C

He then turned it into a compiler that produces machine code instead of the inefficient threaded code of B, and renamed it to C. The syntax remained the same:

main(argc, argv)
char argv[][];
{
        printf("hello, world!\n");
}

However, the first C compiler in June 1972 had dropped the $( $) escapes for braces from B and support never returned .

But it still retained much of the semantics of B. Functions, arrays, and even labels were indirected via a writable pointer , leading to quirks like reassignable labels :

        goto init;
init:
        ouptr = oubuf;
        init = init1;
init1:

This indirection was removed when structs were added, which created a distinction between pointers and arrays. Pointers are reassignable and arrays are not. As such, * was introduced, at the latest by August 1973 :

main(argc, argv)
char *argv[];
{
        printf("hello, world!\n");
}

With a compiler and structs, C was fast and expressive enough to rewrite the kernel in C, culminating in the release of UNIX V4.

PDP-11 B

Backing up from C to B, we can finally use a Teletype Model 33 again! B on the PDP-11 supported braces, in addition to the following escapes :

  • *0: NUL
  • *e: End of file
  • *(: {
  • *): }
  • *t: Tab
  • **: *
  • *': '
  • *": "
  • *n: Line feed

UNIX was ported to the PDP-11 in February 1971 . Some time between then and the B reference manual in January 1972 , the use of { } curly braces was invented, setting it apart from its predecessors. At the time of the draft mid-1971 manual , they were clearly using the Teletype Model 37, a newer teleprinter that supported braces . If PDP-11 B did not support { } from the start, it gained them very shortly thereafter.

The runtime library resembles later C:

main() $(
        printf("hello, world!*n");
$)

Unfortunately, no B source code survived from this era. However, the compiled PDP-11 B runtime from June 1972 survived and has been disassembled, and a compiler that produces this output has been reconstructed .

PDP-7 B

And before the PDP-11, PDP-7 B also supported $( and $) instead of, or in addition to, braces. But it used two-char printing for the 18-bit word size:

main() $(
   write('he'); write('ll'); write('o,');
   write(' w'); write('or'); write('ld'); write(041012);
  $)

Only two B programs from the PDP-7 era of UNIX have survived, both showing this syntax :

main $(
   auto ch;
   extrn read, write;

   goto loop;
   while (ch != 04)
      $( if (ch > 0100 & ch < 0133)
            ch = ch + 040;
      if (ch==015) goto loop;
      if (ch==014) goto loop;
      if (ch==011)
       $( ch = 040040;
          write(040040);
          write(040040);
          $)
      write(ch);
   loop:
      ch = read()&0177;
      $)
  $)

This syntax is directly borrowed from its predecessor, BCPL.

BCPL to B

B was Ken Thompson’s version of BCPL, simplified to its core, as he so often did. The name too was a contraction of either BCPL or Bon, an unrelated language he created during his Multics days .

An example in the style of the 1967 BCPL manual, reflecting the state of the language at the time B forked from it :

let Start() be
       $( Writech(MONITOR,'h'); Writech(MONITOR,'e'); Writech(MONITOR,'l')
          Writech(MONITOR,'l'); Writech(MONITOR,'o'); Writech(MONITOR,',')
          Writech(MONITOR,' '); Writech(MONITOR,'w'); Writech(MONITOR,'o')
          Writech(MONITOR,'r'); Writech(MONITOR,'l'); Writech(MONITOR,'d')
          Writech(MONITOR,'!'); Writech(MONITOR,'*n')  $)

Although this manual uses mixed letter case and rich symbols, the canonical style was uppercase . The 1967 manual does not specify an entrypoint, so I adapted Start from START in the 1979 BCPL book .

BCPL only gained { and } for blocks later, in imitation of C .

Teletype Model 37

Even before UNIX V4 extended the terminal driver to replace \( and \) for the Teletype Model 33, UNIX programmers had stopped writing B code with $( and $). They had moved on from the Model 33 to the Teletype Model 37, its successor.

The Model 37 was 50% faster and supported the full, modern ASCII character set. They were no longer limited to the ASCII 1963 subset.

It was the most advanced electromechanical teleprinter ever made, i.e., it operates purely mechanically without digital logic, but was soon obsolesced by video terminals.

It had many escape sequences: black and red colors, half-forward and half-reverse line feeds (useful for sub- and superscripts), reverse line feed, horizontal and vertical tab setting, and half- and full-duplex . Last year, Brian Kernighan recounted a humorous use of one of these features in the UNIX group: Robert Morris Sr. sent an email to Joe Ossanna which contained a hundred reverse line feeds, making it suck the long fan-fold paper out the back of the Model 37 and drop it on the floor .

Terminals on UNIX

UNIX gained Teletype Model 37 support early on and it quickly became preferred.

PDP-7 UNIX supported only the Model 33 . But, already before the UNIX V1 manual was finalized, a draft mid-1971 manual implies that many UNIX users were already using the Teletype Model 37 . The V1 kernel supported the Model 37 . login from V2 at the latest to V5 would cycle through speeds and login messages for different terminals, supporting the TermiNet 300 and Teletype Model 37 . It grew in V6, once getty was rewritten in C, to support many more terminals and further in V7, but still supported the Model 37 .

No version of the assembly kernel uses braces in its source code, even V1, once development used the Model 37.

Modern implications

The character set limitations of the Teletype Model 33 have had lasting influence on modern computing.

UNIX uses lowercase almost exclusively. This is still Ken’s writing style .

PDP-7 UNIX sources don’t contain a single underscore (it would have been on the Model 33). Later versions use it sparingly. The core of libc still uses flatcase naming style.

Identifiers in C were very short to fit within the 7 or 8-byte limit. Many such functions are still in libc. Though, that’s due to the assembler—a topic worthy of another post, once I finish my assembler.

Design decisions from 1963 still affect us today, 63 years later!


I collect teletypes and am seeking a Teletype Model 37 . If you have any leads on one, please get in touch! And, I hope to eventually acquire a PDP-11 too.

Appendix: hello, world

All the “hello, world” snippets, together:

// BCPL, circa 1967
let Start() be
       $( Writech(MONITOR,'h'); Writech(MONITOR,'e'); Writech(MONITOR,'l')
          Writech(MONITOR,'l'); Writech(MONITOR,'o'); Writech(MONITOR,',')
          Writech(MONITOR,' '); Writech(MONITOR,'w'); Writech(MONITOR,'o')
          Writech(MONITOR,'r'); Writech(MONITOR,'l'); Writech(MONITOR,'d')
          Writech(MONITOR,'!'); Writech(MONITOR,'*n')  $)

/* PDP-7 B, 1969 */
main() $(
   write('he'); write('ll'); write('o,');
   write(' w'); write('or'); write('ld'); write(041012);
  $)

/* PDP-11 B, 1971 */
main() $(
        printf("hello, world!*n");
$)

/* NB, May 1972 */
main(argc, argv)
char argv[][]; {
        printf("hello, world!\n");
}

/* Early C, June 1972 */
main(argc, argv)
char argv[][];
{
        printf("hello, world!\n");
}

/* C, August 1973 at the latest */
main(argc, argv)
char *argv[];
{
        printf("hello, world!\n");
}

/* C and UNIX V4, November 1973 */
main(argc, argv)
char *argv[];
\(
        printf("hello, world!\n");
\)

/* C89, 1989 */
int main(int argc, char *argv[]) ??<
    printf("hello, world!\n");
??>

/* C95, 1995 */
int main(int argc, char *argv[]) <%
    printf("hello, world!\n");
%>

// C99, 2000
int main(int argc, char *argv[]) <%
    printf("hello, world!\n");
%>

References

联系我们 contact @ memedata.com