GPascal——来自过去的冲击 (2011)
GPascal – A Blast from the Past (2011)

原始链接: https://www.gammon.com.au/forum/?id=11203

这段文字描述了一个为有限内存环境设计的早期Pascal编译器的内部工作原理。其优化的关键在于标记化。错误信息和关键字由单字节标记表示,在运行时展开以显示。错误信息标记位于特定的十六进制范围(B0-DB)内,以避免与源代码标记冲突。编译器将源代码转换为标记(保留字、标识符等),以便在编译期间进行高效比较。这些标记由十六进制值表示。文本还包含这些消息标记、源标记和P代码的完整列表。P代码是伪机器指令,构成编译代码的中间表示。文档给出了每个指令的十六进制代码及其功能。堆栈操作由`sp`和`sp-1`引用。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 GPascal – 来自过去的冲击 (2011) (gammon.com.au) 4 分,由 archangelod 发布,2 小时前 | 隐藏 | 过去 | 收藏 | 讨论 加入我们 6 月 16-17 日在旧金山举办的 AI 初创公司学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:

原文
Thanks, Fiendish!

To explain the source (http://www.gammon.com.au/GPascal/source/) a bit, a lot of work was put into fitting it into the available memory. One approach I used was to tokenise things like error messages.

Message tokens

This was done by putting bytes with the high-order bit set inside messages, and then expanding them out at display time. This is a table I extracted of the various tokens, in hex, (from PAS1.ASM lines 1754+):


B0 = P-codes
B1 = full
B2 = Constant
B3 = Identifier
B4 = expected
B5 = missing
B6 = Illegal
B7 = Incorrect
B8 = string
BA = compiler
BB = literal
BC = mismatch
BD = Error
BE = zero
BF = source file
C0 = of
C1 = or
C2 = to
C3 = ended at 
C4 = Symbol
C6 = Stack
C7 = Instruction
C8 = table
C9 = Type
CA = list
CC = Number
CD = Line
CE = Gambit
CF = Games
D2 = Version 3.1 Ser# 5001
D3 = Copyright 1983
D4 = <C>ompile
D5 = <S>yntax
D6 = Written by Nick Gammon 
D7 = <Q>uit
D8 = Range
D9 = Parameter
DA = <E>dit,
DB = <

Error messages

The error messages, in decimal, once the tokens are expanded, are (from PAS1.ASM lines 1222+):


 1:  Memory full 
 2:  Constant expected     
 3:  = expected 
 4:  Identifier expected     
 5:  , or : expected  
 6:  bug
 7:  *) expected 
 8:  Incorrect string     
 9:  . expected 
10:  ; expected 
11:  Undeclared Identifier 
12:  Illegal Identifier     
13:  := expected 
14:  literal string of zero length 
15:  compiler limits exceeded 
16:  THEN expected 
17:  ; or END expected  
18:  DO expected 
19:  Incorrect Symbol  
20:  bug   
21:  Use of procedure Identifier in expression 
22:  ) expected 
23:  Illegal factor 
24:  Type mismatch     
25:  BEGIN expected 
26:  "of " expected 
27:  Stack full     
28:  TO or DOWNTO expected  
29:  string literal too big 
30:  Number out of Range  
31:  ( expected 
32:  , expected 
33:  [ expected 
34:  ] expected 
35:  Parameters mismatched
36:  Data Type not recognised 
37:  Symbol table full   
38:  Duplicate Identifier 

Source tokens

When the source was being processed it was turned into "tokens" (eg. numbers, symbols, reserved words, identifieds, etc.).

This made it easy to do comparisons in the compiler proper, because rather than having to do string compares, you simply checked a single byte. The source tokens, in hex, are (from PAS1.ASM line 559+):


81 = get
82 = const
83 = var
84 = array
85 = of
86 = procedure
87 = function
88 = begin
89 = end
8A = or
8B = div
8C = mod
8D = and
8E = shl
8F = shr
90 = not
91 = mem
92 = if
93 = then
94 = else
95 = case
96 = while
97 = do
98 = repeat
99 = until
9A = for
9B = to
9C = downto
9D = write
9E = read
9F = call
A1 = char
A2 = memc
A3 = cursor
A4 = xor
A5 = definesprite
A6 = plot
A7 = getkey
A8 = clear
A9 = address
AA = wait
AB = chr
AC = hex
AD = spritefreeze
AE = close
AF = put
DF = sprite
E0 = positionsprite
E1 = voice
E2 = graphics
E3 = sound
E4 = setclock
E5 = scroll
E6 = spritecollide
E7 = groundcollide
E8 = cursorx
E9 = cursory
EA = clock
EB = paddle
EC = spritex
ED = joystick
EE = spritey
EF = random
F0 = envelope
F1 = scrollx
F2 = scrolly
F3 = spritestatus
F4 = movesprite
F5 = stopsprite
F6 = startsprite
F7 = animatesprite
F8 = abs
F9 = invalid
FA = load
FB = save
FC = open
FD = freezestatus
FE = integer
FF = writeln

Notice that the "message tokens" in the range 0xB0 to 0xDB are not in the list. This is so that the output routine can convert back tokens which are either messages or reserved words without clashes.

This makes the snippet of source in the earlier post more understandable:


                1800 * REPEAT
                1801 *
9734: 20 02 90  1802 REPEAT   JSR  PSHPCODE   
9737: 20 49 80  1803 REP1     JSR  GTOKEN     
973A: 20 63 93  1804          JSR  STMNT      
973D: A5 16     1805          LDA  TOKEN      
973F: C9 3B     1806          CMP  #';'       
9741: F0 F4     1807          BEQ  REP1       
9743: A9 99     1808          LDA  #$99       
9745: A2 0A     1809          LDX  #10        
9747: 20 34 80  1810          JSR  CHKTKN     
974A: 20 40 90  1811          JSR  GETEXPR    
974D: 20 55 80  1812          JSR  PULWRK     
9750: 20 51 90  1813          JSR  WRK:OPND   
9753: A9 3D     1814          LDA  #61        
9755: 4C 88 80  1815          JMP  GENRJMP    

The code calls GTOKEN (get token) and processes a statement. Then it checks if we got a ";" token, and if so, gets another statement. When the statements separated by semicolons run out, it checks for token 0x99 (which is "until" from the above table) and if it doesn't get it outputs error 10 which is "; expected" from the earlier table).

P-codes

This is the meanings of the P-codes (pseudo machine codes):


Code Function   Description
---- ---------- ------------------------------------

00 = LIT     	Load constant
01 = DEF:SPRT	DEFINESPRITE
02 = NEG     	Negate (sp)
03 = HPLOT   	PLOT
04 = ADD     	Add (sp) to (sp - 1)
05 = TOHPLOT 	PLOT (not used)
06 = SUB     	Subtract (sp) from (sp - 1)
07 = GETKEY  	GETKEY
08 = MUL     	Multiply (sp) * (sp - 1)
09 = CLEAR   	CLEAR
0A = DIV     	Divide (sp - 1) / (sp)
0B = MOD     	Modulus (sp - 1) MOD (sp)
0C = ADRNN   	Address of integer
0D = ADRNC   	Address of character
0E = ADRAN   	Address of integer array
0F = ADRAC   	Address of character array
10 = EQL     	Test (sp - 1) == (sp)
11 = FINISHD 	Stop run (end program)
12 = NEQ     	Test (sp - 1) != (sp)
13 = CUR     	Cursor position
14 = LSS     	Test (sp - 1) < (sp)
15 = FREEZE:S 	SPRITEFREEZE
16 = GEQ     	Test (sp - 1) >= (sp)
17 = INH     	Input hex number
18 = GTR     	Test (sp - 1) > (sp)
19 = LEQ     	Test (sp - 1) <= (sp)
1A = ORR     	OR  (sp - 1) | (sp)
1B = AND     	AND (sp - 1) & (sp)
1C = INP     	Input number
1D = INPC    	Input character
1E = OUT     	Output numbher
1F = OUTC    	Output character
20 = EOR     	Not (sp) (logical negate)
21 = OUH     	Output hex number
22 = SHL     	Shift left (sp) bits
23 = OUS     	Output string
24 = SHR     	Shift right (sp) bits
25 = INS     	Input string into array
26 = INC     	Increment (sp) by 1
27 = CLL     	Relative procedure/function call
28 = DEC     	Decrement (sp) by 1
29 = RTN     	Procedure/function return
2A = MOV     	Copy (sp) to (sp + 1)
2B = CLA     	Call absolute address
2C = LOD     	Load integer onto stack
2D = LODC    	Load character onto stack
2E = LDA     	Load absolute address integer
2F = LDAC    	Load absolute address character
30 = LDI     	Load integer indexed
31 = LDIC    	Load character indexed
32 = STO     	Store integer
33 = STOC    	Store character
34 = STA     	Store integer absolute address
35 = STAC    	Store character absolute address
36 = STI     	Store integer indexed
37 = STIC    	Store character indexed
38 = ABSCLL  	Absolute procedure/function call
39 = WAIT    	WAIT
3A = XOR     	XOR (sp - 1) ^ (sp)
3B = INT     	Increment stack pointer
3C = JMP     	Jump unconditionally
3D = JMZ     	Jump if (sp) zero
3E = JM1     	Jump if (sp) not zero
3F = SPRITE  	SPRITE
40 = MVE:SPRT 	POSITIONSPRITE
41 = VOICE   	VOICE
42 = GRAPHICS 	GRAPHICS
43 = SOUND   	SOUND
44 = SET:CLK 	SETCLOCK
45 = SCROLL  	SCROLL
46 = SP:COLL 	SPRITECOLLIDE
47 = BK:COLL 	GROUNDCOLLIDE
48 = CURSORX 	CURSORX
49 = CURSORY 	CURSORY
4A = CLOCK   	CLOCK
4B = PADDLE  	PADDLE
4C = SPRT:X  	SPRITEX
4D = JOY     	JOYSTICK
4E = SPRT:Y  	SPRITEY
4F = OSC3    	RANDOM
50 = VOICE3  	ENVELOPE
51 = SCROLLX 	SCROLLX
52 = SCROLLY 	SCROLLY
53 = SPT:STAT 	SPRITESTSTATUS
54 = MOV:SPT 	MOVESPRITE
55 = STOP:SPT 	STOPSPRITE
56 = STRT:SPT 	STARTSPRITE
57 = ANM:SPT 	ANMINATESPRITE
58 = ABS     	ABS (absolute value of (sp))
59 = INVALID 	INVALID
5A = LOADIT  	LOAD
5B = SAVEIT  	SAVE
5C = X:OPEN  	OPEN
5D = FR:STAT 	FREEZESTATUS
5E = OUTCR   	Output a carriage-return
5F = X:CLOSE 	CLOSE
60 = X:GET   	GET
61 = X:PUT   	PUT

Operations that mention (sp) refer to "whatever value is on the top of the stack", and (sp - 1) is the second value from the top.

Thus for example, when you add, it pulls the stop value from the stack, and then the second top value, adds them, and pushes the result onto the stack.

联系我们 contact @ memedata.com