追踪以太滑行(DOS网络)中一个34年的指针错误
Hunting a 34 year old pointer bug in EtherSlip

原始链接: https://www.brutman.com/Adventures_In_Code/EtherSlip_ARP/EtherSlip_ARP.html

## DOS 网络与一个 34 年的漏洞 这篇文章详细描述了在设置 SLIP(串行线路互联网协议)连接——允许通过串行端口进行 TCP/IP 传输——在 DOS PC 和 Linux 之间时发现的一个数十年历史的漏洞。作者使用 EtherSLIP,一个模拟以太网连接的数据包驱动程序,来利用现有的 mTCP 程序。 测试发现了一个“检测到 NULL 赋值”错误,该错误是由 Open Watcom 编译器检测到的堆损坏引起的。调查发现问题在于 EtherSLIP 的 ARP(地址解析协议)处理中的一个错误的内存复制。EtherSLIP 模拟 ARP 响应,但一个编码错误错误地复制了数据,从而破坏了数据段中的内存。 该漏洞源于 ARP 响应创建过程中段寄存器操作位置的错误,导致数据被写入错误的内存位置。由于小型内存模型以及典型的 TCP/IP 协议栈中缺乏 ARP 标头验证等因素,该漏洞被掩盖了。 修复方法是删除错误的段寄存器移动。作者强调了解决编译器警告的重要性,以及旧系统中漏洞的惊人寿命,并赞扬 EtherSLIP 开发人员创建了在 34 年后仍然有用的代码。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 在EtherSlip (DOS 网络) 中追踪一个 34 年的指针错误 (brutman.com) 5 分,由 mbbrutman 1 小时前发布 | 隐藏 | 过去 | 收藏 | 讨论 帮助 考虑申请 YC 2026 年夏季项目!申请截止至 5 月 4 日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式 搜索:
相关文章

原文

[email protected]
Posted: 2026-04-19
Tags: DOS, Networking, Segmented memory is hard

A few weeks ago I was revisiting my instructions for running a SLIP connection between a DOS PC and Linux. If you are not familiar with SLIP it stands for Serial Line Internet Protocol, and it lets you run TCP/IP over a PC serial port. TCP/IP is much faster over Ethernet, but a serial port can work too.

There are several packet drivers for DOS that let you make SLIP connections. One that I use often is "EtherSLIP" which is handy because it emulates an Ethernet packet driver but it is really just SLIP over a serial port. The emulation allows you to use programs designed for Ethernet packet drivers unmodified; otherwise, you'd have to run programs that are designed specifically for SLIP packet drivers. All of the mTCP programs expect Ethernet, but they don't actually know what is happening "under the covers" so any packet driver that emulates Ethernet works too. (Besides EtherSLIP there is also a Token Ring packet driver that emulates Ethernet.) EtherSlip is included in the Crynwr packet driver collection, which covers most classic ISA Ethernet cards.

I used Telnet to do my testing and there was something wrong with my cabling; it was slow and dropping packets like crazy. (It turned out to be a hardware problem.) When Telnet exited it gave me this error message:

*** NULL assignment detected

Well, that doesn't sound good. The compiler I use (Open Watcom 1.9) checks the heap at the end of a program to let you know if there was heap corruption, but this is a different error message. I dug through the PDF documentation and I found an explanation in the "Watcom C/C++ Programmer's Guide." Here is a summary of the problem:

  • Normally it is an error to use a NULL pointer and on a real operating system you will get a signal or an interrupt if you try to read or write using one. 16-bit DOS doesn't have that capability so it is allowed, even if it is an error.
  • While you can't detect reads using a NULL pointer, the compiler has a trick for trying to detect writes using it. The compiler reserves 32 bytes at the start of the data segment and writes a known pattern to it. At the end of the program the compiler checks to see if those 32 bytes have been altered. If they have, then something might have used a NULL pointer to do it. (Nothing else should point into that area.)

If you get the warning message then something clobbered those first 32 bytes in the data segment and you probably have a bug. Even if you don't get the warning message you might have a bug, but this trick can't detect that - a write outside of those 32 bytes will not be detected by this mechanism.

Ok, so here is the situation:

  • This only happens when using EtherSLIP. I've never even seen this error before and I've been using this compiler for 15 years.
  • It seems to be triggered only when I have packets getting lost, which then requires retrying sending those packets.
  • The machine I'm using is an 8088 class machine so I can't use the Open Watcom debugger to catch the code that is causing this.

My first attempt: Lots of if-checks

The compiler run-time is telling me that I'm writing using a NULL pointer, so all I need to do is add some trace points on the suspected path and write a warning if I see a NULL pointer being used. That is simple to do but somewhat tedious as I might have to add a trace point for every pointer I use. But the suspect code path (resending a lost packet) is not that complicated so I started with this approach. Here is a sample of what I did:

void near TcpSocket::resendPacket( TcpBuffer *buf ) {
  if ( buf == NULL ) {
    TRACE_WARN(("Whoops: resendPacket tried to reference a NULL pointer."));
    return;
  }
  TcpPacket_t* packetPtr = &buf->headers;
  ...

I kept running the code and recreating the problem, but I never got my warning message. So I kept adding trace points in my code until I eventually determined that this approach was not working and I would need to try something different.

My second attempt: Detect the corruption earlier

The compiler can detect the corruption, but it only runs the check when the program exits. To get closer to the problem I can do the same check and do it while the program is running, hopefully narrowing down when and where it happens.

To get started I first looked at the compiler source code to see exactly what it was doing, and I found what I needed in bld/clib/startup/a/cstrt086.asm. (I've slightly simplified and reformatted it here for clarity.)

This is where the 32 bytes of reserved storage are defined. (It is allocated as 16 words of 0x0101.)

           assume  ds:DGROUP

           INIT_VAL        equ 0101h
           NUM_VAL         equ 16

_NULL      segment para public 'BEGDATA'
__nullarea label word
           dw      NUM_VAL dup(INIT_VAL)
           public  __nullarea
_NULL      ends

Here is the error message that I was seeing:

;
; miscellaneous code-segment messages
;
NullAssign      db      '*** NULL assignment detected',0

And here is the code that checks the storage for changes at the end of the program:

__exit  proc near
        public  "C",__exit
        push    ax
        mov     dx,DGROUP
        mov     ds,dx
        cld                             ; check lower region for altered values
        lea     di,__nullarea           ; set es:di for scan
        mov     es,dx
        mov     cx,NUM_VAL
        mov     ax,INIT_VAL
        repe    scasw
        pop     ax                      ; restore return code
        je      ok
;
; low memory has been altered
;
        mov     bx,ax                   ; get exit code
        mov     ax,offset NullAssign    ; point to msg
        mov     dx,cs                   ; . . .
        ...

That code defines 32 bytes of 0x01 at the beginning of the data segment, and they can be addressed using the variable name "__nullarea". The bytes are present and initialized before the program starts. At the end of the program the __exit routine will be called and it will check to see that those 32 bytes are still 0x01. If they are not, you will get an error message.

I created a callable function in C that does the same thing:

extern "C" uint8_t _nullarea;
uint8_t *_nullareap = &_nullarea;

bool failed = false;

extern "C" void nullCheck( const char *loc ) {

  // Only generate a trace message the first time it is detected.
  if (failed == true) return;

  int good = true;
  for ( int i=0; i < 32; i++ ) {
    if ( _nullareap[i] != 0x01 ) {
      good = false;
      break;
    }
  }

  if ( good == false ) {
    TRACE_WARN(("Null check failed at %s\n", loc));
    Utils::dumpBytes( Trace_Stream, _nullareap, 32 );
    failed = true;
  }
}

And then, I inserted a call to this code in various places in my Telnet code to try to narrow down where the problem was happening.

Eventually I got to the code that calls the packet driver to send a packet on the wire:

nullCheck("Packet_send_pkt sendattempt");
int86x( Packet_int, &inregs, &outregs, &segregs);
nullCheck("Packet_send_pkt after soft int");

The second call to nullCheck was tripping. So it was not a problem in my Telnet code, it was something in the packet driver which is why my if-checks for NULL pointers never showed anything.

The trace showed me the following:

2026-04-17 16:41:13.76   Nullarea is at 318a:0000
. . .
2026-04-17 16:41:28.59 W Null check failed at Packet_send_pkt after soft int
Buffer address: 318a:0000
01 01 00 02 12 00 56 34 01 01 01 01 01 01 01 01   ......V4........
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01   ................

Interestingly, only six bytes were corrupted and the contents of the six bytes were the same in every trace that I looked at. Sometimes the six bytes would move around but that was probably due to the changes I was making to Telnet to add the code or the trace points.

Knowing that this was only happening on the packet send path I looked for those six bytes in the trace and found this right at the very top:

2026-04-17 16:41:13.22   mTCP telnet Version: Apr 17 2026
2026-04-17 16:41:13.27     PACKETINT=0x60 MAC=00.02.12.00.56.34 MTU=1400
2026-04-17 16:41:13.27     IPADDR=192.168.2.122 NETMASK=255.255.255.255 GATEWAY=192.168.2.121
2026-04-17 16:41:13.33     Debug level: 0xff, DOS Version: 6.00
2026-04-17 16:41:13.33   Tcp: Allocated 1 sockets, MTU is 1400, My MSS is 1360
2026-04-17 16:41:13.33     NAMESERVER=192.168.2.1
2026-04-17 16:41:13.38     DOS Sleep calls enabled: int 0x28:1  int 0x2f,1680:0

So the six bytes of corruption in the _nullarea are the MAC address of the simulated Ethernet device that EtherSLIP is providing. This is a very powerful clue - I now knew to look in EtherSlip on the send path, specifically where it might be copying a MAC address.

Some quick notes on x86 programming

Before we pick apart the packet driver code and expose the bug, we should review some x86 architecture.

Classic x86 architecture has 16 bit registers and a segmented memory model that allows you to address up to 1MB of memory. A segment is a region of memory that starts on a 16 byte boundary (a paragraph); on a classic IBM PC there are 64K possible segments, each spaced 16 bytes apart. Segment values are stored in special registers called segment registers.

To address a single byte of memory you combine a segment register and a 16 bit offset to construct a pointer. The segment defines the start of the memory region (always at a paragraph boundary) and the offset lets you reach any byte in that region, up to the range of the offset. Since the offset is a 16 bit value, that lets you address up to 64KB. To go outside of that 64KB region you have to change the segment register to point at a different paragraph.

Sixteen bit x86 has four segment registers and four corresponding offset registers:

Segment register Offset register
CS (Code segment) IP (Instruction Pointer)
SS (Stack segment) SP (Stack Pointer)
DS (Data Segment) SI (Source Index)
ES Extra Segment) DI (Destination Index)

In addition to these registers, there are four general purpose registers (AX, BX, CX, and DX), another stack offset register (BP), and a flags register.

A full pointer is usually written as the segment register and the offset register, such as "DS:SI" or "ES:DI". If the segment register is not specified it is implied.

Here is a more concrete example of how this mechanism works. Assume we want to look for the second parallel port I/O address in the BIOS data area. Documentation tells us that is located at address 0040:000A, which is in segment and offset notation. We can convert this to a "flat" address by multiplying the segment by 0x10 (as it points to a 16 byte paragraph) and then adding the offset value. 0x0040 * 0x10 + 0x000A = 0x0040A, which is the address when looking at it as a flat 1MB address space.

When writing code to read this address we would set DS to be 0x0040 and SI to 0x000A. We could then use an instruction to read that memory and move the value into a register. While having to combine two registers to address memory is cumbersome, it does allow us to address more than 64KB when using 16 bit registers.

A new segment starts at every paragraph in memory and the offset register can specify a 64K offset, so you have the possibility of two different segment and offset registers pointing at the same physical byte of memory. For example:

003F:0032 points at the same byte of memory as 0040:0022 and 0041:0012

  (0x003F * 0x10) + 0x0032 = 0x00422
  (0x0040 * 0x10) + 0x0022 = 0x00422
  (0x0041 * 0x10) + 0x0012 = 0x00422

That aliasing of memory addresses means you have to be careful when comparing pointers; you can have two pointers that look to be different, but effectively point at the same byte of memory. High level languages like C hide most of this complexity from you, but in assembly language we have to deal with it directly.

And now a look at an ARP packet

We're going to be talking about ARP so we should review the ARP protocol format quickly.

ARP is used to find the hardware address of a machine on the same network segment. Given an IP address, a machine can broadcast a query to the entire local network and ask for the hardware address (the Ethernet MAC) of that machine. If there is a machine on the local network with that IP address, it will respond back directly to the requesting machine.

Here is a diagram of the ARP packet format, including where it sits in an Ethernet frame:

(Source: https://homepages.uc.edu/~thomam/Net1/Packet_Formats/arp.html)

On a SLIP connection ARP is not needed, as SLIP is a point-to-point protocol so there is only ever one machine to talk to. That machine is also logically serving as a gateway, so all traffic has to go through it. But EtherSlip is emulating an Ethernet adapter so it has to do things that are normal and expected on Ethernet, including handling ARP requests.

ARP handling inside the packet driver

The EtherSLIP packet driver is open source software and it is distributed as part of the Crynwr packet driver collection. One of the reasons to love open-source software is that decades later you can look at it and modify it for what you need, or find the bug that you think it might have.

Even with source code available, this was not a pleasant task. The changes to the SLIP driver that added Ethernet emulation were made in 1991 and the comments were sparse. The code is in x86 assembly language and it jumps around quite a bit.

Based on my debugging and traces I started my search for the bug in the code that lets a program send packets to the network. At a high level, the send_pkt routine in the packet driver takes a pointer to a packet to be sent and a length, formats the data for the SLIP protocol, and sends it down the wire. SLIP has its own packet framing requirements so it is not sending the packets verbatim. The packets being sent include an Ethernet header which needs to be removed, as that makes no sense for a SLIP connection.

ARP packets get special handling in this code. If the packet driver code sees an ARP request from the application it doesn't try to send it on the wire. Instead it generates a simulated ARP response just like you were on Ethernet, satisfying the calling application. (That is a feature of EtherSlip as it is emulating an Ethernet packet driver, not a part of SLIP.)

mTCP is somewhat SLIP aware and will generally not send ARP packets when it is on a SLIP connection. It does this by stuffing a fake ARP entry for the other end of the SLIP connection into the ARP cache. This works because the other end of the SLIP connection is also the gateway and all packets go to the gateway, so if the gateway hardware address is known there is never a need to send an ARP request. However, mTCP was sending two different ARP requests even though it was on a SLIP connection:

  • At program startup the program always ARPs its own assigned address to see if another machine responds. If that were to happen it would result in an address collision, and a warning message would be generated.
  • During a retry mTCP will ARP the next hop address before resending the packet.

The ARP request at program startup does not cause problems, as EtherSlip does not generate a simulated ARP response packet when it sees that specific condition. EtherSlip will generate simulated ARP responses for the second case. mTCP could be made more explicitly aware of SLIP connections and never send ARP packets under any condition when it knows SLIP is in use - I will make that improvement. However, everything should work as-is given that EtherSlip simulates ARP responses.

So what went wrong?

Now we get to the good part ...

There was a TCP packet dropped and that packet needed to be resent. Before resending a TCP packet mTCP will send an ARP request to the next hop address, making sure everything is up to date. The packet driver detected the ARP request and went down the path of simulating an ARP response. First it asked mTCP for a receive buffer for the simulated ARP response it was going to create. Then it started creating the simulated ARP response in that buffer.

Here is the relevant code with some extra comments to annotate it:

; Brutman: On entry to this section of code ...
;   ds:si is the packet to be sent, now back to pointing at the Ethernet header.
;      bx has a copy of si
;   cx is the packet length, including the Ethernet header.
;   es:di is the user buffer where we will copy the fake ARP response to.
;      dx has a copy of di
;
; Set up ARP Reply by first copying the ARP Request packet.
        rep     movsb
;
; Skip Ethernet header
        add     bx,14
        add     dx,14
;
; Swap target and source protocol addresses from ARP request to ARP
; reply packet.
        push    es              ; mods by Joe Doupnik
        mov     si,ds
        mov     es,si
        mov     si,bx           ; incoming packet interior
        sub     si,2+6          ; walk back to originator's Ethernet address
        mov     cx,6            ; six bytes of Ethernet address
        mov     di,dx           ; outgoing packet
        sub     di,2+6+6        ; Ethernet destination address
        rep     movsb           ; copy originator's address as new dest
        pop     es              ;

So on entry to this code DS:SI points at the ARP request from the program and ES:DI points at the receive buffer for the simulated ARP response. The actual values for all of the registers are in the trace:

The outgoing ARP request buffer is at 318A:3CB4

2026-04-17 16:41:26.51   Packet: Sending 60 bytes, dumping 60
Buffer address: 318a:3cb4
FF FF FF FF FF FF 00 02 12 00 56 34 08 06 00 01   ..........V4....
08 00 06 04 00 01 00 02 12 00 56 34 C0 A8 02 7A   ..........V4...z
FF FF FF FF FF FF C0 A8 02 79 6D 54 43 50 20 62   .........ymTCP b
79 20 4D 20 42 72 75 74 6D 61 6E 00               y M Brutman.

The receive buffer address given to the packet driver for the simulated ARP response is at 489C:0002. (This appears in the trace when the simulated ARP response is given to Telnet.) Note the very different segment registers and offsets; these buffers are not close to each other in memory.

; Set up ARP Reply by first copying the ARP Request packet.
        rep     movsb

The REP MOVSB instruction copies the request into the response byte-for-byte, starting at the Ethernet header. It uses DS:SI as the source and ES:DI as the target pointers.

; Skip Ethernet header
        add     bx,14   ; These numbers are decimal
        add     dx,14

After the request is copied into the response buffer the code will move some fields around and fix other fields to make it look like an ARP response. It starts by advancing BX and DX past the Ethernet headers in the ARP request and response respectively. (BX and DX here are being used as offsets into segments, just as SI and DI are.) That is done with the two ADD instructions. So now we have the following in the registers:

Source buffer: DS=318A, SI=3CF0
Target buffer: ES=489C, DI=003E
BX (offset into the source buffer) = 3CC2
DX (offset into the target buffer) = 0010

SI and DI have both advanced 0x3C bytes; this is a side effect of the REP MOVSB instruction. But that is ok as we can compute their original values by subtracting 0x3C or using their copies in BX and DX, which have been modified to point past the Ethernet header.

Next we have this sequence of instructions:

        push    es              ; mods by Joe Doupnik
        mov     si,ds
        mov     es,si

That saves ES onto the stack so it can be restored later and copies DS into ES, using SI as an intermediate storage location. (x86 doesn't let you copy a segment register directly into another segment register.)

        mov     si,bx           ; incoming packet interior
        sub     si,2+6          ; walk back to originator's Ethernet address

These two instructions then move BX (currently pointing at the start of the ARP request header) into SI, then move SI backward 8 bytes so now SI is an offset to the source MAC address of the Ethernet frame that holds the ARP request.

        mov     cx,6            ; six bytes of Ethernet address
        mov     di,dx           ; outgoing packet
        sub     di,2+6+6        ; Ethernet destination address

These instructions do something similar to the simulated ARP response buffer. They take DX (currently pointing at the simulated ARP response header), move it into DI, and then subtract 14 from DI making it point to the beginning of the Ethernet frame for the simulated ARP response. CX is also set to 6, setting up the next instruction to copy six bytes.

Here are the registers at the time of the copy:

Source buffer: DS=318A, SI=3CBA
Target buffer: ES=318A, DI=0002
BX (offset into the source buffer) = 3CC2
DX (offset into the target buffer) = 0002

And here comes the copy instruction:

        rep     movsb           ; copy originator's address as new dest

Hopefully you see the mistake - DS was copied into ES before the copy, so while the source pointer (DS:SI) is correct and pointing into the ARP request the destination pointer (ES:DI) is bogus and it is not pointing into the simulated ARP response buffer. It has the wrong segment value in register ES, which by accident it is pointing two bytes into the start of the data segment right where the compiler is looking for writes through a NULL pointer. This wasn't a write using a NULL pointer, but the effect is the same - data corruption where it doesn't belong.

If we look at the simulated ARP response we can see that the source MAC address from the ARP request was not copied into the simulated ARP response:

2026-04-17 16:41:26.89   Packet: Received 60 bytes, dumping 60
Buffer address: 489c:0002
FF FF FF FF FF FF 00 02 12 00 56 34 08 06 00 01   ..........V4....
08 00 06 04 00 02 00 00 00 22 34 66 C0 A8 02 79   ........."4f...y
00 02 12 00 56 34 C0 A8 02 7A 6D 54 43 50 20 62   ....V4...zmTCP b
79 20 4D 20 42 72 75 74 6D 61 6E 00               y M Brutman.

The bolded bytes are where that REP MOVSB should have copied data to. Everything else is manipulated to look like an ARP response, as expected.

This is just a bug - you can't mix and match segments and offsets like this and expect to get a good pointer. Depending on the compiler and how the calling program is written, that bug could have corrupted something more critical, possibly causing a crash or data corruption. It was just lucky that in this instance the errant MAC address landed in a place that was being monitored by the compiler runtime.

Fixing the bug

Before fixing the bug it is important to look at it in context and see what might have caused the bug. The rest of the code continues to restore ES to the correct segment value, and then swaps fields around using similar offset manipulations and REP MOVSB instructions.

There are at least two possible good fixes:

  • Remove the incorrect move of DS into ES, and continue using the ARP request buffer as the source of data.
  • Since you have already copied all of the ARP request into the simulated ARP response, all of the changes could have been made with DS:SI and ES:DI pointing into the simulated ARP response packet.

I chose the path of least resistance and removed the incorrect move of DS into ES. If one wanted to go further one could optimize the code by not copying the entire ARP request packet into the simulated ARP response, as a lot of that is wasted effort. Then again, this code should not run often so it might not be worth the effort.

How did this bug hide for so long? There are a few factors contributing to its age:

  • If your program uses the small memory model then ES and DS probably have the same value. When that happens, the bug is masked.
  • If the program is not checking the Ethernet header of the ARP response it would not have detected that it was set incorrectly. I doubt that any TCP/IP programs check the Ethernet header in this case, as what they need is actually in the ARP response itself.
  • The impact the bug has depends entirely on where the badly constructed target pointer lands in memory. If the bug hits a non-critical area of memory nobody would notice anything wrong.
  • Open Watcom has the detection code for NULL porinters. A different compiler might not have feature.
  • DOS machines often crash for all sorts of reasons. If EtherSlip was causing problems people may have just been shrugging their shoulders and moving on.

The fixed source code with an executable can be found at EtherSlip_v11.8.zip. The original source code and the full Crynwr packet driver collection can be found at Crynwr.com.

The lesson? Take every warning seriously, and get to the root cause. If enough bugs like this creep into a system you wind up with unexplained behavior and a lot of people shrugging their shoulders saying "It's DOS ...".

To Russell, Denis, Phil, Joe and the other contributors to EtherSlip - thanks for the code, and we're still happily using it 34+ years later!


Created April 19th, 2026
(C)opyright Michael Brutman, mbbrutman at gmail dot com

联系我们 contact @ memedata.com