The original LZEXE (A.K.A. Kosinski) compressor source code has been released

原始链接: https://clownacy.wordpress.com/2025/05/24/the-original-lzexe-a-k-a-kosinski-compressor-source-code-has-been-released/

The Kosinski compression format, used in Mega Drive Sonic games, is actually LZEXE, a DOS executable compressor from the 80s/90s. Fabrice Bellard, LZEXE's developer, recently released the source code (v0.91) under the MIT license. This includes x86 assembly compression logic and a Pascal frontend. Previously, a Kosinski compressor was created that perfectly matched the Sonic games' data, but it failed to reproduce the Mega CD BIOS's compressed data. This suggests different LZEXE versions (potentially bug-fixed or with new bugs) were used. The released source code may not be the exact version used for the Sonic games, but can be compared and modified. The source code availability marks a significant milestone, granting access to the source code of two out of the four 'KENS' compression formats: Kosinski and Saxman! Unfortunately, the hopes of finding Enigma and Nemesis, the remaining custom Mega Drive formats, are low.

The Hacker News thread discusses the release of the original LZEXE compressor source code by Kosinski, noting Fabrice Bellard's authorship. Commenters praise Bellard as an influential programmer, highlighting his simple yet effective code in projects like QEMU and TCC. One user mentions LZEXE's influence on PKLITE, while others compare Bellard's impact to John Carmack's. The discussion touches upon the widespread use of LZ family compression algorithms in PC BIOSes and the 90s scene of EXE/COM compression and protection tools, with one user recalling their involvement as the author of the chkexe detection tool and the administrator of the exe mailing list.
相关文章

原文

Last year, I discovered that the Kosinski compression format is actually LZEXE, which was used for compressing DOS executables back in the 90s and the late 80s. Its developer catalogues three versions on his website: v0.90, v0.91, and v0.91e. While only binaries of v0.91 and v0.91e can be found on the website, v0.90 can be found mirrored on various other websites.

I got in touch with LZEXE’s developer, Fabrice Bellard, and he was able to release LZEXE’s source code, untouched since 1990! It is released under the terms of the MIT licence, allowing it to be freely used in other projects. To maximise performance, the compression logic was written in x86 assembly, while its frontend was written in Pascal. This particular source code appears to be for v0.91.

Back in 2021, I made my own Kosinski compressor which produced identical data to what could be found in the Mega Drive Sonic games. At the time, I noticed that it did not accurately reproduce the Mega CD BIOS’s compressed Sub-CPU payload data. The inaccuracies were so extensive that it appeared that the BIOS’s data was compressed with a different tool to the Sonic games. Notably, the compressor which was used for the Sonic games suffered from a number of bugs and shortcomings, causing the compressed data to less efficient than it should have been. The Mega CD BIOS developers may have used a different version of the compressor, which lacked these bugs, or which had additional bugs.

With this in mind, the source code which has been released may not be for the exact compressor which was used by the Sonic games, though it could be modified to function identically to it. Since the compression logic was written in assembly, it should be simple enough to disassemble the compressor executables and compare them to the source code. Devon did the heavy-lifting of extracting and unpacking the core logic, which can be found here.

With that, we now have the source code of two of the four ‘KENS’ format compressors – Kosinski and Saxman! Unfortunately, I do not have much hope of ever finding the original compressors for, let alone the source code of, the remaining two formats – Enigma and Nemesis – due to them evidently being custom formats which were designed specifically for the Mega Drive, likely meaning that the compressors and their source code never left the hands of Sega (Enigma encodes plane map data, operating on 16-bit words and specifically acknowledging the separation of bits of the tile’s index from its X/Y flip, palette line, and priority; meanwhile Nemesis encodes tiles, operating on nibbles and bunching data into groups of 32 bytes (8 x 8 4-bit nibbles).

联系我们 contact @ memedata.com