(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43946824

Hacker News 上的一个帖子讨论了一个关于 386 处理器预取队列的反向工程项目。作者 kens 也参与其中并回答了问题。评论者们表达了对汇编编程的怀旧之情,并建议了未来的芯片分析目标,包括 AMD 29000、Inmos Transputer 和 M68k 系列。一位用户甚至提供了 Transputer 的芯片图。 讨论涉及到反向工程的挑战,特别是随着芯片层数的增加和特征尺寸的缩小。Kens 分享了他去除金属层的经验,并指出了光学显微镜的局限性。该帖子还探讨了电路设计的演变,从简单的实现到优化的技术,例如曼彻斯特进位链。一位用户提到了 386SX 的分析,并质疑其与 286 相比的性能差异。 几位评论者回忆了 80 年代显着的性能飞跃以及 386 对他们编程体验的影响。一位用户指出一个可能在重复发布评论的机器人,Kens 证实了这一点,并怀疑这是为了刷取 Karma。

相关文章
  • (评论) 2025-05-10
  • (评论) 2025-05-10
  • (评论) 2025-05-08
  • (评论) 2025-05-10
  • (评论) 2025-05-10

  • 原文
    Hacker News new | past | comments | ask | show | jobs | submit login
    Reverse engineering the 386 processor's prefetch queue circuitry (righto.com)
    118 points by todsacerdoti 10 hours ago | hide | past | favorite | 39 comments










    man, reading all this makes me wanna bust out some old manuals and mess with assembly again, those jumps in chip power back then always blew my mind


    Author here. I hope you're not tired of the 386... Let me know if you have any questions.


    I'll never tire of any analysis you do. But if you are taking requests, I'd love two chips.

    The AMD 29000 series, a RISC chip with many architectural advances that eventually morphed into the K5.

    And the Inmos Transputer, a Forth like chip with built in scheduling and networking, designed to be networked together into large systems.

    https://en.wikipedia.org/wiki/AMD_Am29000

    https://en.wikipedia.org/wiki/Transputer



    Those would be interesting chips to examine, if I ever get through my current projects :-)


    If you are doing requests, I'd love to see the M68k series analyzed.


    Another vote for the 68000 series :)


    I have some Transputer die and die plots if you ever need those.


    Have you thought about doing these yourself?


    So Epictronics recently looked at the 386SX, the version with the 16bit external bus, which was slower than the 286 at the same clock. What changed between that and this? Was the major difference the double clock hit on fetch? Or did it have a shorter prefetch queue as well like the 8088?


    At what number of layers is it difficult to reverse engineer a processor from die photos? I would think at some point, functionality would be too obscured to able to understand the internal operation.

    Do they ever put a solid metal top layer?



    I've been able to handle the Pentium with 3 metal layers. The trick is that I can remove metal layers to see what is underneath, either chemically or with sanding. Shrinking feature size is a bigger problem since an optical microscope only goes down to about 800 nm.

    I haven't seen any chips with a solid metal top layer, since that wouldn't be very useful. Some chips have thick power and ground distribution on the top layer, so the top is essentially solid. Secure chips often cover the top layer with a wire that goes back and forth, so the wire will break if you try to get underneath for probing.



    Interesting! What is the reason of 800nm limit? I have successfully photographed my own designs down to 130nm with optical microscobes, though not with metal layer removal. The resolution isn't perfect but fearures were clearly visible.


    Never, the 386 is way too important.


    Never!


    Ok, now do 486.


    I'm not as interested in the 486; I went stright to the Pentium: https://www.righto.com/2025/03/pentium-multiplier-adder-reve...


    I totally agree with your methodology. Stick to the classic leaps.


    Fair enough. But why?


    Because I saw a Navajo weaving of a Pentium and wanted to compare the weaving to the real chip: https://www.righto.com/2024/08/pentium-navajo-fairchild-ship...


    That was great. Thank you.

    Too bad (for the Navajo Nation) about the armed standoff and its aftermath.



    I was only joking but I'm glad you have decided to take it seriously.


    I remember reading about naive circuits like ripple-carry, where a signal has to propagate across the whole width of a register before it's valid. These seem like they'd only work in systems with very slow clocks relative to the logic itself.

    In this writeup, something that jumps out at me is the use of the equality bus, and Manchester carry chain, and I'm sure there are more similar tricks to do things quickly.

    When did the transition happen? Or were the shortcuts always used, and the naive implementations exist only in textbooks?



    As I understand it, you can use slower carry propagation techniques in parts of a design that aren't on the timing critical path. Speeding up logic that isn't on the critical path won't speed up your circuit; it just wastes space and power.

    Clock dividers (for example, for PLLs and for generating sampling clocks) commonly use simple ripple carry because nobody is looking at multiple bits at a time.



    Well, the Manchester carry chain dates back to 1959. Even the 6502 uses carry skip too increment the PC. As word sizes became larger and transistors became cheaper, implementations became more complex and optimized. And mainframes have been using these tricks forever.


    When are you going to implement the first electron-level 386 emulator?


    very nice analysis! personally I'm a DEC alpha fan.. but I guess that's a too big endeavor.. (or maybe a selected portion?)


    So many chips, so little time :-)


    May I suggest a video chip? Yamaha V9958

    I hope some day the tedious part of what you do, can be automated (AI?), so that you (or others) can spend their time on whatever aspect is most interesting. Vs all the grunt work needed to get to a point where you understand what you're looking at.

    Btw. any 4 bit cpus/uC's in your collection? Back in the day I had a small databook (OKI, early '90s iirc) that had a bunch of those. These seem to have sort of disappeared (eg. never saw a pdf of that particular databook on sites like Bitsavers).



    https://www.twitch.tv/tubetimeus is currently reversing IBM MCGA chip down to gate level diagram.


    I miss those dramatic performance leaps in the 80s. 10x in 5 years, give or take.

    Now we get like 2x in a decade (single core).



    There was no performance improvement clock for clock between 286 and 386 when running contemporary 16 bit code https://www.vogons.org/viewtopic.php?t=46350


    Well, that's not at all true.

    The 286 in the benchmark was using 60ns Siemens ram, and a 25mhz unit which virtually no one has ever seen in the wild. 286's that people actually bought topped out at 12mhz.

    The 386 in the test was using 70ns ram.

    Lets see them both with 60ns ram.



    I wrote blitters in assembly back in those days for my teenager hobby games. When I could actually target the 386 with its dword moves, it felt blisteringly fast. Maybe the 386 didn't run 286 code much faster but I recall the chip being one of the most mind-blowing target machine upgrades I experienced. Much later I recall the FPU-supported quadword copy in 486dx and of course P6 meeting MMX in Pentium II. Good times.


    You're 100% right that the 386 had a huge amount of changes that were pivotal in the future of x86 and the ability to write good/fast code.

    I think a bigger challenge back then was the lack of software that could take advantage of it. Given the nascent state of the industry, lots of folks wrote for the 'lowest common denominator' and kept it at that (i.e. expense of hardware to test things like changing routines used based on CPU sniffing.)

    And even then of course sometimes folks were lazy. One of my (least) favorite examples of this is the PC 'version' (It's not at all the original) of Mega Man 3. On a 486/33 you had the option of it being almost impossible twitchy fast, or dog slow thanks to turbo button. Or, the fun thing where Turbo Pascal compiled apps could start crapping out if CPU was too fast...

    Sorry, I digress. the 386 was a seemingly small step that was actually a leap forward. Folks just had to catch up.



    As did I :).

    Imagine how it felt going from an 8086 @ 8 MHz to an 80486SX (the cheapo version without FPU) @ 33 MHz. With blazingly fast REP MOVSD over some form of proto local bus Compaq implemented using a Tseng Labs ET4000/W32i vga chip.



    Ok.

    I'm speaking of e.g. the leap between the IBM PC in 1981 and the Compaq 386 five years later.

    Or between that and the 486 another five years later or so.



    [flagged]



    This appears to be a bot reposting comments from an older article on my blog.


    Can we reverse-engineer the purpose of the bot? Just for lulz?


    For the most part, the account posts comments on HN that previously appeared on reddit discussions of the same article (you can check this with Google). My guess is that it's an experiment in karma farming.






    Consider applying for YC's Summer 2025 batch! Applications are open till May 13


    Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



    Search:
    联系我们 contact @ memedata.com