（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40399987

一位自称在 AMD（以前是 Xilinx）工作的人声称，一位戴着锡箔帽子的同事相信现场可编程门阵列 (FPGA) 中存在一个秘密隐藏的 CPU。这种信念源于这样的假设：这些芯片隐藏着类似高级通用智能 (AGI) 的认知能力。然而，AGI 目前还不存在。相反，FPGA 的编程是在初始化期间通过 i2c 总线检索的。要访问信息，只需监视该总线并拦截整个比特流，无需对 FPGA 本身进行大量修改。此外，用户提出了“kill bit”，这是一种通过无线电信号或密钥禁用目标设备的简单机制，而不是引入复杂性难以想象的复杂隐藏CPU。尽管人们普遍认为，FPGA 使得隐藏恶意组件变得具有挑战性，因为不同设计之间的逻辑结构存在显着差异，使得子图匹配算法不够充分。总体而言，作者对开放硬件和软件协作的未来潜力表示兴奋，特别是在自托管 RISC-V 机器方面，并强调了近几十年来技术的显着进步。

> The chip foundry wouldn't know what the FPGA will be used for, and where the proverbial "privilege bit" will end up being laid out on the chip, which mitigates against Privilege Escalation hardware backdoors. Exposure is limited to DoS attacks being planted into the silicon during FPGA fabrication, which yields a significantly improved level of assurance (i.e., the computer may stop working altogether, but can't betray its owner to an adversary while pretending to operate correctly).

I suppose in theory the FPGA could contain a hidden CPU that has full read/write access to the FPGA program.

Further, if the system becomes popular and more FPGAs need to be produced for the same system or the next generation, then the foundry has additional information and they can make a good guess of where the privilege bit will be. Even simpler, they could program an FPGA with the code and figure it out manually.

I suppose in theory the FPGA could contain a hidden CPU that has full read/write access to the FPGA program.

All of them do at this point. It isn't hidden.

You can't buy a large FPGA without an ARM core in it. The ARM cores all have an opaque signed blob running in EL3 that you can't replace. This isn't a soft core on the fabric; it's dedicated silicon. And it has access to the ICAP (internal configuration access port) on Xilinx devices, and the equivalent on all the other manufacturers.

yea op (the person you're responding to) is wearing a tinfoil hat. source i work for AMD (formerly xilinx). there are about a billion reasons it isn't true (starting with we'd charge you more money for the IP instead of hiding it) and ending with ultrascales are sold to DoD and there's very little room for shenanigans there.

The ARM Core?

It's usually used for anything software you need and the FPGA gets used for any domain specific circuit for signal processing and what not in hardware. The ARM core usually runs a linux or any other OS and is used for communication with other devices. It's just able to run at a higher frequency than most soft-cores, because it's a fully integrated and optimized circuit. So in theory that's a nice thing and gives you more space for custom circuits on the programmable logic part.

PS: The latest polarfire chips from Microchip use a RISC-V CPU. Since AMD has announced the microblaze-v (their proprietary soft-core CPU with RISC-V ISA), I assume they soon (tm) will release their zynq range with dedicated RISC-V in near future, too. But even if it's an open instruction set, it's still a closed source CPU

I think backdooring the RAM would be easier. Modern DRAM has lots of complicated features (e.g. link training, targeted refresh, on-die ECC). I don't know exactly how it's implemented, but that's plenty of complexity to provide cover for backdoors.

It should be possible to add something that watches for specific memory access patterns and provides arbitrary read/write capabilities when the correct pattern is detected. This could be used for privilege escalation from untrusted but sandboxed code, e.g. JavaScript. It could work with any CPU architecture or OS, because the arbitrary memory reads could be used to detect the correct place to write.

This would be less effective with DIMMs or other multi-chip memory modules, but RISC-V computers are usually small single-board computers that only have a single DRAM chip.

Yes, and hardware support for encrypted RAM already exists:

https://en.wikipedia.org/wiki/Trusted_execution_environment

However, this will never be perfectly secure against backdoored RAM in a multitasking environment, because the memory access patterns alone leak information. Additionally, I don't think any of these systems support authenticated encryption, which means you could do things like corrupt branch targets and hope to land on a big NOP slide you control.

This sort of thing is analogous to the "Thompson hack" [1], where a malicious compiler has a self-propagating backdoor. It never shows up in the source code, but self-injects into the binaries.

Thompson demonstrated this under controlled conditions. But realistically, the backdoor begins to approach AGI-level cunning to evade attempts at detection. It has to keep functioning and propagating as the hardware and software evolve, while still keeping a profile (size, execution time, etc.) low enough to continue evading detection.

Work like this that rebuilds modern computing on a completely different foundation, would seriously disrupt and complicate the use of this type of backdoor.

https://en.wikipedia.org/wiki/Backdoor_(computing)#Compiler_...

FFS - when a smart person designs something clever it "begins to approach AGI-level cunning"? AGI doesn't exist, at the moment it's purely mythical

I wonder as well whether it wouldn't just be easier to snoop I/O and somehow exfiltrate the data. (This would be completely impractical for dragnet surveillance, of course – but I'm sure if a state actor knew that some organization was using this technique to avoid surveillance, _and_ was using a predictable software setup...)

> I suppose in theory the FPGA could contain a hidden CPU that has full read/write access to the FPGA program.

Even if it did, it would be exceptionally difficult for that CPU to identify which registers/gates on the FPGA were being used to implement which components of the soft CPU. The layout isn't fixed; there's no consistent mapping of hardware LUTs/FFs to synthesized functionality.

Even if the mapping changes, the network (graph of logic gates) will locally be similar. So a subgraph matching algorithm might be all that is needed.

That would you mean you connect your hidden CPU to essentially every wire inside the FPGA. Trivial to detect, and extremely expensive, and probably even impossible considering timing model.

There's no need for such complexity. FPGAs read their programming from an i2c eeprom/flash when they boot, the hidden CPU just has to sniff that bus to get the entire bitstream and know the mapping.

And then you know that mapping. That still means you will need to connect to arbitrary wires. If you have the mapping but you aren't connected to the wire you want to disrupt or sniff then tough luck you can't do anything.

Theoretically what you could do is MITM the bitstream, upload it to a server. Resynthesize, place and route with your sniff wires connected and write that back flash. But now you have to hide a radio, and either force a restart or hope a restart will happen.

For a nation state the most useful thing would be a "kill bit" where you can broadcast some signal or key and disable all your enemy's computers. That's fairly easy to do in an FPGA - the signal would be detected by the serdes block(s) and the kill bit could just kill the power or clock or some other vital part of the chip.

> For a nation state the most useful thing would be a "kill bit" where you can broadcast some signal or key and disable all your enemy's computers.

CNE is generally considered to be far more valuable than CNA. First of all keep in mind that all genuinely sensitive systems are air gapped, so you can't effectively broadcast a signal to them.

Second, CNA is a one-off; after the attack you will typically lose access. CNE access on the other hand can persist for years or even decades, and will be beneficial in both "cold" scenarios for political and economic maneuvers, and closer to a flashpoint. CNA, on the other hand, is usually only relevant when a conflict is turning hot.

No, it's the opposite. The FPGA makes it much harder to hide a trojan in the silicon. If the LUTs were biased, it would be detected fairly quickly. A dedicated circuit with an RF interface would be equally obvious in terms of chip usage and power draw.

It's certainly non-trivial to put a hidden CPU in a FPGA that has full read/write access. The wire configuration inside the FPGA will be different for every design loaded into, hell even for the same design the place and router will do different things. So to what will you connect your hidden CPU?

It’s really quite amazing to login a linux shell on an orangecrab FPGA running a RISV-V softcore, built using an open source toolchain. That was impossible not so long ago! At best you’d have something like Xilinx PetaLinux and all their proprietary junk.

Fun thing is that orangecrab's FPGA is not even a requirement.

A tiny iCE40 LP1K will fit SERV (and even QERV) no prob.

It's amazing how small a fully compliant RISC-V implementation can be.

This is and will be a rallying moment soon for the community, both open hardware and software finally working together! This will be huge by the end of the decade.

Rebuilding the system on itself and validating that the bitfile is the same is nice.

I'm amazed that it could be rebuilt in 512MB (and in "only" 4.5 hours on a ~65MHz CPU.) My experience with yosys (and vivado etc.) is that they seem to want many gigabytes.

> A 65MHz Linux-capable CPU inevitably invokes memories of mid- 1990s Intel 486 and first-generation Pentium processors.

50-65MHz* and 512MB seems comparable to an early 1990s Unix workstation. Arguably better on the RAM side.

*4.5 Mflops on double precision linpack for lowRISC/50MHz

This is very, very cool. I've been thinking for a while that a fully self-hosted RISC-V machine is sorely needed. The biggest limiting factor at the moment actually seems to be finding an FPGA board which has enough RAM on board. The target board here has 512 megabytes, I think - but FPGA toolchains are much happier with several gigabytes to play with.

While I love the idea of self-hosting HW and SW, I can't even imagine the pain of building stuff like GCC on 60Mhz CPU. Not to mention the Rocket CPU is written in Scala. I recently stopped using Gentoo on RockPro64, because the compile times were unbearable, and that's a system orders of magnitude faster than what they want to use.

You can definitely go considerably faster. A lot of these FOSS cores are either outright unoptimized or target ASICs and so end up performing very badly on FPGAs. A well designed core on a modern FPGA (not one of these bottom of the barrel low power Lattice parts) can definitely hit 250+ MHz with a much more powerful microarch. It's neither cheap nor easy which is why we tend not to see it in the hobby space. That, and better FPGAs tend not to have FOSS toolchains and so it doesn't quite meet the libre spirit.

But, yes, even at 250MHz trying to run Chipyard on a softcore would certainly be an exercise in patience :)

People used 50Mhz SPARC systems to do real work, and the peripherals were all a lot slower (10mbps Ethernet, slower SCSI drives) with less and slower RAM. But it might take a week to compile everything you wanted, I agree; of course there is always cross-compiling as well.

> That was before everything became a snap package in a docker image.

A modern app should consist of dozens of of docker images in k8s on remote cloud infrastructure, all running "serverless" microservices in optimized python*, connected via REST* APIs to a javascript front-end and/or electron "desktop" app, with extensive telemetry and analytics subsystems connected to a prometheus/grafana dashboard.

That is ignoring the ML/LLM components, of course.

If all of this is running reliably, and the network isn't broken again, then you may be able to share notepad pages between your laptop and smartphone.

*possibly golang/protobufs if your name happens to be google and if pytorch and tensorflow haven't been invented yet

Oh I believe in theory a 50Mhz CPU is capable of doing almost everything I need, but it just lacks the software optimized for it. I think a week to compile everything is too optimistic.

Old compilers/IDEs like Turbo Pascal or Think C were/are usably fast on single-digit MHz machines and emulators.

And even if the CPU is 50 MHz, modern DRAM and NVMe flash are very fast compared to memory and storage on 1990s (or older) machines.

Older versions of Microsoft Office (etc.) ran about the same on 50 MHz systems as Office 365 runs today.

I did valuable work on a 2 MHz Apple II with a 4 MHz Z80 add-on running CP/M that I used to write the documentation. The documentation part was just as fast forty years ago as it is now but assembling the code was glacially slow. The 6502 macro assembler running on the Apple too forty minutes to assemble code that filled an 8 k EPROM.

I remember when I got CodeWarrior on my PowerMac 6100/60 and suddenly I could answer questions online about weird MacApp problems by making a temporary project with their code and compiling the whole of MacApp in 5 minutes.

Previously that had taken about 2 hours (Quadra with MPW), and I did clean builds only when absolutely necessary.

Truly painful was trying to write large programs in Apple (UCSD) Pascal on a 1 MHz 6502.

At one time many of us dreamed of having a computer that could run as fast as 60MHz. The first computers I used ran around 1MHz. Compilation will take longer on a slower machine, but that really isn't a big deal. If the computer is reliable and the build scripts are correct, you can just let the process run over days or weeks. I've run many tasks in my life that took days or weeks. Cue "compiling": https://xkcd.com/303/

The real problem is debugging. Debugging the process on a slow system can be unpleasant due to long turn-arounds. Historically the solution is to work in stages & be able to restart at different points (so you don't have to do the whole process each time). That would work here too. In this case, there's an additional option: you can debug the scripts on a much faster though less trustworthy system. Then, once it works, you can run it on the slower system.

Wow, I am starting to read all your reading material that you have put up.

It's really what I have always wanted to do and it's more than that because you are using FPGAs. I am from India and I want to help you in any way I can because I also have wanted to go on this journey. It's just amazing I wish you all the blessings.

（评论） (comments)

（评论）
(comments)