(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43484520

这个Hacker News帖子讨论了可重复构建的重要性及其挑战,Debian最近在其“bookworm”镜像中取得的成就突显了这一点。可重复构建确保相同的源代码始终产生完全相同的二进制输出,无论构建环境如何。 用户质疑如何实现可重复性,特别是关于文件元数据(如时间戳)的问题。讨论澄清了时间戳通常被标准化为特定的纪元,并且解决了诸如不稳定的哈希排序和并行执行等其他因素。 一些人认为可重复构建能够通过允许供应商证明签名二进制文件的来源来实现“tivoization”(类似Tivo的控制),而另一些人则强调其优势:提高整个构建过程中的缓存效率,确保在构建过程中不会注入恶意软件,并允许审计人员独立验证二进制文件是否与提供的源代码匹配,从而增强供应链安全性和信任度。这场辩论围绕着这项技术主要服务于用户自由还是供应商控制。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
Debian bookworm live images now reproducible (lwn.net)
59 points by bertman 37 minutes ago | hide | past | favorite | 13 comments










What is the significance of a reproducible build, and how is it different than a normal distribution?


I don't get how someone achieves reproducibility of builds: what about files metadata like creation/modification timestamps? Do they forge them? Or are these data treated as not important enough (like it 2 files with different metadata but identical contents should have the same checksum when hashed)?


Timestamps are easiest part - you just set everything according to the chosen epoch.

The hard things involve things like unstable hash orderings, non-sorted filesystem listing, parallel execution, address-space randomization, ...



Generally, yes: https://reproducible-builds.org/docs/timestamps/

Since the build is reproducible, it should not matter when it was built. If you want to trace a build back to its source, there are much better ways than a timestamp.



Those aren't needed to generate a hash of a file. And that metadata isn't part of the file itself (or at least doesn't need to be), it's part of the filesystem or OS


That's an acceptable answer for the simple case when you distribute just a file, but what if your distribution is something more complex, like an archive with some sub-archives? Metadata in the internal files will affect the checksum of the resulting archive.


Yes.


I never really understood the hype around reproducible builds. It seems to mostly be a vehicle to enable tivoization[0] while keeping users sufficiently calm. With reproducible buiilds, a vendor can prove to users that they did build $binary from $someopensourceproject, and then digitally sign the result so that it - and only it - would load and execute on the vendor-provided and/or vendor-controlled platform. But that still kills effective software freedom as long as I, the user, cannot do the same thing with my own build (whether it is unmodified or not) of $someopensourceproject.

Therefore, I side with Tavis Ormandy on this debate: https://web.archive.org/web/20210616083816/https://blog.cmpx...

[0]: https://en.wikipedia.org/wiki/Tivoization



One of the big advantages from my perspective is you can cache a lot more effectively throughout the build process when things are deterministic.


To achieve that it is enough to hash inputs, and cache resulting outputs. Repeating a build from scratch with an emtpy cache would not necessarily have to yield the same hashes all they way down to the last artifact, but that's actually a simplification of the whole process, and not a bad thing per se.


Reproducible builds are important also for: - caching artefacts - ensuring there's no malware somewhere that's been added in the build process


> ensuring there's no malware somewhere that's been added in the build process

i.e. supply-chain safety

It doesn't entirely resolve Thompson's "Trusting Trust" problem, but it goes a long way.



Auditors can take a copy of the source, reproducibly build it themselves, and thus prove that the binaries someone would like to run match the provided source code.






Join us for AI Startup School this June 16-17 in San Francisco!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com