Debian bookworm live images now reproducible

geocrasher · 2025-03-26T17:58:34 1743011914

What is the significance of a reproducible build, and how is it different than a normal distribution?

imcritic · 2025-03-26T17:36:08 1743010568

I don't get how someone achieves reproducibility of builds: what about files metadata like creation/modification timestamps? Do they forge them? Or are these data treated as not important enough (like it 2 files with different metadata but identical contents should have the same checksum when hashed)?

o11c · 2025-03-26T17:40:10 1743010810

Timestamps are easiest part - you just set everything according to the chosen epoch.

The hard things involve things like unstable hash orderings, non-sorted filesystem listing, parallel execution, address-space randomization, ...

purkka · 2025-03-26T17:40:28 1743010828

Generally, yes: https://reproducible-builds.org/docs/timestamps/

Since the build is reproducible, it should not matter when it was built. If you want to trace a build back to its source, there are much better ways than a timestamp.

HideousKojima · 2025-03-26T17:38:24 1743010704

Those aren't needed to generate a hash of a file. And that metadata isn't part of the file itself (or at least doesn't need to be), it's part of the filesystem or OS

imcritic · 2025-03-26T17:40:25 1743010825

That's an acceptable answer for the simple case when you distribute just a file, but what if your distribution is something more complex, like an archive with some sub-archives? Metadata in the internal files will affect the checksum of the resulting archive.

c0l0 · 2025-03-26T17:38:10 1743010690

c0l0 · 2025-03-26T17:43:32 1743011012

I never really understood the hype around reproducible builds. It seems to mostly be a vehicle to enable tivoization[0] while keeping users sufficiently calm. With reproducible buiilds, a vendor can prove to users that they did build $binary from $someopensourceproject, and then digitally sign the result so that it - and only it - would load and execute on the vendor-provided and/or vendor-controlled platform. But that still kills effective software freedom as long as I, the user, cannot do the same thing with my own build (whether it is unmodified or not) of $someopensourceproject.

Therefore, I side with Tavis Ormandy on this debate: https://web.archive.org/web/20210616083816/https://blog.cmpx...

[0]: https://en.wikipedia.org/wiki/Tivoization

klysm · 2025-03-26T17:45:30 1743011130

One of the big advantages from my perspective is you can cache a lot more effectively throughout the build process when things are deterministic.

c0l0 · 2025-03-26T17:54:16 1743011656

To achieve that it is enough to hash inputs, and cache resulting outputs. Repeating a build from scratch with an emtpy cache would not necessarily have to yield the same hashes all they way down to the last artifact, but that's actually a simplification of the whole process, and not a bad thing per se.

oulipo · 2025-03-26T17:45:54 1743011154

Reproducible builds are important also for: - caching artefacts - ensuring there's no malware somewhere that's been added in the build process

AceJohnny2 · 2025-03-26T17:58:16 1743011896

> ensuring there's no malware somewhere that's been added in the build process

i.e. supply-chain safety

It doesn't entirely resolve Thompson's "Trusting Trust" problem, but it goes a long way.

mjevans · 2025-03-26T17:57:20 1743011840

Auditors can take a copy of the source, reproducibly build it themselves, and thus prove that the binaries someone would like to run match the provided source code.

(评论) (comments)

(评论)
(comments)