（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40259185

本文讨论了全系统验证相对于单元测试在确保分布式系统的正确性方面的优势。作者解释说，由于分布式系统的复杂性，编写涵盖所有潜在执行的测试是不切实际的。相反，他们提出了一种使用正式规范的方法来确保实施遵循这些规范。他们分享了构建名为 Anvil 的框架的经验，该框架使用 Verus 进行全系统验证。 Anvil 允许用户指定其分布式系统的顶级属性，从而实现自动检查以确保实施符合这些规范。虽然传统测试对于检查已验证代码和未验证代码之间的边界假设至关重要，但作者认为，全系统验证提供了处理故障场景、异步操作和所需状态更改的卓越功能。尽管存在工具限制等挑战，但他们预计未来将广泛采用全系统验证。确定性模拟被认为更容易理解分布式系统行为，但验证仍然提供了有关正确性的更广泛的保证。工具改进（例如更轻量级的基于属性的语法）可能会增强其吸引力和易用性。

We've used Verus to write formally verified Kubernetes controllers.

Basically, we can prove liveness properties of the form "eventually, the controller will reconcile the cluster to the requested desired state". As you can imagine, there is a lot of subtlety and nuance to even specifying correctness here (think rapid changes to the desired state requirement, asynchrony, failures and what not).

Code: https://github.com/vmware-research/verifiable-controllers/, with a corresponding paper due to appear at OSDI 2024.

Full-system verification is more powerful than unit tests. You can prove an implementation of a distributed system is free of entire classes of bugs, modulo your specification.

The reason is that there is simply no way to practically write tests to cover enough out of all possible executions of a distributed system. Think arbitrary interleavings of message arrivals, failures, restarts and events affecting every single entity in the distributed system.

We built a framework on top of Verus (called Anvil) that allowed us to write a top-level specifications of the form "If the ZooKeeper operator says it can reconcile a ZK cluster according to a 'spec', it will eventually bring up a ZK instance running on spec.num_replicas pods, managed by a stateful set etc.". We can then make sure the implementation of our ZK operator, repeatedly trying to executing a control loop to reconcile the current and desired states of the ZK instance it is managing, delivers on that specification. Using verification here allows us to be sure that our ZK operator is correct even in the face of failures, asynchrony, series of edits to the required desired state (think repeatedly asking the operator to switch between 3 and 10 replicas), and more. This stuff is honestly impossible to write tests for (and unsurprisingly, no open-source Kubernetes operator we've seen tests for such cases extensively).

That said, we still need to use traditional testing to be sure our assumptions about the boundary between verified and unverified code is correct (e.g., think of the Kubernetes API itself, assumptions about built-in controllers etc).

The precursor to our use of verification in this context was, in fact, a system we built to do automatic reliability testing: https://www.usenix.org/system/files/osdi22-sun.pdf -- even that work found a lot of safety and liveness bugs, but I feel it barely scratched the surface of what we could do with verification.

So while the tooling still has a long way to go, I personally hope full-system verification becomes mainstream someday.

Hmm interesting. Do you think deterministic simulation is a much more accessible approach than verification? Also curious what tooling you think is still missing/not up to par?

I assume by deterministic simulation you mean: use a harness wherein you can deterministically run specific executions of a distributed system (e.g., force specific event interleavings, failures etc?)?

If so, yes it's more accessible. The previous paper I linked (Sieve) used such a technique to reliably execute test plans so any bugs we find can be reproduced. But when it comes to correctness, I still think verification is just much more powerful.

W.r.t tooling, I think there's still quite the usability and knowledge gap to cross. The way I see it, we've had decades to figure out how to engineer complex software systems from simpler parts, with a lot written about software engineering and testing in general.

Full-system verification in contrast is still really young, and breaking up a system into an implementation, specification and proofs currently requires quite some creativity and even awareness of how the verifier works. The Anvil project I mentioned above had a team of systems, distributed computing, reliability and verification experts (including Verus authors) team up to do this.

If you want a small stepping stone toward Versus, you can add Rust debug_assert for preconditions and postconditions; the Rust compiler strips these out of production builds by default.

Example from the Versus tutorial with verification:

    fn octuple(x1: i8) -> (x8: i8)
        requires
            -16 <= x1,
            x1 < 16,
        ensures
            x8 == 8 * x1,
    {
        let x2 = x1 + x1;
        let x4 = x2 + x2;
        x4 + x4
    }

Example using debug_assert with runtime checks:

    fn octuple(x1: i8) -> i8 {
        debug_assert(-16 <= x1);
        debug_assert(x1 < 16);
        let x2 = x1 + x1;
        let x4 = x2 + x2;
        let x8 = x4 + x4;
        debug_assert(x8 == 8 * x1);
        x8
    }

One issue with the existing Verus syntax is that it requires wrapping the whole code in a proc-macro. Other Rust proof/verification/DbC tools such as Creusot use attribute-based syntax instead, which generally seems to be a bit more lightweight and idiomatic for Rust. Hopefully it will become available in future Verus releases.

and you can use MIRAI, that takes the contracts defined by this crate and check them at compile time

or, use prusti, that has contracts with a similar syntax checked at compile time as well

Your Versus example is how I write my Clojure code, with post and pre conditions on most of the functions. And the JVM has a flag that makes it easy to strip that out for production builds.

Noob question from someone with little real CS experience, when the README for this project says:

> verifying the correctness of code

What is the difference between "verifying" the correctness of code, as they say here, vs "proving" the correctness of code, as I sometimes see said elsewhere?

Also, is there a good learning resource on "proving" things about code for working programmers without a strong CS / math background?

Edit: I'm also very curious why "zero knowledge" proofs are so significant, and why this is so relevant. Eg I heard people talking about this and don't really understand why it's so cool: x.com/ZorpZK

A very good resource for both verifying code and functional programming is Software Foundations (https://softwarefoundations.cis.upenn.edu).

One note though: Verus and the tool Software Foundations works with (Coq) take different approaches to proving things.

Verus attempts to prove properties automatically using something called an SMT solver, which is an automated system for solving constraints. Coq on the other hand, requires you to manually prove much more, offering a more limited set of automations for proving things.

Both have their advantages and disadvantages, namely that automation is great when it works and annoying when it doesn't.

(Another side note: Zero Knowledge Proofs (ZKPs) are kind of something different. A great many people who work in formal verification/proof don't touch ZKPs at all (ex: me). They are better thought about as a cryptography primitive)

Verifying and proving are used synonymously, as made clear later in the opening paragraph.

As for zero knowledge proofs, there is little practical use, significance, or relevance to them due to the overhead involved and the lack of a "killer app", so to speak. But they're conceptually interesting.

Whenever I hear people talk about the lack of practicality of some mathematical construct, I always remember G H Hardy who worked on number theory at the turn of the century. One of his famous quotes I love is:

> I have never done anything 'useful.' No discovery of mine has made, or is likely to make, directly or indirectly, for good or ill, the least difference to the amenity of the world.

Despite his self-proclaimed focus on pure mathematics, Hardy's work, particularly in number theory, has had profound applications in cryptography and other fields.

I agree about the overhead. The costs have come down significantly already but they still remain a few orders of magnitude too large. That being said, it’s killer app is cloud compute. Right now the only way to amortize the cost of HW is to run it on someone else’s computer, which brings along with it all sorts of security and privacy risks. Well-performing ZK proofs (which we don’t know if it exists / it may be a long time before we figure out how to do it) would let you do cloud computing securely without worrying about vulnerabilities in your cloud provider’s network. Like all cryptography, it’s a double-edged sword because the same techniques would let websites deliver code for your machine to execute that you have no knowledge of what it’s doing.

Do you ever think about the many more folks who worked on stuff that really was completely useless and nobody remembers? "Whenever I hear people talking about how the lottery is a waste of money with net negative expected returns, I think about the person I saw on TV who did win."

Cloud compute might be a good use case. But the overhead would have to come down dramatically for it to ever beat simply doing the compute in-house and incurring the overhead of managing a physical rack. Which is (and always has been, and likely always will be) an extremely viable option for anyone who is seriously interested in data access control.

> But the overhead would have to come down dramatically for it to ever beat simply doing the compute in-house and incurring the overhead of managing a physical rack.

A physical rack needs several physical people to look after it. Imagine it going down in the middle of the night while your sysadmin is on a vacation on Hawaii.

And cloud computing is also not expensive. If you don't have a large load, a t4g.micro instance on AWS costs 0.5 cents per hour ($1.2 per day, $438 per year). It will run most small workloads just fine, and you can easily scale it up with a couple of clicks.

ZKP's are a large load, so this hypothetical is invalid. Also ZKPs require you to already have some amount of in house compute on an in house rack, or else who are you even protecting your data from? So the question is do you provision slightly more compute into the rack you're already managing, or incur orders of magnitude more CPU time expense and a complete refactoring of your business logic to use ZKP's in order to... have slightly fewer servers in your rack? The benefits are hard to even comprehend.

I think you may be overestimating the complexity & costs involved. Hyperscalars run millions of machines and have extracted impressive efficiencies from that that you’re not going to be able to compete with by buying and maintaining your own rack. This starts to become clear when you consider how ancient, outdated and expensive servers for hospital systems are.

The hyperscalers are only efficient when compared to typical corporate IT departments, and what they can buy on the open market. For example, some analysts suggest that AWS EC2 has an operating margin of around 30%. If we take what off-brand cloud providers charge (who compete mostly on price) and compare that to the EC2 pricing, AWS must have some tremendous inefficiencies if they only have 30% margins.

Even when considering network bandwidth (where hyperscalers are famously expensive, and orders of magnitude above wholesale pricing), hyperscalers still be might be the cheaper option for larger public companies because of the structural issues they face internally (e.g., the inability to make it official that they are already running an ISP anyway, and start buying much cheaper carrier services from other ISPs).

> The hyperscalers are only efficient when compared to typical corporate IT departments, and what they can buy on the open market

As compared to what? Hypothetical IT departments staffed by the best engineers in the world?

> If we take what off-brand cloud providers charge (who compete mostly on price) and compare that to the EC2 pricing, AWS must have some tremendous inefficiencies if they only have 30% margins.

You're committing the cardinal sin of comparing list price. B2B sales heavily discounts the list price so you don't actually know what large companies are paying for vs what the discount providers would charge. I promise you the discrepancy isn't that wide.

Off-brand providers are usually much closer between list price & B2B pricing because their product offerings are so narrow that they can't offset losses in one with gains in another.

You also have to remember that hyperscalars need to charge a profit to reinvest those profits back into R&D. Corporate IT departments don't do that because it's a cost center. So even if you hired a super duper amazing team and built something that meets the needs today, over time the bean counters at your company cause you to underinvest what you actually needed (or you make a strategic mistake) and now you're having to spend significant resources to make up that shortage. Public cloud scaling is a way to mitigate that cost.

There's also the knock on effect of employee salaries - public cloud means more available (i.e. cheaper) expertise. A privately managed cloud means you have higher salaries if you're competing competently.

I've talked to people who run and run on private clouds (e.g. banks who are regulatory banned from it) and while it's impressive they really struggle to keep up with business needs in ways that you don't with public cloud. Yes, it's cheaper in some aspects & more customizable, but scaling limits have to be actively negotiated constantly - businesses gladly pay extra to not have to have that kind of internal negotiation.

> But the overhead would have to come down dramatically for it to ever beat simply doing the compute in-house and incurring the overhead of managing a physical rack.

Out of sheer curiosity, by which metric deploying the pre-packaged code (which takes seconds) into the cloud incurs a higher overhead compared to maintaining an in-house physical rack?

When the code has to use ZKP's, there is an intrinsic overhead measured in orders of magnitude of CPU time. By avoiding the use of third parties, that overhead is eliminated.

So the question is, does this hypothetical entity with very precious data incur the extra orders of magnitude of compute time in oder to not maintain a rack in house, or do they just... maintain a rack. Keeping in mind, of course, that they already must maintain a rack in order for ZKP's to make any sense at all. So... do they add a server to their existing rack to handle the workload, or refactor their entire computation to use ZKP's, and incur several extra orders of magnitude of compute time, and outsource the work to a cloud service?

You’re making the assumption that the entity with the data, and the entity that wants to process that data are the same entity.

An obvious use case for ZKP’s, which is under active research, is allowing medical data from institutions like the NHS to be processed by research organisations and pharmaceutical companies. Allowing pharmaceutical and research organisations to extract useful medical data from large populations, without requiring the handover of personal medical data.

In that situation the owner of the data has a clear incentive to use ZKP’s (they’re required to protect medical data), and so does the data processor (they can extract new inferences from previously inaccessible, but extremely large, datasets). There’s obviously a clear profit motive as well, and given the cost of medical research, the use ZKP’s doesn’t seem unreasonable.

What is an example of a real use case for such where the data owner would rather do orders of magnitude more compute to generate the results in the fancy new ZKP format, compared to hosting a simple API that processes the exact same queries and returns the exact same amount of information as the adversary would get in the ZKP strategy, but using classical means?

The cost of deployment into the cloud is $0 at a scale that is unfathomable to any rack with the overhead being a near zero as well. Initial capital investment for the cloud is $0 as well.

The self-managed rack:

  1. Has to exist somewhere.
  2. Therefore it required the initial capital investment at some point, which is a mid five figure amount or more.
  3. Scales by its weight, height, depth and hardware specifications.
  4. Does not scale beyond (3) – hardware constraints of blades deployed in the rack are very rigid.
  5. Has to have available processing capacity to support the new workload.
  6. Has a habit of running of capacity at the most inconvenient moment requiring, well, a new rack.

A new/extra rack:

  1. Has to be paid for from a budget. The budget may or may not exist when the new rack is required, therefore potentially incurring further, potentially lengthy, delays (i.e. «no money in the budget until the next financial year». Boom.).
  2. Has to be taken through the procurement. Depending on the organisation and its size, procurement can take anywhere in between 3 and 12 months.
  3. Has to be in stock.
  4. Has to be installed, deployed and configured.
  5. Requires technicians and engineers to be available within a designated time slot to complete (4). The technicians and the engineers (all of them or a single/few) may be unavailable due to X, Y or Z.

Bonus points:

  1. The DC may not have enough room / power / cooling / etc. You and your rack are now stuck for an non-deterministic amount time.

  2. Adding a new rack to your rack HA setup requires a new network interconnect due to the network capacity reaching a saturation point. It is called «an expensive network switch». Add further delays, repeat all steps required to procure a new rack, add new/unforeseen delays.

With the above in mind, I fail to see how the overhead of a poorly scalable, self-managed rack is lower compared to a $0, software driven code deployment into the cloud at a scale that is limited by the size of one's wallet.

> The cost of deployment into the cloud is $0 at a scale that is unfathomable to any rack with the overhead being a near zero as well.

Obviously you haven’t seen my org’s cdk repos

Ah yes, the cloud, where network interconnect issues simply do not exist, and extra capacity is always available, and the budget never runs dry, and configuration is not required, and technicians and engineers are always available.

Can I have a puff of whatever it is you smoked to reach this cloud?

Your comment is inflammatory but you're not wrong. Selling the cloud as $0 cost to set up is at best, a fairy tale and at worst, an ad.

If "the cloud" was so simple, there wouldn't be 6-7 figure job positions dedicated to just setting up and managing their complexity and they definitely wouldn't come with certification requirements. Somehow the requirement of having a small team of employees "who can manage our own infra" is a deal-breaker but having a team of AWS Solution Architects is not.

> by which metric deploying the pre-packaged code (which takes seconds) into the cloud incurs a higher overhead compared to maintaining an in-house physical rack?

In terms of time? Probably not much. In terms of cost, we’ll it’s about scale.

I worked on a project for a big financial firm about 8 years ago, to implement a burst-capacity overflow to cloud strategy for their compute needs. About one week a month or even one week a quarter, they would need a few tens of thousands of extra cores.

At this scale the cloud providers were not really able to react or provision fast enough, and required us to reserve the capacity full time. In the end the costs for “buy and manage more racks” internally worked out cheaper.

> […] to implement a burst-capacity overflow to cloud strategy for their compute needs.

8 years ago the cloud did not effectively exist yet. Today the available capacity surpasses the needs of the prevailing majority of cloud customers, with a caveat – see below.

> At this scale the cloud providers were not really able to react or provision fast enough, and required us to reserve the capacity full time.

A few things have happened since then:

  1. Hardware capacity has been ramped up at a very large scale.
  2. Most importantly, advances in the workload distribution and optimisation: the cloud control plane distributes workloads to the available capacity across cloud data centres.

The caveat.

For most demanding customers, cloud providers have a solution called «the cloud appliance». A truck rolls in with the hardware that gets deployed as an extension of your own private cloud within a few days. Deployed software does not notice a difference and gets automatically distributed to the new processing capacity. If that is not enough, another truck rolls in with another «cloud appliance». It is presumed that if one operates at such a scale, they also have the money to pay for it.

> 8 years ago the cloud did not effectively exist yet

Yeah it really did. I’ve been around a while. “Cloud” was becoming a buzzword around the tech industry 16 years ago. 8 years ago we had AWS, IBM, and Microsoft all vying for our business on this project.

> For most demanding customers, cloud providers have a solution called «the cloud appliance». A truck rolls in with the hardware that gets deployed as an extension of your own private cloud within a few days

None of which was as cost effective as adding more capacity to your own data centres, if you’re already running them at scale, because fundamentally someone is profiting from renting out those cloud appliances. If you have the in-house capabilities already, you can cut out the middlemen.

> As for zero knowledge proofs, there is little practical use, significance, or relevance to them due to the overhead involved and the lack of a "killer app", so to speak. But they're conceptually interesting.

This is just plain wrong. There are numerous applications in the blockchain space where the overhead is tolerable because they provide properties (privacy, succinct verification) that other techniques cannot provide.

Their relevance to blockchains has led to massive reductions in the overhead, making it quite plausible that ZKPs will find use in settings where integrity is a must.

For example, you could prove that a particular binary is the resulting of compiling source code that satisfies particular properties, such as memory safety. This would allow you to skip unnecessary safety checks.

For a tool to be of "practical use, significance, or relevance" requires that the work it produces also is of "practical use, significance, or relevance". Blockchains are not.

And the best way to verify a compiled binary is correctly compiled is to do the compilation. That's a trivial amount of work. Constructing a ZKP of the same is certainly more cost intensive.

> Constructing a ZKP of the same is certainly more cost intensive. That is certainly true for the compilation phase (i.e. constructing the ZKP).

But for the verifier, that isn't so obviously true: - Firstly, the verifier might not even have access to the formal proof that the program meets the memory correctness guarantees. It might have access to a binary code, plus a proof that the binary code compiled from a completely memory safe language. The verifier would need zero knowledge of the code (which might be proprietary). - Secondly, proofs aren't necessarily that big (especially for Groth16 circuits), and can be applied recursively (you prove that you had a proof that something was true), and aren't that expensive to verify. If verifying a program once when you download it means it can be flagged as being safe to run without expensive bounds and MMU checks, it could open up new types of more performant CPUs, and potentially save far more than it costs to verify.

> And the best way to verify a compiled binary is correctly compiled is to do the compilation. That's a trivial amount of work. Constructing a ZKP of the same is certainly more cost intensive.

This makes the assumption that you both have the source code and that the compiler is deterministic. With the latter one being not the case for (most?) modern compilers.

> "proving" things about code for working programmers

I'd argue that this is antinomic. Proving things about code isn't something working programmers do yet. I'd say that Hoare logic is a good starting point as it is sometimes taught in introductory CS classes.

Coq has a steep learning curve, especially if you're not familiar with OCaml or similar languages. Maybe Why3 is more beginner friendly https://www.why3.org

Proving vs verifying: could mean the same thing. Proving seems to me as something more interactive in nature, while verifying could be automatized (model checking, SMT-solving of annotated programs).

Working programmers write proofs in a limited sense. Any time you write types you're writing a proof. Maybe it's a stretch to say "const a: int = b;" is a proof, but when you get into higher-order types in TypeScript it's appropriate.

A trivial proof is still a proof.

There's nothing trivial about "const a: int = foo();" though. Compilers disprove the claim by contradiction all the time.

AFAIK, zero-knowledge proofs allow you to prove that you know something without revealing what it is you know. For example, verifying you know a password, but without having to send the password to the server, so a malicious server or MitM wouldn't be able to sniff it.

It also might provide better options for identity verification, i.e. proving you have a certain government-issued ID but without actually leaking the document to the server (for it to be stored "for a maximum of 2 years / 3 years / 6 months / etc." but then leaked in a data breach anyway).

Secure password verification does not require anything close to what we mean when we refer to modern "zero knowledge proofs". A better example would be verifying you know the answer to a SAT problem without sharing the answer. Which is of... limited.. practical application.

ZK proofs could be used to implement secure cloud computing where a total compromise of a vendor’s machines / network wouldn’t in any way compromise the customers running workloads on that network.

Perhaps. But if someone cares that much about data access control, they can very easily manage a rack in house (they probably already do), and do the compute on that. Paying someone else to do orders of magnitude more work than strictly needed when you have private enough data to justify just buying a server (again, you already have one) is a very hard sell. It could maybe, possibly be justified if the workload is highly burstable, but... even then, it's hard to come up with numbers that make sense.

You implied it would be a helpful tool for soling that particular problem, which is to dramatically misrepresent their utility to the parent asking for concrete examples of their relevance.

It's just the analogy I use to understand ZKPs myself, since I haven't done a ton of research into them yet. It's also why I think ZKPs are cool, so it was my response to a comment asking why people think ZKPs are cool.

> What is the difference between "verifying" the correctness of code, as they say here, vs "proving" the correctness of code, as I sometimes see said elsewhere?

In this context it is the same.

> Also, is there a good learning resource on "proving" things about code for working programmers without a strong CS / math background?

I wish. The Dafny docs are pretty good but IMO formal software verification is not really at the point where it is usable for normal programmers like us who don't have a PhD in CS / maths. The examples make it look relatively easy, but you will quickly run into "nope, couldn't prove it" and the answer as to why is some hardcore implementation detail that only the authors would know.

> IMO formal software verification is not really at the point where it is usable for normal programmers like us who don't have a PhD in CS / maths.

I know. Four decades ago I headed a project to build a system amazingly similar to this one, intended for real-time automobile engine control code.[1][2] This new system for Rust looks practical. It seems to be intended for people who need bug-free code. Most verification systems come from people in love with formalism. Those involve too much human labor.

Hints:

- The assertions and invariants need to be part of the language. Not something in comments, and not different syntax. They should be syntax and type checked during compiles, even if the compiler doesn't do much with them.

- It's useful to work off the compiler's intermediate representation rather than the raw source code. Then you're sure the compiler and verifier interpret the syntax the same way.

- SAT solvers aren't powerful enough to do the whole job, and systems like Coq are too manual. You need two provers. A SAT solver is enough to knock off 90% of the proofs automatically. Then, the programmer focuses the problem by adding assertions, until you get the point where you have

    assert(a);
    assert(b);

and just need to prove that a implies b as an abstraction mostly separate from the code. Then you go to the more elaborate prover. We used the Boyer-Moore prover for that. After proving a implies b, that became a theorem/rule the fast prover could use when it matched. So if the same situation came up again in code, the rule would be re-used automatically.

I notice that the examples for this verified Rust system don't seem to include a termination check for loops. You prove that loops terminate by demonstrating that some nonnegative integer expression decreases on each iteration and never goes negative. If you can't prove that easily, the code has no place in mission-critical code.

Microsoft's F* is probably the biggest success in this area.[3]

[1] https://archive.org/details/manualzilla-id-5928072/page/n3/m...

[2] https://github.com/John-Nagle/pasv

[3] https://www.microsoft.com/en-us/research/video/programming-w...

Formal software verification encompasses a wide set of what we refer to as advanced mathematics. The process of taking some thing you want the code to do, correctly abstracting it out into component lemmas, and generating proofs for those lemmas, is itself advanced mathematics. I don't really see a way that this could be simplified.

Formal hardware verification has been much more successful in industry. Are there fundamental differences in the types of problems that they solve? EDA companies have been able to create tools that don't require a PhD in Math to be used effectively.

I think that the explanation for such different levels of success is that economic incentives are different. The cost of a hardware bug is much much higher than the cost of the average software bug; this means that hardware companies are willing to spend a lot of money in getting designs right the first time, versus software companies that know they can always make bug fixes in new versions of their products. Additionally, hardware companies are used to paying millions of dollars in software licenses, which is not common in the software world.

Yeah I agree formal hardware verification is an order of magnitude easier to use.

My guess is it's because the problem is so much simpler. No variables, no loops, no recursion, etc.

Yes, the challenge in any formal software verification is dealing with unbounded inputs and compute durations, as enabled by recursive programs. If the code under analysis is straight-line non-reentrant basic logical blocks, the verification is quite trivial indeed. This is vaguely the case for hardware verification, though of course there are complexities introduced by the physical nature of the embedding.

That would only cover the simplest feed-forward designs: combinational circuits and versions with extra registers added for timing reasons.

As soon as you introduce any kind of feedback in your RTL design it's possible to do any kind of computation that SW can do and therefore proving correctness is difficult.

> My guess is it's because the problem is so much simpler. No variables, no loops, no recursion, etc.

My guess: it's because people actually care about correctness for hardware. It's really expensive to throw out all those fab runs when you notice you messed up...

Is it really that advanced? Do you actually need more logic tools than what was used in the 19th century to formalize real analysis? Things get complicated if you want to put the proofs into the types, but I don't think this is the main approach the industry uses for certifying software for high-integrity applications (in cases there is a requirement for formal verification, which seems to be less and less the case).

I don't think it's the mathematics, it's the tools. If the proof is separate and needs to be kept aligned with the specification and implementation manually, that's not really useful except for niche applications. Integrated approaches (like Verus here, or SPARK in the Ada context) should solve the implementation/proof gap, but such tools are not widely available for languages people actually use.

This is why I think interactive proof assistants (as opposed to "automatic" ones like Dafny), are a better starting point for learning. You're still gonna need to learn some higher level concepts, but you won't have the frustrating experience of the automation just failing and leaving you shrugging.

Software Foundations is great: https://softwarefoundations.cis.upenn.edu

If you stick with it long enough, you'll even build up to Hoare logic which is the underpinning the tools like dafny use to generate the equations they throw to the solver.

I've found the DX of Dafny to be very approachable. The VSCode extension is pretty great, and you get feedback as you type. Also, its ability to give you counter examples in the IDE is really nice.

"Verifying" and "proving" are synonymous in this case. You prove or verify that the system satisfies some specification, i.e., properties. Your normally write the properties yourself, but sometimes the properties are hardcoded (e.g., your code doesn't panic).

You don't need a particularly strong formal background in CS or maths to verify code, but it's a challenging task never the less.

1. The first challenge where people often give up is that you will often need to reason about your code in a different language than the code itself (not necessarily true for Verus that's being discussed here). In that sense it helps a lot if you already know multiple programming languages, ideally using different paradigms. 2. The second challenge is that you will need to be actually precise in describing what your want your program to. This is a lot harder than it sounds; there's no hand-waving, no "works in practice", but even beyond that, it's actually really difficult to express what you want from the program succinctly, and often not even possible or feasible. On the other hand, this is possibly the most practical skill you will currently be able to take away from the verification tools; it absolutely can change the way how you work, how you collect requirements, and how you go about writing code. 3. The third challenge is then actually performing the verification/proofs. This is currently painfully hard; most literature I've seen comes up with a ration between 5-20x between the amount of code you're verifying (in terms of lines of code, say) and the number of "proof lines" you need to write in some form. This may make sense for extremely critical code where a bug can cost tons of money; e.g., aerospace, or huge companies with tons of users - AWS is doing proofs at a fairly large scale, Apple is now doing the same at a smaller scale too. I would generally recommend NOT writing proofs if you can, but using tools that can do some kind of limited or heuristic verification. Which tools exactly I would recommend for a "working programmer" depend on what you're trying to do. For example, if you are writing concurrent or distributed code, I would recommend using something like TLA and the associated TLC tool. For lower-level Rust code (data structures and similar), I'd recommend the Kani verifier over something like Verus. For many other languages, the choices are limited or non-existent.

Zero-knowledge proofs roughly allow you to convince someone that something is true, without leaving them any wiser WHY it's true. Generally this is not what's interesting for software verification, but it's nevertheless extremely cool (because it's super counterintuitive). Most of the excitement indeed comes from blockchains, where these things are used, but it's debatable whether they're really practical. You can use them to transfer crypto around already today, without revealing the identities of the sender or recipient for example. However they are still computationally quite expensive and only work because there's currently not so much volume in crypto.

> What is the difference between "verifying" the > correctness of code, as they say here, vs "proving" > the correctness of code, as I sometimes see said > elsewhere?

There is not much difference, except that verification usually includes identifying a formal logical proposition about the behaviour of the code.

In other words, formally verified code is code that has been proven to meet at least one formal proposition about its behaviour - for example, a proposition about a function f might be: if variable x is greater than 0, then `f(x) = 1`.

There is no such thing as proving something 'correct', you need someone to define what exactly correct means, and then someone proves it meets that definition. So the proving is only a subset of the overall formal verification task.

> Also, is there a good learning resource on "proving" things about code for working > programmers without a strong CS / math background?

Most will be specific to a particular technology and type of verification. There are some courses online that provide a high level overview, e.g. https://anton-trunov.github.io/csclub-coq-course-spring-2021....

If you want to get into specifics, you might need to pick an approach. You could learn a dependently typed language, for example there are some good resources out there on proving things in Agda or Idris. Or perhaps play around with one of the formal verification systems that can be bolted on to C or Rust.

> Edit: I'm also very curious why "zero knowledge" proofs are so significant, and why this > is so relevant. Eg I heard people talking about this and don't really understand why it's > so cool: x.com/ZorpZK

ZK is an exciting area of cryptology because breakthroughs in that area power new applications that people wouldn't have thought possible before. Applications to cryptocurrencies in particular can solve some of their scaling and privacy problems.

For example, one of the biggest problems with cryptocurrencies is that every single transaction ever needs to be recorded in a ledger that is distributed to every node participating in the network. That simply won't scale to microtransactions. Let's say that 1000 people each start with 1 coin, and do 100,000 small transactions averaging 0.001 coins amongst themselves (maybe they bought a coffee, or paid for information on a per-view basis, or whatever). Storing those 100,000 transactions forever will have an ongoing cost for every one of thousands of nodes on the network long after the transaction has happened.

Now that could be solved with centralisation - the parties send their transactions to a trusted company, who maintains balances for each of them without sending transactions to the network, but lets them withdraw their balance to the network if they ever want to. But centralisation is a risk - what if the company betrays their trust?

Zero-knowledge cryptography allows for the parties to roll up the signed transactions into a cryptographic proof saying, given these were the balances at the start of the 100,000 transactions, the person creating the roll-up has access to the signed transactions proving that the balances of each of the 1,000 parties at the end are this. Notably, the proof can be much smaller than the size of the 100,000 transactions. So that enables applications where people work off in 'side chains', and but can merge the side chain back into the main chain by submitting proof about the effects (but not all the detail of) the side chain into the main chain.

Is there a Rust standard yet like there is for C/C++, Common Lisp, and Ada/SPARK2014? Without that, it is a moving target compared to the verification tools that have been developed for say, Ada/SPARK2014. Not to mention the legacy of Ada/SPARK2014 in high-integrity, safety-critical applications from bare metal up.

How does Versus compare to SPARK? Are they of the same general class of verifier? How is Versus different other than a verifier for Rust rather than a verifier for Ada?

Model checkers typically only explore a bounded number of states which is efficient at bug finding and often doesn't require additional annotations in the program.

Automatic (SMT-based) verifiers like Verus, Dafny, F* (and my VCC :) require you to annotate most every function and loop but give you broad guarantees about the correctness of the program.

Tools based on interactive provers (like Coq or Lean) typically require more guidance from the user but can guarantee even more complex properties.

Could someone familiar with Verus comment on the power and expressiveness of Verus vs. Lean4

I understand Verus is an SMT based verification tool, and Lean is both an interactive prover and SMT based tool.

But my understanding in the area of formal verification is limited, and it would be good to get an opinion from someone well versed in formal methods for software.

Lean is similar to Coq. You could

- state and prove things about (e.g.) C code, like the "Software Foundations" book does for Coq. However, nobody seems to be doing this with Lean and the tools are missing.

- write programs in Lean4 and prove things about them. Some people are doing this a tiny little bit.

- formalize pure math and publish papers about that. This is what Lean4 (and Coq) is mostly used for.

The kinds of things Lean/Coq are able to practically state and prove are more general. Maybe you don't need that generality for real-world programs though.

It is valid Rust... but only because it wraps everything in a proc macro.

Creusot does it in a different way using attributes, which IMO is a better approach because it means normal tooling works, though it does have much worse syntax.

This looks really cool. One thing I think would be really useful for people is some instructions / examples of how to add proofs for an existing codebase.

So maybe an example could be a bare-bones gui app with a single textbox, that does an http request to some resource (having data that is unknown at compile-time and potentially untrusted is a very common thing) and fetches an array, which is bubble-sorted and displayed in the box. The bubble sort has some intentional bug (maybe due to some off by one error, the last element remains untouched). There are unit-tests that somehow don't trigger the bug (worrying that your tests are incomplete would be a primary motivator to go for proofs). It could then show how to replace the unit tests with a proof, in the process discovering the bug and fixing it.

The example wouldn't need to go into huge detail about the proof code itself as it is potentially advanced, instead it would focus on the nitty-gritty details of adding the proof, like how the interface between proved mathematical code and non-proved io code works, what command line to run to prove&build, and finally a zip archive with all of that, that you can play around with.

Edit: Actually just reading from stdin and writing to stdout is probably good enough.

You're not wrong, but formal verification is still useful for multiple reasons:

1. Cases where specification is much less complex than implementation, like proving a sorting algorithm - the spec is very simple, forall integer i,j : i result[i]<=result[j] plus the requirement that elements may not be removed or added

2. Ability to eliminate checks for improved performance (not sure if this applies to Rust yet, but it works great with Frama-C).

3. "Unit tests" for entire classes of behavior, not just specific inputs. Even if you cannot write a formal specification for a huge complex protocol, you can incrementally add asserts which cover much more area than simple unit tests.

You make implicit assumptions you had during development explicit through code or comments which doesn't actually effect runtime execution speed since it only runs in debug/compile time.

There's a place for formal verification, usually in places where a bug causes death or significant financial loss.

Interesting! Looks most similar to Creusot. The syntax is definitely nicer but wrapping your entire code in a macro surely is going to upset rust-analyzer?

What exactly do SMT systems "solve" in cases like this?

If I wrote a simple BFS or DFS and enumerated the search space how far would I get.. Is that not what TLA+ does in principle.

I am surprised people prefer having a dependency of something like Z3 at compiler level.

SMT solvers use a decision procedure known as CDCL(T). This uses a SAT solver at the core, which operates only on the core propositional structure of the input formula, and dispatches higher level constructs (e.g. functions, arrays, arithmetic) to specialized theory specific solvers.

This is an extension of the CDCL (conflict driven clause learning) approach to SAT solving, which is a heuristic approach that uses information discovered about the structure of the problem to reduce the search space as it progresses. At a high level:

  1. assign true or false to a random value
  2. propagate all implications
  3. if a conflict is discovered (i.e. a variable is implied to be both true and false):
    1. analyze the implication graph and find the assignments that implied the conflict
    2. Add a new constraint with the negation of the assignment that caused the conflict
    3. backtrack until before the first assignment involved in the conflict was made

The theory specific solvers use a diverse set of decision procedures specialized to their domain. The “Decision Procedures” book is an excellent overview: http://www.decision-procedures.org/

TLA+ has also had an SMT-based backend, Apalache [1], for a few years now. In general, you encode your system model (which would be the Rust functions for Verus, the TLA model for Apalache) and your desired properties into an SMT formula, and you let the solver have a go at it. The deal is that the SMT language is quite expressive, which makes such encodings... not easy, but not impossible. And after you're done with it, you can leverage all the existing solvers that people have built.

While there is a series of "standard" techniques for encoding particular program languages features into SMT (e.g., handling higher-order functions, which SMT solves don't handle natively), the details of how you encode the model/properties are extremely specific to each formalism, and you need to be very careful to ensure that the encoding is sound. You'd need to go and read the relevant papers to see how this is done.

[1]: https://apalache.informal.systems

SAT is an NP-complete problem. Doing an exhaustive search is very time-consuming. An SMT or SAT solver uses heuristics to make that process quicker for practical problems.

It looks like there are some TLA+ implementations that do use SMT solvers under the hood.

They try to find a counter-example to the constraints you have set up, or tell you that no such counter-example exists, in which case your program is correct. The counter-example is in the form of inputs to your program or function.

It looks like the TLA+ Proof System does the same thing, but I believe you can also use TLA+ in "brute force all the states" mode. I haven't actually used it.

（评论） (comments)

（评论）
(comments)