展示 HN:Cuq – Rust GPU 内核的正式验证
Show HN: Cuq – Formal Verification of Rust GPU Kernels

原始链接: https://github.com/neelsomani/cuq

该项目为针对GPU的Rust代码建立了一个形式化验证框架,弥补了确保安全性和正确性的关键差距。目前,虽然Rust越来越多地用于GPU编程(通过Rust-CUDA & rust-gpu),但其GPU子集缺乏形式化语义,并且与底层GPU执行模型(NVIDIA的PTX)没有经过验证的连接。 该研究将Rust的中级中间表示(MIR)的一个子集翻译成Coq证明助手,并将其链接到PTX内存模型的现有形式化描述。这使得能够证明编译的*正确性*——确保Rust的原子和同步操作正确地翻译成PTX,从而保持内存安全。 一个原型工具链可以自动将Rust-CUDA内核翻译成Coq,从而可以验证诸如屏障正确性和无数据竞争等属性。虽然最初的工作侧重于简化的MIR子集,但该框架旨在最终纳入Rust的所有权和借用规则,以实现完整的端到端验证。这项工作为经过验证的GPU编译奠定了基础,类似于CompCert,并为更安全、更可靠的大规模并行Rust程序提供了保障。

## Cuq:Rust GPU内核的形式化验证 - 摘要 一个名为Cuq(现正在改名为Rocq)的新研究原型旨在形式化验证Rust GPU内核的安全性。由nsomani开发,Cuq将Rust的MIR(中间层IR)连接到Coq证明助手,将CUDA内核翻译成Coq语义,以针对PTX内存模型进行验证。目前,它支持基本的内核,包括saxpy和原子标志操作。 该项目的目标不是一个完整的验证编译器,而是朝着确保Rust GPU代码中正确使用原子操作、屏障和内存范围迈出的一步。 然而,该项目启动引发了对其原始名称的大量讨论,原因是意外的双关语以及潜在的不当解读,从而导致了改名工作。尽管存在命名争议,评论员们仍然承认该项目有潜力提高GPU上大规模并行Rust代码的可靠性,而该领域容易出现竞争条件和未定义行为。有人讨论过可能将cuTile作为目标,而不是PTX,以便更轻松地解析,但形式化推理可能会更困难。
相关文章

原文

Rust's rise as a systems language has extended into GPU programming through projects like Rust-CUDA and rust-gpu, which compile Rust kernels to NVIDIA's PTX or SPIR-V backends. Yet despite Rust's strong safety guarantees, there is currently no formal semantics for Rust's GPU subset, nor any verified mapping from Rust's compiler IR to PTX's formally defined execution model.

This project introduces the first framework for formally verifying the semantics of Rust GPU kernels by translating Rust's Mid-level Intermediate Representation (MIR) into Coq and connecting it to the existing Coq formalization of the PTX memory model (Lustig et al., ASPLOS 2019). Rather than modeling Rust's ownership and borrowing rules directly, this work focuses on defining a mechanized operational semantics for a realistic subset of MIR and establishing memory-model soundness: proving that MIR atomic and synchronization operations compile soundly to PTX instructions under the PTX memory model.

Cuq = CUDA + Coq.

  1. No formal semantics for Rust GPU code: Although Rust compilers can emit GPU code via NVVM or SPIR-V, the semantics of such kernels are defined only informally through the compiler's behavior. There is no mechanized model of MIR execution for GPU targets.

  2. Disconnect between high-level Rust and verified GPU models: NVIDIA's PTX memory model has a complete Coq specification, but that model has never been linked to a high-level language. Existing proofs connect only C++ atomics to PTX atomics.

  3. MIR as a verification sweet spot: MIR is a well-typed SSA IR that preserves Rust's structured control flow and side-effect information while stripping away syntax. It provides a precise, implementation-independent level at which to define semantics and translate to Coq.

  1. Define a mechanized semantics for MIR: Implement a Coq formalization of a simplified MIR subset sufficient to express GPU kernels: variable assignment, arithmetic, control flow, memory loads/stores, and synchronization intrinsics.

  2. Translate MIR to Coq: Develop a translation tool that consumes rustc's -Z dump-mir output and produces corresponding Gallina definitions. The translation captures MIR basic blocks, terminators, and memory actions as Coq terms.

  3. Connect to PTX semantics: Use the existing Coq formalization of PTX to define a memory-model correspondence between MIR and PTX traces. The initial goal is to prove soundness in the same sense as Lustig et al. (ASPLOS 2019):

    If a MIR kernel is data-race-free under the MIR memory model, its compiled PTX program admits only executions consistent with the PTX memory model.

  4. Property verification: Leverage this semantics to verify kernel-level properties such as:

    • Absence of divergent barrier synchronization;
    • Preservation of sequential equivalence (e.g., for reductions or scans);
    • Conformance to the PTX consistency model under shared-memory interactions.
  5. Prototype toolchain: Deliver a prototype that automatically translates Rust-CUDA kernels into Coq terms, evaluates their semantics within Coq, and interfaces with PTX proofs.

  • A Coq formalization of Rust MIR semantics for GPU kernels using Rust nightly-2025-03-02.
  • A MIR→PTX memory-model correspondence theorem, establishing soundness of atomic and synchronization operations for a well-defined kernel subset.
  • A prototype translator generating Coq verification artifacts from Rust code.
  • Case studies on standard CUDA benchmarks (e.g., SAXPY, reductions) verifying barrier correctness and dataflow soundness.

While this first phase omits Rust's ownership and lifetime reasoning, the framework is designed to incorporate it later. Future extensions can integrate ownership types or affine resource logics into the MIR semantics, enabling end-to-end proofs of data-race freedom and alias safety.

This project establishes the missing formal bridge between Rust's compiler infrastructure and the only existing mechanized model of GPU execution. By defining verified semantics for MIR and connecting it to PTX, it provides the foundation for future CompCert-style verified compilation of GPU code and opens the door to ownership-aware proofs of safety and correctness for massively parallel Rust programs.

Rebuild the MIR dumps, translate them into Coq, and check the traces/bridges with:

The target performs three steps:

  1. rustc -Z dump-mir=all for examples/saxpy.rs and examples/atomic_flag.rs (writes into mir_dump/).
  2. tools/mir2coq.py parses the PreCodegen.after dumps and regenerates coq/examples/{saxpy,atomic_flag}_gen.v.
  3. make -C coq all type-checks the MIR semantics, the generated programs, and the MIR→PTX translation lemmas.

Afterwards you can inspect coq/examples/*_gen.v and re-run Eval compute queries found in coq/MIRTests.v to see the MIR event traces and their PTX images.

examples/*.rs --rustc -Z dump-mir--> mir_dump/*.mir --tools/mir2coq.py--> coq/examples/*_gen.v
        \                                                                 |
         \--> target/*.ptx (optional)                                     v
           Coq build (MIRSyntax + MIRSemantics + Translate + Soundness) -> PTX event traces
  1. Ensure the Rust nightly and Coq toolchain are available:

    • rustup toolchain install nightly-2025-03-02
    • rustup override set nightly-2025-03-02
    • opam install coq (Coq ≥ 8.18)
  2. In every new shell, activate the Coq switch so coq_makefile is on your PATH:

  3. Run the end-to-end build:

Refer to docs/mapping-table.md for the full table. In short:

  • TyI32/TyU32/TyF32 loads and stores become EvLoad/EvStore in PTX with space_global, relaxed semantics, and the matching mem_ty (MemS32, MemU32, MemF32).
  • Acquire loads and release stores attach sem_acquire/sem_release and CTA scope, mirroring the observed ld.acquire.sys.<ty> and st.release.sys.<ty>.
  • Barriers translate to EvBarrier scope_cta.

The translator (coq/Translate.v) and the docs stay in sync via helper functions mem_ty_of_mir and z_of_val.

  • Global memory only; shared-memory scopes and bank conflicts are out of scope.
  • Non-atomic accesses are relaxed and scope-less; only one acquire/release pair with SYS scope is modelled.
  • Floating-point values are treated as raw IEEE-754 bit patterns (Z payloads); no reasoning about NaNs or rounding edge cases yet.
  • Translator handles a curated subset of MIR (no arbitrary control flow, panic paths, or complex intrinsics).
  1. Extend the translator grammar to cover additional MIR statements (comparisons, guards, simple loops/barriers) while preserving determinism.
  2. Enrich the PTX shim with reads-from / coherence relations from the PTX Coq model.
  3. Prove the remaining per-event lemmas (Load_ok, Store_ok) and lift the translate_trace_shape property toward an end-to-end soundness theorem.
  4. Integrate shared-memory scope tags and CTA-wide fences, then revisit atomics/fences beyond acquire-release.
联系我们 contact @ memedata.com