在 AMD GPU 上运行未经修改的 CUDA

在 AMD GPU 上运行未经修改的 CUDA
Run CUDA, unmodified, on AMD GPUs

SCALE 是一个工具包，旨在将 CUDA 应用程序本地编译到 AMD GPU 上，无需对原始应用程序或构建系统进行任何修改。它的功能类似于 NVIDIA 的 CUDA 工具包，同时进行模拟，确保与现有构建工具和脚本（如 cmake）兼容。通过利用相同的命令行选项和 CUDA 方言，SCALE 可以直接替代 nvcc。目前，SCALE 支持 AMD gfx1030 和 AMD gfx1100 显卡，但正在为旧架构开发其他版本。著名的开源 CUDA 项目（包括 TensorFlow 和 PyTorch 等热门项目）均通过 SCALE 的夜间自动化测试进行验证。 SCALE 由兼容 nvcc 的编译器、AMD GPU 的 CUDA 运行时和驱动程序 API 的实现以及提供“CUDA-X”功能的开源包装库组成，提供与 NVIDIA CUDA 的完全兼容性，旨在让用户避免维护多个与不同的 GPU 制造商打交道时，会牺牲代码库或牺牲性能。此外，SCALE 提供可选的语言扩展，使 GPU 编码更加用户友好和高效。作为一项正在进行的工作，SCALE 开发人员欢迎有关可能妨碍使用的缺失 API 的反馈。有关 SCALE 的更多信息，请通过电子邮件联系 [[email protected]](mailto:[email protected])。

高性能计算互操作性 (HIP) 是一种较低工作量而非零工作量的解决方案，针对熟悉 CUDA 的用户。它只需进行最少的更改，就可以简化现有 CUDA 代码适应 AMD GPU 的过程。对于抽象层端口，责任在于抽象层维护者或 AMD 本身。零努力解决方案的主张是值得怀疑的，特别是考虑到性能要求和全堆栈集成所涉及的复杂性。人们对这一想法持怀疑态度，因为即使以 HIP 为基础，这似乎也是一项具有挑战性的任务。完全轻松的解决方案不太可能存在。一旦生态系统完全集成 HIP，性能最高的机器学习解决方案就会出现。然而，它需要经验丰富的 CUDA 工程师非零的努力才能达到最佳结果。不管怎样，AMD 因忽视机器学习而受到批评，尽管开源 CUDA 同等产品有潜在的好处，可以带来更激烈的竞争，并可能带来卓越的性能。

原文

What is SCALE?

SCALE is a GPGPU programming toolkit that allows CUDA applications to be natively compiled for AMD GPUs.

SCALE does not require the CUDA program or its build system to be modified.

Support for more GPU vendors and CUDA APIs is in development.

To get started:

How does it work?

SCALE has several key innovations compared to other cross-platform GPGPU solutions:

SCALE accepts CUDA programs as-is. No need to port them to another language. This is true even if your program uses inline PTX asm.
The SCALE compiler accepts the same command-line options and CUDA dialect as nvcc, serving as a drop-in replacement.
"Impersonates" an installation of the NVIDIA CUDA Toolkit, so existing build tools and scripts like cmake just work.

What projects have been tested?

We validate SCALE by compiling open-source CUDA projects and running their tests. The following open-source projects are currently part of our nightly automated tests and pass fully:

Which GPUs are supported?

The following GPU targets are supported, and are covered by our nightly tests:

AMD gfx1030 (Navi 21, RDNA 2.0)
AMD gfx1100 (Navi 31, RDNA 3.0)

The following GPU targets have undergone ad-hoc manual testing and "seem to work":

We are working on supporting the following GPUs:

AMD gfx900 (Vega 10, GCN 5.0)

What are the components of SCALE?

SCALE consists of:

An nvcc-compatible compiler capable of compiling nvcc-dialect CUDA for AMD GPUs, including PTX asm.
Implementations of the CUDA runtime and driver APIs for AMD GPUs.
Open-source wrapper libraries providing the "CUDA-X" APIs by delegating to the corresponding ROCm libraries. This is how libraries such as cuBLAS and cuSOLVER are handled.

What are the differences between SCALE and other solutions?

Instead of providing a new way to write GPGPU software, SCALE allows programs written using the widely-popular CUDA language to be directly compiled for AMD GPUs.

SCALE aims to be fully compatible with NVIDIA CUDA. We believe that users should not have to maintain multiple codebases or compromise on performance to support multiple GPU vendors.

SCALE's language is a superset of NVIDIA CUDA, offering some opt-in language extensions that can make writing GPU code easier and more efficient for users who wish to move away from nvcc.

SCALE is a work in progress. If there is a missing API that is blocking your attempt to use SCALE, please contact us so we can prioritise its development.

There are multiple ways to get in touch with us: