Startup company takes on Nvidia: emulating CUDA on AMD cards, compiling and running original programs directly

2024-07-16

Cressey from Aofei Temple
Quantum Bit | Public Account QbitAI

Without any modification or conversion, AMD graphics cards can also run native CUDA programs!

A British startup company has launched a CUDA program compilation tool for AMD, which is free for commercial use.

Once the tool was released, it sparked widespread discussion among netizens.Topped the HackerNews Hot List。

The tool is called SCALE, and the developers position it as a GPGPU (general-purpose GPU) programming toolkit.

Currently, 9 programs including the large model framework llama-cpp have passed the test and achieved normal operation.

Unlike other implementations, SCALEDirectly simulate the installation of the CUDA toolkit, it can be compiled at the source without converting into other languages.

Therefore, SCALE can also provide support for NVIDIA's dedicated intermediate language such as inline PTX.

No need to convert, let AMD run CUDA

According to the official website, SCALE has three main components: a compatible nvcc compiler, AMD implementation of the CUDA runtime and driver API, and the ROCm library.

The compiler can directly compile programs written in CUDA-specific languages, including nvcc and inline PTX, into binary code that can run on AMD GPUs.

The ROCm library is used to provide the "CUDA-X" API, which is what SCALE uses when dealing with libraries such as cuBLAS and cuSOLVER.

The key innovation of SCALE isAccept CUDA programs as is, without having to port them to another language,Compatible with nvcc and clangAnd other compilation methods, while existing build tools and scripts (such as cmake) can work normally.

According to the official statement, SCALE is fully compatible with CUDA, allowingDevelopers do not need to write code for different GPU platforms。

This is very different from the HIP launched by AMD, because HIP rewrites the CUDA code in a certain way, may not be able to correctly understand complex macros, and does not support proprietary languages such as inline PTX.

Even the SCALE author believes that HIP cannot solve the CUDA compatibility problem.

In addition, the SCALE language is a superset of CUDA, providing someOptional language extensions, which can make it easier and more efficient for developers who want to move away from nvcc to write GPU code.

The author expressed the hope that in the future developers will be able to write code only once and have it run on different hardware platforms, and is working on bridging the compatibility gap between the popular CUDA programming language and other hardware vendors.

Currently, SCALE supports various AMD GPU series as follows:

Supported: gfx1030 (RX6000 series) and gfx1100 (RX7000 series)
"seems to work": GFX1010 (RX5000 series) and GFX1101
Adapting: gfx900 (RX Vega series)

In addition, the author tested some CUDA open source projects and successfully ran 9 CUDA applications using SCALE.

However, SCALE is a brand new project after all, so the author has also prepared a series of tutorials from installation to compilation, and provided different types of example programs.

The key steps of the tutorial are accompanied by relevant codes, and even include how to determine the model of your own GPU. It can be said to be very detailed.

If you encounter problems during use, the author also introduces common troubleshooting methods and has opened a Discord forum for direct communication with the development team.

The startup that created SCALE is calledSpectral ComputeFounded in the UK in 2018, it claims to have an in-depth understanding of the architecture of CPUs and GPUs, with the goal of helping developers use computing resources efficiently.

Netizen: Challenging Nvidia’s moat?

Some netizens believe that if SCALE really has the (advertised) effect, it will challenge Nvidia's moat and allow AMD to compete directly with it.

However, it is too early to draw a conclusion now. After all, SCALE officials also admit that there are still some defects compared with the original CUDA.

The developers also made it clear that some CUDA APIs and functions are not supported, but did not give a specific list.

Regarding more shortcomings of the "AMD solution", a netizen who claimed to have communicated with the SCALE team said that the current SCALE cannot operate TensorCore, which means that the FlashAttention acceleration framework cannot run on AMD.

In addition, since N cards have powerful matrix multiplication units, even if it can be compiled and run, the performance on AMD cards may not be as good as that on N cards.

Some netizens even believe that the reason why Nvidia dominates the market is that AMD is unwilling to invest in making its GPUs have higher machine learning performance (rather than just having the advantage of CUDA).

Even if they can run efficiently, it is also a question whether AMD cards are really affordable and easy to obtain.

Another group of netizens believes that the biggest problem is not whether it can work technically, but the legal issues behind it.

This issue has also sparked widespread discussion, but there is no conclusion yet.

Some people believe that SCALE, like ZLUDA (another way to run CUDA programs on AMD), has legal doubts and may lead to lawsuits from Nvidia.

Specifically, according to Nvidia's EULA terms, the CUDA SDK only allows the development of applications that run on N cards, which may prohibit compatible implementations such as SCALE.

But some netizens immediately said that SCALE did not use NVIDIA's "SDK", so how could there be an SDK usage agreement?

In short, whether it is technical deficiencies or legal issues, the discussion about this new tool is still ongoing.

As for whether it is useful or not, it depends on the developers to vote with their feet.

Reference Links:
[1]https://docs.scale-lang.com/
[2]https://news.ycombinator.com/item?id=40970560

news