news

Dialogue with Zhao Shuai from Inspur Information: Open source brings development and prosperity to AI

2024-08-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

On August 9, the 2024 Open Compute China Summit with the theme of "Open Collaboration: Collaboration, Intelligence, and Innovation" was held.Inspur InformationserverProduct Line General ManagerZhao ShuaiShared the development trend of large models andAIThe progress of the open computing power ecosystem indicates that all computing in the future will beAIAfter the meeting, Zhao Shuai, Zhang Zheng, senior product manager of Inspur Information's AI & HPC product line, and Luo Jian, product planning manager of Inspur Information's server product line, accepted media interviews and had in-depth exchanges on the topic of open computing driving the innovative development of artificial intelligence.

Zhao Shuai said,Today, 2/3 of the models are open source, and more than 80% of AI projects are developed using open source frameworks. The download volume of open source models has exceeded 300 million times, and more than 30,000 new models have been derived. It can be said that open source has brought about the development and prosperity of AI.Based on this, the open design of hardware has also become the key to promoting the development of the AI ​​computing ecosystem. Only in this way can we better reach every customer and land in every industry.

Zhao Shuai gave an example, saying that under an open ecosystem, the design of accelerated standardized OAM was opened in 2019. Inspur Information demonstrated the industry's first UBB at the 2019 OCP Summit, and launched the OAM reference system design MX1 in 2020. In 2021, it released the server NF5498A5 based on the OAM v1.0 specification, which promoted the development and deployment of high-end AI chips by several manufacturers.OAM's open design standards have greatly accelerated the adaptation and compatibility process of computing chips, saved tens of billions of yuan in industrial resource investment, accelerated computing deployment and iteration, and supported the maturity of upper-level large models and AIGC applications.

Specifically, efficient training of large models usually requires the support of an AI server system consisting of AI chips with high computing power of more than 1,000 cards. The premise for interconnecting thousands of chips and enabling them to work together efficiently is to solve the high-speed direct connection of chips inside a single server. The emergence of the open acceleration specification OAM solves the problems of inconsistent forms and interfaces of multiple AI acceleration cards in a single server, low efficiency of high-speed interconnection, and long R&D cycle, and has received support and participation from many companies.

In this regard, Zhang Zheng pointed out that Inspur Information has been continuously promoting the evolution of standards and technologies in the field of open acceleration specifications, and has been quietly working in the open community for many years. At the beginning, there were very few partners, and this matter would not bring profits to the company, but we will find that it will bring huge benefits to the entire industry chain. Now basically all domestic and foreign chip manufacturers have adopted open standards for their most high-end products, soOur bottom line is to do a good job in an industry. Only when the industry is good and healthy can we gain more value of our own in the industrial chain.

According to reports, system manufacturers represented by Inspur Information have developed a number of AI servers that comply with open acceleration specifications. Inspur Information has defined the industry's first 8-card interconnected hardware system that complies with the OAM specification. This is an interconnected substrate that follows the open computing specification. For the first time, it has achieved the highest chip interconnection rate of 56Gbps under the OAM specification, which is much higher than the transmission rate of PCIe5.0, and is still developing open acceleration specifications with higher rates. Inspur Information's latest generation of AI servers based on the OAM specification, NF5698G7, supports a variety of open acceleration chips based on the OAM specification, and has built a thousand-card liquid cooling cluster for users to support AI large model training with more than 100 billion parameters.

At this point, the pain points of accelerated standardization have been solved, and new challenges have emerged.

Zhao Shuai said that all computing in the future will be AI, but with the diversification of application paradigms, CPUs are showing a diversified development direction. So how to provide a better computing power platform for diversified CPUs and improve efficiency?

The newly launched Open Computing Module (OCM) specification provides the answer. OCM uses CPU and memory as the smallest computing unit and standardizes high-speed and low-speed interconnection interfaces. Just as OAM builds a unified acceleration chip base, OCM will promote the unification of CPU computing unit interfaces and the improvement of the ecosystem.This plan was jointly initiated by the China Electronics Standardization Institute, Intel, AMD, Inspur Information, Lenovo, Super Fusion, Baidu, Xiaohongshu, etc. It is the first server computing module design specification in China. It aims to establish a standardized computing power module unit based on the processor. By unifying the external high-speed interconnection, management protocols, power supply interfaces, etc. of different processor computing power units, it can achieve compatibility of processor chips with different architectures and build a unified computing power base for the CPU to solve the challenges of the CPU ecology. It is convenient for customers to flexibly and quickly match the most suitable computing power platform according to diversified application scenarios such as artificial intelligence, cloud computing, big data, etc., and promote the high-quality and rapid development of the computing power industry.

Luo Jian added that when Inspur Information was building industry standards, it was not done out of thin air, but was the result of collaboration with upstream and downstream industry partners. He emphasized,Industrial standards and product industrialization are advancing simultaneously, rather than just one standard. Having only one standard is worthless to the entire industry! Only by putting the standard in the market and applying it on the customer side can real value be generated.(Dingxi)

This article is from NetEase Technology Report. For more information and in-depth content, follow us.