news

Breaking the ecological isolation, domestic heterogeneous native AI computing tools are launched from Zhongke Jiahe

2024-07-22

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Machine Heart Report

Author: Zenan

"With the help of system optimization software, the threshold for development will be lowered, various hardware will be unified, and the technology ecosystem will be developed. This is of great significance to the progress of the current intelligent ecosystem," said Sun Ninghui, academician of the Chinese Academy of Engineering, director of the Academic Committee of the Institute of Computing Technology of the Chinese Academy of Sciences, and chairman of CCF, at the press conference. "In addition to smart chips and AI industry applications, we need the participation of system software optimization parties to work together, so that we can make the domestic ecosystem better."



Academician Sun Ninghui at the press conference

Faced with the computing power bottleneck problem, we finally have a system-level solution.

On July 20, AI infrastructure startup Zhongke Jiahe officially released the first generation of heterogeneous native AI computing tools.

In view of the current trend of large-scale implementation of domestic computing power, the method proposed by Zhongke Jiahe can allow different types of chips to run in parallel on a large scale while maximizing efficiency, and allow users of computing power to use it directly without having to pay attention to the different chip ecosystems.

Cui Huimin, founder and CEO of Zhongke Jiahe, released and introduced that "Jiahe heterogeneous native AI computing power tool" has played a certain role in the AI ​​infrastructure of domestic computing power. It is compatible with a variety of domestic AI chips and provides a high-performance unified interface to shield chip differences. Based on the heterogeneous native platform, AI computing power clusters are used in large model reasoning.The latency can be reduced by 3-74 times, the throughput can be increased by 1.4-2.1 times, the energy efficiency ratio can be increased by 1.46 times, and it can support dense large models with 340B parameters and MoE large models with 640B parameters.

At the same time, Zhongke Jiahe has provided high-performance reasoning support to more than 10 customers including chip, integrators, service providers, etc. Its architecture supports mainstream large models at home and abroad and can perform diversified parallel reasoning.

The computing power providers and application partners that announced the signing and cooperation at the press conference include: AMD, PowerLeader, Huawei, Hangzhou Artificial Intelligence Computing Center, Open Transcendent, Moore Threads, QingCloud Technology, Rise VAST, Suiyuan Technology, Wuwen Core Dome, Yunxi Computing Power, H3C, etc. (sorted in alphabetical order of the first letter of pinyin).



Cui Huimin, founder and CEO of Zhongkejiahe, at the press conference

Heterogeneous native AI computing power, aiming to achieve "three zeros and one high"

The solution proposed by Zhongke Jiahe aims to enable large AI models to be appliedEfficient use of zero-cost migration, zero-loss use, and zero-delay deployment on different chips

This set of software tools includes three products: the heterogeneous native large model inference engine "SigInfer", the heterogeneous native fine-tuning engine "SigFT", and the operator automatic generation and translation tool "SigTrans".

Among them, SigInfer, which was released yesterday, is a cross-platform, high-performance heterogeneous native inference engine that supports not only server-level AI accelerator cards but also consumer-level GPUs. Therefore, it can be deployed in data centers and can also accelerate various end-side devices.



As the technical foundation of heterogeneous computing, different AI computing powers connected through SigInfer can unify the calling interface and smoothly migrate business applications. SigInfer will perform multi-level deep optimization while calling multiple different computing powers to fully tap the potential of chip computing power.

It has various features that modern large-model inference engines have, such as support for API Serving, request scheduling, Batch management, KV Cache optimization, tensor parallelism, pipeline parallelism, expert parallelism, and even multi-machine pipeline parallelism.

Zhongke Jiahe said that SigInfer already supports most of the large model structures in the industry.



Currently, SigInfer can already realize complete inference engine capabilities. The heterogeneous accelerator card cluster it supports can flexibly schedule NVIDIA AI accelerator cards + domestic AI accelerator cards for hybrid inference, which can be expanded to large models at the trillion level.

Using SigInfer to assist in AI chip deployment can enable large model services to maintain high throughput and low latency when business access demands increase. These indicators are critical for large-scale applications of generative AI.

When using the same NVIDIA graphics card, we can see that SigInfer can provide a more obvious acceleration effect:



Going further, when using domestic chips to complete similar tasks, SigInfer can also improve the throughput of AI accelerator cards during parallel computing, while significantly reducing the latency of outputting tokens.

Heterogeneous native AI computing tools can adjust the computing frequency of AI accelerators according to different stages of large model task processing, operator characteristics, optimization target adaptive optimization, etc., so as to achieve high efficiency. Zhongke Jiahe did some calculations for us. In the process of data center operation, using A800 plus SigInfer can improve the energy efficiency ratio by 46% compared with vllm.

In addition to optimizing cloud infrastructure, Zhongke Jiahe also demonstrated performance optimization for edge-side reasoning. SigInfer can accelerate chip devices based on Intel, Qualcomm, AMD and other major manufacturers. Compared with the mainstream deployment solutions in the industry, SigInfer can improve the efficiency of edge-side reasoning by up to 5 times.

Behind heterogeneous computing and efficiency improvements are the application and optimization of a series of cutting-edge technologies and engineering.

In order to improve the efficiency of parallel computing, Zhongke Jiahe introduced a series of optimizations, such as memory access optimization in the deep decoding stage, which allows KV Cache to be reused at the register level, and compared with loading from L2, both latency and bandwidth are optimized.

At the same time, in order to alleviate the reduction of parallelism, researchers from Zhongke Jiahe also performed parallel division on the sequence dimension of the data. Combined with the reuse optimization of KV Cache, it not only saves memory access, but also increases parallelism, making the core calculation of the entire attention mechanism more efficient.

Zhongke Jiahe also explored the high-performance operator generation method for heterogeneous computing power. By cooperating with computing power manufacturers, Zhongke Jiahe migrated cutlass to the domestic chip architecture, greatly improving the operating efficiency of matrix multiplication. Among them, the company achieved more than 20% performance improvement by combining optimization with compilation technology.

With the support of a series of technologies, Jiahe's heterogeneous native AI computing tools have achieved excellent energy efficiency optimization.

Starting from compilation technology: Zhongke Jiahe's technical route

Different from the capabilities provided by some AI computing infrastructure companies in the past,The heterogeneous computing and acceleration provided by Zhongke Jiahe are centered around compilation technology.

For computers, the work done by the compilation layer is "translation". It is responsible for converting the content of high-level programming languages ​​written by humans into a language that machines can understand and execute.



In this process, compilation also needs to be optimized, that is, to improve the running efficiency of the generated machine code. For chip performance, compilation plays a large role, but it is often overlooked.

The CUDA computing platform plays an important role on the most popular NVIDIA chips in the industry. It includes programming languages, compilers, various high-performance acceleration libraries and AI frameworks. It can act as a distributor when the computer executes tasks, making full use of the computing power resources of different hardware to make complex code models run faster. It can be said that today's AI ecosystem is largely built on CUDA.

For domestic computing power, in order to achieve large-scale applications, it is necessary to build the necessary ecosystem and capabilities.



In the era of generative AI, people's demand for computing power has driven the development of chip technology, but new challenges have also emerged:

  • From the perspective of chip companies, the ecosystem is also showing diversified and fragmented development, which will lead to increased development costs and problems such as implementation efficiency and compatibility.
  • From the perspective of industry development, AI technology is developing rapidly, covering more and more scenarios, which means that more types of computing power will be involved, which further drives the demand for heterogeneous computing.

Therefore, the industry urgently needs an efficient tool chain that can support a variety of domestic chips. If a set of universal, low-cost, high-performance basic software can be developed to help ecosystem partners quickly transplant applications developed based on the NVIDIA ecosystem, the potential of domestic chips can be fully tapped, driving the pace of technology research and development, thereby gradually building a positive cycle of AI computing power ecology.

This is what Zhongke Jiahe has been doing.

The basic software platform layer provided by Zhongke Jiahe is positioned at the operator, compilation, and framework layers, building a bridge between hardware and software.The heterogeneous native AI computing tools it provides can help users smoothly migrate AI models and chip architectures, which brings great convenience to AI applications.



The capabilities at these levels all involve compilation technology. AI compilation covers both the layer layer and the operator layer, and has a wider span in semantic conversion than traditional compilers. For example, AI compilers generally consider computational graph partitioning, subgraph fusion, parallel computing, data segmentation, etc. These are all difficult problems to solve.

In this regard, Zhongke Jiahe has completed a lot of research, such as global data flow analysis at the Tensor expression level, building precise computational graphs and data dependency graphs, and breaking operator boundaries to perform operator fusion, achieving very good results. On some networks, its method has achieved a speedup ratio of up to 3.7 times compared to the industry's advanced level. The relevant work results were published at this year's top conference in the computer field.

Build end-to-end computing power enabling solutions to help prosper the domestic AI ecosystem

Zhongke Jiahe was founded in July 2023, and the team mainly comes from the Institute of Computing Technology of the Chinese Academy of Sciences. The founder Cui Huimin graduated from the Department of Computer Science of Tsinghua University and is the head of the compilation team of the Institute of Computing Technology of the Chinese Academy of Sciences. The company's core team has more than 20 years of experience in compiler development and has presided over or participated in the development of compilers for many domestic chips as a core member.

Since its establishment, the company has focused on chip compilation and optimization technology, and is committed to providing universal, low-cost, high-performance computing resources, with the mission of "gathering the power of chips and building a domestic ecosystem". At present, Zhongke Jiahe has received multiple rounds of financing totaling nearly 100 million yuan.



Zhongke Jiahe is building a series of products around three routes, including an AI large model inference engine that supports heterogeneous computing power, a large model fine-tuning framework, and an AI compilation tool suite. They can not only help computing power users quickly use diversified AI computing power, but also help computing power suppliers improve the software ecosystem and enhance competitiveness, completing an important piece of the puzzle of the domestic AI computing power ecosystem.



More importantly, Zhongke Jiahe hopes to become a bridge of "communication", connecting a large number of computing power users and computing power providers, so that both parties can happily run in both directions, thereby promoting the large-scale application of heterogeneous native AI computing power and contributing to the vigorous development of the domestic AI ecosystem.