Google AI chip veteran starts a business, with annual revenue of nearly 500 million, and lands in Qianka Intelligent Computing Center

2024-08-07

“I hope our next generation of products can be iterated toCompared with the current Nvidia B200More advanced products..."

This is the latest "small goal" that Yang Gongyifan, founder and CEO of AI training chip startup Zhonghao Xinying, recently shared with Xindongxi. Yang Gongyifan is a veteran chip developer who has been engaged in high-end chip research and development for more than ten years in companies such as Oracle and Google, and participated inGoogle TPU 2/3/4He returned to China at the end of 2018 to build a complete chip design team and prototype verification team, and established Zhonghao Xinying in 2020.

According to him, in 2023, Zhonghao XinyingAchieved positive profit for the first time, the net profit attributable to the parent company reached 81.33 million yuan, the annual income reached 485 million yuan, and the cumulative5Round of financing;ThatThe self-developed TPU training chip "Shana" has been mass-producedThe computing power of the "Moment" training model reaches NVIDIA A100Nearly 1.5 timesZhonghao Xinying uses the AI training chip "Moment" as the cornerstone and builds a large-scale AI intelligent computing cluster "Taize" through the high-speed interconnection capabilities of 1024 chips.The computing power of the Qianka cluster reaches 200PFLOPS。

▲Zhonghao Xinying AI training chip "Moment"

Unlike GPUs, which currently dominate the AI chip market, Yang Gongyifan chose the TPU architecture designed specifically for deep learning. In his opinion, "TPU architecture is a naturally advantageous architecture for large AI models. Under the same production process and the same technology, it will achieve 3 to 5 times the performance of traditional GPU architecture."

Yang Gongyifan said that Zhonghao Xinying is currentlyThe only domesticTPUArchitecture training and promotion integrationAICompanies with core chip technologyHe predicts that in the next 5-10 years,TPUand ClassTPUThe market share of the architecture will reach80%,the remaining10%-20%It is a traditionGPU。

In addition to promoting the research and development and implementation of AI chips, Zhonghao Xinying has also developed its own pre-trained large models that can provide "rigid" output capabilities, which will eventually be opened to partners in finance, medical care, education, etc. for the implementation of professional large models in vertical fields.

AI chips are a well-known money-burning long-term race. How did Zhonghao Xinying achieve profitability in five years? As a chip startup, why does it develop large models and build its own intelligent computing center? How does it plan to stand out from the increasingly fierce competition among domestic AI chips? Recently, Yang Gongyifan, founder and CEO of Zhonghao Xinying, had an in-depth conversation with Xindongxi, sharing his thoughts and choices during the entrepreneurial process, as well as his judgment on the development of technology and the trend of commercial implementation.

▲Yang Gongyifan, founder and CEO of Zhonghao Xinying

1. More than 10 years of experience in chip research and development, deeply involved in Google TPU research and development, returned to China to start a business in 2018

Yang Gongyifan's more than 10 years of experience in the high-end chip field laid a solid foundation for him to find the right direction for entrepreneurship.

After obtaining a master's degree in computer science from Stanford University, Yang Gong Yifan participated in and led the design and production of 12 top-level high-performance CPUs including SPARCT8/M8 at Oracle, and also has experience in more than ten successful tape-outs.

▲Related papers published by Yang Gongyifan during his time at Oracle (Source: IEEE Xplore)

Joining Google in 2017 planted the seeds for him to return to China to start a business in the future.

During his time at Google, Yang participated in the design and development of TPU 2/3/4 as a member of the core chip R&D team. He mentioned that it was the more than ten years of work experience that enabled them to optimize the computing needs of applications and computing models, which led to the successful development of TPU 2/3/4.

In June 2017, eight co-authors from Google published a paper titled “Attention is All You Need”, which revolutionized training on TPUs.TransformerThe architecture has been pushed to the extreme, opening the prelude to the Transformer architecture becoming popular in large model research. At the same time, Yang Gongyifan realized that large models are gradually becoming human-like in intelligence, and TPU will have a great impact on industrial development. He firmly believes that large models will replace humans and become the core of social productivity between 2025 and 2026.

▲Google TPU architecture (Source: YouTube)

Yang Gongyifan believes that this big model's transformation of the computing field is the biggest transformation in human history. Because all previous computing was done on a single chip for a single or multiple applications, now is the first time that thousands of chips are used to complete a single application, so it poses a huge challenge to the realization of the entire computing architecture, and behind the challenge lies an opportunity.

He thought that it was rare to have such a big challenge, such a big change, and such a large application scenario space, so he had to do it. Based on this understanding, he returned to Shenzhen at the end of 2018 and formed a team to make domestically produced, independent and controllable TPU AI training chips, and officially established Zhonghao Xinying in 2020.

This also ushered in his first moment of entrepreneurial achievement.2019In 2017, we saw that the simulator was built, ran smoothly, and had good performance, which proved that their chip design was feasible.

At the beginning of the business, Yang Gongyifan's idea was to first form a team that could make products. Therefore, the initial team landed in Shenzhen, formed a chip design team and a prototype verification team, and completed the chip modular design. Later, in 2020, Zhonghao Xinying landed in Hangzhou. With the promotion of products, financing, and chip mass production, they formed a complete supply chain team and marketing team. The team size has now reached over 170 people, of which R&D personnel account for more than 80%.

But the early days of entrepreneurship were not smooth. At that time, most domestic players, investment institutions, and customers did not recognize the future development and application prospects of large models in the industry. According to Yang Gongyifan's recollection, in the early days of its establishment, Zhonghao Xinying had no income for two years. It did not achieve its first revenue until 2021, and officially achieved profitability in 2023. In 2023, the company's revenue reached 485 million yuan, and its net profit attributable to the parent company was 81.33 million yuan.

At present, Zhonghao Xinying has completed 5 rounds of financing, and the disclosed financing amounts include:2022Year9Completed in monthAA round of financing of about 100 million yuan, led by Saizhi Bole Investment, with participation from Hangzhou Hi-Tech Investment and others;2023Completed consecutivelyPre-BandPre-B+The two rounds of financing were worth hundreds of millions and tens of millions of yuan respectively.

In Yang Gongyifan's view, investors' recognition of them is divided into several stages. In the early stage, they focus on the team, in the middle stage, they focus on whether their products meet market demand, and in the later stage, they focus on whether the products have core competitiveness and the new expansion direction of the industry. Whether in terms of product layout, technology route selection or team integrity, Zhonghao Xinying has not missed any step. At present, Zhonghao Xinying has grown into the only domestic company that mastersTPUArchitecture training and promotion integrationAIA company with core chip technology.

two,TPU is naturally suitable for large models.AI training chip performance is 1.5 times that of A100

“Let computing power become the driving force of human development”Zhonghao Xinying’s business layout is developing along this vision.

Yang Gongyifan explained that the first thing to do to achieve this goal is to make the chip first, because it is the most basic infrastructure. After the basic infrastructure is completed, the chip will be built into a complete supply chain to ensure that the infrastructure can continuously supply production.

On the other hand, there are ecological partners who implement industry models at the algorithm level. Zhonghao Xinying's role in this is to develop its own pre-trained large models and then open them to industry partners such as finance, education, and medical care.

In the era of large models,TPUandTransformerArchitecture is naturally adaptable.

Compared toCPUScalar computing units andGPUThe vector computing unit inTPUComputational tasks can be completed using two-dimensional or even higher-dimensional computing units.TPUThe design can more easily realize the clustering of thousands of cards, connecting1024The chip becomes32×32A two-dimensional matrix can make each chip 100% symmetrical with any other chip in the entire network.

He added,TPUThe architecture is optimized for deep learning solutions and can also perform large-scale deep learning calculations, such as intelligent computing networks and Wanka interconnection.TPUIt performs relatively poorly in terms of versatility, butAIIn the application scenario, under the same production process and technology,TPUThe performance of the architecture is traditionalGPUof3-5times.

Establishment5In 2017, Zhonghao Xinying has achieved mass production and industrialization of the chip "Moment". This was his second moment of achievement. At that time, the R&D team worked all night.“Light Up”He still remembers the process of completing product verification.

Compared to NVIDIAA100, developed by Zhonghao XinyingGPTPU AITraining chip“moment”The computing performance is1.5times, reducing energy consumption when completing the same amount of training tasks30%, the unit computing power cost is42%。

"Thai Rule"AIThe server is equipped with8The "instant" training chip can support the training and reasoning of large models with more than 100 billion parameters. Yang Gongyifan revealed that among the products delivered during the current training, Zhonghao Xinying's customers have completedLlama 2Training and inference of other models on the Qianka cluster.

“moment”With unique Gundam1024The ability of high-speed chip-to-chip interconnection to build large-scale intelligent computing clusters“Tai Ze”, the system cluster performance is traditionalGPUIt is dozens of times larger than the original model and can also handle large-scale tasks such as autonomous driving model training and protein structure prediction calculation.AIComputing requirements.

Zhonghao Xinying can makeTPUAnother key to the chip is that Yang Gongyifan said they did not rely solely on experience when recruiting engineers. He said that when they worked with experienced engineers in the early stages of design, they found that these engineers could not understand the design process.TPUConcepts and design innovations are more likely to be constrained by repetitive experiences, making it impossible to think more directly to quickly solve problems and optimize performance.

Facing the larger computing power requirements in the era of large models, he added that Wanka clusters must have, andTPUThe architecture's natural advantage in networking capabilities determines that it has more advantages in building a Wanka cluster and has relatively better performance.

3. Entering the pre-training large model track, the market size of TPU-like architecture may reach 80%

Last year, Zhonghao Xinying also introduced top talents to form a large model algorithm team.

The unique performance of the TPU architecture can increase the freedom of software used in enterprise applications and make it easier to complete parallel design, thereby achieving performance optimization and system construction.

In order to find solutions that better meet the needs of enterprises, chip players can better understand the characteristics and application scenarios of the models by training their own models, and increase customer satisfaction and loyalty to the chips. The current general large model products generally have strong "flexible" capabilities such as text comprehension, Internet information retrieval, and multi-round dialogues, but when facing highly professional industrial application scenarios, it is usually difficult to understand the business knowledge, logic, and professional terms in the subdivided fields; at the same time, the quantitative calculation accuracy of the general large model is poor, and the existing computing power is completely unable to meet the industry application scenarios such as civil aviation and finance that require high digital accuracy.

Based on this, Zhonghao Xinying is building a basic large model with "rigid" output capabilities, and open-sourcing the basic large model to industry partners such as finance, education, and medical care for cooperation. Industry partners can conduct secondary training and data labeling of the model based on the software stack and corresponding data, so that the model has industry knowledge reserves and can gradually be implemented in segmented scenarios to replace specific production environments.

Since last year, they have been studying how to implement the Demo. After seeing the feasibility, they began to gradually move into the pre-training of industry models this year.

With such a complete business layout and judgment of industry trends, Yang Gongyifan believes that in the era of big models, in the next 5-10 years, the market share of TPU and TPU-like chips in the field of AI computing hardware will reach 80%, and the remaining 20% will be traditional GPUs.

He also clarified Zhonghao Xinying's goals in product iteration and commercialization. The next generation of chip products that Zhonghao Xinying is developing is expected to be upgraded to match the performance of NVIDIA B200. In terms of commercialization, it hopes to deepen cooperation with a wider range of customers such as integrators, operators, and Internet giants.

Conclusion: The opportunity for the TPU architecture market to explode has arrived

Since its establishment in 2018, Zhonghao Xinying has witnessed the development of AI chips from the early stage of the market to the opportunity period when large models triggered market demand. But looking back, at the beginning of the AI chip market, a startup that wanted to gradually occupy the market share with the TPU architecture had to face various challenges.

Today, large models bring about an upgrade in computing power requirements, AI chips have entered a new era, and the advantages of the TPU architecture have become apparent, which has also allowed Zhonghao Xinying to anticipate the timing of market explosion in advance. Yang Gongyifan said that he believes that the application scenarios of AI will far exceed any previous computing usage scenarios, and the computing power resources required by AI models will also exceed the imagination of computing power resources in human history, which will make the application scenarios and market demand of the TPU architecture increase rapidly in the short term.

As for the future, he hopes that Zhonghao Xinying can become the leader in China's AI chip industry, and TPU has the best chance to become the "x86" in this field. A new chapter in the story of China's AI chips has begun...

During the 2024 Global AI Chip Summit held on September 6-7 this year, Yang Gongyifan will...

news