news

Many academicians and experts talk about artificial intelligence: China cannot rely on "stacking chips" to develop AI

2024-07-29

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

[Global Times reporter Ma Jun] The United States, relying on its software and hardware advantages, is running wild on the road of using scale to "pile up" the world's most powerful artificial intelligence (AI). American billionaire Musk recently announced on social media that his AI startup xAI has begun using the "Memphis Super Cluster" composed of 100,000 H100 GPUs for AI training, claiming to be "the world's most powerful AI training cluster." Should China follow this technological route led by the United States? Recently, at the 2024 China Computing Power Development Expert Seminar co-organized by the China Intelligent Computing Industry Alliance and the National Standardization Committee's Computing Power Standard Working Group, many academicians and experts gave their own views.

The future of super-intelligence integration will be divided into three stages

Chen Runsheng, an academician of the Chinese Academy of Sciences, said at the seminar that "artificial intelligence big models are representatives of new quality productivity. The integrated development of big models and supercomputing is very important. my country needs to seriously plan and consider it." Zhang Yunquan, a researcher at the Institute of Computing Technology of the Chinese Academy of Sciences, mentioned that the rapid development of big models demonstrates the characteristics of new quality productivity, but it has also encountered a computing power bottleneck. Given China's deep technical accumulation in the field of supercomputing, it is hoped that super-intelligence fusion (the integration of supercomputing and intelligent computing represented by big models) can effectively resolve this challenge. Shan Zhiguang, director of the Information Technology and Industrial Development Department of the National Information Center, explained that "super-intelligence fusion was born with the current diversified development of basic computing power, intelligent computing power, supercomputing computing power, etc., that is, whether hybrid computing power resources or integrated computing power systems can be used to solve the application needs of multiple different computing powers at the same time."

When predicting the process of super-intelligence integration in the future, Qian Depei, an academician of the Chinese Academy of Sciences, believes that it will evolve clearly along the three stages of for AI, by AI and being AI, and evolve in all aspects from hardware to software to adapt to and promote the development of artificial intelligence technology. In the first stage for AI, the focus will be on the transformation and upgrading of existing computer systems, the development of dedicated hardware, and ensuring that AI tasks can be efficiently supported and executed, providing a solid infrastructure for artificial intelligence research. In the second stage by AI, AI will be used to transform traditional computing. On the one hand, AI methods will be used to solve traditional supercomputing problems. On the other hand, AI is also affecting the structure of traditional computers, and this trend will gradually become obvious. In the final stage of being AI, computer systems will show inherent intelligent characteristics. Artificial intelligence is no longer an additional ability, but a core attribute and basic component of computers. The computing power or level of intelligence may far exceed our current supercomputing or intelligent computing.

Chen Runsheng noticed that the scientific and industrial communities have been trying to solve the problem of integrating supercomputing and intelligent computing. For example, Nvidia's latest GB200 architecture is actually two GPUs plus a CPU. In a sense, it can be regarded as using the advantages of both intelligent computing and supercomputing. In the layout of two GPUs executing machine learning, the high-speed data transmission provided by the CPU is added. However, he believes that this architecture does not fundamentally solve the efficiency problem. "The combination of supercomputing and intelligent computing is inevitable, and there will be an organic integration, rather than simply putting them together."

Zheng Weimin, an academician of the Chinese Academy of Engineering, also said that the development, training, fine-tuning and reasoning of large models are inseparable from computing power, and computing power costs account for the bulk of the overall expenses, especially in the training stage, which accounts for as much as 70%, and in the reasoning stage, it is as high as 95%. In view of this, computing power has become a key factor in supporting the development of large models.

Intelligent computing should refer to "human intelligence"

Chinese academicians and experts expressed their own views on China's current big model craze and the path of following the US "scale-up" technology. Qian Depei said that China now has more big models and more types than the US, and is also working on general artificial intelligence, but we are not only subject to strict hardware restrictions from the US, but also the quality and quantity of data used for big model training are also relatively small. "Can the big models made in this way be better than those in the US? I think we still have to conform to China's national conditions and cannot completely follow the Americans."

Chen Runsheng also believes that the big models that are now emerging in China are basically improvements on the big models and algorithms proposed in the United States, but little consideration is given to the basic theory of the entire big model. He proposed at the seminar that compared with the local memory model used by traditional supercomputers, intelligent computing shows a fundamental difference - distributed storage of information. This storage method imitates the human brain.Neural NetworksThe complex structure of the system uses a large-scale, densely interconnected chip network to carry increasingly large models. However, how to effectively embed human knowledge into these complex systems and how information is distributed and stored in the system, the algorithms and technical theories behind it have not been fully explored. "With the uncontrolled expansion of the model scale, an insurmountable problem is energy exhaustion. Therefore, it is not entirely advisable to simply add chips and increase the complexity of the system to solve the storage problem of large models."

Therefore, Chen Runsheng believes that future intelligent computing should still refer to "human intelligence", that is, simulate the operating mechanism of the human brain. The human brain is very small and consumes only tens of watts of energy, but the intelligence it generates exceeds the most advanced AI, which consumes the same amount of energy as an entire city. "The development of big models and intelligent computing requires not only improvements in models and algorithms at the application level, but also breakthroughs from a basic theoretical perspective. Now the big model has only developed the first 10%, and there is still 90% of the work to be done. I also believe that the big model is definitely not achieved by piling up more and more chips. It must learn like the human brain, compressing the spatial complexity and time complexity to a smaller level and reducing energy consumption. So I think the most basic problem is to study the current spatial complexity to complete the basic theory of intelligent computing. If we can make progress in basic theory, we can achieve fundamental and original innovation."

Yuan Guoxing, a researcher at the Institute of Applied Physics and Computational Mathematics in Beijing, believes that it is impossible to expect a universal large model to solve problems in all walks of life. In reality, different applications have different technologies, require different algorithms, and have different requirements for computing power. For example, in scientific computing, the requirements for computing accuracy are getting higher and higher, and as the scale of computers expands and the data increases, the credibility is constantly declining. NASA (NASA) also put forward a similar view, and they have very high requirements for calculation accuracy. Therefore, different applications in the future will have different big models and different calculations to solve different problems. The current big models have completely different requirements for calculation accuracy and algorithms.

China Academy of Information and Communications Technologycloud computingHe Baohong, director of the Big Data Research Institute, added: "Computing and training have different requirements for the underlying infrastructure, and we also need to determine in which scenarios we should block the differences and in which scenarios we should reflect the differences."

Need to develop sovereign-level large models

Zhang Yunquan said that the United States has recently taken a series of actions to try to "choke" my country in the development of artificial intelligence, including banning the sale of high-end GPUs, terminating the sharing of source code for large models, and interrupting ecological cooperation. At the same time, when the computing scale of large models reaches 10,000 GPUs or even 100,000, it is necessary to break through technical bottlenecks such as energy consumption wall, reliability wall, and parallel wall by developing special supercomputers for large models. In this context, if China wants to break through the bottleneck of large model computing power in the short term, there is a way to go: using the advanced supercomputing technology accumulated over the past two decades, developing special supercomputers for large models, and overcoming the bottleneck of large model computing power, so that my country can keep up with the most advanced level of large models in the world and not fall behind.

When introducing the "Sovereign Big Model" plan under the super-intelligence fusion system, Zhang Yunquan said that my country has deep technical accumulation in the field of supercomputing, and has invested huge amounts of money in the development of intelligent computing power in recent years, focusing on establishing a system engineering centered on the super-intelligence fusion computing power system to respond to the computing power needs of big models, hoping to maximize the use of supercomputing technology advantages to solve computing power challenges. According to the deployment of the "Sovereign Big Model" plan, the "Sovereign Big Model" innovation consortium will rely on the national supercomputer, the well-known professor team of the Chinese Academy of Sciences and key universities across the country, smart chip companies, big model solution companies, etc. to jointly create similarOpenAIThe sovereign-level big model is a root model that can support national development, not just an ordinary big model. Similar national-level super-large models are also highly valued by other countries. For example, Microsoft and OpenAI have jointly released a plan to invest $100 billion in a new artificial intelligence supercomputer, and Japan has recently announced that it will invest heavily in the development of a national-level big model.

Chen Runsheng believes that based on China's current basic conditions and the inevitable development trend of large-scale models, it is unrealistic for us to completely follow the Western approach, and it is difficult to catch up in the short term. Therefore, it is more important to find a way to develop a sovereign-level large-scale model.