news

What’s missing for the big model to land on the edge?

2024-08-07

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

As big models start to empower the industry, their large-scale deployment at the edge/side becomes the top priority for further development. However, while deployment at the edge/side is more conducive to the instant response and privacy protection of big models, it will also face challenges in computing power fragmentation, energy efficiency issues, and landing scenarios. For companies in the industry, this is both a difficulty and an opportunity.

Driven by industry application needs, large models are developing towards the edge

At present, my country's large model industry is experiencing rapid development. Statistics show that by the end of March, the number of large models released in China had reached 117. However, in the process of development, unlike the United States, which focuses on original breakthroughs, China's large model companies place more emphasis on application implementation. Zhou Hongyi, founder and chairman of 360 Group, said in a previous speech that rolling up models, rolling up computing power, and rolling up data are not the only options for developing models towards trillions of parameters. Artificial intelligence large models have a more diversified development path. Li Dahai, co-founder and CEO of Mianbi Intelligence, also emphasized the importance of industry models, which have now become one of the important development trends.

In order to more effectively empower thousands of industries, large models and related computing power cannot be deployed only in the cloud. Edge and end-side models have comparative advantages that the cloud does not have. First, the end-side model has stronger reliability. Models deployed on terminals can interact with the environment in real time and continuously, but it is difficult for cloud models to maintain this continuity. Secondly, the end-side model can better guarantee user privacy. This issue has attracted more and more attention. If robots enter homes widely in the future, privacy issues will become more serious. The advantages of the end-side model in protecting data privacy are more obvious.



Based on this understanding, some edge computing companies have taken the lead in launching scenario practices around edge AI and have successfully injected edge AI into multiple fields. For example, in the field of intelligent manufacturing, NVIDIA's Jetson edge computing platform can bring AI reasoning and computing capabilities into industrial scenarios, and solve problems such as defect detection and flexible manufacturing on the assembly line by using GPU-accelerated AI visual reasoning. Intel's solutions have also been applied in the fields of intelligent monitoring, education and teaching, and intelligent medical care. Through the management of various edge devices, it helps various edge intelligent businesses to be more flexible, efficient, and accurate.

The development of edge and end-side models has also driven the growth of edge AI computing, and related computing power, chips and other industry chain links have ushered in a wave of large-scale development. Qiu Xiaoxin, founder and chairman of Aixin Yuanzhi, pointed out that the real large-scale implementation of large models requires a close integration of the three levels of cloud, edge and end. The key to the integration of edge and end lies in AI computing and perception. Qiu Xiaoxin believes that smart chips and multimodal large models have become the "golden combination" in the era of artificial intelligence. As the application of large models becomes more and more extensive, more economical, more efficient and more environmentally friendly will become the keywords of smart chips, and efficient inference chips equipped with AI processors will be a more reasonable choice for the implementation of large models, which is also the key to promoting inclusive AI. STL Partners predicts that by 2030, the global potential market size of edge computing will grow to US$445 billion, with an industry compound annual growth rate of 48%.

Diversified computing power deployment to address fragmentation challenges

However, the deployment of large models on the edge/end side cannot be achieved overnight. Due to the computing resource limitations of edge devices and the diversity of computing resource requirements of large models themselves, edge deployment will first face challenges from the computing power aspect. On the one hand, model manufacturers need to perform technical processing such as compression, pruning, and quantization on large models to reduce the size and computational complexity of the models so that they can adapt to the performance requirements of edge/end devices; on the other hand, how to deploy the computing power infrastructure is also a key point.

Zhang Yu, Chief Technology Officer of Intel China Network and Edge Business Unit, emphasized that as artificial intelligence empowers thousands of industries, different applications have different requirements for computing power, and the span is very large. High performance requirements require computing power clusters to carry, while one or several devices with low computing power requirements can carry it. Yang Lei, Product Director of ARM Technology, also said that when deploying such large AI models on terminals, there are still multiple challenges such as cost, power consumption and software ecology.



In response to this demand trend, computing infrastructure suppliers such as Inspur and Lenovo have been building computing power layouts around "end-edge-cloud-network-intelligence" in recent years. Products include hardware devices such as intelligent edge gateways, edge servers, industrial controllers, and embedded computers to meet the computing power needs of different industries. On the chip side, the integration of CPU+GPU+NPU has become the direction of processor development to cope with more complex AI computing power. Qualcomm launched the Snapdragon X Elite with an integrated dedicated neural processing unit to support models with tens of billions of parameters. Intel's Meteor Lake processor combines the NPU with the AI ​​functions of the computing engine in the processor to improve the energy efficiency of PCs running AI functions. In terms of domestic AI chips, Aixin Yuanzhi also recently released the "Aixin Tongyuan AI Processor". The core operator instruction set and data flow microarchitecture adopt a programmable data flow microarchitecture, which can effectively improve energy efficiency and computing power density, and is suitable for the development of edge computing and AI reasoning.

Energy efficiency issues at the edge are prominent, and lightweight development is the key

Energy efficiency is also a key issue that must be considered in the development of large edge models. In fact, if large models are to be effectively deployed on the edge/end side, the energy efficiency issue is more prominent than on the cloud. Sachin Katti, senior vice president and general manager of Intel's Network and Edge Division, said in an interview with the author that when discussing computing power, in addition to optimizing computing power, software ecology, etc., a key issue is power consumption, especially when it is deployed at the edge. The energy consumption of edge-deployed equipment may be about 200W; the energy consumption of cloud deployment may be between 1k and 2kW, and the energy consumption of a single-layer rack in a data center may be as high as 100kW. If the energy consumption of the entire data center is added up, it may reach a scale of 50G to 100GW.

Edge devices usually have limited computing power and memory, while large models require a lot of resources to achieve high-performance reasoning. Therefore, how to optimize resource utilization and reduce energy consumption while ensuring model accuracy and response speed has become a key issue. In response to this trend, relevant manufacturers have promoted lightweight development frameworks and acceleration libraries, such as AMD's Ryzen AI model deployment engine, Intel's OpenVINO reasoning acceleration library, NVIDIA's TensorRT high-performance reasoning acceleration library, etc., combined with lightweight development frameworks for embedded and mobile terminals (such as PyTorch Mobile, TensorFlow Lite, Paddle Lite, etc.), which can promote the widespread application of artificial intelligence in mobile devices, the Internet of Things and other edge computing scenarios.



In addition, the industry has also begun to widely adopt liquid cooling technology as a means of cooling servers, and it has gradually been applied to data centers and large-scale model deployments. It is understood that the existing liquid cooling technology can already cool down a 100kW cluster, and is expected to expand to 300kW in the future.

Exploring AI applications, who will be the future “star scene”?

As people increasingly emphasize the use of big models to empower industry applications, finding the right "star scene" has become a key issue in determining the success or failure of the industry. Currently, AI mobile phones, AI PCs and autonomous driving have become the most promising application markets for big models.

The latest research from market research firm IDC shows that the AI ​​mobile phone market will reach 234.2 million units in 2024, a 363.6% increase from 50.5 million units in 2023, and will account for 19% of the overall smartphone market this year. In 2028, AI mobile phone shipments will reach 912 million units in 2028, with a compound annual growth rate of 78.4%. Anthony Scarsella, director of IDC's global quarterly mobile phone tracking research, said that cost will continue to be a key inhibitor when AI phones are released, because many powerful chips and NPUs are expensive and are mainly sold in the ultra-high-end market. However, as time goes by and competition intensifies, we believe that these components will enter the mid-range market and more affordable models.

AI PCs are maturing faster than originally expected, and are expected to bring about a wave of replacement for the global PC industry. According to Canalys' forecast, the global AI PC penetration rate will increase from 19% to 71% from 2024 to 2028, and shipments will increase from 51 million units to 208 million units, with a compound annual growth rate of 42.11%. Morgan Stanley predicts that AI PCs will account for 2% of the overall PC market this year, and then increase to 16% next year, 28% in 2026, 48% in 2027, and 64% in 2028.

The use of large models in cars is still in its early stages, but as the concept of intelligence gradually takes root in people's hearts, it has become a consensus among most people that cars will eventually become "walking mobile terminals", and the application of AI large models in cars will also "drive on the fast lane". There are two main directions for large models to be installed in cars: one is to enter the cockpit domain to achieve more intelligent human-computer interaction; the other is to cooperate with the autonomous driving system to further improve the intelligent driving solution. Zhang Chi, CTO of Mai Chi Zhixing Technology Co., Ltd., said that large models have accelerated the transition of autonomous driving from highways to more complex urban scenes, and also promoted the formation of end-to-end perception, regulation and control integration. Liu Jianwei, co-founder and vice president of Ai Xin Yuan Zhi, introduced that Ai Xin Yuan Zhi predicted the outbreak of Transformer in 2022 and took the lead in launching chips equipped with Ai Xin Tong Yuan AI processors. Ai Xin Yuan Zhi's intelligent driving chips, such as M55H and M76H, have been installed on current hot-selling models.