zhou jingren: fully invest in upgrading ai infrastructure

2024-09-19

on september 19, at the 2024 hangzhou yunqi conference, alibaba cloud cto zhou jingren stated that alibaba cloud is setting a new standard for ai infrastructure around the ai era, comprehensively upgrading the technical architecture system from servers to computing, storage, networking, data processing, model training and inference platforms, making the data center a supercomputer, providing high-performance and efficient computing services for each ai and application.

at the conference, tongyi's big model ushered in the annual blockbuster release. the basic model was upgraded, with performance comparable to gpt-4o, and the open source model qwen2.5 series was released. at the same time, more than 100 full-modal models such as language, audio, and vision were launched. the cumulative downloads of tongyi's open source models have exceeded 40 million, and the total number of tongyi's native models and derivative models has exceeded 50,000, making it a world-class model group second only to llama in the united states.

building a strong ai infrastructure

different from the traditional it era, the ai era has higher requirements for infrastructure performance and efficiency. the cpu-dominated computing system has rapidly shifted to the gpu-dominated ai computing system. alibaba cloud is taking ai as the center, comprehensively reconstructing the underlying hardware, computing, storage, network, database, and big data, and organically adapting and integrating with ai scenarios to accelerate the development and application of models and build a powerful ai infrastructure for the ai era.

zhou jingren said: "cloud vendors have a full stack of technology reserves, and through comprehensive upgrades of infrastructure, they make the entire life cycle of ai training, reasoning, deployment and application more efficient."

at the conference, zhou jingren demonstrated the upgrade of the entire alibaba cloud product family driven by ai. the latest panjiu ai server supports 16 gpus and 1.5t video memory on a single machine, and provides an ai algorithm to predict gpu failures with an accuracy of 92%; alibaba cloud acs launched gpu container computing power for the first time, and achieved computing affinity and performance improvements through topology-aware scheduling; hpn7.0, a high-performance network architecture designed for ai, can stably connect more than 100,000 gpus, and improve the end-to-end training performance of the model by more than 10%; alibaba cloud cpfs file storage, with a data throughput of 20tb/s, provides exponentially scalable storage capabilities for ai intelligent computing; the artificial intelligence platform pai has achieved integrated elastic scheduling of training and reasoning at the level of 10,000 cards, and the effective utilization rate of ai computing power exceeds 90%.

over the past two years, the size of models has increased thousands of times, but the computing cost of models is continuing to decline, and the cost of using models for enterprises is also getting lower and lower. zhou jingren emphasized: "this is a technological dividend brought about by the comprehensive innovation of ai infrastructure. we will continue to invest in the construction of advanced ai infrastructure and accelerate the use of large models in thousands of industries."

it is reported that more than 300,000 corporate customers have accessed tongyi's big model; in the future, industries such as biomedicine, industrial simulation, weather forecasting, and gaming are accelerating their embrace of big models, which will bring about a new round of ai computing power growth. zhou jingren said: "in order to cope with the exponential growth demand for gpu computing power, especially the upcoming inference market, alibaba cloud is ready."

open advanced large model

in the past year, big model technology has achieved multiple milestones. from big language to video generation to multimodal models, the capabilities of big models are still expanding, and their capabilities in mathematics, code, and reasoning continue to rise.

as one of the earliest technology companies in the industry to deploy big model technology, alibaba cloud released the large language model tongyi qianwen in april last year. now the tongyi big model family has comprehensively covered all modalities such as language, image, video, audio, etc., and its performance ranks among the top in the world. at the same time, the tongyi model continues to be open source and has become one of the domestic big models popular among enterprises and developers.

at the conference, zhou jingren made a major announcement that tongyi's flagship model qwen-max has been fully upgraded, with performance close to gpt-4o. at the same time, the most powerful open source model qwen2.5 series was released, becoming a world-class model group second only to the american llama. a total of 100 models were open sourced this time, allowing companies and developers to use large models in a low-cost manner.

for programming scenarios, alibaba cloud tongyi lingma has been upgraded again and launched a new ai programmer. unlike the previous generation of products, it has multiple job skills such as architect, development engineer, and test engineer. it can independently complete task decomposition, code writing, defect repair, testing and other development work, and complete application development in as fast as minutes, helping to improve software development efficiency by dozens of times.

one year after the hundred models war, applications have become the main theme of the big model industry. as the first company to propose the concept of model as a service (maas), alibaba cloud has always regarded the prosperity of the big model ecosystem as its primary goal. at present, the moda community has become the largest model community in china, with more than 6.9 million developer users, and has contributed more than 10,000 models in cooperation with industry partners.

"we hope that enterprises and developers can develop and use ai at the lowest cost, so that everyone can use the most advanced large models." zhou jingren said. (li ji)

source: guangming.com

report/feedback

news

zhou jingren: fully invest in upgrading ai infrastructure

introduction

my contact information