news

huawei launches new ai storage, enabling ai large model training and reasoning with a new paradigm of long memory

2024-09-21

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

on september 20, at the data storage summit held during the huawei connect 2024 conference, dr. zhou yuefeng, vice president of huawei and president of the data storage product line, released the new ai storage oceanstor a800, which uses the new paradigm of long memory to comprehensively improve the training and reasoning capabilities of large ai models, helping thousands of industries to stride into the digital era.
zhou yuefeng, vice president of huawei and president of data storage product line, delivered a keynote speech
human civilization has evolved from the ancient stone age to the agricultural age, the industrial age, and now to the digital age. data is a key production factor in developing new productivity. people use data to make great progress in core ai fields such as ai big models, embodied intelligence, and ai for science.
the digital intelligence era is the golden age of data. while the data scale is growing explosively and the data value is rising, it also faces challenges such as insufficient bandwidth of xpu and storage, low availability of computing clusters, and extended inference time, which puts higher requirements on storage. the digital intelligence era calls for storage for ai. data storage with extreme performance, high scalability, data resilience, sustainable development, new data paradigms, and data weaving capabilities is the only way to the ai ​​digital intelligence era.
to meet the above challenges, huawei has released the new ai storage oceanstor a800. based on the efficient training capabilities of large ai models, it has greatly enhanced the reasoning capabilities, and has made leaps and bounds in cluster performance and new data paradigms, promoting the implementation of ai in thousands of industries.
1. a single set of ai storage devices supports large-scale model training with 100,000 cards. by building a technical architecture with full sharing of front-end network cards and full interconnection between controllers and back-end ssd disks, a single set of storage devices can support full interconnection of 100,000-card clusters. one set of oceanstor a800 storage can achieve static full connection with up to 192,000-card training clusters, with a 40% performance improvement and an 80% reduction in space usage.
2. with storage and computing, ai cluster availability is increased by 30%. ai training is frequently interrupted. according to statistics, the longest continuous training time of the ai ​​training center is only 2.6 days, which causes the gpu/npu to repeatedly save ckpt data. china mobile uses huawei oceanstor ai storage for large model training, achieving a 150pb single storage cluster, 8tb/s bandwidth and 230 million iops capabilities, and increasing cluster availability by 32%, providing strong support for subsequent larger-scale large model training.
3. storage instead of computing, long memory storage improves the reasoning experience and reduces system costs.
long context has become an inevitable development trend for large model reasoning. according to the scaling law, providing sufficiently high reasoning computing power and the number of intermediate tokens can greatly improve the accuracy of reasoning. through professional ai storage, long context and massive intermediate reasoning tokens can be stored for a long time, maximizing the logical thinking and reasoning ability of large models, especially the slow thinking ability.
as the industry's first storage system to provide long memory capabilities, oceanstor a800 innovatively adopts a multi-level kv cache mechanism to persist and efficiently use kv-cache, allowing large model reasoning to have long memory capabilities and reduce repeated calculations in the prefill stage. the customer's reasoning latency is reduced by 78%, and the throughput of a single xpu card is increased by 67%, greatly improving the reasoning experience while reducing costs.
the new ai storage in the digital era is playing a role in financial credit, investment research and analysis, healthcare, drug development and other industry scenarios. zhou yuefeng said that in the golden age of data, huawei, with its innovative ai storage, has unleashed advanced data storage capabilities and laid the foundation for the digital era.
report/feedback