Dialogue with Xiong Dapeng, Chairman of Yizhu Technology: Storage and computing integration may usher in the second growth curve of computing power in the AI era
2024-08-14
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
The explosion of artificial intelligence (AI) has brought about massive demand for computing power. In the post-Moore era, advanced chip processes are approaching physical limits, and integrated storage and computing is expected to become one of the important technology routes in the future.
Storage and computing integration means that data storage and computing are integrated in the same area of the same chip. In what aspects are the performance and cost advantages of storage and computing integrated chips reflected? What challenges does large-scale commercial use currently face? Will storage and computing integration become a possibility for the domestic chip industry to overtake others?
Recently, China Business News discussed the above topic with Xiong Dapeng, founder, chairman and CEO of Yizhu Technology. In his view, storage-computing integrated technology has revolutionary potential in the future computing field, which will break Moore's Law and start the second growth curve of computing power. "Especially in the AI era, this technology may become a key factor in promoting the growth of computing power."
Breaking the von Neumann architecture and eliminating three major problems
In the traditional von Neumann architecture, computing and storage functions are performed by computing units (CPU, GPU, etc. XPU) and storage units respectively. Data is obtained from the memory and returned to the memory after processing. The time required to move and read data from the memory outside the processing unit is often several times the operation time, resulting in a decrease in computing efficiency or effective computing power.
"Today, with the prevalence of large models, in order to complete the calculation, it is necessary to move model parameters, and the number of parameters is very large, so the time spent accounts for a high proportion, even exceeding 80%. In some cases, this proportion is even higher. Therefore, the data bandwidth limits the actual effective performance. The chip performance on paper may be a P, but the actual performance may be far lower than this number. This is the so-called 'storage wall'." Xiong Dapeng told Caixin.
Along with the "storage wall problem", a large amount of energy is consumed in the transmission process, resulting in a significant reduction in the energy efficiency of the chip, which is the "energy wall" problem.
In addition, there is the "compilation wall" problem - that is, the dynamic data flow scheduling is complex, and the compiler cannot automatically optimize operators and executable programs in a static and predictable situation to achieve data flow optimization. It is necessary to rely on manual tuning to achieve a higher effective computing power, which increases the time and labor cost of actual deployment and migration. "These three points have greatly restricted the development of the AI industry, where resources are increasingly scarce and power consumption is increasing significantly." Xiong Dapeng said.
The integrated storage and computing technology breaks the von Neumann architecture, integrates storage function and computing function on the same chip, and directly uses storage units for data processing. By modifying the in-memory computing architecture of the "read" circuit, the calculation results can be obtained in the "read" circuit and the results can be directly "written" back to the destination address of the memory. There is no need for frequent data transfer between the computing unit and the storage unit, which eliminates the consumption caused by data movement, greatly reduces power consumption, and greatly improves computing efficiency.
"Storage and computing integrated technology is expected to become one of the important technology routes in the post-Moore era. From the first principles of effective computing power, for storage and computing integration, the amount of data movement is greatly reduced, and the effective computing power shows linear growth. It can be said that storage and computing integration will break Moore's Law and start the second growth curve of computing power. At the same time, I believe in the transformative potential of storage and computing integrated technology in the future computing field, especially in the AI era. This technology may become a key factor in promoting the growth of computing power." Xiong Dapeng said.
A solution with better energy efficiency and cost performance
Compared with the high-bandwidth memory chip HBM which has become popular recently, the system energy efficiency and cost-effectiveness of the integrated storage and computing architecture chip are better.
HBM is a high-performance memory interface technology that is mainly used to improve the data processing capabilities of GPUs and high-performance computing (HPC) systems. This technology stacks DRAM chips vertically and uses high-speed interconnects to closely connect them to the processor, thereby greatly increasing bandwidth.
"HBM is an effective technical route to solve the 'storage wall' problem, but it comes at the cost of cost and power consumption, because providing large bandwidth requires higher power consumption and is very expensive, far exceeding the price of traditional DRAM." Xiong Dapeng said, "In essence, HBM is a storage chip and does not have computing functions. It needs to be paired with computing chips such as GPGPU to realize computing functions."
From the perspective of system cost, storage and computing integrated chips may be cheaper than the combination of traditional GPGPU and HBM.
On the one hand, the computing power density or PPA of the storage-computing integrated architecture is higher. "The equivalent data bandwidth of the storage-computing integrated architecture is far greater than that of HBM, which may be several times or even more than ten times. At the same time, its computing power density is more advantageous, and the actual effective computing power, cost-effectiveness, and energy efficiency ratio are much higher than those of the GPGPU+HBM solution." Xiong Dapeng said.
On the other hand, the integrated storage and computing technology is relatively less dependent on advanced processes, while GPGPU and HBM are heavily dependent on advanced processes. "HBM relies on advanced processes and has great supply chain risks. However, the integrated storage and computing technology route, even if it does not use advanced processes, such as 12nm and 22nm, may not be inferior to 4nm or even 3nm in performance. This is also the concept of overtaking in another lane."
In terms of cost-effectiveness, although integrated storage and computing may require more chips to achieve the same performance, its high cost-effectiveness and high energy efficiency are one of its significant advantages.
In the next 2-3 years, it may be implemented on a large scale in the field of large models
The research and application of storage-computing integrated technology are accelerating worldwide.
At present, overseas high-computing chip companies that adopt the integrated storage and computing route include AI chip startup Groq, which is valued at more than US$2.8 billion and is seen as a strong competitor to Nvidia; d-Matrix has received investments from Microsoft, Temasek, Samsung, Marvell, Hynix, Ericsson and many other companies.
In addition, Samsung has also published a study on in-memory computing based on MRAM in Nature, demonstrating the high accuracy of its AI algorithm. SK Hynix has launched a DRAM in-memory computing product based on the GDDR interface, which can greatly increase computing speed and reduce power consumption.
"As far as I know, most overseas companies use SRAM to achieve storage and computing integration, but its capacity is low and the cost is high. For example, Groq's complete solution requires more than 570 chips, while if NVIDIA H100 is used, the number of chips required is only in the single digit. This is mainly due to insufficient storage density." Xiong Dapeng said that many emerging companies in China have made breakthroughs in storage and computing integration technology, providing the possibility for China's chip industry to overtake in other lanes.
However, when the computing power of storage and computing integrated chips is expanded on a large scale, there are still many challenges: first, the problem of unreliable accuracy; second, based on analog computing, digital-to-analog conversion brings bottlenecks in energy consumption, die size and performance; third, large AI models have capacity requirements.
"The fully digital path can solve these problems well, and this is also the basis for Yizhu Technology to make AI high-computing power inference chips." Xiong Dapeng said.
In a general analog storage and computing system, data is stored in the form of analog signals, represented by different voltage levels in the storage unit, and MAC and other operations are performed based on Ohm's law and Kirchhoff's Laws. The biggest problem with this solution is the unreliability of accuracy and precision, and analog circuit noise and various variables are one of the reasons. Whether it is the manufacturing process or the working environment, the value represented by the memristor will have errors or drift. The mixed digital and analog method attempts to balance efficiency and accuracy issues, but still cannot guarantee high accuracy and accuracy credibility.
Xiong Dapeng introduced that Yizhu Technology's solution is a fully digital storage and computing integrated solution based on memristors (ReRAM). Because it is fully digital, data is stored in the storage unit in binary form, and a memristor only represents one bit, which means there is only a difference between high and low levels, high and low resistance, and high and low current. In this case, it can be reliable.
In addition, the development of storage and computing integration is also facing the problem of engineering implementation. "As a new technology route, how to use and integrate into the existing ecology is a big challenge. Programmability and compatibility with the existing ecology are crucial." Xiong Dapeng told Yicai Global.
Overall, storage-computing integrated technology is globally regarded as an effective means to resolve the contradiction between high computing power demand and high energy consumption costs. It also provides an important opportunity for the Chinese chip industry to catch up. In the next few years, as technology continues to mature and market demand increases, storage-computing integrated chips are expected to be widely used in multiple fields and promote the innovative development of the entire industry. At present, the application of storage-computing integrated chips in the field of large models is still in the development stage. Xiong Dapeng expects that large-scale implementation will be achieved in the next 2-3 years.
(This article comes from China Business Network)