news

Cloud giants go on a rampage, with 2 million self-developed CPUs shipped! A new round of chip reshuffle begins

2024-07-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


Smart Things
AuthorZeR0
Edit Mo Ying

Last week, the Graviton4 processor independently developed by Amazon Web Services, the world's largest cloud computing giant, was fully launched, initially providing support for the new instance Amazon EC2 R8g.

This is quite emotional, and it gives people a sense of déjà vu that the Arm server CPU has finally come to fruition after going through so much hardship.

The server CPU field has experienced a period of ups and downs. In the early days, a group of predecessors with reduced instruction sets dominated the market, but later they were counterattacked by the complex instruction set architecture x86. When the younger generation of the reduced instruction set Arm wanted to enter the data center track, x86 had already dominated the market.

In fact, Arm had been eager to try out this emerging market as early as 2008. However, ten years later, after several attempts, it has not made any splash.

The first ticket to enter the data center market was given by Amazon Web Services, the big brother of the cloud computing industry.

At that time, Amazon Web Services launched a "three-hit combo":

1. In January 2015, it unexpectedly acquired the Israeli chip design company Annapurna Labs, which attracted close attention from the industry;

2. In 2017, Amazon launched its first self-developed network chip, Amazon Nitro, and brought the world's first commercial DPU chip to the stage of history;

3. In 2018, the first Amazon Graviton processor was released, giving Arm server CPU a clear coordinate in the history of data centers.

Amazon Web Services then launched a textbook multi-line custom chip research and development, and other major Chinese and American companies also followed the trend of developing their own processors. The long tug-of-war over server CPUs finally changed from a one-sided dominance of x86 to a gradual rise in the momentum of the Arm camp.

Graviton has also gradually become the world's most widely used Arm server CPU, and Amazon Web Services is seen as the "hope of the whole village" that will lead the Arm ecosystem to expand in data centers. A report by Bernstein last year showed that Amazon Web Services accounted for more than half of the global Arm server CPU market.


▲Amazon Web Services has released five Graviton processors in five years (Source: Zhidongxi)

Today, it is common for large companies to develop their own chips, but only a few have succeeded. This self-developed server CPU reference textbook written by Amazon Web Services in five years is worth chewing over and over again.

1. Six years of hard work on CPU self-developed innovation, paving the way for Arm server chips

The first generation opened up the mountain road, the second generation established the world.

This is a true portrayal of Amazon Web Services' rise to fame through self-developed server chips: the Graviton processor released in November 2018 fired the first shot in the cloud computing giant's self-developed CPU; a year later, the successor Graviton2 debuted, marking the official entry of Arm server CPUs into the data center market competition, competing with x86.

Graviton2 integrates 30 billion transistors, has four times the number of cores as the previous generation, has doubled the L1/L2 cache, and has a bus bandwidth of 2TB/s, achieving a 7-fold performance improvement over the previous generation. Compared with similar x86-based instances, the performance of Graviton2-based instances is 40% higher, and the cost per instance is reduced by 20%.

The outstanding performance in low power consumption has enabled Amazon Web Services to massively transfer general-purpose workloads to Graviton2 to save power and costs. Since then, the adoption rate of the Graviton series has soared, and the workloads covered have expanded from the initial Cache and Web to data analysis, machine learning, high-performance computing, etc.

Graviton's initial success in the market can be regarded as a turning point that changed Arm's destiny.

Behind this, Amazon Web Services has put a lot of thought into the underlying innovation: for the first time, it no longer uses synchronous multi-threading technology, but instead implements exclusive single-thread resources of physical cores, allowing each vCPU to exclusively occupy one physical core, making the vCPUs more isolated and preventing performance jitters due to competition for resources.


From the first generation to the second generation, Graviton achieved considerable performance improvements by increasing the number of cores, but for the third generation, Amazon Web Services needs to incorporate more design innovations.

Increasing the number of cores and increasing the main frequency are two common means to improve performance. The third-generation Graviton3 released in 2021 did not adopt these ideas. The number of cores remained unchanged, and the main frequency was only slightly increased. Because increasing the frequency is risky for large-scale data centers, it may bring a lot of energy consumption, and it needs to be matched with upgraded power and cooling configurations, which ultimately leads to higher customer usage costs.

Graviton3 has made several innovations that are different from its predecessor:

1. Chiplet design is used to package 7 silicon dies together;

2. The instruction-level parallelism method is used to increase the number of instructions that can be executed in a single core cycle, allowing the core to complete more tasks;

3. For workloads that are sensitive to memory bandwidth and latency, the memory space has been increased by 40%, and DDR5 has been used to increase the memory channel bandwidth by 50%.

As a result, compared to the previous generation, Graviton3 can improve the performance of application loads by 25% without any difference, and reduce power consumption by up to 60% compared to x86 instances. Through the built-in machine learning hardware acceleration unit, this processor also achieves a 3x improvement in machine learning performance and is used by AI researchers and enterprises for MLOps in the cloud.


The Graviton3E, launched in 2022, is specially optimized for floating-point and vector instruction operations. Its vector computing performance is twice that of Graviton3, and is particularly suitable for application scenarios such as artificial intelligence/machine learning, and high-performance computing.

The latest generation of Graviton4 uses a better Neoverse-V2 core and increases the number of cores to 96 cores. The L2 cache of each core is doubled to 2MB, and the memory bandwidth is increased by 75%.


Each generation of Graviton will have a double-digit performance improvement over the previous generation, and the power consumption per unit of computing power continues to decline. Energy conservation and emission reduction are extremely important for the sustainable development of data centers. Twitter, Databricks, Formula 1, Snap and other well-known cloud customers have used Graviton-based services and praised its advantages in reducing costs and increasing efficiency.

According to foreign media reports, by mid-2022, Graviton will account for about 20% of Amazon Web Services' CPU instances, most of which are Graviton2, and about 50% of Amazon Web Services' new virtual machine instances will be from the Graviton series.

Some cloud customers have publicly endorsed that they have saved 10% to 40% of computing costs by renting Graviton services.

As an early user of Graviton, Dahua Infinity has reduced the cost of big data operations by 20% using Graviton2; Tuya, which uses a large number of Graviton2 instances, has also upgraded to the new generation of instances, improving the encryption and decryption performance of the IoT platform by 50%.

According to data from market research firm IDC, Arm's server shipment market share will be about 10% in the first quarter of 2023. At this time, Arm's ecological problems in the server market have been initially resolved.

To date, Amazon Web Services has deployed more than 2 million Graviton processors in 33 regions and more than 100 availability zones on six continents. These processors drive more than 150 computing instances and are used by more than 50,000 companies and developers around the world.


2. The only cloud giant to use Arm architecture on a large scale

In the process of serving cloud customers, the Amazon Web Services team found that if they wanted to revolutionize the cost-effectiveness of computing for all possible workloads, they needed to completely rethink instances and delve into the underlying technologies, including custom chips.

Why design chips based on the Arm architecture?

For Amazon Web Services, this is both a necessity and a forward-looking strategy.

First, Arm licenses are relatively easy to obtain and offer a high degree of design freedom, making it easier for Amazon Web Services to design processors that better meet the needs of its cloud business.

Secondly, power saving has long been a major problem for data centers. Considering the scale effect, the few watts saved by each chip are very important. Arm has been tested by the mobile processor market with its advantages such as high energy efficiency, high computing density, and low cost.

In addition, as we mentioned earlier, Graviton is very cautious in increasing the frequency, and it uses higher instruction-level parallelism to make up for the performance, making it more competitive in terms of cost performance. Under high CPU utilization, each vCPU in Graviton occupies a physical core, there is no contention problem, and it can maintain a still fast speed, and its price advantage will become obvious.

According to Amazon Web Services, the Amazon EC2 R8g instance based on the new-generation Graviton4 processor has a 30% performance improvement over the seventh-generation R7g instance using Graviton3, a larger instance size, and 3 times more vCPU and memory, providing better cost-effectiveness for memory-intensive workloads such as databases, memory caches, and real-time big data analysis.

Compared to R7g instances, R8g instances can speed up web applications by up to 30%, databases by up to 40%, and large Java applications by up to 45%.

Its performance and cost-effectiveness advantages have been verified by some actual tests.


According to some benchmark test results released by Phoronix, with the same number of vCPUs, the new Graviton4 core is roughly equivalent to Intel Sapphire Rapids in performance, and is comparable to AMD's fourth-generation EPYC. When running workloads such as high-performance computing, encryption, code compilation, ray tracing, databases, 3D modeling, etc., the overall generational progress is excellent.


▲After testing, the cost-effectiveness of the R8g instance based on Graviton4 exceeds that of Amazon Web Services cloud instances based on Intel Xeon and AMD EPYC (Source: Phoronix.com)

As one of the first customers of the R8g instance, Honeycomb shared that the throughput improvement of Graviton4 is very obvious, and the throughput per vCPU has more than doubled compared to when Graviton was first used four years ago. They plan to migrate the entire workload to Graviton4 immediately after the official release of the R8g instance series.

Epic Games, the producer of the hit game "Fortnite", commented that the EC2 R8g instance based on the latest Graviton4 is the fastest EC2 instance they have ever tested, and it performs well in its "most competitive and latency-sensitive workloads" and can fully improve the performance of game servers.

Preliminary testing results of R8g instances for SAP HANA Cloud show that R8g instances can improve analytical performance by up to 25% and transactional workload performance by up to 40% compared to Graviton3-based instances.


▲Comparison of different specifications of R8g instances

So far, only Amazon Web Services has truly achieved large-scale use of the Arm architecture.

Why Amazon Web Services? As Dai Wen, General Manager of Solutions Architecture for Greater China at Amazon Web Services, said at this year’s China Summit: “Only in the cloud computing environment can we have the opportunity to make such full-stack innovations from applications to CPUs.”

Self-developed chips are not just innovations on paper; they require the accumulation of engineering experience. They must not only pursue high performance, but also be sufficiently stable, reliable and highly secure.

Using the same Arm microarchitecture does not mean that you can make a CPU with the same performance, and designing a chip does not mean that you can achieve mass production and commercial success. The linearity and communication delay problems caused by the interconnection of hundreds of CPU cores alone can stump many chip teams, not to mention the ecological difficulties that must be overcome when designing Arm server chips.

Amazon Web Services' R&D approach is to deeply understand cloud customers' workloads and then work backwards to chip design. This customer-centric approach allows Amazon Web Services to make adjustments in the short term to quickly adapt to market dynamics.

Taking Graviton4 as an example, Amazon Web Services designed a CPU architecture for practical applications for the first time. The design of this processor shifted from the traditional MicroBenchmark benchmark evaluation system to an evaluation method based on actual workloads. For example, the front-end and back-end CPU parameters required to optimize Cassandra databases, Groovy applications, and nginx servers are different.

The huge customer base has built a high barrier for Amazon Web Services. Its extensive data center clusters around the world can support the deployment of Graviton series processors. The scale effect formed by the world's largest cloud computing business can effectively spread costs for Amazon Web Services.

Continuously innovative cloud services enable Amazon Web Services to understand the most used applications and their resource consumption patterns, so as to select the technical points that bring the highest benefits to users, perform targeted optimization, quickly improve software and hardware stacks and even CPU designs, and develop matching vCPUs and hardware cores.

At the same time, Amazon's various managed service product lines all use a unified infrastructure, so Graviton innovations can be applied to all managed services in a timely manner. Users can easily enjoy the cost-effectiveness improvements brought by Graviton by changing computing options.

Users only need to worry about which instance can better meet their needs, and Amazon Web Services will be responsible for reducing the cost of software migration and learning. By deeply integrating more management services with Graviton, seamless migration from x86 to Arm becomes simple and fast.

3. How do self-developed chips affect cloud computing?

Today, self-developed chips have become a standard practice for major technology companies. Whether it is to reduce costs and increase efficiency, build competitive advantages, or improve controllability and reduce dependence on third-party chip companies, it is a good story that can easily convince downstream customers and investors.

But nine years ago, when Amazon Web Services took the lead in developing its own chips, it was still an advanced exploration.

Looking back at the history of cloud computing, the release of the first EC2 (Elastic Cloud Computing) instance definition by Amazon Web Services in 2006 was considered a historic moment. Subsequently, more and more companies gradually accepted the concept of cloud computing and began to migrate their own applications to the cloud.

Now Amazon Web Services can successfully run high-performance computing clusters with tens of thousands of nodes on the cloud to train large models and process high-concurrency real-time streaming applications on the cloud, which was hard to imagine at the time. The first EC2 instance of Amazon Web Services had a main frequency of only 1.7GHz, a network bandwidth of 250Mbps, less than 2GB of memory, and a mechanical disk of only 160GB.

In the first few years of its cloud computing business, Amazon Web Services had to solve many difficult problems. What particularly worried the team was that if a customized version of Xen was used as the virtualization hypervisor, no matter how much time was spent optimizing the code, the virtual layer would always occupy host resources, and the x86 CPU was not good at handling network traffic.

It was not until 2013 that an Israeli chip company Annapurna Labs came into the sight of Amazon Web Services. Through cooperation, Amazon Web Services wrote network processing into hardware for the first time. The surprising landing performance made Amazon Web Services focus on this outstanding partner: in January 2015, Amazon Web Services announced the acquisition of Annapurna Labs, and since then embarked on the journey of self-developed chips.

Looking back, this was definitely a smart investment in the history of Amazon Web Services.

Just two years after this transaction, Amazon Web Services announced the Nitro virtualization platform, which offloads all security, management, and monitoring to hardware and provides customers with nearly 100% of the host computing power.

From then on, cloud computing embarked on the path of complete physical isolation of business and infrastructure, and the underlying virtualization technology innovation and the upper-level server type development could be carried out in parallel.

This gave rise to a key turning point for EC2 instances: from 2006 to 2017, it took Amazon Web Services 11 years to expand from 1 EC2 instance to 70 types; and from 2017 to 2023, EC2 instances suddenly exploded, from 70 to 750 in 6 years, providing suitable computing instances for various loads.

Building on the success of Nitro, Amazon Web Services has developed three product lines: network chips, server CPUs, and AI training and inference chips. The Nitro network chip has grown to its fifth generation, continuously optimizing network performance, storage performance, and security reinforcement. Graviton has released four generations and five models. The AI ​​inference chip Inferentia and the AI ​​training chip Trainium provide users with AI acceleration options other than GPUs by providing more cost-effective inference and training instances.

This enables Amazon Web Services to maintain the flexibility of internal full-stack innovation: starting from customized motherboards and servers, to in-depth customized chips at the bottom level, and then to horizontally expanding its self-developed chip portfolio, Amazon Web Services will gradually integrate and coordinate chips, hardware, and software, while bringing better cost-effectiveness and reliability to the business, and forming its own core competitiveness.

The self-developed chips are linked with Amazon Web Services' self-developed storage servers and high-speed network systems, allowing more chips to be efficiently interconnected, thereby significantly reducing computing time. Based on these innovations, Amazon Web Services can support one of the most challenging tasks in cloud computing - artificial intelligence and machine learning.

At the recent Amazon Web Services New York Summit, Amazon Web Services announced that 96% of AI/ML unicorns have run their businesses on Amazon Web Services, and 90% of the companies on the 2024 Forbes AI 50 list chose Amazon Web Services. From 2023 to date, Amazon Web Services has officially released 326 generative AI features, and the number of officially available machine learning and generative AI services during the same period is more than twice that of other suppliers.

A wide range of use cases and deep technical accumulation are always interdependent. These astonishing numbers of AI use cases allow Amazon to provide customers with enough practical cases to choose the best options, and extensive customer feedback can be the best driving force for its chip design. The continuous iteration of chip technology will support more and more cost-effective cloud services and promote the universalization of generative AI.

Conclusion: No chip is the only solution for cloud computing

There are a plethora of chip options on the market, and cloud infrastructure providers can play a valuable role in how to integrate all of these together to better enable various innovations from infrastructure to cloud services.

Unlike independent chip companies, the purpose of Amazon Web Services' self-developed chips is not to participate in market competition, but to provide its customers with a "universal store" that provides both self-developed chips and mainstream options on the market such as Intel CPUs and Nvidia GPUs. Customers can choose the product combination that best meets their workload needs based on the configuration files of these chip instances.

Graviton's six-year evolution is the story of Arm's server CPU landing. Arm provides Amazon Web Services with a flexible basis for customizing CPUs, while Amazon Web Services promotes changes in the server chip market and becomes the best representative of Arm's cost and cost-effectiveness advantages in the data center market.

As long as Graviton still has room to reduce costs and increase efficiency, Amazon Web Services can continue to lower prices and pass on the benefits of scale and technology to cloud customers.