cerebras launches the world's fastest ai inference solution, 20 times faster than nvidia's solution

cerebras launches world’s fastest ai inference solution, 20 times faster than nvidia’s solution

2024-08-28

cerebras launches world’s fastest ai inference solution, 20 times faster than nvidia’s solution

2024/8/28 9:51:23 source: it home author: yuanyang editor: yuanyang

comments: 2

it home reported on august 28 that cerebras systems today announced the launch of cerebras inference, which the company officially called the world's fastest ai inference solution. the new solution provides 1,800 tokens per second for llama 3.1 8b and 450 tokens per second for llama 3.1 70b, which is 20 times faster than nvidia gpu-based ai inference solutions provided in hyperscale clouds such as microsoft azure.

in addition to its incredible performance, this new inference solution is priced significantly lower than popular gpu clouds, starting at just 10 cents per million tokens, delivering 100x better price/performance for ai workloads.

the solution will allow ai application developers to build the next generation of ai applications without compromising speed or cost. the solution uses the cerebras cs-3 system and its wafer scale engine 3 (wse-3) ai processor, where the cs-3 has 7,000 times the memory bandwidth of nvidia h100, solving the memory bandwidth technical challenges of generative ai.

according to it home, cerebras inference provides the following three levels:

the free tier provides free api access and generous usage limits to anyone who logs in.

designed for flexible serverless deployments, the developer tier provides users with an api endpoint at a fraction of the cost of alternatives on the market, with llama 3.1 8b and 70b models priced at 10 cents and 60 cents per million tokens, respectively.

the enterprise tier provides fine-tuned models, customized service-level agreements, and dedicated support. enterprises can access cerebras inference through a cerebras-managed private cloud or on customer premises, making it ideal for ongoing workloads.

the cerebras team said: "with record-breaking performance, industry-leading pricing, and open api access, cerebras inference sets a new standard for open llm development and deployment. as the only solution that can provide high-speed training and inference at the same time, cerebras opens up new possibilities for ai."

the field of ai is evolving rapidly, and while nvidia currently dominates the ai market, the emergence of companies such as cerebras and groq signals a possible change in industry dynamics. as demand for faster and more cost-effective ai inference solutions continues to increase, these challengers are disrupting nvidia’s dominance, especially in the inference space.

news

cerebras launches world’s fastest ai inference solution, 20 times faster than nvidia’s solution

introduction

my contact information