news

ChatGPT moment for open source big models? The much-anticipated Llama 3 405B is about to be released

2024-07-23

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

After much anticipation, the Llama 3 405B, originally scheduled to be released on the 23rd, is finally here.

As the top model in the Llama 3 series, the 405B version has 405 billion parameters and is one of the largest open source models to date.

In the early hours of last night, META suddenly leaked the evaluation data of Llama 3.1-405B. Some netizens expected that a Llama 3.1-70B version might be released at the same time, because "(pre-leak of models) is an old tradition of META, and it happened once with the Llama model last year."

Some analysts believe that Llama 3 405B is not only another improvement in artificial intelligence capabilities, but also a potentialChatGPTmoment” where state-of-the-art AI is truly democratized and put directly into the hands of developers.

Three predictions for the upcoming Llama 3 405B announcement

Some analysts have predicted the highlights of the upcoming Llama 3 405B announcement from three perspectives: data quality, model ecosystem, and API solutions.

First, the Llama 3 405B may revolutionize data quality for dedicated models.

For developers focused on building specialized AI models, a long-standing challenge is obtaining high-quality training data. Smaller expert models (1-10B parameters) often leverage distillation techniques to augment their training datasets with the output of larger models. However, using data fromOpenAISuch data from closed-source giants such as Nvidia is strictly restricted, limiting commercial applications.

Llama 3 405B is here. As an open source powerhouse that rivals the power of proprietary models, it provides a new foundation for developers to create rich, unrestricted datasets. This means that developers can freely use the distilled output of Llama 3 405B to train niche models, greatly accelerating the innovation and deployment cycle in professional fields. It is expected that there will be a surge in the development of high-performance, fine-tuned models that are both powerful and in line with open source ethics.

Secondly, Llama 3 405B will form a new model ecosystem: from basic models to expert combinations

The launch of Llama 3 405B has the potential to redefine the architecture of AI systems. The sheer size of the model (405 billion parameters) could suggest a one-size-fits-all solution, but the real power lies in its integration with a layered model system. This approach is particularly resonant for developers working with AI at different scales.

Expect a shift toward a more dynamic model ecosystem, with the Llama 3 405B acting as the backbone, powered by small and medium-sized models. These systems will likely employ techniques such as speculative decoding, where less complex models handle the bulk of the processing, calling the 405B model only when necessary for validation and error correction. This will not only maximize efficiency, but also open up new avenues for optimizing compute resources and response times in real-time applications, especially when running on a SambaNova RDU optimized for these tasks.

Finally, Llama 3 405B has the most efficient API competition

With great power comes great responsibility - deployment is a major challenge for Llama 3 405B. Developers and organizations need to carefully navigate the complexity of the models and operational requirements. AI cloud providers will compete to provide the most efficient and cost-effective API solutions for deploying Llama 3 405B.

This situation provides a unique opportunity for developers to interact with different platforms and compare how various APIs handle such large models. The winners in this space will be those who can provide APIs that can not only manage the computational load efficiently, but also do not sacrifice model accuracy or disproportionately increaseCarbon Footprint

In short, Llama 3 405B is not just another tool in the AI ​​arsenal; it is a fundamental shift towards open, scalable and efficient AI development. Analysts believe that whether it is fine-tuning niche models, building complex AI systems or optimizing deployment strategies, the arrival of Llama 3 405B will open up new horizons for users.

What do netizens think?

Netizens posted in the LocalLLaMA sub-Reddit section, sharing information about Meta Llama 3.1 with 405 billion parameters. Judging from the results of several key AI benchmarks, the performance of this AI model exceeds the current leader, OpenAI’sGPT-4o, marking the first time that an open source model could beat the current state-of-the-art closed sourceLLMModel.

As shown in the benchmarks, Meta Llama 3.1 outperforms GPT-4o in many tests including GSM8K, Hellaswag, boolq, MMLU-humanities, MMLU-other, MMLU-stem, and winograd. However, it lags behind GPT-4o in HumanEval and MMLU-social sciences.

Ethan Mollick, associate professor at the Wharton School of the University of Pennsylvania, writes:

If these statistics are true, then it can be said that the top-of-the-line Al model will be available to everyone for free starting this week.

Every government, organization, and company in every country around the world will have access to the same AI capabilities as everyone else. It will be interesting.

Some netizens summarized several highlights of the Llama 3.1 model:

The model was trained using 15T+ tokens from public sources, and the pre-training data cutoff date is December 2023;

Fine-tuning data includes a publicly available instruction fine-tuning dataset (different from Llama 3) and 15 million synthetic samples;

The model supports multiple languages, including English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai.

Some netizens said that this is the first time that an open source model has surpassed closed-source models such as GPT4o and Claude Sonnet 3.5, and achieved SOTA on multiple benchmarks.