news

The most powerful open source big model became a god overnight! Llama 3.1 was released, and the era of GPT-4 for all people has arrived

2024-07-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


Smart Things
Author: Zhidongxi Editorial Department

Zhidongxi reported on July 24 that last night, Meta announced the launch of its most powerful open source model to date——Llama 3.1 405B, and released the newly upgraded Llama 3.1 70B and 8B models.

Llama 3.1 405B supports context lengths of128K Tokens, based on15 trillion tokensMore than 16,000 H100 GPUsThis is also the first Llama model ever trained at this scale by Meta.

The researchers' evaluation results based on more than 150 benchmark test sets show thatLlama 3.1 405B can be used with GPT-4o, Claude 3.5 Sonnet and Gemini Ultra and other head models in the industry


In addition to its powerful performance, Meta founder and CEO Mark Zuckerberg also personally wrote a post to support it. He said that in addition to having better cost and performance than closed-source models,The 405B open source model will be the best choice for enterprises to fine-tune and train smaller models

Meta AI announced the access to Llama 3.1 405B and launched new features such as AI photo editing, AI programming, and VR/AR device smart assistant. Zuckerberg predicted,Meta AI assistant usage will surpass ChatGPT in a few months


▲Meta AI supports real-time audio and video interaction between Quest headset and users

Meta's open source ecosystem is also ready.More than 25 partnersLlama 3.1 models will be available from Amazon AWS, NVIDIA, Databricks, Groq, Dell, Microsoft Azure, and Google Cloud, among others.

To date, the total number of downloads of all Llama model versions has exceeded300 million timesThe release of the Llama 3.1 model, which is comparable to the mainstream closed-source models, may mean that the open source model story that Meta wants to tell has just begun...


Model download link:

https://llama.meta.com/

https://huggingface.co/meta-llama

Paper link:

https://t.co/IZqC6DJkaq


▲Summary of Meta Llama 3.1 model paper interpretation

1. 405B open source model benchmarks GPT-4o, 25 partners are ready

Meta evaluated the performance on over 150 benchmark datasets, and Llama 3.1 405B was comparable to GPT-4o, Claude 3.5 Sonnet, and Gemini Ultra in a range of tasks including common sense, operability, mathematics, tool use, and multilingual translation.


In real-world scenarios, Llama 3.1 405B was compared with human evaluation.Overall performance is better than GPT-4o and Claude 3.5 Sonnet


UpgradedLlama 3.1 8B and 70BModels also perform better than models of the same parameter size. These smaller parameter models support the same 128K Tokens context window, multiple languages, improved reasoning, and state-of-the-art tool use to enable more advanced applications.


Meta updated its license to allow developers to use the output of the Llama model, including the 405B parameter size, to improve other models for the first time.

At the same time, Meta's open source ecosystem has further expanded, and more than 25 companies have launched the new Llama 3.1 model.

in,Amazon Web Services, Databricks, and NvidiaA full set of services is being launched to support developers in fine-tuning and training their own models. AI chip startups such as Groq have built low-latency, low-cost inference services for all new models released by Meta.

At the same time, these models will beAmazon Web Services, Microsoft Azure, Google Cloud, OracleProvide services on major cloud platforms such as

Scale AI, Dell, DeloitteCompanies like are ready to help enterprises adopt Llama models and train custom models using their own data.

Llama 3.1 405B is not only the strongest open source model, but also has the potential to become the strongest model. The distance between open source and closed source has been greatly shortened again.

2. Fully optimize the training stack and focus on making the model scalable

In order to train models based on 15 trillion tokens and achieve the results researchers want in a reasonable time, Meta completely optimized the training stack.


In addressing the above challenges, Meta chose to focus on strategies that keep the model development process scalable and more straightforward:

1. The researchers choseStandard decoder-only Transformer model architectureMaking small adjustments instead of using the MoE mixture of experts model can maximize training stability.

2. The researchers usedIterative post-training procedure, each round uses supervised fine-tuning and direct preference optimization. This enables the model to create the highest quality synthetic data for each round and improve the performance of each ability.

Compared to previous Llama models, Meta improves the quantity and quality of data used for pre-training and post-training. These improvements includeDevelop more careful preprocessing and management pipelines for pre-training data, and develop more rigorous quality assurance and filtering methods for post-training data

As expected from the Scaling Laws for large language models, Meta’s new flagship model outperforms smaller models trained using the same strategy. Meta also improved the training quality of its smaller models using a 405B parameter model.

At the same time, to support large-scale inference of the 405B parameter model, the researchers quantized the model from BF16 to FP8, effectively reducing the required computing requirements and allowing the model to run within a single server node.

In terms of command and chat fine-tuning, the researchers generated the final model by performing several rounds of alignment on top of the pre-trained model, each of which involved supervised fine-tuning (SFT), rejection sampling (RS), and direct preference optimization (DPO), which used synthetic data generation to generate the vast majority of SFT examples to generate higher quality synthetic data in all features.

In addition, Meta employs a variety of data processing techniques to filter this synthetic data to the highest quality, which enables the new model to scale fine-tune the amount of data across functions.

In terms of data, the researchers also carefully balanced the data to produce high-quality models with all features. For example, the model quality is guaranteed on the short-context benchmark, allowing it to scale to 128K context length.

In addition, Meta also announced the launch of a comprehensiveLlama SystemIn addition to the Llama model, the system also involves the coordination of multiple components and the calling of external tools to help developers develop customized products that are stronger than the basic model.

The Llama system will include a range of new components, including open source newSecurity ToolsSuch as Llama Guard 3 (multilingual security model) and Prompt Guard (just-in-time injection filter). In order to connect the scattered components, Meta also released a request for comment on the Llama Stack API, which is a standard interface so that third-party projects can more easily utilize the Llama model.

For ordinary developers, using 405B-scale models is still a challenge, which requires a lot of computing resources and expertise.

Based on the Llama system, generative AI development is more than just prompting models. Everyone should be able to use the 405B model to complete more tasks, including real-time and batch reasoning, supervised fine-tuning, evaluating models for specific applications, continuous pre-training, retrieval-augmented generation (RAG), function calls, synthetic data generation, etc.

This is the largest model Meta has launched so far, and more device-friendly sizes, more modes, and updates at the Agent level will be launched in the future.

three,405B large model explosionMeta AI, Quest smart voice assistantupgrade

Now, Meta's multiple terminals, such asWhatsApp and Meta AI ChatbotsZhongdu started using Llama 3.1 405B.


Meta AI currently supports seven new languages. This time, Meta launched a batch of new Meta AI creative tools, focusing on areas such as visual generation, mathematics and coding.

First look at visual generation, Meta AI launches“Imagine Me” image generation prompt function, supports users to type "imagine me" in the Meta AI chat and add prompts, such as "imagine I am a member of the royal family" or "imagine myself in a surreal painting" to generate images and share them with friends and family.


Meta AI will be launchedEdit With AIThe feature allows users to easily add or remove objects, or change and edit them with the click of a mouse, and keep the rest of the image unchanged, such as changing a cat to a corgi. Meta AI will also support adding newly created pictures to Facebook posts, as well as social platforms such as Instagram, Messenger and WhatsApp.


In Math and Programming, users can get help with math homework with step-by-step explanations and feedback, write code faster with debugging support and optimization suggestions, and master complex technical and scientific concepts with expert guidance.


Users can combine Meta AI's coding expertise and image generation capabilities to build new games from scratch or create new interpretations of classic games. It only takes a few minutes to turn your ideas into reality, and even allows users to preview the game directly.

It is worth mentioning that Meta AI also applies toRay-Ban Meta smart glasses, and will be available in experimental mode on Meta Quest in the United States and Canada next month. Meta AI will replace the current voice commands on Quest, allowing users to control the headset hands-free, get answers to questions, stay informed of real-time information, check the weather, and more.

Users can also use Meta AI in conjunction with the view they see in the headset, asking it about things they see in the physical environment.

4. Zuckerberg's open letter: Open source is better for developers, Meta, and the world

Llama 3.1 series was just released, and Zuckerberg's long blog was simultaneously posted on the official website, making the tension between the open-source and closed-source models even stronger.


▲Partial screenshot of Zuckerberg’s open letter

At the beginning, Zuckerberg mentioned that the gap between open source and closed source models is gradually narrowing. Last year, Llama 2 was only comparable to the most advanced closed source model of the previous generation. This year, Llama 3 is comparable to the most advanced model and is ahead in some areas.

Starting next year, he expects the Llama model to be the most advanced in the industry.. And the current Llama series models are already leading in terms of openness, modifiability and cost-effectiveness.

In the blog, he directly pointed out the closed source model and answered three major questions: why open source AI is good for developers, why open source AI is good for Meta, and why open source AI is good for the world.

First, why is open source AI good for developers?

He believes that developers need to train and fine-tune their own models to meet their specific needs; developers need to control their own destiny rather than be bound by a closed supplier; developers need to protect their own data; developers need efficient and low-cost models; developers want to invest in an ecosystem that will become a long-term standard.

The benefit of open source AI to Meta is that Meta's business model is to create the best experience and services for people. To do this, he believes that it must ensure that it always has access to the best technology and is not trapped in the closed ecosystem of its competitors.

At the same time, open source AI will enable Meta to develop Llama into a complete ecosystem with the potential to become an industry standard.

He also mentioned that one of the key differences between Meta and closed-source model players is that selling access to AI models is not Meta’s business model, which means that open source will not cut into its revenue, sustainability, or ability to continue investing in research.

Finally, Meta has a long history of open source projects and success.

Zuckerberg's view on the debate over the safety of open source AI modelsOpen-source AI will be safer than alternativesHe believes that open source will ensure that more people around the world can enjoy the benefits and opportunities brought by AI, that power is not concentrated in the hands of a few companies, and that the technology can be applied more evenly and safely throughout society.

Conclusion: Meta makes another move, and the debate over open and closed source for large models changes

The debate over open-source and closed-source big models continues...

From the release of Meta Llama 3.1 series models, we can see that the gap between open-source and closed-source big models is narrowing, and there is a trend of keeping pace and surpassing each other. As a loyal supporter of the open-source big model camp and a pioneer in technological innovation, Meta has been determined to build its own open-source ecosystem since the release of the Llama series models. At the same time, compared with the previous Llama models, Meta will also form an internal team for the release of this new model to allow as many developers and partners as possible to use the Llama series.

Meta has made another move, making the conclusion of the debate between open-source and closed-source models even more confusing. But in the final analysis, in actual applications, many companies and developers will choose to use open-source or closed-source models based on specific needs and situations. Therefore, it will take time to prove the specific capabilities of the model and the real scenarios it is applicable to.