2024-07-23
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Machine Heart Report
Synced Editorial Department
Get your GPU ready!
Llama 3.1 has finally appeared, but the source is not official from Meta.
Today, the news of the new version of Llama's large model leaked on Reddit went viral.In addition to the base model, benchmark results for 8B, 70B, and the maximum parameterized 405B are also included。
The following figure shows the comparison results of various versions of Llama 3.1 with OpenAI GPT-4o and Llama 3 8B/70B. As you can see,Even the 70B version outperforms GPT-4o on multiple benchmarks。
Image source: https://x.com/mattshumer_/status/1815444612414087294
Obviously, the 8B and 70B models of version 3.1 are distilled from the 405B, so there is a significant performance improvement over the previous generation.
Some netizens said that this isFor the first time, the open source model surpassed closed source models such as GPT4o and Claude Sonnet 3.5, reaching SOTA on multiple benchmarks。
At the same time, the model card of Llama 3.1 was leaked and the details were also leaked (the date marked on the model card shows that it is based on the release on July 23).
Some people summarized the following highlights:
Image source: https://x.com/iScienceLuvr/status/1815519917715730702
Although the leaked Github link is currently 404, some netizens have provided a download link (but for safety reasons, it is recommended to wait for the official channel announcement tonight):
However, this is a 100 billion-level model after all, so please prepare enough hard disk space before downloading:
Here are the highlights of the Llama 3.1 model card:
Basic information of the model
The Meta Llama 3.1 Multilingual Large Language Model (LLM) collection is a set of pre-trained and fine-tuned generative models of size 8B, 70B, and 405B (text input/text output). The Llama 3.1 fine-tuned text-only models (8B, 70B, 405B) are optimized for multilingual conversational use cases and outperform many available open-source and closed-source chat models on common industry benchmarks.
Model architecture: Llama 3.1 is an optimized Transformer architecture autoregressive language model. The fine-tuned version uses SFT and RLHF to align usability and security preferences.
Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
From the model card information, it can be inferred thatThe context length of Llama 3.1 series models is 128kAll model versions use Grouped Query Attention (GQA) to improve inference scalability.
expected usage
Intended use cases. Llama 3.1 is intended for commercial applications and research in multiple languages. The command-adapted plain text model is suitable for assistant-like chats, while the pre-trained model can be adapted to a variety of natural language generation tasks.
The Llama 3.1 model set also supports the ability to leverage its model outputs to improve other models, including synthetic data generation and distillation. These use cases are permitted under the Llama 3.1 Community License.
Llama 3.1 is trained on a wider set of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages other than the 8 supported languages, provided they comply with the Llama 3.1 Community License Agreement and Acceptable Use Policy, and in such cases are responsible for ensuring that Llama 3.1 in other languages is used in a safe and responsible manner.
Software and hardware infrastructure
First of all, Llama 3.1 is pre-trained using a custom training library, Meta’s customized GPU cluster and production infrastructure, and is also fine-tuned, annotated, and evaluated on the production infrastructure.
Next is the training energy consumption. Llama 3.1 training used a total of 39.3 M GPU hours of computing on H100-80GB (TDP 700W) type hardware. Here, the training time is the total GPU time required to train each model, and the power consumption is the peak power capacity of each GPU device, adjusted according to the power efficiency.
Training GHG emissions. Total GHG emissions during Llama 3.1 training on a regional basis are estimated to be 11,390 tonnes of CO2e. Meta has maintained net zero GHG emissions across global operations since 2020 and matches 100% of its electricity use with renewable energy, resulting in total GHG emissions during training on a market basis of 0 tonnes of CO2e.
The methods used to determine training energy usage and greenhouse gas emissions can be found in the following paper. Since Meta has released these models publicly, others do not need to bear the cost of training energy usage and greenhouse gas emissions.
Paper address: https://arxiv.org/pdf/2204.05149
Training Data
Overview: Llama 3.1 is pre-trained using ~15 trillion tokens of data from public sources. Fine-tuning data includes publicly available instruction datasets, as well as over 25 million synthetically generated examples.
Data freshness: The cutoff date for pre-training data is December 2023.
Benchmark Rating
In this section, Meta reports the scoring results of the Llama 3.1 model on the annotation benchmark. For all evaluations, Meta uses an internal evaluation library.
Security risk considerations
The Llama research team is committed to providing the research community with valuable resources to study the robustness of safe fine-tuning, and to providing developers with safe and robust off-the-shelf models for a variety of applications to reduce the workload for developers deploying safe AI systems.
The research team adopted a multi-faceted data collection approach, combining artificially generated data from suppliers with synthetic data to mitigate potential security risks. The research team developed a number of large language model (LLM)-based classifiers to thoughtfully select high-quality prompts and responses to enhance data quality control.
It is worth mentioning that Llama 3.1 attaches great importance to the model's rejection of benign prompts and rejection of tone. The research team introduced boundary prompts and adversarial prompts in the safe data policy and modified the safe data response to follow the tone guidelines.
Llama 3.1 models are not designed to be deployed individually, but should be deployed as part of an overall AI system, with additional “safety guardrails” as needed. Developers should implement systemic safety measures when building intelligent agent systems.
Note that this release introduces new features, including longer context windows, multi-language input and output, and possible integration of developers with third-party tools. When building with these new features, in addition to considering best practices that apply to all generative AI use cases in general, you need to pay special attention to the following issues:
Tool Usage: As with standard software development, developers are responsible for integrating LLM with the tools and services of their choice. They should develop clear policies for their use cases and evaluate the integrity of the third-party services used to understand the safety and security limitations when using this feature.
Multilingual: Lama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages, but these texts may not meet the performance thresholds for safety and helpfulness.
Llama 3.1's core values are openness, inclusion, and helpfulness. It is designed to serve everyone and be applicable to a variety of use cases. As such, Llama 3.1 is designed to be accessible to people of all backgrounds, experiences, and perspectives. Llama 3.1 is centered around users and their needs, without inserting unnecessary judgments or norms, while also reflecting the recognition that even content that may seem problematic in some contexts can serve a valuable purpose in others. Llama 3.1 respects the dignity and autonomy of all users, and in particular respects the values of free thought and expression that fuel innovation and progress.
But Llama 3.1 is a new technology, and like any new technology, its use carries risks. Testing conducted to date has not covered, and cannot cover, all scenarios. Therefore, as with all LLMs, the potential outputs of Llama 3.1 cannot be predicted in advance, and in some cases the model may respond to user prompts inaccurately, biased, or otherwise objectionable. Therefore, before deploying any application of a Llama 3.1 model, developers should perform security testing and fine-tuning for the model's specific application.
Model card source: https://pastebin.com/9jGkYbXY
Reference information: https://x.com/op7418/status/1815340034717069728
https://x.com/iScienceLuvr/status/1815519917715730702
https://x.com/mattshumer_/status/1815444612414087294