Llama 3.1 was hacked as soon as it was released: Cursing Zuckerberg, dangerous formulas came out of his mouth!

Llama 3.1 was hacked as soon as it went online: people cursed Zuckerberg and blurted out dangerous formulas!

2024-07-24

Mengchen from Aofei Temple Quantum Bit | Public Account QbitAI

The most powerful modelLlama 3.1, it was breached as soon as it went online.

Swearing at his boss Zuckerberg, and even know how to bypass blocked words.

Designing dangerous viruses, how to hack WifiJust open your mouth and say it.

Llama 3.1 405B surpasses GPT-4o, and the open source large model has reached the top. The side effect is that it is also more dangerous.

But it's not all bad.

The previous versions of the Llama series have been criticized by some users for excessive security protection:

It won't even "kill" a Linux process, which is very impractical.

Now, with the enhanced capabilities of version 3.1, we finally understand that this killing is not that killing.

Llama 3.1 was hacked just after it was released

The first person to break Llama 3.1 was the jailbreak master@Pliny the Prompter。

In the hands of my brother, almost no large model can survive.

In an interview with the media, Pliny said that on the one hand he does not like to be told what he cannot do and hopes to challenge the researchers behind the AI model.

Responsible jailbreaking, on the other hand, is a type of red team testing that helps identify vulnerabilities and get them fixed before they really become a big problem.

Here is his general routine, I won’t go into more details:

The answer format is set. First, let the big model reject the user's request by starting with "I'm sorry". Then insert a meaningless dividing line. After the dividing line, it is required to semantically reverse the first three words of each rejection, so "I can't" becomes "I can". Then, from time to time, the key words are turned into garbled characters to confuse the AI.

When the AI answered, it saw that I had already refused at the beginning, so there was no "moral burden" overall.

It doesn't seem dangerous to semantically reverse the first three words of each rejection.

Once you say "I can", the following content follows the principle of "probabilistic prediction of the next token", and the most likely thing is that you will just blurt out the answer.

So this method, in factIt is precisely the ability of cutting-edge large models to follow complex instructions, the more powerful the model is, to a certain extent, more easily fooled.

A recent study found that large models have a simpler security vulnerability: as long as the "past tense" is used, the security measures will not work.

Llama 3.1 also failed to defend against this move.

Apart from the safety issue, what is the strength of the most powerful model, Llama 3.1 405B, in other aspects?

We also took this opportunity to test it out.

Traps that even the most powerful models cannot escape

The most popular and outrageous question recently"Which is bigger, 9/11 or 9/9?", the answers to the official Instruction version of Llama-3.1-405B are always very straightforward, but unfortunately there is a high probability that they will be answered incorrectly.

If you ask him to explain, he will come up with some twisted ideas, and while chatting he will forget to speak Chinese, but he does forget to use emoticons.

Llama 3.1 has basically made no progress in solving the problems that have long plagued other large models.

For example, the classicThe “Reversal of the Curse” Problem, if you answer it the right way, you will get it, but if you answer it the wrong way, you won’t get it.

Recent researchAlice in Wonderland Problem, and also need reminders to do it right.

However, I was able to answer the question correctly in the Chinese version. Perhaps it is because the probability that "Alice" is a female name in the Chinese context is higher.

The letters of the alphabet also make the same mistakes as GPT-4o.

So regardless of these tricky questions, in what scenarios can Llama 3.1 really show its strength?

An entrepreneur shared:8B small model for fine-tuning, in chat, summarization, and information extraction tasksStronger than the GPT-4o mini+ prompt word, which is also a small model。

To be fairer,Compared with the fine-tuned version, Llama 3.1 8B still has a considerable advantage。

Therefore, the greatest significance of the Llama series is that it has never been the official version of the Instruct model, but that after it was open sourced, everyone used various private data to modify and fine-tune it according to their own needs.

Before 405B was released, someone experimented with model merging, stitching two Llama 3 70Bs into a 120B model, which surprisingly worked.

This time, it seems that Meta has also learned from this experience.The final release we see is actually the average of different checkpoints during the training process.。

How to create your own Llama 3.1

So the question is, how can you create a custom Llama 3.1 model for a domain-specific industry use case?

The big winner behind the scenes, Huang Renxun, personally took action this time.

NVIDIA also announced the launch of the new NVIDIA AI Foundry service and NVIDIA NIM™ inference microservice on the same day. Huang Renxun said:

“Meta’s open source release of Llama 3.1 marks a critical moment in the adoption of generative AI by enterprises around the world. Llama 3.1 will set off a wave of companies and industries creating advanced generative AI applications.

Specifically, NVIDIA AI Foundry has integrated Llama 3.1 throughout the process and can help companies build and deploy custom Llama super models.

The Nim microservice is the fastest way to deploy Llama 3.1 models into production, with up to 2.5 times higher throughput than running inference without Nim.

What’s more special is that on the NVIDIA platform,Enterprises can train custom models using their own data as well as synthetic data generated by Llama 3.1 405B and NVIDIA Nemotron™ Reward models。

The open source agreement updated in Llama 3.1 also specifically states that it is allowed to use the data produced by Llama to improve other models, but the word Llama must be added to the beginning of the model name after use.

For the security issues discussed above, NVIDIA also provides professional "guardrail technology"NeMo Guardrails。

NeMo Guardrails enables developers to build three kinds of boundaries:

Topic guardrails prevent apps from straying into non-target areas, such as preventing a customer service assistant from answering questions about the weather.
Functional safety guardrails ensure that applications respond with accurate and appropriate information. They filter out unwanted language and enforce that models only cite reliable sources.
Information security guardrails restrict applications to establish connections only with external third-party applications that have been confirmed to be safe.

One More Thing

Finally, I will share some platforms where you can try Llama 3.1 for free.If you have any questions that interest you, you can try it yourself.

On the first day the model was launched, the traffic was very large and the server of the large model arena was overwhelmed at one point.

Large Model Arena: https://arena.lmsys.org
HuggingChat：https://huggingface.co/chat
Poe：https://poe.com

Reference Links:
[1]https://x.com/elder_plinius/status/1815759810043752847
[2]https://arxiv.org/pdf/2406.02061
[3]https://arxiv.org/abs/2407.11969
[4]https://x.com/corbtt/status/1815829444009025669
[5]https://nvidianews.nvidia.com/news/nvidia-ai-foundry-custom-llama-generative-models

news

Llama 3.1 was hacked as soon as it went online: people cursed Zuckerberg and blurted out dangerous formulas!

Mengchen from Aofei Temple Quantum Bit | Public Account QbitAI

Introduction

my contact information