Karpathy: I attacked the big model with SQL injection, it was so easy

2024-08-16

Machine Heart Report

Editors: Du Wei, Zenan

The safety of large models can be said to have "much room for improvement".

AI expert Andrej Karpathy is here again to share his knowledge. This time the topic is "Using special tokens to attack LLM in a similar way to SQL injection」。

The so-called SQL injection attack is a network attack technique. The attacker inserts malicious SQL statements into the input fields of the application, inducing the backend database to execute these malicious SQL statements. Such attacks usually take advantage of the application's improper handling of user input, such as not properly filtering or escaping the input, allowing the attacker to access, modify, or even delete data in the database.

As people's security awareness gradually increases, SQL injection should not occur in most software products at present.

But in the field of large models, everything is still in its infancy. The LLM tokenizer is responsible for parsing special tokens in the input string (such as, <|endoftext|>, etc.). Although this seems convenient, it can lead to false positives at best; at worst, it can lead to LLM security vulnerabilities, which are equivalent to SQL injection attacks.

It is important to note here that user input strings are untrusted data.

In SQL injection, you can use the "DROP TABLE" attack to crack the bad code. The same problem will be encountered in LLM. The bad code will parse the special token descriptor of the string into the actual special token, confuse the input representation, and make LLM unable to distribute the chat template.

Here is an example using the current huggingface Llama 3 tokenizer defaults.

As you can see, two unintuitive situations occur at the same time:

The <|begin_of_text|> token is added to the beginning of the sequence (128000).
The <|end_of_text|> token (128001) is parsed from the string and the special token is inserted. Now the text (possibly from the user) may be confused with the token protocol and cause the LLM to fail to distribute, resulting in undefined output.

Therefore, Karpathy recommends always using two extra flags for tokenizing, disabling add_special_tokens=False and split_special_tokens=True, and adding special tokens yourself in the code. He thinks the naming of these two options can be a bit confusing. For the chat model, you can also use the chat template apply_chat_template.

By doing the above you can get something that looks more correct. For example <|end_of_text|> is now treated as any other string sequence and broken down by the underlying BPE tokenizer just like any other string.

Karpathy believes that calls to encode and decode should never handle special tokens by parsing strings, and we need to deprecate this functionality completely. Instead, these should only be added explicitly and programmatically through separate code paths. In tiktoken, always use encode_ordinary; in huggingface, it is safer to use the flag mentioned above. At least be aware of this issue, and always keep your tokens visible and test your code.

Karpathy thinks these things are so subtle and poorly documented that he estimates that about 50% of the codebase today has bugs caused by these issues.

Even ChatGPT, which was rigorously tested before it was shipped, had some strange problems. In the best case, it just deleted the token, and in the worst case, it confused the LLM in an undefined way. Karpathy didn't know what was going on behind the scenes, but ChatGPT couldn't send him the string <|endoftext|> repeatedly. So be extra careful here.

As soon as Andrej Karpathy's article came out, it immediately sparked discussion. Someone asked: So what measures do LLM developers need to take to improve security?

Karpathy believes that it is simple to always mark strings in a "normal" way, that is, utf8 byte sequences. This reminds people of the principle of "least privilege" in the security field - essentially, by limiting the functionality to what is absolutely necessary, the possibility of unintended consequences can be minimized.

Some people also said that "we have already moved in this direction." Lucas Beyer, the author of the VLM model PaliGemma and a scientist at Google DeepMind, said that we have improved the security mechanism in the new work code, which will be a bit troublesome, especially when supporting multiple tokenizers, but overall it is worth it. It will also make the code more straightforward.

Some netizens also asked, what will happen if the code is correct, but <|endoftext|> is entered when training the data?

Karpathy said that if the code is correct, nothing will happen. But the problem is that a lot of code may not be correct, which will quietly undermine the world view of the big model.

What do you think of the new problems discovered by Karpathy?

References:

https://twitter.com/karpathy/status/1823418177197646104

news

Karpathy: I attacked the big model with SQL injection, it was so easy

Introduction

My contact information