Copyright trap: a text version of the "cat and mouse game" in the AI era

2024-07-27

sinceGenerative AISince the wave of anti-virus software swept the world, many content creators have been claiming that their works were used to train artificial intelligence models without permission. But so far, it has been difficult to determine whether the content of their works has actually been used in certain training data sets.

Now, researchers have developed a new way to prove this. Recently, a research team from Imperial College London developed "Copyright traps", a kind of hidden text that allows writers and publishers to subtly mark their works so that they can later detect whether these works have been used to train artificial intelligence models. This idea is similar to the strategies used by copyright holders before, such as adding fake locations on maps or adding fake words in dictionaries.

These AI copyright traps have sparked one of the biggest debates in the AI field. Many publishers and writers are fighting lawsuits against tech companies, claiming that their intellectual property was included in AI training data sets without permission. For example, the New York Times filed a lawsuit against OpenAI The lawsuit may be the most typical case.

As of now, the code for generating and detecting traps has been put online on GitHub. Next, the team also plans to develop a tool that allows users to generate and inject copyright traps by themselves.

“There is currently a complete lack of transparency around what content is used to train AI models, which we believe is an impediment to finding the right balance between AI companies and content creators,” Yves-Alexandre de Montjoye, professor of applied mathematics and computer science at Imperial College London, who led the research, said at the International Conference on Machine Learning, a top AI conference in Vienna this week.

To create the traps, he and his team used a word generator to create thousands of synthetic sentences. These sentences were long, mostly gibberish, like, "When turbulent times hit... What's on sale and, more importantly, when is the best time to buy? This list tells you who's open on Thursday nights with their regular sales hours and other opening times for neighbors. You're still in."

Yves-Alexandre de Montjoye explains, "We generated 100 trap sentences and then randomly selected one to inject multiple times into the text." At the same time, the trap can also be injected into the text in a variety of ways, such as using white text on a white background, or embedded in the source code of the article. The sentence must be repeated in the text 100 to 1,000 times.

To detect these traps, they fed the generated 100 synthetic sentences into a large language model and saw if the model marked them as new sentences. If the model had seen trap sentences in its training data, it would show a lower "perplexity score"; but if the model was "surprised" by the sentences, it meant that the model had encountered them for the first time, so these sentences were not traps.

In the past, researchers have suggested using language models to memorize training data to determine whether something appears in the data. This technique, called membership inference attacks, works better with large state-of-the-art models because these models tend to memorize large amounts of data during training.

"In contrast, smaller models that are increasingly popular and can run on mobile devices are less susceptible to membership inference attacks because they memorize less data, making it more difficult to determine whether they were trained on specific copyrighted texts," said Gautam Kamath, an assistant professor of computer science at the University of Waterloo who was not involved in the research.

Copyright traps, as a way to perform membership inference attacks even on smaller models. Yves-Alexandre de Montjoye's team injected their traps into the training dataset of CroissantLLM, a newly developed French-English bilingual language model trained by a research team at Imperial College London in collaboration with partners in industry and academia. CroissantLLM has 1.3 billion parameters, a fraction of the state-of-the-art models (e.g., GPT-4 reportedly has 1.76 trillion parameters).

“Our research shows that it is indeed possible to introduce such traps into textual data, significantly improving the effectiveness of membership inference attacks, even for smaller models,” says Gautam Kamath, but adds that there is still a lot of work to be done.

“Repeating a 75-character phrase 1,000 times in a text has a big impact on the original text. This may cause the trainer of the AI model to spot the trap and skip the content containing it, or simply delete it and train on the remaining text. It also makes the original text difficult to read,” pointed out Gautam Kamath.

“This makes copyright traps impractical at this point in time. A lot of companies do deduplication, in other words, they clean up their data, and these copyright traps might get removed,” said Sameer Singh, a professor of computer science at the University of California, Irvine, and co-founder of the startup Spiffy AI, who was also not involved in the study.

Another way to improve copyright traps, in Gautam Kamath's view, is to find other ways to mark copyrighted content so that membership inference attacks work better against them, or to improve membership inference attacks themselves.

Yves-Alexandre de Montjoye acknowledged that the traps were not foolproof. "A motivated attacker could remove them if they knew they existed," he said.

"But whether they can remove them all is still an open question, and it may be a bit of a cat-and-mouse game," he said. "Even so, the more traps there are, the more difficult it becomes to remove them all without investing a lot of engineering resources."

“It’s important to remember that copyright traps may be a stopgap measure or simply an inconvenience for model trainers. No one can publish a piece of content containing a trap and guarantee that it will always be a valid trap,” said Gautam Kamath.

Original link:

https://www.technologyreview.com/2024/07/25/1095347/a-new-tool-for-copyright-holders-can-show-if-their-work-is-in-ai-training-data/

news