Nature published an article: The definition of "academic plagiarism" is being blurred by AI, how should we deal with it?

2024-08-02

(Source: Piotr Kowalczyk, illustrator and graphic designer)

Editor’s NoteIn April this year, Science overturned its previous iron rule: the regulations allow the legitimate use of generative artificial intelligence (AI) and large language models (LLM) to create illustrations and write paper content after the "Methods" section of the paper.

now,AI can help researchers free up more time to think, but the question is, does this count as plagiarism? And in what circumstances should the use of this technology be allowed?

Recently, science journalist Diana Kwon published an article in Nature magazine discussing the application of generative AI tools in academic writing and the challenges and impacts it brings.

She noted that generative AI tools such as ChatGPT show great potential value in saving time, improving clarity and reducing language barriers, but they also may involve issues of plagiarism and copyright infringement.

She also mentioned that the use of AI in academic writing has exploded, especially in the biomedical field.Detecting AI-generated text is difficult because it can be made almost undetectable with slight editing.at the same time,The line between legal and illegal use of AI tools may blur further, as more and more applications and tools are integrating AI capabilities.

Finally, she argues that clearer guidelines on the use of AI in scholarly writing are needed to help researchers and journals determine when it is appropriate to use AI tools and how to disclose their use.

Academic Headlines has made a simple translation without changing the main idea of the original text. The content is as follows:

From the resignation of Harvard University's president in January following allegations of plagiarism to the revelation of plagiarized text in peer review reports in February, the academic world has been roiled by plagiarism this year.

But academic writing faces a bigger problemThe rapid adoption of generative artificial intelligence (AI) tools has raised questions about whether it counts as plagiarism and under what circumstances its use should be allowed. “There’s a wide spectrum of AI use, from completely human-written to completely AI-written, with a huge area of confusion in the middle,” said Jonathan Bailey, a copyright and plagiarism consultant in New Orleans, Louisiana.

Generative AI tools based on large language models (LLMs), such as ChatGPT, can save time, make text more legible, and reduce language barriers. Many researchers now believe that they are acceptable in certain situations and that their use should be fully disclosed.

But these tools have complicated an already fraught debate over the inappropriate use of other people's work.. LLMs are trained to generate text by learning from large amounts of published writing. This could therefore lead to something like plagiarism if a researcher took credit for the machine’s work, or if a machine-generated text closely resembled someone’s work but didn’t credit the source. These tools could also be used to disguise deliberately plagiarized text, and their use would be difficult to detect. “It’s going to be very, very difficult to define what we mean by academic dishonesty or plagiarism, and where the boundaries are,” says Pete Cotton, an ecologist at the University of Plymouth, UK.

In a 2023 survey of 1,600 researchers,68% of respondents said AI will make plagiarism easier and harder to detect“Everyone is concerned about other people using these systems, and they’re concerned about not using them when they should,” said Debora Weber-Wulff, a plagiarism expert at the Berlin University of Applied Sciences. “There’s a bit of panic about this.”

When plagiarism meets AI

According to the definition of the US Office of Research Integrity,Plagiarism is "the use of another's ideas, processes, results, or words without proper citation or acknowledgement"A 2015 study estimated that 1.7% of scientists admitted to plagiarizing, and 30% knew that a colleague had done so.

LLM may make this situation worse.Deliberate plagiarism of a human-written text can be easily concealed if someone first lets an LLM rewrite the textThese tools can be directed to rewrite in sophisticated ways, such as adopting the style of an academic journal, says Muhammad Abdul-Mageed, a computer scientist and linguist at the University of British Columbia in Canada.

A central question is whether using unattributed content that was written entirely by a machine, not a human, counts as plagiarism. Not necessarily, many researchers say. The European Academic Integrity Network, for example, defines unauthorized or undeclared use of AI tools for writing as “unauthorized content generation,” not plagiarism. “To me, plagiarism would be something that can be attributed to another identifiable person,” Weber-Wulff says. She adds that while there have been cases of generative AI producing text that is nearly identical to existing human-written content, that’s generally not enough to be considered plagiarism.

However, some people believe that generative AI tools infringe copyright.Plagiarism and copyright infringement are both improper use of another person's work. Plagiarism is a violation of academic ethics, while unauthorized use of copyrighted works may violate the law.“These AI systems are built on the work of millions or even hundreds of millions of people,” said Rada Mihalcea, a computer scientist at the University of Michigan in Ann Arbor.

Some media companies and authors have protested what they see as copyright infringement by AI. In December 2023, The New York Times filed a copyright lawsuit against Microsoft and OpenAI. The lawsuit claims that the two companies copied and used millions of New York Times articles to train LLMs, and that the content generated by these LLMs is now "competing" with the content of the publication. The lawsuit includes instances where prompts caused GPT-4 to copy several paragraphs of newspaper articles almost verbatim.

In February, OpenAI filed a motion in federal court to dismiss parts of the lawsuit, saying that "ChatGPT is in no way a substitute for a New York Times subscription." A Microsoft spokesperson said that "lawfully developed AI tools should be allowed to develop responsibly" and "they cannot replace the important role journalists play."

If a court rules that training an AI on text without permission does constitute copyright infringement, Bailey says, “it’s going to be a huge shock to AI companies.” Without an extensive training set, a tool like ChatGPT “wouldn’t exist.”

AI is exploding

Whether or not this is called plagiarism, the use of AI in academic writing has exploded since the release of ChatGPT in November 2022.

In a preprint updated in July, the researchers estimated thatAt least 10% of biomedical abstracts in the first half of 2024 will be written using LLM, equivalent to approximately 150,000 papers per yearThe study, led by Dmitry Kobak, a data scientist at the University of Tübingen in Germany, analyzed 14 million abstracts published in the academic database PubMed between 2010 and June 2024. They showed that the emergence of LLMs was associated with an increased use of stylistic words such as “delves,” “showcasing,” and “underscores,” and then used these unusual lexical patterns to estimate the proportion of abstracts processed using AI. “The emergence of LLM-based writing assistants has had an unprecedented impact in the scientific literature,” they write.

Figure | The emergence of LLM is associated with the increased use of stylistic vocabulary.

Kobak predicts thatLLM use "will certainly continue to increase" and will "likely become more difficult to detect"。

The undisclosed use of software in academic writing is not new. Since 2015, Guillaume Cabanac, a computer scientist at the University of Toulouse in France, and his colleagues have been uncovering “gibberish” papers written by a software called SCIgen, as well as papers containing “distorted phrases” created by software that automatically translates or paraphrases text. “Even before generative AI came along, people had these tools,” Cabanac says.

However, there is some value in using AI in academic writing.. The researchers say this can make texts and concepts clearer, reduce language barriers, and free up time for experimentation and reflection. Hend Al-Khalifa, a researcher in information technology at King Saud University in Riyadh, said that before generative AI tools became available, many of her colleagues who speak English as a second language would have difficulty writing papers. "Now, they focus on the research and use these tools to eliminate the hassle of writing," she said.

But there is still confusion about when the use of AI constitutes plagiarism or ethical violations.Soheil Feizi, a computer scientist at the University of Maryland, College Park, says that using LLMs to rewrite content from an existing paper is clearly plagiarism. But if LLMs are used transparently to help express ideas — whether generating text based on detailed prompts or editing drafts — then they should not be penalized. “We should allow people to express themselves effortlessly and clearly using LLMs,” Feizi says.

Many journals now have policies that allow a certain degree of use of LLM.After initially banning text generated by ChatGPT, Science updated their policy in November 2023 to say that the use of AI technology when writing manuscripts should be fully disclosed - including the systems and prompts used. Authors are responsible for ensuring accuracy and "making sure there is no plagiarism." Nature also said that authors of research manuscripts should record the use of any LLM in the methods section. An analysis of 100 large academic publishers and 100 highly ranked journals found that as of October 2023, 24% of publishers and 87% of journals had guidelines on the use of generative AI. Almost all of those that provided guidance said AI tools could not be listed as authors, but policies varied on the types of AI uses allowed and the level of disclosure required. Weber-Wulff said clearer guidelines on the use of AI in academic writing are urgently needed.

Currently, Abdul-Mageed says widespread use of LLMs when writing scientific papers is hampered by their limitations. Users need to create detailed prompts describing the audience, language style, and subfield of research. “It’s actually very difficult to get a language model to give you exactly what you want,” he says.

But Abdul-Mageed said,Developers are building apps that will make it easier for researchers to generate professional scientific contentIn the future, he said, users might be able to generate an entire paper from scratch simply by selecting options from a drop-down menu and pressing a button, without having to write a detailed prompt.

Boundaries may blur further

The rapid adoption of LLMs for writing texts has also led to the emergence of a large number of tools designed to detect LLMs.. Although many tools claim high accuracy rates—over 90% in some cases—research shows that most tools do not live up to their claims. In a study published last December, Weber-Wulff and her colleagues evaluated 14 AI detection tools widely used in academia. Only five of them could accurately identify whether 70% or more of the text was written by AI or humans, and none scored above 80%.

When someone is detected to have slightly edited the AI-generated text,By replacing synonyms and rearranging sentences, the detector’s accuracy dropped below 50% on average.Such text is “almost impossible to detect with current tools,” the authors wrote. Other studies have also shown that asking AI to rewrite text multiple times can significantly reduce the accuracy of detectors.

There are other problems with AI detectors, too. One study showed that they were more likely to misclassify English writing as AI-generated if it was written by a non-native English speaker. Feizi says the detectors can’t reliably distinguish between texts written entirely by AI and those where the author used an AI-based service to improve the text by helping with grammar and sentence clarity.Distinguishing between these situations will be very difficult and unreliable - potentially leading to very high rates of false positives,” he said, adding that being falsely accused of using AI could cause “considerable damage” to the reputation of those academics or students.

The line between legal and illegal use of AI may blur further. In March 2023, Microsoft began integrating its generative AI tools into its applications, including Word, PowerPoint, and Outlook. Some versions of its AI assistant Copilot can draft or edit content. In June, Google also began integrating its generative AI model Gemini into tools such as Docs and Gmail.

“AI is becoming so embedded in everything we use that I think it’s going to be increasingly difficult for people to know if something you’re doing is being influenced by AI,” said Debby Cotton, an expert in higher education at St. Mark’s and St. John’s University in the U.K. “I think we may not be able to keep up with the pace at which it’s developing.”

Compiled by: Ma Xuewei

Original article by Diana Kwon, freelance science journalist

Original link: https://www.nature.com/articles/d41586-024-02371-z

news