news

Published on arXiv half a year earlier, but was accused of plagiarism: CAMEL lives in the shadow of Microsoft AutoGen

2024-07-17

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Machine Heart Report

Synced Editorial Department

arXiv is not a peer-reviewed journal, so papers published on arXiv do not have to be cited. Is this reasonable?

If you are interested in AI agents, you must know Microsoft's AutoGen. It is an open source programming framework for building AI agents, allowing multiple agents to solve tasks through chatting. In the meantime, LLM agents can play multiple roles, such as programmers, designers, or a combination of various roles.

On GitHub, this project has received 28k stars, and the paper also won the Best Paper Award at the ICLR 2024 LLM Agent Workshop.



However, there is actually controversy behind this paper.

In November 2023, an AI researcher (Li Guohao, a PhD student at King Abdullah University of Science and Technology and the initiator of the open source projects Camel-AI.org and DeepGCNs.org) posted that because AutoGen is highly similar to their paper CAMEL, they are asked every time they attend an event, what is the difference between the two?





In this regard, Brandon Lee expressed his frustration, because their paper was published on arXiv significantly earlier than AutoGen, but now it is regarded as an imitator of AutoGen (CAMEL was published in March 2023; AutoGen was published in August 2023).



Paper link: https://arxiv.org/abs/2303.17760



Paper link: https://arxiv.org/pdf/2308.08155

According to Brandon Lee, there are the following similarities in the two methodologies:



Even the examples used are somewhat similar:





As a latecomer, AutoGen did mention CAMEL in its paper and pointed out some differences between CAMEL and AutoGen. But the location of these contents is puzzling - they all appear in the appendix. This may be a major reason why other researchers only know AutoGen but not CAMEL. After all, how many people will read the appendix carefully?



The paragraph about CAMEL in the AutoGen paper: "CAMEL (Li et al., 2023b) is a communicating agent framework that shows how to use role-playing to enable chat agents to communicate with each other to complete tasks. CAMEL can also record agent conversations for behavior analysis and capability understanding. CAMEL uses an "inception-prompting" technique to enable autonomous cooperation between agents. Unlike AutoGen, CAMEL itself does not support tool use (such as code execution). Although CAMEL is proposed as an infrastructure for multi-agent conversations, it only supports static conversation modes, while AutoGen also supports dynamic conversation modes. "



Table 1 summarizes the differences between AutoGen and other related multi-agent systems, judging from four indicators: the first is the infrastructure, that is, whether the system is designed as a general infrastructure for building LLM applications; the second is the dialogue mode, that is, the type of mode supported by the system. In "static" mode, the agent topology remains unchanged regardless of the input. AutoGen allows flexible dialogue modes, including static and dynamic modes that can be customized according to different application requirements. The third is executable, that is, whether the system can execute the code generated by LLM; the fourth is human participation, whether (and how) the system allows human participation in the execution process. AutoGen allows humans to flexibly participate in multi-agent dialogues and allows humans to choose to skip providing input.



The paragraph about CAMEL in the AutoGen paper: “AutoGen can help develop highly capable agents that leverage the strengths of LLMs, tools, and humans. Creating such agents is critical to ensuring that multi-agent workflows can effectively troubleshoot and make progress on tasks. For example, we observed that CAMEL, another multi-agent LLM system, was unable to effectively solve problems in most cases, primarily because it lacked the ability to execute tools or code. This failure shows that simple role-playing LLMs and multi-agent dialogues are not enough, and that highly capable agents with a variety of skills are also necessary. We believe that more systematic work is necessary to develop application-specific agent guidelines, create large OSS knowledge bases, and create agents that can discover and improve their own skills.

During the review of AutoGen submitted to the ICLR main conference, CAMEL first author Li Guohao pointed out this problem in the public comment area and emphasized that this was a "notable omission."



In their review comments on AutoGen, ICLR reviewers and area chairs also pointed out the inappropriateness of this practice.



Among them, the area chair wrote, "The authors do discuss this work in an appendix, but this practice is undesirable because the level of review of supplementary materials is not the same as the level of review of the paper. In short, this seems to allow authors to say that they cite and discuss the paper but do not actually cite and discuss it in the part of the paper that 99% of people are likely to read. I think this practice is worrying."



Why did the authors of AutoGen do this? They replied that when they submitted their paper to ICLR 2024, papers like CAMEL had not yet been published in peer-reviewed conferences/journals. According to the ICLR 2024 reviewer guidelines, they were not obliged to cite this paper or compare with it (CAMEL was accepted by NeurIPS 2023 in September 2023; the ICLR 2024 reviewer guidelines stipulate that papers published after May 28, 2023 do not need to be cited).



At the same time, they listed the parts of the paper involving CAMEL:



Given that ICLR has set the rules first, the area chair can’t say much. He wrote, “While I understand the rationale behind this policy, it may lead to strange results in the current publishing climate. Because of ICLR’s policy, I will not factor it into my decision, but this will reduce my confidence.”

Regarding the similarities mentioned by Brandon Lee, the AutoGen author also gave a rebuttal:



In response to the reviewers’ questions, they responded as follows:



In the end, the similarity with CAMEL and the citation problem were not considered as the main problem of the paper by the field chair. However, the AutoGen paper was eventually rejected for other reasons (so the author later switched to ICLR 2024 LLM Agent Workshop).

According to Li Guohao, the authors of the two papers actually met offline, but something unpleasant happened:



Brandon Lee hopes to attract the attention of the academic community by posting.



What do you think about this?