news

When agents start to create themselves, will the explosion of AI products still be a distant dream?

2024-08-21

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

TencentScience and Technology Author Hao Boyang

Editor: Zheng Kejun

In 2024,AI The hottest topic in the field is undoubtedly Agent.

Large ModelCool, but what can I do with it? “This is the AI ​​Application of the Year question for 2023. And by 2024, agents have become the most promising antidote to this problem.

Intelligent agents can be used through complex processes and tools to enable large models to handle more complex and customized tasks, and ultimately produce software entities or physical entities with autonomy, perception, decision-making and action capabilities. Andrew Ng, Jim Fan and other industry leaders have joined in to prove the effectiveness of intelligent agents.

Professor Andrew Ng proposed in his blog in March this year that the HumanEval dataset should be used GPT-3.5The accuracy rate of the (Zero-shot) test was 48.1%. GPT-4(Zero-shot) is 67.0%. By working with the Agent workflow, GPT-3.5 achieved an accuracy rate of 95.1%.

(Figure: Andrew Ng’s experiment, using intelligent agent technology,GPT 3.5 Performance far exceeds the original GPT4)

Therefore, over the past year, everyone from large companies to private experts has been building intelligent bodies.MicrosoftFrom Copilit to nonsensical AI fortune-telling, tools for building intelligent body frameworks such as Langchain, Coze, and Dify have also sprung up like mushrooms after a rain, and their popularity continues to rise.

(Figure Note: Companies related to intelligent bodies and AI automation processes compiled by INSIGHT)

Andrew Karpathy, a former scientist at OpenAI, once said that ordinary people, entrepreneurs, and geeks are more capable of building AI agents thanOpenAISuch companies have even more advantages.

Is the era of new product managers based on AI agent workflows coming? Not necessarily, because AI may be better at building agents than humans.

Automated loop logic

On August 19, three researchers from the University of British Columbia published a paper titled "Automated Design of Intelligent Agent Systems." In this paper, they designed a system that allows AI to discover and build intelligent agents and iterate on its own.

Recall OpenAI’s classic definition of an intelligent agent: an intelligent agent is a product that can store knowledge, plan, and apply tools.

When we use workflows to build intelligent agents, we also use existing knowledge (knowledge of the form of the intelligent agent), make our own plans (build processes) and use tools (access APIs) to finally execute outputs, without exceeding the capabilities of the intelligent agent itself.

So why not build an agent that can automatically discover and design agents?

Following this idea, the author of the paper called the designer a meta-agent, and asked it to design new agents. The designed agents were added to the database as data, and new and stronger versions of the agents were continuously iterated.

They call this whole approach ADAS (Automated Design of Intelligent Systems).

So, how does this system actually work?

Let the chain turn

The process of generating new agents in ADAS systems can be divided into three parts:

The first part establishes the search space, which can be understood as a set of basic tools and rules that can be used to design potential new agents.

The second part is to run a searchalgorithm, which specifies how the meta-agent uses the search space and uses the elements in it to build a new agent.

The last part is to run the evaluation function, which will evaluate the constructed agent based on performance and other objectives.

In the paper, the researchers explained step by step how to build the above three core parts.

The first step is to determine the basic elements for constructing the search space, and the researchers believe that the best way is code.

This is because the code is Turing complete and can express all possibilities, so in theory the meta-agent can discover any possible building blocks (such as prompts, tool usage, control flow) and any agent system that combines these building blocks in any way.

More importantly, the various workflows that already exist in websites such as Langchain, which are used to build intelligent entities, have been coded. Therefore, relevant data can be easily obtained without conversion. And tool calls, such as RAG (retrieval augmented generation) and other capability components, also have a very sufficient code base.

Using code to construct the search space also means that the intelligent agent generated by ADAS can be run directly for error correction and scoring without the need for human intervention.

After defining the search space, researchers began to design search algorithms, that is, to let the meta-agent explore possible ways to complete the task. This process is basically completed by cue word engineering.

The first step is to give it a series of system prompts.

Then, the foundationPromptThe mentioned information is given to the meta-agent, including

1. Basic description of the task.

2. The most basic framework code, such as formatting prompts, encapsulation and other operation names, as well as the ability to call other basic models (FM) and APIs.

(Note: part of the framework code)

3. Format and examples of task input and output.

4. A library of examples formed by some of the agents generated in the original iteration, including their baseline test results.

(Figure: An example from an agent library)

Based on these basic cues, ADAS can begin to operate and generate an intelligent agent to solve a specific problem.

In this process, the meta-agent will conduct two rounds of reflection to ensure that the generated agent is novel and correct. It will check whether the newly generated code framework has errors and whether it is innovative enough compared to the past results in the agent example library.

The meta-agent also needs to make a preliminary assessment of the generated agent based on its "understanding" of practical performance capabilities to determine whether it performs better than the previous agent.

When some of these conditions are not met, the meta-agent must either modify or simply re-create the generated agent.

(Prompt for the first round of reflection)

After two rounds of reflection and modification, the meta-agent will send the newly generated agent, which it thinks is new and good, to the evaluation system in the third step.

The evaluation system gives an objective score to the agent's capabilities based on the baseline test and then returns it to the agent example library.

The meta-agent will then continue to perform the next iterative optimization based on the previous scoring results and past examples in the example library to achieve a higher task performance score.

A whole fully automated chain was born in this process. To generate an excellent intelligent agent, it may take more than double-digit iterations.

Beyond Hand-Rubbing Agents

How complex can the intelligent agent obtained by automating the process through the ADAS method be? The figure below is an intelligent agent framework generated after 14 iterations.

The structure of this intelligent agent has five thought chains that give preliminary answers. After three expert expert models and a model that imitates human evaluation give feedback on these answers, these answers will be modified and strengthened three times in the optimization process stage. Finally, three results are selected through evaluation and merged to give the final answer.

A design of this level of complexity would take a human about a week to complete, and that’s just the time to write the prompts and design the architecture, not to mention testing and comparison.

Of course, this is also the result of continuous iteration of meta-agent design.

During the iteration process, its ability to generate intelligent agents also increased rapidly with the number of iterations. In the third iteration, the meta-agent learned the multi-thinking chain strategy by itself, and in the fourth iteration, it learned to use dynamic memory to optimize the answer. By the 14th iteration, the intelligent agent it generated reached the complexity mentioned above.

In the end, the ability of its optimal solution can be improved by more than 250% compared to the initial simple large language model, and by 75% compared to the best hand-failed intelligent agent COT-SC (multi-thought chain answering) method.

Not only ARC, but also the agents generated in ADAS mode are significantly more powerful in all aspects than all the current strongest benchmark hand-made agents, such as COT, LLM Debate, and Self-Refine. Moreover, the more complex tasks and cross-domain applications are handled, the stronger the ADAS-generated agents are.

Moreover, these generated agents have certain transfer capabilities. For example, agents that can solve scientific problems can also achieve good results in mathematics. Therefore, an optimal framework is likely to solve related problems in many fields.

Although the era of hand-made intelligent agents is coming to an end, the era of discovering intelligent agent paradigms may continue. In the overall test, ADAS did not find new construction methods outside the current intelligent agent construction paradigm, but rather reorganized and used these methods.

However, for general AI Agent developers, this is enough to replace their jobs.

However, the popularity of ADAS may still need to overcome a hurdle, which is the cost issue.

According to researchers, the cost of running an OpenAI API search and evaluation on ARC is about $500, while the cost of running it in the field of reasoning and problem solving is about $300. That is, each iteration costs about $20. Compared with such high costs, human labor still has certain advantages at this stage.

But the researchers also said that because they started the research early, they used the "gpt-3.5-turbo-0125" model. The latest GPT-4 model "gpt-4o-mini" is less than one-third of the price of "gpt-3.5-turbo-0125" and has better performance. And from the experiment, the intelligent agent iterated with the GPT 3.5 capability entered a performance bottleneck after a certain number of iterations, and the iterations after fourteen were wasted. Therefore, a design with better evaluation and resource management can also significantly reduce costs.

Obviously, the price advantage of labor will not last long.

Has the explosion of intelligent agents really begun?

Why is this automated technology so important?

In the era of mobile Internet, various apps for various fields have flourished, creating a prosperous era of science and technology. However, because new tools needed to be learned at that time, the development of mobile apps also went through a long penetration phase before finally accommodating enough developers.

In the earlier era, the speed was even slower. According to the "Crossing the Chasm" theory proposed by Geoffrey Moore in the 1990s based on the experience of personal computers, in the early years of the emergence of technology, only about 13.5% of early adopters would use this technology. This was not development, but use.

Therefore, the shortage of developers may be an important bottleneck in the promotion of technology.

Of course, the penetration rate of intelligent agent construction development may be much faster because it is much simpler than previous software development. For example, Wordware, which was popular a while ago, allows ordinary users to complete the construction of intelligent agents using natural language, which lowers the threshold.

However, the design of thought chains and multi-step loops is still very complex, and more and more tools are needed in the process. Therefore, there are not many people who can really devote themselves to the development of intelligent agents and make good use of this tool.

Zuckerberg once said in a conversation with Huang Renxun that even if large-scale model technology no longer develops, it will take five years just to fully realize the potential of intelligent agents.

Therefore, compared to technology, developers may be the core bottleneck for the emergence of intelligent agents. There are still too few people who can do this.

However, there are many Agents.

If this technology of automatically generating and tuning agents is adopted and optimized by more commercial companies, the bottleneck of early technical personnel will naturally no longer exist. The speed of exploring the coverage and depth of capabilities of intelligent agents in various fields will be greatly improved.

Perhaps next year, the first Killer AI App in human history will be created by an AI.