2024-08-13
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Jin Lei's west wind blows from Aofei Temple
Quantum Bit | Public Account QbitAI
After Devin, anotherAI Software EngineerThe screen was refreshed——
It's calledGenie, claiming that currentlyThe strongest on earth, can now think and act like humans!
So how strong is this “strongest force on earth”?
Let’s take a look firstReview score。
In the authoritative list SWE-Bench, Genie is known for solving30.07%The question's score topped the list.
(SWE-Bench is a benchmark for evaluating large models for solving real-world software problems.)
This result is 19.27% ahead of the second place, unlockingThe maximum increase in SOTA is 57%!
As for Genie'sActual Results, in the words of the team:
It can solve real-life software problems just like a human engineer.
First of all, you can use 4 ways to get Genie started: prompt word, GitHub Issue, Linear Ticket or API.
For example, to solve a GitHub Issue, first feed Genie a link to the repo, and it will startAutomatically resolve issues:
Genie willAutomatic iterative thinkingIf it wants to solve this problem, it needs to find the files it wants until it finds what it is satisfied with:
Next, it will do aAutomatic Iterative AnalysisThe process:
Then Genie started to "swish swish swish"Automatically write and run code:
If a bug occurs while running the code, Genie will repeat the process of analyzing, writing code, and running only the problematic area until it runs smoothly.
The whole process only takes time84 seconds!
In the team’s words:
Genie has observed and learned how human programmers solve software problems millions of times.
This is a number that no human programmer could achieve in a lifetime.
But what is even more unexpected is the team behind Genie——Cosine, only 5 people。
And CEO Alistair also wrote a message to thank OpenAI:
Without you, we couldn't have made Genie.
So how did the Cosine team create Genie?
The main feature of Genie is its ability to mimic the cognitive processes, logic, and workflow of human engineers.
To do this, the Genie team revealed that it has collected a dataset containing development activities of real human programmers over the past year.
It not only uses methods such as results analysis, static analysis, self-playing, and step-by-step verification, but also uses AI models trained on a large amount of labeled data. The benefit is that when the basic model capabilities are improved, the quality of the data they can extract will also improve accordingly.
Final GenieUse this proprietary data for training。
The dataset encodes the complete process of human reasoning, including perfect information traceability, incremental knowledge discovery, and a step-by-step decision-making process based on actual work cases of software engineers.
Genie's reasoning process includesPlanning, searching, writing code, and running codeThe four main steps break through the limitations of other AI engineers who rely on adding additional tools such as web browsers and code interpreters on top of the basic model, and can handle diverse, highly situational, and unprecedented problems like a human.
This training method immediately reminded netizens of a similar idea proposed by Karpathy before:
For LLM, the ideal training data is not the content you wrote, but your complete thought process and every editing action during the writing process. However, we can only do our best with the existing resources.
In addition, Genie training also introducesSelf-improvement mechanism。
The initial training data was mostly runnable code without errors, which made Genie difficult to deal with errors. To solve this problem, the team used the first version of Genie to generate synthetic data containing errors, and then used this data to train the next version of the model.
Specifically, an old version of Genie is used to propose a solution, and if the solution is wrong, the final state of the task is used to teach it to get from the current state to the correct state.
By repeating this process over and over, Genie's initial solution gradually becomes more accurate, giving the correct answer directly in most cases, and even if it is wrong, only minor corrections need to be made in the data set.
Another key to improving Genie's capabilities lies in the large model support provided by OpenAI.
The team said that when they initially developed Genie, they only had access to fine-tuned short-context models in the 16-32k range. They used these models for a lot of early development and trained the models with data from over 100 million tokens. Although they found that the designed architecture had certain advantages, it was fundamentally limited by the amount of information that the model could process in a specific time.
After trying various compression/chunking methods, the only solution was to use a model with a larger context.
OpenAI provides long-context model support, and the latest version of Genie has been trained on billions of tokens of data.
The team believes that data quality is more important than hyperparameter adjustment and data volume. Therefore, they also conducted a lot of experiments in data mixing, including language, task type, task length and other dimensions. The following is the proportion of data in different programming languages used to train Genie:
There are also data proportions of different types of instances:
As we mentioned above, the Cosine startup team currently consists of only 5 people.
In the introduction on the official website, they also describe themselves very directly as:
Small but mighty.
Small but powerful.
According to the introduction, some members come from unicorn companies, some have experience in managing global teams, and some even started programming at the age of 8.
But when Cosine was first established, there were only three people, and their goal was toLet's understand human reasoning.。
It is worth mentioning that there is also a Chinese member in the team.Yang Li, is the co-founder of Cosine and was listed on Forbes 30 under 30 in 2021.
In addition, regarding Genie itself, CEO Alistair also said:
We started envisioning Genie as early as 2022, but it was not technically feasible at the time.
It was not until the past six months or so, as the large model gradually matured, that Genie became a reality.
Well, I have to say that the big model has done a great job again.
Genie can now apply for the waitlist. Interested friends can click the link at the end of the article~
Waitlist address:
https://cosine.sh/register
Reference Links:
[1]https://x.com/alistairpullen/status/1822981361608888619?s=46
[2]https://cosine.sh/blog/genie-technical-report
[3]https://cosine.sh/blog/state-of-the-art
[4]https://x.com/AlistairPullen/status/1823030874579120223
[5]https://x.com/yangli_