OpenAI's Q* has never been seen, but a number of startups' Q* are here

I haven’t seen OpenAI’s Q, but here come the Qs of a bunch of startups

2024-07-31

How far are we from AI that can "think slowly"?

By Stephanie Palazzolo

Compiled by Wan Chen

Editor｜Jingyu

Last year, around the time when Sam Altman was temporarily fired, OpenAI researchers sent a joint letter to the board of directors, pointing out that a mysterious project codenamed Q could threaten all of humanity. OpenAI acknowledged Q* in a subsequent internal letter to employees, and described the project as an "autonomous system that surpasses humans."

Although Q* has not been seen yet, rumors about it have always been around.

Lu Yifeng, a senior engineer at Google DeepMind, once made a guess to Geek Park from a professional perspective: the model needs to realize what it is unsure about and what to do when it is unsure. At this time, the model may need to surf the Internet, read books, do experiments, think of some inexplicable ideas, and discuss with others, just like humans.

This year, when I asked questions in the AI assistant apps of various major model manufacturers, I felt that the answers were more reliable than last year. Many manufacturers also said that they are working hard to make the models think more and further improve their reasoning capabilities. What is the current progress?

In response to the above questions, The Information reporter Stephanie Palazzolo discussed the models of existing startups improving model reasoning capabilities in the article "How OpenAI's Smaller Rivals Are Developing Their Own AI That 『Reasons』", including the Chinese company Q*. The following is sorted out by Geek Park:

OpenAI’s smaller competitors

Develop your own “reasoning” AI

Putting aside the bubble, how useful this wave of AI is has been a topic that has been repeatedly examined under the spotlight this year.

The principle of the large model is to generate word units one by one based on probability prediction, but it is obviously not what everyone expects because it can only parrot the corpus fed during training and make up questions it has never seen before. Further improving the reasoning ability of the model is the key.

We have yet to see progress from OpenAI and Google in this regard, but some startups and individuals say they have come up with some “cheap hacks” to enable AI in some forms of reasoning.

These shortcuts involve breaking a complex problem into simpler steps and asking the model dozens of additional questions to help it analyze those steps.

For example, when asked to draft a blog post about a new product, the AI application automatically triggers additional queries, such as asking the big model to evaluate its (prepared to generate) answer and where it needs to be improved. Of course, in the user interface, you can't see these actions done by the model in the background.

This is similar to the Socratic method of teaching students to think critically about their beliefs or arguments. The latter adopts a question-and-answer teaching method. When communicating with students, Socrates does not give direct answers, but guides students to discover problems by themselves, reveal contradictions and shortcomings in their views, and gradually correct them to reach the correct conclusions through continuous questioning.

With this step, the AI application can ask the big model to rewrite the blog post above, taking into account the feedback it just gave it. This process is often called reflection, and one AI application entrepreneur said it usually leads to better results.

In addition to the reflection method, developers can also follow Google's example and tryA technique called sampling, in which developers improve a large model’s ability to generate creative and random answers by asking the same question dozens or even 100 times and then selecting the best answer.

For example, a programming assistant app might ask a large model to give 100 different answers to the same question, and then run all these code snippets. In the end, the programming assistant app will choose the code that produces the correct answer and automatically choose the simplest code.

Meta also highlighted some similar techniques in its recent Llama 3 paper.

But this solution — calling a large language model 100 times, or asking it to output so much text and code — is extremely slow and costly. This may be why some developers have criticized a coding assistant made by Cognition, a startup that uses these techniques, for its slow performance.

Developers have also seen this problem and are trying to solve it.Select examples of models that show good reasoning ability on a particular problem and feed them back into the modelTraining DataFocus on solving this problem.As one entrepreneur put it, this approach is similar to learning multiplication tables in elementary school. Initially, students may need to calculate each multiplication problem manually. But over time, they memorize these multiplication tables and the answers become almost part of the students' intuition.

To develop this kind of AI, developers need control over large models, but you can hardly get a sense of control from the closed-source models of OpenAI or Anthropic, so they are more likely to use open-weight models like Llama 3 (open-weight is the term in the open source world, meaning highly open code) to complete this task.

The above two methods may be the techniques used by OpenAI behind its breakthrough in reasoning. Of course, OpenAI has not yet released Q*, which is also known as the "Strawberry" project.

China's Q*

Chinese developers and researchers are also gradually mastering these technologies.

Researchers at China’s Skywork AI and Nanyang Technological University published a paper on the problem in June, in which they also named the technique Q* in honor of the version they had never seen from OpenAI.

China's Q* technology allows large models to solve problems with multiple steps, such as complex logic puzzles.

the way isBy “searching” at each step of the answer for the best next step the big model should try, rather than following the steps to a conclusion (this method is also called Monte Carlo tree search and was earlier used in Google’s AlphaGo)This is done using a special equation, called a Q-value model, which helps the big model estimate the future reward for each possible next step — or the likelihood that the final answer will be correct.

The researchers say they plan to release the technology publicly this fall.

Alex Graveley, CEO of Minion AI, an intelligent startup and former chief architect of GitHub Copilot, said they are still tryingTeach the large language model to go back one step when it realizes it made a mistake. This awareness can happen when the large model produces a wrong answer, or is asked to reflect on its intermediate steps (similar to the examples in the blog post above), he said., realizing that a mistake had been made.

There are more attempts in the industry, including the "Quiet-STaR" paper published in March by Stanford University and Notbad AI. Just as humans stop and think about their thoughts before speaking or writing, this paper explains how to teach large language models to generate information about the internal "thinking" steps they take in complex reasoning problems to help them make better decisions.

OpenAI’s Q*/Strawberry technology may have a head start, but everyone else seems to be racing to catch up.

*Header image source: GulfNews

Geek Question

Do you think we are far away

How far are we from AI being able to “think slowly”?

After testing the call recording in iOS 18.1 beta, Android phones can still receive recording prompts.

Like and followGeek Park Video Account，

news

I haven’t seen OpenAI’s Q, but here come the Qs of a bunch of startups

Introduction

my contact information

news

I haven’t seen OpenAI’s Q*, but here come the Q*s of a bunch of startups

Introduction

my contact information

I haven’t seen OpenAI’s Q, but here come the Qs of a bunch of startups