news

OpenAI releases PVG: Using small models to verify large model outputs and solve the "black box" problem

2024-07-18

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

In the early morning of July 18, OpenAI released its latest technical research on its official website - Prover-Verifier-Games

As ChatGPT is widely used in fields such as law, finance, and marketing, it is very important to ensure that the model is safe, accurate, and well understood. However, due to the complexity and variability of neural networks, we cannot verify the accuracy of the generated content, which will result in a "black box" output.

To solve this problem, OpenAI proposed a new training framework, Prover-Verifier Games (PVG for short). For example, a small model like GPT-3 is used to verify and supervise the output of the large GPT-4 model, thereby improving the output accuracy and controllability.

In fact, the concept of PVG technology was proposed in a paper in August 2021., which is what OpenAI was inspired by. This is a training method based on game theory that improves the output quality of the model by simulating the interaction between provers and verifiers.

In this mechanism, the Prover's task is to generate content, and the Verifier's task is to determine whether the content is correct.

The core idea of ​​the PVG module is to improve the overall performance of the prover and verifier through multiple rounds of iterative training. In each round of training, the verifier first uses supervised learning to predict the correctness of the content sampled from the prover in the previous round.

The purpose of this is to let the verifier learn how to identify the correct answer from a large amount of content. Then, the prover optimizes the content it generates through reinforcement learning, making it not only correct but also convincing the verifier to accept it.

In the verification process, the roles of the prover and the verifier are interdependent.The prover needs to generate content that convinces the verifier, and the verifier needs to have enough judgment to identify the correct content.

In order to further improve efficiency, OpenAI designed two enhanced types of provers on this basis: Helpful prover and Sneaky prover.

The goal of a useful prover is to generate correct and convincing content. It does this by optimizing the content it generates so that it can be accepted by the verifier.

The cunning prover tries to generate false but equally convincing content to deceive the verifier. This design not only increases the difficulty of training, but also requires the verifier to have higher judgment and robustness.

When training PVG, each round of training includes optimization of the verifier and the prover. In each round, the verifier first uses supervised learning to predict the correctness of the content, helping the verifier learn how to identify the correct answer from a large amount of content. The prover optimizes its generated content through reinforcement learning.

For example, when a user asks the question "Shauna's father is 5 times older than Shauna. Shauna's current age is 3 times that of Aaliyah. If Aaliyah is 3 years old, how old is Shauna's father?"

The output of GPT-4 is 45 years old.

The verifier model GPT-3 needs to determine Alia's age, then calculate Shauna's age based on Alia's age, and finally calculate Shauna's father's age based on Shauna's age. The whole process requires multiple rounds of verification.

In addition, during the training process, OpenAI also introduced a method called "checkability training".

The verifier and prover models are trained alternately, so that the verifier uses supervised learning to predict the correctness of the solution sampled from the prover in the previous round. The prover uses reinforcement learning to continuously optimize based on the acceptance of the verifier and the correctness of the solution in the current round.

The researchers saidAs the number of training rounds increases, the soundness of the verifier gradually improves, which is manifested as increased resistance to incorrect solutions generated by cunning arguers. At the same time, the solutions generated by the beneficial arguers become clearer and easier for humans to understand.

However, it should be noted that when training the verifier model, a large amount of real and accurate labeled data is required to improve its ability to distinguish true from false. In other words, if the verification model used is biased, then the verified content may still have illegal output.

The source material of this article is OpenAI official website and papers. If there is any infringement, please contact us to delete it.