2024-08-15
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Mingmin from Aofei Temple
Quantum Bit | Public Account QbitAI
20,000 words were generated in one go, and the large model output was rolled up!
The latest research by Tsinghua University and Zhipu AI has successfully increased the output length of GLM-4 and Llama-3.1.
For the same problem, the output increased from 1800 words to 7800 words.4 times。
It should be noted that the current generation length of large models is generally less than 2k. This has an impact on content creation, question answering, etc., and may cause the model to answer questions incompletely and reduce creativity.
The research was jointly led by Li Juanzi and Tang Jie, founders of Zhipu AI and professors at Tsinghua University.
The paper and code have been made open source on GitHub.
Some netizens have already experienced it first. LongWriter-llama3.1-8b can generate a 10,000-word long article "The Decline and Fall of the Roman Empire" and can run on a MacBook Pro 2018 (32GB).
The output is very accurate and deserves an A++.
9B model to handle 10,000-word output
This study mainly includes three aspects of work.
First, the researchers built a testing tool, LongWrite-Ruler. By testing multiple large models, they found that all models were generatingMore than 2000 wordsI encountered difficulties when reading the text.
Further analysis of the interaction logs between users and the large model revealed that only more than 1% of user requests explicitly mentionedTo generate more than 2000 words's text.
To do this, they changed the model’s supervised fine-tuning (SFT) phase toMaximum output length of the dataset。
The results show that the maximum output length of the model is consistent with the maximum output length in the SFT dataset.Significant positive correlation。
So we conclude that the existing model is limited in output length mainly becauseMissing long output samples in SFT dataset。
Even though the model has seen longer sequences during the pre-training phase, the lack of long text samples in the SFT phase still affects the output length.
To overcome this limitation, researchers proposedAgentWrite。
This is an Agent-based pipeline.
It allows decomposing the very long text generation task into multiple subtasks, each processing a segment of it.
The specific process is that AgentWrite first develops a detailed writing plan based on the user's instructions, which includes the main content points and target number of words for each paragraph. According to the plan, AgentWrite prompts the model to generate the content of each paragraph in turn.
Based on AgentWrite, the team used GPT-4o to generate 6,000 long output SFT data with output lengths ranging from 2k to 32k words, forming the LongWriter-6k dataset. These data were then added to the training process.
To verify the effectiveness of the method, the team also proposed a LongBench-Write, which contains a variety of user writing instructions, with output length specifications of 0-500 words, 500-2000 words, 2000-4000 words, and more than 4000 words.
The evaluation results show that the model output length increases significantly after using AgentWrite.
Through direct preference optimization (DPO), GLM-4-9B achieved the best performance among all the models.
Netizens with quick hands have already tested it out first.
A user on Reddit asked LongWriter-llama3.1-8b to generate the History of the Decline and Fall of the Roman Empire. It took 22 minutes (depending on the hardware) and generated 3.34 tokens per second on average.
The generated content is relatively formulaic, and the structure and rhythm of answering different questions are similar.
Regardless, this is a good start and the improvement is obvious.
The research team also stated that they will further expand the output length and output quality of the model in the future, and will also begin to study how to improve efficiency without sacrificing generation quality.
Reference Links:
https://github.com/THUDM/LongWriter