news

Dialogue with Qi Peng's team at Chongqing AI Research Institute of Shanghai Jiao Tong University: The current level of large models is only equivalent to that of a five-year-old child |

2024-07-21

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


(Image source: unsplash)

Recently, a piece of news about "the large model cannot determine which is bigger, 9/11 or 9/9" sparked discussion.

When a user asked 12 large AI models at home and abroad, including GPT-4o, a math problem with a primary school student's difficulty, "Which is bigger, 9.11 or 9.9?", in the end, only four of them, including Alibaba Tongyi Qianwen, Baidu Wenxin Yiyan, Minimax and Tencent Yuanbao, answered correctly, while eight large models, including ChatGPT-4o, gave wrong answers.

This means that the mathematical capabilities of large models are poor and there are many problems that need to be solved.

In an exclusive conversation with Titanium Media AGI earlier, Dr. Qi Peng, director of the AI ​​Big Model Center of Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University (Shanghai-Chongqing Artificial Intelligence Research Institute), said that although big models have great potential, can handle complex problems and have the ability to learn and generalize. However, due to the limitations of the model architecture, large language models are more like "liberal arts students" and lack science capabilities. Moreover, due to the current limitations of insufficient computing power, insufficient text data, deviations in accuracy and reliability, and insufficient model size, its intelligence level is still at the child level, more like a "five-year-old child", and it is difficult to handle complex tasks, and "hallucinations" exist for a long time.

Qi Peng graduated from Tsinghua University with a bachelor's and master's degree, and completed his doctorate at the University of Wisconsin, USA. He currently works at the Chongqing Institute of Artificial Intelligence of Shanghai Jiao Tong University. Qi Peng has been deeply involved in data science, AI and other fields for many years, participated in many national scientific and technological projects, and has many intellectual property rights.

As ChatGPT has become popular around the world, Qi Peng has led the AI ​​Big Model Center team of Shanghai Jiao Tong University Chongqing Artificial Intelligence Research Institute to independently develop the "Zhaoyan" large language model over the past year, and ranked third in the world and second in China in the SuperCLUE Chinese big model intelligent agent evaluation benchmark in March this year.

At the same time, in July this year, Qi Peng led Shanghai Jiao Tong University doctoral student Zhuang Shaobin and others to participate in the open source community project and successfully reproduced the Sora-like video model. Using the advanced Latte spatiotemporal decoupled attention architecture, after careful training, it can generate up to 16 seconds (128 frames) of video on the InternVid video dataset. Compared with the previous open source model that could only generate 3 seconds (24 frames) of video, the performance has been improved by 5 times (500%).

On July 12, Qi Peng and Zhuang Shaobin had an exclusive conversation with Titanium Media for about two hours, focusing on topics such as the current development status of Sora, the challenges faced by the development of large models, the industry's implementation status, and future development directions.

Talking about the impact of Sora technology,Qi Peng told Titanium Media AGI that Sora is more like a new "hammer" that can solve a variety of problems. In addition to video generation, Sora's video model can also play a role in multiple fields such as autonomous driving and physical world simulation. The most intuitive application is video generation. Users only need to enter text descriptions to quickly generate video content that meets the requirements, which improves the efficiency and convenience of video production.

When it comes to industry implementation,Qi Peng pointed out that big models are widely used in many vertical industries, but there are relatively few real cases. There are two main reasons: first, the lack of mathematical and engineering capabilities of big models; second, as part of the scope of machine learning, the nature of big models based on statistical methods determines that they cannot achieve 100% accuracy.

Looking forward to the future development of AGI,Qi Peng emphasized that human society is at a critical stage on the road to AGI. Although the current model capabilities do not meet the AGI standard, one day in the future, when people look back on this period of history, they may realize that ChatGPT has put us at an important historical node.

"An important goal of the institute is to realize the commercialization of technology. The Big Model Center is currently focusing on the practical application of AIGC, especially the 'last mile' problem, how to transform research results into actual products or services to meet market demand. Although the intelligence of big models can be continuously improved, from five years old, ten years old to eighteen years old, and even to the level of top experts, such systems will always need supporting facilities and tools to support their operation and application. The cost of facility research and development may be relatively low, but they play a vital role in promoting the practical application and social value of big models." Qi Peng said.


Dr. Qi Peng, Director of AI Big Model Center, Chongqing Institute of Artificial Intelligence, Shanghai Jiao Tong University

The following is a summary of the conversation between Titanium Media AGI and Qi Peng and Zhuang Shaobin:Titanium Media AGI: Compared with other video models, what is the core difference of the Sora-like video model jointly developed by Shanghai Jiao Tong University and Chongqing Artificial Intelligence Research Institute?

Qi Peng: This project was developed by a team led by Dr. Zhuang Shaobin. The team chose to use fully open source data for model training. The team not only open sourced the data, but also made the training process public. In this way, other researchers or developers can reproduce the model training process in their own environment according to the same steps and parameter settings to verify the effectiveness and stability of the model.

The core differences are mainly reflected in three aspects:

First, the team uses fully open source data for model training, which means that the entire training process is based on publicly accessible datasets. This approach ensures the transparency and repeatability of the training process, and anyone interested can use the same dataset to reproduce or improve the model.

Second, the team adopted an indirect training method, which can efficiently train models at a lower computing cost. This method is suitable for large-scale data sets and complex models because they require longer training time and higher computing resources. By using indirect training, the training time can be shortened by increasing the number of computing nodes without increasing the computing power cost of a single computing node.

Third, the team also carried out some underlying optimization work, especially the optimization of video memory overhead. These optimizations enable the model to stably train long videos on a cluster or server, improving the training efficiency and scalability of the model.

Titanium Media AGI: What is the logic and reason behind choosing the open source model?

Qi Peng: Unlike commercial projects, the advantage of using an open source model for research projects that are a collaboration between a team and the open source community is that it can attract more R&D personnel to participate. Since there are no copyright and commercial restrictions, anyone interested in the project can easily obtain and use the model, and can propose their own improvement suggestions or contribute new code. This model can help the continuous improvement and optimization of the model, and can also strengthen cross-disciplinary and cross-field communication and cooperation.

Titanium Media AGI: This reproduced Sora-like video model uses the Latte spatiotemporal coupled attention architecture. What is the reason why it has no connection with the DiT architecture?

Qi Peng: The Sora-like model architecture developed by the team does not completely abandon Transformer or other traditional models. It is an extension of DiT, adding a time dimension to support video processing. This new architecture may be considered to better adapt to the characteristics of video data and improve the performance of the model in video generation or processing tasks.

Titanium Media AGI: The DiT architecture has limitations in generating long videos. Can the Latte spatiotemporal coupled attention architecture solve these problems?

Zhuang Shaobin: The best model that the team is currently training can generate videos up to 16 seconds long. This is a big improvement over previous models based on the UNET architecture, because the models at that time could usually only generate two to three seconds of video. 16 seconds is not a particularly long duration, but it is a relatively long record in the field of video generation.

The problem of continuity and coherence in video generation is mainly affected by data quality. If there are incoherent situations such as picture jumps in the video data, the trained model is likely to generate incoherent videos. In addition, the frame rate and resolution of the model training affect the quality of video generation. If the model is only trained on data with lower resolution and frame rate, it may not be able to generate high-resolution and smooth videos.

Why can't we generate a one- or two-minute video end-to-end? An end-to-end one- or two-minute video means thousands or even two or three thousand frames of data, which requires hundreds or thousands of times more computing resources. Although the Latte spatiotemporal coupled attention architecture can theoretically be extended to such a length, no organization currently has enough computing power and data to support such training.

Titanium Media AGI: Who is using Sora at the moment? What problems does it solve? What value does it bring?

Zhuang Shaobin:On the C-end, for non-professional video producers, such as ordinary home users, video generation models such as Sora can greatly reduce the difficulty of video production. Users only need to enter a simple text description to generate exquisite video content, making it easier to participate in video creation.

On the B-side, Sora can generate complex or even imaginative video materials for professional video editors and creative personnel. Professionals can fine-tune and optimize the materials provided by the model, thereby improving work efficiency and creative quality.

Sora is not only used in video production, but also in a series of explorations in many fields such as autonomous driving, 3D generation and modeling, and physics research. The autonomous driving system needs to accurately predict the dynamic changes of surrounding objects, and Sora, as a "world simulator", can simulate and predict the movement trajectory of objects, providing more accurate environmental modeling for the autonomous driving system.

For example, in the field of autonomous driving, Tesla's autonomous driving solutions and similar advanced driver assistance systems have made significant technological progress. They can perceive the surrounding environment in real time, including vehicles, pedestrians, obstacles, etc., which is the basis for realizing autonomous driving. Sora helps the autonomous driving system make decisions in advance to avoid potential dangerous situations such as collisions and rear-end collisions. At the same time, by predicting the movement of objects, the system can also optimize the driving route and speed, improve traffic efficiency, and reduce congestion and emissions.

In general, Sora lowers the threshold for video production, allowing more people to participate in video creation. Both non-professional users on the C-end and professional video producers on the B-end can benefit from it.

Qi Peng: Sora is more like a "hammer", a new tool that can solve a variety of problems.In addition to video generation, the Sora Wensheng video model can also play a role in multiple fields such as autonomous driving and physical world simulation. The most intuitive application is video generation. Users only need to enter text descriptions to quickly generate video content that meets the requirements, which improves the efficiency and convenience of video production.

Often, technology is not developed to solve a specific problem, but rather a powerful solution is discovered accidentally during the research process. Once mature, this method can be widely applied in multiple fields to solve a range of problems.

Currently, Sora is still in the testing phase and is not widely used publicly. In China, there may be some use cases of internal or external beta versions, but the number is relatively small and mainly limited to generating short videos or movie clips. Since this is a beta version, it may be provided free of charge in many cases. If it starts charging in the future, the cost will be a small part of the current video production cost, such as a few hundred yuan, which will greatly reduce the cost of video production.

Titanium Media AGI: What challenges did the team encounter during the development of the Sora model? How did you overcome these challenges?

Qi Peng:This project is mainly in cooperation with the open source community. The main research and development work is carried out by Dr. Zhuang Shaobin and one or two R&D personnel. The project is divided into four groups, responsible for data collection and labeling, model training, model evaluation, training acceleration and machine optimization.

Zhuang Shaobin: During the model training process, the biggest challenge the team faced was insufficient computing resources. Especially when dealing with large-scale data and complex models, the demand for computing resources is very high. In order to make more efficient use of limited machine resources, the algorithm team of the project team carried out a lot of optimization work.

These optimizations include advanced optimization strategies such as model parallelism, pipeline parallelism, and graphics memory optimization for a single model.

In addition, the team also carried out optimizations in the video field, so that the project has clear application scenarios and target areas, and better meets the actual application needs of the project.

Titanium Media AGI: Previously, the Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University and the Rural Revitalization (Chongqing) Research Institute released the rural revitalization agricultural model "Zhaoyan Zhaofeng". Why did you develop this model?

Qi Peng: Chongqing, as the only municipality with rural scenes, provides rich scenes and broad space for the application of large agricultural models. The large rural revitalization model uses a large amount of online data and agricultural data from the Academy of Agricultural Sciences. These data provide a basis for the construction and training of the model, which can more accurately reflect the actual situation of agricultural production. At present, this project is jointly developed with government agencies, the Rural Revitalization (Chongqing) Research Institute and other parties. This cooperation model helps to integrate resources, technology and funds to jointly promote the research and development and application of large agricultural models.

The rural revitalization model plans to create 14 models, and currently there are 3-4 related products. Through the big models, the knowledge of experts is converted into popular and easy-to-understand information to solve problems in agricultural production, management and people's livelihood, and help agricultural practitioners to obtain and use agricultural knowledge as easily as urban residents, helping to narrow the information gap between urban and rural areas and improve the efficiency and benefits of agricultural production.

Titanium Media AGI: What is the bottleneck in the development of large model technology at this stage?

Qi Peng:First, let’s clarify what the team defines as a big model: a big language model. Big language models are mainstream, and their core lies in knowledge and logic. As big language models continue to develop, their intelligence level may gradually increase from the IQ of a five-year-old child to the level of a ten-year-old, an eighteen-year-old, or even a superhuman. This process mainly depends on the model’s mastery and application of knowledge and logic.

Unlike large language models, the Vincent video model is another line of large models that does not involve complex knowledge and logic, but focuses more on understanding and simulating the laws of the physical world. Vincent video modeling is able to predict and respond to changes in the physical world based on perception and experience, but lacks high-level logical understanding and knowledge summarization capabilities.

In addition, there are multimodal models, which can encode and uniformly process multiple forms of information such as text, images, and sounds. Multimodal models are one of the future development directions, which can more comprehensively understand and process complex information in the real world.

At present, large models have entered a plateau period and it seems difficult to achieve a qualitative leap in intelligence. We still believe that larger models can often handle more complex problems and have stronger learning and generalization capabilities. Once a model can achieve 99.9% accuracy, then this large model will become a new productivity tool that can handle a variety of tasks.

The development of big models faces problems such as insufficient computing power, insufficient text data, deviations in accuracy and reliability, and insufficient model size. As a result, the "IQ" of big models is not high enough, more like the IQ level of a five or six-year-old child. The ability of big models to handle complex tasks is limited and cannot meet people's expectations.

Secondly, due to the limitations of the large language model architecture, the large model is a bit like a "liberal arts student". It is very good at language processing, but not so good at mathematics and engineering. The large model can be compared to the "CEO or COO" of a company. Although this "CEO or COO" may not understand much technology, he can mobilize various high-tech components.

At the same time, the difficulties encountered by domestic large companies and start-ups in developing large models are mainly due to the huge investment costs, while commercialization is not sufficient to support the continuous investment in computing power and data.

If the intelligence level of the big model cannot be significantly improved in a short period of time, then developing applications becomes a viable option. At the current stage of big model development, customers need to explore and improve in different application scenarios. Through application commercialization, revenue can be generated to support the continued development and optimization of the big model. This not only ensures the economic sustainability of the project, but also provides possibilities for future technological innovation.

In addition, large-scale model enterprises can also support the development of projects through financing. However, financing is not easy, and it depends on whether the market recognizes the potential and value of the project.

Titanium Media AGI: The market is very enthusiastic about big models, but their application is progressing slowly, which is inconsistent with market expectations. Why is the application of big models progressing slowly?

Qi Peng: There are two reasons.

First, the current lack of technical capabilities has limited improvements, reducing the enthusiasm for proactive upgrades;

Second, the application of new technologies requires new hardware and computing power support, but companies are not well prepared and lack sufficient computer rooms and intelligent computing resources to deploy and run large models, making it difficult for large models to be implemented in vertical industries. The second problem can actually be solved through corresponding policies. If companies can trust the government-invested research institutes or computing centers to ensure data security, they can start developing large model solutions before building their own intelligent computing power computer rooms.

Large models, especially those that can generate high-quality text, images, and other content, usually require a lot of computing resources to run. For example, when 1 million users use a large model at the same time, the annual computing cost may be hundreds of millions, making it difficult to commercialize. For ordinary users, such high-cost large model application products may be unaffordable, which also limits the promotion of C-end applications.

At this stage, solutions may include adopting more efficient algorithms, optimizing model structures to reduce computational workloads, or using distributed computing resources such as cloud computing to spread costs.

However, the current large-scale model agent is still like a "five-year-old child" in some aspects, with problems such as low "IQ", unstable performance, and easy hallucinations, which seriously affect user experience and trust. These problems are unacceptable in application scenarios that require high accuracy, such as government or financial customer service scenarios. Even in some consulting or operation and maintenance fields where accuracy requirements are not so high, the current accuracy rate of 80% or 60% has not yet reached the critical point for widespread application.

Improving the performance and stability of intelligent agents requires continuous optimization of algorithms, increasing the diversity and quantity of training data, introducing more complex model architectures, etc. At the same time, it is also necessary to strengthen real-time monitoring and error handling mechanisms to ensure the stability of large models in complex environments.

Image recognition is a very important field in the application of multimodal large models. Based on the pre-trained model, new image recognition models can be developed at a very low cost, covering many long-tail scenarios, and have great market potential. Although image recognition has many application scenarios, the current large image recognition models still have the problem of low accuracy and relatively high computing power requirements.

In addition, since the previous generation of artificial intelligence has been relatively mature in image understanding, people have not fully accepted the additional value that large models can generate, which also affects the speed of its promotion.

Titanium Media AGI: How do you view the current industrial innovation of vertical industry big models? Why are there so few vertical industry cases that have been implemented?

Qi Peng: In terms of vertical industry implementation, taking humanoid robots in the manufacturing industry as an example, it may take another five to ten years for humanoid robots to reach the level of being usable in households. This is mainly because their software generalization capabilities are not yet sufficient, and the hardware also needs further research and development and improvement.

A more practical research direction is to focus on the generalization problem of robotic arms in manufacturing scenarios. Although robotic arms themselves are very mature and are occupied by major domestic and foreign manufacturers in the market, existing robotic arms lack sufficient generalization capabilities and cannot flexibly adapt to a variety of different work tasks. This leads to the need for reprogramming every time a robotic arm is required to perform a new task in practical applications, which is impractical when tasks change frequently.

The key to solving the generalization problem of the robotic arm lies in software development, especially software that enables the robotic arm to handle a wider range of scenarios. It is expected that within one or two years, through software optimization and development, the generalization ability of the robotic arm will be significantly improved.

Of course, achieving the goal of generalization of the robot arm requires some challenges, namely the lack of data. In order to train a robot arm that can handle a variety of scenarios, a large amount of high-quality data is needed to support the learning and optimization of the algorithm.

In fact, big models can be used as an intelligent entity in the manufacturing industry, which can call different software as a whole. This means that in the complex systems of the manufacturing industry, various software that originally required manual operation or programming connection can now be automatically called and integrated through big models in theory.

Users only need to interact with the big model through language or thoughts, and the big model can automatically execute the corresponding programs and complete various tasks. However, due to the different production environments, systems, and APIs of different manufacturing companies, the adaptability of big models in different scenarios has become a major challenge. Even a big model that is well tuned in one scenario may not work properly in another environment. Therefore, enterprise developers need to fine-tune for specific scenarios to improve the performance and accuracy of big models.

This limitation directly affects the widespread application and in-depth development of large models in manufacturing. Because manufacturing often involves highly complex and sophisticated operations, requiring high-precision calculations and control. If large models are not competent for these tasks, then they cannot play their due potential in manufacturing.

In addition to the capacity limitations of large models themselves, compatibility issues between systems are also an important factor restricting the application of large models in manufacturing. Different companies or production units may use completely different systems, including different software, hardware, and APIs. This makes it difficult to directly apply a large model to another scenario after it has been optimized in one scenario, because the system environments of the two scenarios may be completely different. This difference between systems increases the complexity and cost of applying large models in manufacturing.

In fact, there is a solution. For vertical industries such as manufacturing, finance, and retail, standardized interfaces of big models can be defined. These interfaces will clarify the specific capabilities that big models can provide, so that all systems can call the functions of big models through these interfaces. The advantage of this is that no matter how the system environment changes, as long as they follow these standardized interface specifications, they can seamlessly connect with the big model.

Therefore, by defining standardized interfaces, enterprise developers can greatly reduce the difficulty of matching large models with different systems, allowing large models to adapt to different production environments more quickly. Standardized interfaces help ensure that large models can run stably in various systems and reduce compatibility issues caused by system differences.

In general, big models are widely used in many vertical industries, but there are relatively few real implementation cases. There are two main reasons: First, due to the lack of mathematical and engineering capabilities, it is difficult for big models to achieve sufficient accuracy and stability in practical applications. Second, as part of the scope of machine learning, the nature of big models based on statistical methods determines that they cannot be 100% correct.

In fact, the structure of the human brain is not 100% accurate, but human judgment is often accurate enough to meet the needs of most practical scenarios. In contrast, even after training, the accuracy of large models may still remain at around 95%, which may not be enough in some scenarios with extremely high accuracy requirements. In addition, the relatively poor mathematical ability of large models also limits their application in some fields.

If you want to overcome these limitations, you need to realize the importance of supporting facilities for large models. By providing large models with necessary supporting facilities and tools, you can make up for their lack of mathematical and engineering capabilities, so that they can better adapt to the needs of actual application scenarios. Such supporting facilities may include more accurate data sets, more efficient algorithms, more stable hardware platforms, etc.

Titanium Media AGI: Why do large models produce hallucinations?

Qi Peng: Sometimes the original data itself is missing or has problems, so the large language model cannot learn the correct knowledge during the training process and therefore cannot make correct inferences. This error is not caused by defects in the large language model itself, but by inaccurate input data.

If a large model is trained in a hypothetical environment where all information points to wrong conclusions, then the large model will also make wrong judgments based on this wrong information. This emphasizes the important influence of data and environment on the performance of intelligent agents and large models.

Sometimes large models may generate responses that seem logical and thoughtful but are not actually true or accurate, similar to how 5-year-olds often swear by false memories.

Adults often experience hallucinations or memory errors when processing information and memory. For example, when recording court trials or analyzing cases, parties may also experience false memories or hallucinations due to various pressures, misleading information, etc. in very serious and important situations.

Titanium Media AGI: Where are the differences in the domestic and international large-scale model market environments reflected?

Qi Peng:At present, foreign countries still maintain strong confidence in improving technology and have not completely turned to application development. This may be related to the relatively mature and stable foreign market, which allows companies to have more resources and space to focus on technology research and development and innovation. In contrast, the domestic market faces a more intense competition environment, and most large-scale model base research and development companies have already turned to application on a large scale.

Competition in the domestic market is not only reflected in the number of companies, but also in price wars. As multiple companies provide similar services at the same time, the price of large models has dropped rapidly, making it difficult for companies to recover costs by providing services. In foreign countries, companies represented by ChatGPT can continue to earn revenue and use it for further research and development and innovation by virtue of their leading position in technology and market recognition.

In the domestic market, due to the fierce price war and relatively weak willingness to pay, enterprises may have to focus more on developing new applications to seek commercial breakthroughs. Although this strategy can alleviate the economic pressure of enterprises to a certain extent, it may also lead to insufficient investment in technology research and development, thus affecting their long-term competitiveness.

Titanium Media AGI: What are the future development directions of AGI?

Qi Peng:I believe that human society is at a critical stage on the road to AGI. Although at this stage the industry believes that some technologies or models are not on the right path to AGI and that these technologies or models do not belong to AGI.But one day in the future, when we look back on this history, we may realize that we are standing at an important historical node.

Take Tesla's autonomous driving technology as an example. Five years ago, people might have thought that L4 autonomous driving technology would take 10 to 20 years to achieve, but now this technology has made significant progress. This accidental progress makes the industry believe that true AGI may also be achieved inadvertently.

Zhuang Shaobin:What is the ideal state of AGI? AGI should not only have high-level thinking ability, but more importantly, it should be able to be applied in real life, especially in the industry.

At present, people have seen a lot of applications of robots and AI technology on physical devices, which shows that people are working hard to liberate AI technology from computers and transform it into a tangible and dynamic entity. This leap is very important for AI technology. Only in practical applications can AI create greater value.

Titanium Media AGI: In addition to the DiT route, are there other possible routes or strategies for the development of AGI? What is the path to achieve AGI?

Qi Peng: In the process of AGI development, humans need to have a diverse and inclusive attitude. If we compare AGI to the homework of students with different grades in a class, although the students have different abilities, they can all complete some basic things. Similarly, even if the architectures have differences in performance, they can all complete some basic tasks, but their abilities in difficult tasks are different.

In particular, with the support of large amounts of data and computing power, different architectures may improve their basic capabilities by increasing the number of parameters, so that they can all perform at a certain level. At the same time, there are also some new trends in the current large model field, such as optimization methods such as linear attention mechanisms, which aim to reduce the amount of computation of traditional Transformer models and improve efficiency.

There is actually no fixed path to the ultimate realization of AGI. The various current models and technologies have their advantages and limitations. In the development of AGI, it is necessary to continuously explore and integrate multiple architectures and technologies. Different architectures and technologies will provide important references and lessons for AGI in this process, promoting its continuous development. At the same time, it is also necessary to pay attention to the practicality and self-correction ability of the model.

Titanium Media AGI: How to balance research innovation and commercialization in the field of large models in China?

Qi Peng: In terms of innovative research, due to limited funds, the institute needs to clearly define the goals it can strive to achieve, rather than blindly pursuing projects that require a lot of resources, such as large language models that can only be undertaken by large companies such as Baidu.

Secondly, the research team should choose research projects that can be achieved with some effort and have practical value. For example, the Sora-like model developed by the team based on the Latte spatiotemporal coupled attention architecture, taking 16-second high-definition video generation as an example, is a goal that the institute can strive to achieve with existing resources. At the same time, the institute also needs to choose some research directions that may require less resources, such as model optimization or supporting applications.

In terms of commercialization, the institute should focus on the application of AIGC, especially the "last mile" problem. This means that the institute needs to focus on how to transform research results into actual products or services to meet market demand and achieve commercialization.

Although the IQ of large models can be continuously improved, from five to ten to eighteen years old, and even to the level of top experts, such systems always need supporting facilities or tools to support their operation and application. The R&D cost of these supporting facilities may be relatively low, but they play a vital role in promoting the practical application and social value of large models.

Therefore, domestic research institutes and teams in the field of AI should focus mainly on the research and development of these supporting facilities to support the operation and practical application of large models.

(This article was first published on Titanium Media App, author: Dou Yueyi, Lin Zhijia, editor: Lin Zhijia)