Zhu Jiaming's new preface: Eight major trends of AI and human intelligence beginning to "share wisdom"

Zhu Jiaming's new preface: Eight major trends of AI and human intelligence beginning to "share wisdom"｜2024 Shanghai Book Fair①

2024-08-19

"Dialogue Era: The Road to Building a Powerful Country by Forging New Productivity" by Zhu Jiaming, Tao Hu, Shen Yang, etc., published by Peking University Press in August 2024, priced at 78 yuan

[Introduction] The Shanghai Book Fair is in full swing, with people flowing inside and outside the exhibition hall, and information and spirits at all levels are flowing and infecting each other.

The new book "Dialogue Era: The Road to a Powerful Country by Forging New Productivity" published by Peking University Press and compiled by guests of Wenhui Lecture Hall was unveiled at the book fair and was included in the list of new books for cadres to study in August at Chang'an Street Reading Club on August 14. The book is divided into three chapters: the "emergence" of artificial intelligence, the integrated development of the virtual and real worlds, and the infrastructure of digital intelligence technology. It mainly focuses on topics such as artificial intelligence, large models, chips, brain-computer interfaces, Web3, satellite Internet, digital ecology, metaverse, and AI ethics.

Fourteen lectures, forty-one experts, scholars and industry elites, including Zhu Jiaming, Lin Baojun, Wang Jianyu, Shenyang, Li Miao, Cai Hengjin, Lu Yong, Lin Longnian, Lin Yonghua, Tao Hu, Yang Guang, Wei Hui, He Liang, Ji Weidong, Feng Xiang, Jiang Xiaoyuan, Yu Hai, He Jing, Fu Changzhen, Li Quanmin and other lecture guests, deeply analyzed the technological innovation and industrial development in the fields of new generation information technology, artificial intelligence, aerospace, biomedicine, quantum technology, etc., and discussed the frontiers, trends and challenges of global artificial intelligence development and the ethical governance of artificial intelligence. It helps readers understand the concept and connotation of new quality productivity and its important role in promoting China's modernization. This book reflects the dialogue between science and technology and humanities, the linkage between technological development and social change, and has both cutting-edge heights and new knowledge. It helps to understand the concept and connotation of new quality productivity and its important role in promoting China's modernization.

It is particularly worth mentioning that the preface of this book is 17,000 words long, written by one of the authors, economist Zhu Jiaming, and focuses on the frontiers, trends and challenges of artificial intelligence development from 2022 to 2024. It also elaborates on the impact of artificial intelligence on the macroeconomy. The latest information contained in the preface is as of early July 2024. Today, we have selected the frontiers, trends and challenges of AI development in the preface.

During the book fair, readers can go to Peking University Press located at E1-07, East Hall 1 to purchase (40% off), or order on online platforms such as Dangdang, JD.com, and Taobao.

The Era of Dialogue at Peking University Press at the Shanghai Book Fair (Booth E1-07, East Hall 1)

AI and human intelligence are beginning to enter the eight major trends of "co-intelligence"

Artificial intelligence is an important part of the new productivity. Artificial intelligence is a comprehensive technology involving thought, science and technology, economy and society. Artificial intelligence technology is different from agricultural technology, industrial technology and information technology in human history. It originated from a belief, a concept and a spirit of the intellectual elite from ancient Greece to modern times, that is, intelligence is not only owned by humans, but also machines made by humans may produce intelligence, because intelligence can eventually be calculated. In 1936, the birth of the Turing machine was undoubtedly a milestone in the history of artificial intelligence. For more than 80 years, for human society, artificial intelligence has not only meant a certain science and technology, but also a disruptive change in thought, economy and society. After continuous iteration and evolution, artificial intelligence has and will continue to prove the historical significance of the interaction between long-termism and accelerationism. This article discusses the frontiers, trends and challenges of global artificial intelligence development since 2022.

1. Large Language Model (or Big Model)

Major breakthrough in June: "Reinforcement Learning" errors can be discovered and corrected

The development history of artificial intelligence can be divided into different stages. In November 2022, OpenAI released ChatGPT, and Generative Artificial Intelligence (GenAI) began to flourish. Generative AI is a machine learning technology based on neural networks that imitate humans, creating new content in the form of text, images, music, videos, etc.

The representative of GenAI is the Large Language Model (LLM). The so-called Large Language Model is a deep learning model trained based on a large amount of text data, which can generate natural language text or understand the meaning of language text. In other words, the Large Language Model is based on deep learning, and uses multi-layer neural networks to identify complex patterns in data by simulating the way the human brain processes information.

At this stage, the core of artificial intelligence is the large language model. The major countries and companies in the world have led the development of large language models, which have shown explosive growth and formed an ever-expanding cluster of large language models. The main variables that affect the performance of large language models are training data, model size (i.e., the number of parameters), generation algorithms, and optimization techniques. The characteristics of large language models include: (1) Large parameters. The number of parameters of large language models can usually reach billions or even hundreds of billions. (2) It has image recognition and predictive analysis capabilities. (3) It has the ability to understand and generalize data. It can learn and perform a variety of complex tasks, and in natural language processing (NLP), it can achieve accurate and efficient machine translation, sentiment analysis, and intelligent question and answer.

ChatGPT, Google's Gopher, LaMDA, and Meta's Llama are global representatives of large language models. Among them, GPT-4 released by OpenAI in 2023 is a general term for a series of models, not a single model. In May 2024, the GPT-4o model launched by OpenAI demonstrated its outstanding ability to process hundreds of languages in terms of understanding text, speech, and images, and can conduct real-time voice conversations and accurately capture and express human emotions. In June of the same year, Anthropic officially launched the Claude 3.5 Sonnet model, which surpassed Claude 3 Opus and GPT-4o in terms of coding capabilities, visual capabilities, and new ways of interaction. Even more exciting is that Claude 3.5 Sonnet introduces the innovative "Artifacts" feature, which allows users to edit and build AI-generated content in real time in a dynamic workspace, turning conversational AI into a collaborative partner that is seamlessly integrated into users' projects and workflows. In particular, Claude 3.5 Sonnet also redefines the cost-effectiveness of smart models with its speed twice that of its predecessor and 1/5 of the cost.

Also in June, a breakthrough occurred in the field of large language models: OpenAI released CriticGPT based on the GPT-4 model to capture errors in ChatGPT code output. In other words, CriticGPT is a model that finds GPT-4 errors through GPT-4. It can not only write comments on ChatGPT response results, but also help human trainers better understand and meet human intentions, discover and correct errors in reinforcement learning with human feedback (RLHF), indicating that artificial intelligence has taken a key step in the goal of evaluating the output of advanced AI systems.

2. AI Platform

There are nine major platforms in the world, and the trend is verticalization and specialization

As AI covers all aspects of human production and life, building AI platforms has become a general trend. AI platforms provide the world's leading multimodal technologies for multiple AIs, such as voice, image, and NLP, as well as open conversational AI systems and ecosystems. Currently, there are nine major AI platforms in the world, including Google, TensorFlow, Microsoft Azure, OpenAI, NVIDIA, H2O.ai, Amazon Web Services (AWS), DataRobot, and Fotor. Among them, NVIDIA Omniverse is an open platform designed for virtual collaboration and real-time realistic simulation. With the help of a powerful ecosystem such as GPU and CUDA-X AI software, it provides industry-leading solutions, including machine learning, deep learning, and data analysis.

The development trend of AI platforms is mainly verticalization and specialization. For example, AI art platforms are platforms that use artificial intelligence technology to process and create images, helping artists and non-professionals to quickly generate interesting and aesthetically valuable paintings in the form of AI paintings, from which they can form creative inspiration and artistic experience, and bring more innovation and possibilities to the art world. Midjourney and Stable Diffusion are AI art platforms with expanding influence. For another example, Suno v3.5, as an AI music generation tool, has increased the length of the generated music from 2 minutes to 4 minutes, and the music structure has been significantly optimized. The AI music generation platform has shown its creative potential beyond humans in terms of its expertise in auditory art, which is difficult to describe clearly in words. Suno announced that it will also launch a new feature that allows users to create songs with any sound. This new feature can transform various sounds in daily life into music, bringing new possibilities to music creation.

3. AI stack

The foundational pillars include: data, computing, and models

From a hardware perspective, the foundation of the AI stack is GPU, CPU, and TPU. The most important part of the generative AI stack is the GPU. However, the AI stack also includes the AI software system, and the final AI stack is a system and ecosystem.

In-depth analysis shows that the AI stack is a structured framework that includes various layers and components required to develop and deploy AI systems. The key components of the AI stack include data management, computing resources, machine learning frameworks, and machine learning operations (MLOps) platforms. The stack of generative AI consists of three layers: top, middle, and bottom. The top layer involves knowledge and expertise in specific fields, the middle layer provides data and infrastructure that can be used to build AI models, and the bottom layer is cloud computing resources and services. Making progress in each layer is critical to promoting the development of AI. The basic pillars of the AI stack include: data, computing, and models. Among them, generative AI requires a large amount of computing resources and large data sets, which are processed and stored in high-performance data centers. Generative AI has driven the reshaping of the full stack.

Generally speaking, based on the AI stack, it is possible to build artificial intelligence applications with features such as quick search, quick translation, intelligent recognition, and intelligent manipulation.

4. Physical World Simulator

The third is the AI physical world: beyond human perception of time and space

For contemporary humans, there are three worlds: the real world of experience, the virtual world, and the physical world beyond human perception of time and space. Artificial intelligence directly affects the relationship between humans and the above three worlds. In the real world of experience, the parallel and interaction of artificial intelligence and natural intelligence has changed the way the real world exists; in the virtual world, artificial intelligence and real virtual technology can guide humans into a non-real immersive experience state, and the metaverse is one of the ways; in the physical world beyond human perception of time and space, artificial intelligence can help humans break through the limitations of their senses and recognize the universe with a scale of tens of billions of light years and microscopic scenes measured in nanometers. In the field of scientific experiments, artificial intelligence technology is no longer just a tool, but also a prerequisite.

At the beginning of 2024, the fundamental significance of Sora's appearance is that through its own physical world simulator function, it shows a physical world that humans may not perceive, a physical world that is likely to be more real than what human eyes see. Once humans perceive and integrate into the world created by the AI physics engine, they will experience more diverse physical rules.

When Sora performs video generation tasks, the generated videos can follow the physical laws of the real world to a certain extent based on the support of perception, memory, and control modules. This allows it to simulate people, animals, environments, etc. in the real world, has a broader imagination space, and basically achieves spatial consistency, temporal consistency, and causal consistency. Sora is a readable world model, and how well it performs at this stage is not the essence of the problem. After the release of Open-Sora 1.1, the quality and duration of video generation have been greatly improved. The optimized Causal Video VAE architecture has greatly improved Sora's performance and reasoning efficiency.

One of NVIDIA's important contributions is the completion of the Earth-2 digital twin earth model. Earth-2 combines the generative AI model CorrDiff, trained based on WRF numerical simulation, and can accurately predict weather information with 12 times the resolution (from 25 km to 2 km). The next step for Earth-2 is to increase the prediction accuracy from 2 km to tens of meters. With higher resolution, the running speed is increased by 1,000 times compared to physical simulation, and the energy efficiency is increased by 3,000 times, which means that it can be predicted in real time.

The prospect is very clear: humans will construct a perception/memory/control complex with the ability to build realistic and physically correct "world models". It is in this sense that Microsoft scientist Sébastien Bubeck proposed the concept and research direction of "AI physics". Nvidia CEO Huang Renxun also proposed that the next wave of AI will be physical AI. Therefore, Nvidia's digital twin target is not only the earth, but also the entire physical world.

5. Embodied intelligence and intelligent robots

Ultimate application: making artificial intelligence concrete and human

The development of artificial intelligence will inevitably lead to the formation of an artificial intelligence ecosystem, and embodied intelligence (EAI) or intelligent robots will become the main body of the artificial intelligence ecosystem.

Embodied intelligence is a further extension of artificial intelligence in the physical world. It is an intelligent system that can understand, reason and interact with the physical world. It has the ability of human-computer interaction and natural language understanding, and can realize thinking, perception and action. Furthermore, intelligent robots will simulate human thinking paths to learn and make behavioral feedback expected by humans. Driven by multimodal AI, they will self-learn, perceive the world, understand and execute human instructions, complete personalized tasks and collaboration requirements, and continue to evolve. That is, in a real physical environment, perform a variety of tasks that can be tested and measured. In short, the characteristic of embodied intelligence is the ability to autonomously perceive the physical world from the perspective of the protagonist.

As for intelligent robots of various forms, they are the physical existence of embodied intelligence, and their overall architecture consists of a perception layer, an interaction layer, and a motion layer. The development of Tesla's humanoid robot "Optimus Prime" from the first generation to the second generation, the huge investment received by the American humanoid robot startup Figure AI in February this year, and the 25 humanoid robots exhibited at Nvidia's 2024 Global Technology Conference (GTC), all show the rapid development of the humanoid robot field.

In March 2024, NVIDIA launched Project GR00T, the world's first universal basic model of humanoid robots. Robots driven by this model can understand natural language and imitate actions by observing human behavior. Users can teach them to quickly learn and coordinate various skills to adapt to and interact with the real world. The emergence of Project GR00T indicates that the real robot era may be coming. This is also the ultimate application of AI: making artificial intelligence concrete and "human".

The rise of embodied intelligence marks a shift in robotics from traditional control-based to a new paradigm of learning and manipulation. The explosion of large-scale model technology and the reduction of hardware costs have led to the emergence of embodied intelligence companies that aim to develop intelligent robots that can interact with the physical world.

In May 2024, the International Conference on Robotics and Automation (IEEE ICRA), one of the most influential international academic conferences in the field of robotics, was held in Yokohama, Japan. This year's conference theme "CONNECT+" not only showcases the latest progress in robotics technology, but also a revolution in "embodied intelligence" and "learning". In the long run, embodied intelligence is of great significance to the development of the artificial intelligence industry and has an undeniable value for artificial general intelligence (AGI).

6. Spatial Intelligence

More than just a machine version of the human eye, it reveals the world from never-before-imagined perspectives

There are two types of spatial intelligence: one is the spatial intelligence formed by natural evolution. Nature has spent millions of years to let humans evolve spatial intelligence. The eyes capture light, project 2D images onto the retina, and then the brain converts this data into 3D information. The other is spatial intelligence based on artificial intelligence technology, that is, the machine simulates the complex visual reasoning and actions of humans, and directly understands and operates the 3D world through visual information with the assistance of multiple sensors.

Comparing the spatial intelligence formed by natural evolution and the spatial intelligence based on artificial intelligence technology, the difference is significant: the spatial intelligence formed by natural evolution is limited in spatial dimensions, and it is difficult or even impossible to break through 3D space. However, spatial intelligence based on artificial intelligence technology can break through the spatial dimension. Such a space breaks the geographical boundaries and is in a fluid, boundless and free and open state. Not only that, such a space is no longer subject to Newton's time constraints, achieving timeliness and time optimization. For example, Google researchers have developed an algorithm that can convert data into 3D shapes or scenes with only a set of photos.

In this regard, Fei-Fei Li, director of the Human-Centered AI Institute at Stanford University (academician of the U.S. Academy of Engineering), has had the following profound thoughts: "Combining visual acuity with encyclopedic knowledge depth can bring a new ability. What this new ability is is still unknown, but I believe that it is by no means just a machine version of the human eye. It is a brand-new existence, a deeper and more detailed perspective that can reveal the world from a perspective we have never imagined." In other words, spatial intelligence based on artificial intelligence technology will break through the spatial intelligence formed by natural evolution and show a spatial state that humans cannot rely on the imagination of the brain. For example, the quantum space described by quantum mechanics refers to a space with topological characteristics composed of some discrete or continuous states. The spatial intelligence that humans naturally evolved cannot feel and recognize quantum space, but spatial intelligence supported by artificial intelligence technology is possible.

In short, spatial intelligence based on the big model of artificial intelligence guides humans into "a completely new existence", and embodied intelligence is likely to be the "indigenous people" here.

7. Deep evolution of artificial intelligence

Moore's Law may be broken, and metacognition leads to the failure of scaling laws

Artificial intelligence is at a historical moment of profound evolution. Moore's Law and Scaling Law are gradually playing an increasingly important role.

Moore's Law is a law summarized by Gordon Moore, one of the founders of Intel, based on experience, that is, the number of transistors that can be accommodated on an integrated circuit will double approximately every 18 to 24 months. In other words, the performance of the processor doubles approximately every two years. The problem is that when the chip enters 28 nanometers (nm), a Moore's Law crisis occurs. When the chip enters the 1 nanometer process chip, it means reaching the limit of Moore's Law. Now the entire hardware foundation of artificial intelligence with chips as the core is facing the crisis of Moore's Law or the limit of Moore's Law. In June 2024, at the 2024 Taipei International Computer Show (Computex 2024), Nvidia CEO Huang Renxun announced that the update frequency of its GPU architecture will be accelerated from once every two years to once a year, but the computing power growth has not stagnated. The computing power of its AI chips has achieved an astonishing 1,000-fold growth in the past 8 years, which shows that there is a technical possibility to break through the crisis of Moore's Law and the limit of Moore's Law.

Scaling laws mainly involve the study of critical phenomena. The core idea is that as the number of model parameters, the size of the dataset, and the amount of floating-point calculations used for training increase, the performance of the model will improve. In order to obtain the best performance, the above three factors must be scaled up simultaneously. When not constrained by the other two factors, the model performance has a power law relationship with each individual factor.

Specifically in the field of artificial intelligence, the performance prediction of GPT-4 on a specific problem can be obtained by a model that is 1,000 times smaller than GPT-4. In other words, before GPT-4 starts training, its performance on this problem is already known. Therefore, scaling laws are very important for the training of large models. It can be said that scaling laws are another potential law for the deep evolution of artificial intelligence.

Not long ago, Bill Gates made a profound comment on scaling in an episode of The Next Big Idea podcast: "Scaling will definitely continue to work. But at the same time, the actions from the simple algorithms we have today to more human-like metacognition will change, and that's a bigger frontier." Because consciousness may be related to metacognition, and metacognition is not a measurable phenomenon. In other words, metacognition causes scaling to fail.

8. Near- and medium-term trends

The general AI stage is accelerating, and the dawn of super AI has appeared

From the perspective of 2024, we can roughly see the near- and medium-term trends of artificial intelligence:

(1) The Artificial Narrow Intelligence (ANI) stage is coming to an end. In this stage, AI is an AI system that can perform specific tasks, such as image recognition or speech recognition. The peak of this stage is the emergence of large models that support generative AI and the popularization of intelligent machines.

(2) The stage of general artificial intelligence is accelerating.

(3) The dawn of artificial super intelligence (ASI) is already on the horizon. ASI is an artificial intelligence system that surpasses the “human mind”, catches up with and quickly surpasses the collective wisdom of all mankind, and is even more powerful than human intelligence.

Artificial intelligence is already in a historical period that refreshes people's imagination every day. During this period, the world's subject and reference system will change, the knowledge system will be reconstructed, human intelligence and artificial intelligence will begin to enter a state of "co-intelligence", traditional economic organizations, national systems and legal systems will also change, and human civilization will even be reorganized in the future.

Zhu Jiaming

July 6, 2024

(The titles and subtitles in the original preface were supplemented and slightly adjusted by the editor)

Author: Zhu Jiaming

Text: Zhu Jiaming Photo: Zhu Meiquan Editor: Li Nian Responsible Editor: Li Nian

Please indicate the source when reprinting this article.

Report/Feedback

news

Zhu Jiaming's new preface: Eight major trends of AI and human intelligence beginning to "share wisdom"｜2024 Shanghai Book Fair①

Introduction

My contact information