news

LeCun is not optimistic about reinforcement learning: "I do prefer MPC"

2024-08-26

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Original title: Yann LeCun is not optimistic about reinforcement learning: "I do prefer MPC"

Editor: Zhang Qian, Xiao Zhou

Is it still worth studying a theory from more than 50 years ago?

“I really prefer model predictive control (MPC) to reinforcement learning (RL). I’ve been saying this since at least 2016. RL requires an extremely large number of attempts to learn any new task. In contrast, MPC is zero-shot: if you have a good world model and a good task objective, MPC can solve new tasks without any specific task learning. That’s the magic of planning. This doesn’t mean RL is useless, but its use should be a last resort.”

In a recent post, Yann LeCun, chief AI scientist at Meta, expressed this view.

Yann LeCun has long been a critic of reinforcement learning. He believes that reinforcement learning requires a lot of trials and is very inefficient. This is very different from the way humans learn - babies do not recognize objects by observing a million samples of the same object, or try dangerous things and learn from them, but by observing, predicting and interacting with them, even without supervision.

In a speech half a year ago, he even advocated "abandoning reinforcement learning" (see "GPT-4's research path has no future? Yann LeCun sentenced autoregression to death"). But in a subsequent interview, he explained that he did not mean to give up completely, but to minimize the use of reinforcement learning. The correct way to train the system is to first let it learn a good representation of the world and world models from the main observations (and perhaps a little interaction).

At the same time, LeCun also pointed out that he prefers MPC (model predictive control) to reinforcement learning.

MPC is a technology that uses mathematical models to optimize control systems in real time within a limited time. Since its introduction in the 1960s and 1970s, it has been widely used in various fields such as chemical engineering, oil refining, advanced manufacturing, robotics, and aerospace. For example, some time ago, Boston Dynamics shared their many years of experience in using MPC for robot control (see "Boston Dynamics Technology Revealed: Backflips, Push-ups and Car Rollovers, 6 Years of Experience and Lessons").

One of the latest developments in MPC is its integration with machine learning techniques, known as ML-MPC. In this approach, machine learning algorithms are used to estimate system models, make predictions, and optimize control actions. This combination of machine learning and MPC has the potential to provide significant improvements in control performance and efficiency.

LeCun's world model-related research also used MPC-related theories.

Recently, LeCun's preference for MPC has attracted some attention in the AI ​​community.

Some say that MPC works well if our problem is well modeled and has predictable dynamics.

Perhaps for computer scientists, there is still a lot to be explored in the field of signal processing and control.

However, some people have pointed out that solving the exact MPC model is a difficult problem, and the premise in LeCun’s point of view - "if you have a good world model" itself is difficult to achieve.

Some people also say that reinforcement learning and MPC are not necessarily an either-or relationship, and the two may have their own applicable scenarios.

There have been some studies that have combined the two, with good results.

Reinforcement Learning vs MPC

In the previous discussion, a netizen recommended a Medium article that analyzed and compared reinforcement learning and MPC.

Next, let us analyze the advantages and disadvantages of the two in detail based on this technical blog.

Reinforcement learning (RL) and model predictive control (MPC) are two powerful techniques for optimizing control systems. Both approaches have their advantages and disadvantages, and the best approach to solve a problem depends on the specific requirements of a particular problem.

So, what are the advantages and disadvantages of the two methods, and what problems are they suitable for solving?

Reinforcement Learning

Reinforcement learning is a machine learning method that learns through trial and error. It is particularly well suited to solving problems with complex dynamics or unknown system models. In reinforcement learning, an agent learns to take actions in an environment to maximize a reward signal. The agent interacts with the environment, observes the resulting state, and takes actions. The agent is then rewarded or penalized based on the outcome. Over time, the agent will learn to take actions that lead to more positive rewards. Reinforcement learning has a variety of applications in control systems, aiming to provide dynamic adaptive methods to optimize system behavior. Some common applications include:

Autonomous Systems: Reinforcement learning is used in autonomous control systems such as autonomous vehicles, drones, and robots to learn optimal control policies for navigation and decision making.

Robotics: Reinforcement learning enables robots to learn and adapt their control policies to accomplish tasks such as grasping, manipulation, and locomotion in complex dynamic environments.

......

Reinforcement Learning (RL) workflow.

Agents: learners and decision makers.

Environment: The surroundings or entities that the agent interacts with. The agent observes and takes actions that affect the environment.

State: A complete description of the state of the world. The agent can fully observe or partially observe a state.

Reward: A scalar feedback indicating the agent's performance. The agent's goal is to maximize the long-term total reward. The agent changes its strategy based on the reward.

Action space: The set of valid actions that an agent can perform in a given environment. A finite number of actions constitutes a discrete action space; an infinite number of actions constitutes a continuous action space.

Model Predictive Control

Model Predictive Control (MPC) is a widely used control strategy that has been applied in many fields, including process control, robotics, autonomous systems, and so on.

The core tenet of MPC is to use mathematical models of a system to predict future behavior and then use that knowledge to generate control actions to maximize some performance objective.

After years of continuous improvement and refinement, MPC can now handle increasingly complex systems and difficult control problems. As shown in the figure below, at each control interval, the MPC algorithm calculates an open-loop sequence of control horizons to optimize the behavior of the controlled body (plant) within the prediction horizon.

Discrete MPC scheme.

Applications of MPC in control systems include:

Process Industry

Power System

Car Control

Robotics

Among them, MPC is used in robotic systems to plan and optimize motion trajectories, ensuring smooth and efficient movement of robotic arms and robotic platforms in various applications, including manufacturing and logistics.

The following table lists the differences between reinforcement learning and MPC in terms of models, learning methods, speed, robustness, sample efficiency, applicable scenarios, etc. In general, reinforcement learning is a suitable choice for problems that are difficult to model or have complex dynamics. MPC is a good choice for problems that are well-modeled and have predictable dynamics.

One of the latest advances in MPC is its integration with machine learning techniques, known as ML-MPC. ML-MPC takes a different approach to control than traditional MPC, using machine learning algorithms to estimate system models, make predictions, and generate control actions. The main idea behind it is to use data-driven models to overcome the limitations of traditional MPC.

Machine learning-based MPC can adapt to changing conditions in real time, making it suitable for dynamic and unpredictable systems. Compared to model-based MPC, machine learning-based MPC can provide higher accuracy, especially in complex and difficult-to-model systems.

In addition, machine learning-based MPC can reduce the complexity of the model, making it easier to deploy and maintain. However, compared with traditional MPC, ML-MPC also has some limitations, such as requiring a large amount of data to train the model and poor interpretability.

It seems that computer scientists still have a long way to go before they can truly introduce MPC into the field of AI.

Reference link: https://medium.com/@airob/reinforcement-learning-vs-model-predictive-control-f43f97a0be27