Google Robotics Expert: AI will also encounter the same obstacles that robots encounter in reality

2024-07-16

Machine Heart Report

Editor: Zhang Qian

“Machine learning has been living in a bubble that’s the envy of roboticists, chemists, biologists, and neuroscientists, and as it really starts to work, we’re all going to run into the same reality walls that everyone else has been dealing with for years.”

Some say that progress in robotics has been slow, or even nonexistent, compared to other subfields of machine learning.

Alex Irpan, a robotics scientist at Google DeepMind and a participant in embodied intelligence projects such as SayCan, RT-1, and RT-2, agrees with this statement. But he believes that this is because robotics is a field that is closely connected to reality, and the complexity of reality determines that they are bound to run into walls. He also pointed out that these problems are not unique to robotics. The same problem applies to technologies such as large language models (LLMs). These models will encounter similar complexity problems as robotics when facing the real world.

He recently wrote a blog post titled “The Tragedies of Reality Are Coming for You” to make this point.

The tragedy of reality is coming to you

In 2023, I attended an ML conference. As the night progressed and the conversation turned to a question, “If you could give resources from any subfield of machine learning to another subfield, which one would you give the resources to?”

I can't remember what the others said, but one person said they would cut robotics. When I pressed them further, they said robotics was progressing too slowly and nothing was happening relative to other areas.

They say that robotics is progressing more slowly than the software-only subfield of machine learning, and I think they are right, but I would add two more points:

The reason why robot learning progresses slowly is that it is difficult to achieve anything without solving difficult problems.
The challenges of robotics are not unique to robots.

In robotics, a common saying is "reality is messy". In relation to code, I would extend that to "reality is complex". In robotics, you tend to push the messy reality into a good enough abstraction layer for the code to work on top of it. As a field, computer science has spent decades creating good abstraction layers between hardware and software. Code describes how to get power to the hard drive, processor, and display, and it's reliable enough that I don't even have to think about it.

There are many benefits to doing this. Once you’ve done the hard work and moved your work progress into an abstract logical space, everything becomes easier. Code and data are incredibly reproducible. I have copies of the file representing the draft of this blog post synced across 3 devices without even having to think about it.

However, as Joel Spolsky says, all abstractions are leaky to some extent, and I find that the leaks in robotics tend to be larger. There are many ways to go wrong that have nothing to do with the correctness of the code.

Does this have to do with some fundamental tenets of the discipline? A little bit. A lot of robotics hardware is more experimental than a laptop or a Linux server. Consumer robotics isn't a big industry yet. "Experimental" often means "weird, more prone to failure."

However, I don’t think the hardware is the main cause of the problem. Reality is the root of the problem. Benjamin Holson said it very well in his article “Mythical Non-Roboticist”:

The first difficulty is that robots have to deal with imperfect perception and imperfect execution in the real world. Global mutable state is a bad programming style because it's really hard to deal with, but for robotic software, the entire physical world is global mutable state, and you can only observe it unreliably and hope that your actions are close to what you want to achieve.

Robotics research depends on building new bridges between reality and software, but this happens outside of robotics research as well. Any software that interfaces with reality has an imperfect understanding of reality. Any software that attempts to affect changes in the real world must deal with the global mutable state of reality. Any software whose behavior depends on what happens in reality will incur adversarial noise and complexity.

Game AI is a good example. Chess AIs are reliably superhuman. However, some superhuman Go AIs are beatable if you play them in a specific way, as Tony T. Wang et al. found in the ICML 2023 paper “Adversarial Policies Beat Superhuman Go AIs”. Adversarial techniques find strategies that are clear enough for humans to replicate.

In Appendix G.2, one of our authors, a Go expert, was able to implement this [cyclic] attack without any algorithmic help by learning from the opponent's game records. They achieved over 90% win rate in games played under standard human conditions on the KGS online Go server against a top KataGo bot that is unrelated to the author.
The authors even won with a 9-handicap for the bot, a huge advantage: human pros with these handicaps have an almost 100% win rate against any opponent, human or AI. They also beat KataGo and Leela Zero, both of which searched 100,000 times per game, which is generally far beyond human capabilities. Other humans have since used cyclic attacks to beat various other top Go AIs.

Meanwhile, a few years ago, OpenAI created a system that defeated the reigning world champions of Dota 2. After opening the system to the public to test its robustness, a team devised a strategy that led to 10 straight wins.

Based on this, you could take a pessimistic view, saying that even a simple "reality" like connecting a 19 x 19 Go board or Dota 2 has enough additional complexity to make robust behavior challenging. I think this view is unfair, as neither system has robustness as a top priority, but I do think they are an interesting case study.

There has been a lot of hype lately around LLMs - what they can do, where they can be applied. Implicit in this is the belief that LLMs can dramatically change the way people interact with technology at work and at leisure. In other words, LLMs will change the way we interact with reality. In fact, I've joined the hype, specifically because I suspect the underlying model is overhyped in the short term and underhyped in the long term. However, this also means that for a field that has historically been bad at considering reality, all the chaos of reality is coming.

At the same ML conference where this guy was saying robotics was a waste of resources, I mentioned that we were experimenting with our base model with real robots. Someone said that seemed a little scary, and I assured them that it was just a research prototype. But I also find LLM generating and executing software a little scary, and I find it interesting that they are vaguely worried about one but not the other. People in Silicon Valley are a bit contradictory. They believe both that software can drive amazing change in startups and that their software is not worth deep thought or reflection. I think the world of bits is as much a part of reality as the world of atoms. They operate on different levels, but they are both part of reality.

I’ve noticed (with some schadenfreude) that LLM practitioners are starting to run into the same pain points that robotics had before. Like “we can’t replicate these trainings because it’s too expensive.” Yeah, that’s been a problem in robotics for at least a decade. Or “I can’t get Bing to tell me the release date of Avatar 2 because it keeps pulling up news stories about itself and correcting itself before generating them.”

We now live in a world where any public internet text irreversibly affects search-enhanced generation. Welcome to global mutable state. Whenever I see claims that ChatGPT’s behavior has regressed, I think of all the conspiracy theories I and others have come up with to explain the sudden and inexplicable drop in bot performance, and whether the problem lies with the model, the environment, or our over-extrapolation.

As the saying goes “all robot demos lie”, people have found that all LLM demos lie as well. I think that fundamentally this is unavoidable because human attention is limited. What is important is to assess the type, size and significance of the lies. Do they show how the model/bot generalizes? Do they mention how carefully the examples were chosen? These questions become more complex once you relate to reality. Messi looks like a good player at the moment, but “can he do this on a cold rainy night in Stoke”?

What complicates the issue is that the answer to these questions isn’t always “no.” Messi could do it on a cold, rainy night in Stoke. He’s good enough. It makes the question difficult because getting a “yes” right is much more important than getting a “no” right. As LLMs get better, and as AI becomes more common in everyday life, we as a society need to get better and better at judging whether a model has proven itself. One of my main concerns about the future is that we’re not good at assessing whether a model has proven itself.

However, I expect roboticists to be ahead of the curve. We were complaining about evaluation problems long before the LLM rigged common benchmarks. We were struggling to get enough data to capture the long tail of autonomous driving long before “we need better data coverage” became a rallying cry for base model pretraining teams. As machine learning really starts to work, all of us will run into the same real-world barriers that everyone else has been dealing with for years. These challenges are surmountable, but they will be hard. Welcome to the real world. Welcome to the world of pain.

Original link: https://www.alexirpan.com/2024/07/08/tragedies-of-reality.html

news

Google Robotics Expert: AI will also encounter the same obstacles that robots encounter in reality

Introduction

my contact information